Hadoop ToolRunner Implementation for Mapreduce Driver Example

Generally In Learning stage of Mapreduce programs most of the people usually create their MapReduce job using a driver code that is executed though its static main method.Actually this type of execution is also correct but coming to industry level coding it is hardcoded.So here we learning about Hadoop ToolRunner Implementation for Mapreduce Driver Example .

Should we change our hadoop mapreduce driver configurations from hard code to industry level code by using the hadoop tool interface . The main difference between hardcoded and smart coded is if some one want to modify some of your configuration properties on the fly (modify the number of reducers in driver code).

If you are using Tool interface then we will easily modify the Driver configurations,suppose if you used hardcoded in your driver configuration at that time we have to reedit the configurations and re create the jar file once again.soThis can be avoided by implementing the Tool interface in your MapReduce driver code.

Hadoop Configurations

By implementing the Tool interface and extending Configured class, you can easily set your hadoop Configuration object via the GenericOptionsParser (is a utility to parse command line arguments generic to the Hadoop framework) , thus through the command line interface. This makes your code definitely more portable (and additionally slightly cleaner) as you do not need to hardcore any specific configuration anymore.

Hadoop ToolRunner Implementation for Mapreduce Driver Example

Hadoop ToolRunner Implementation for Mapreduce Driver Example

Here is the example for without Tool and With Tool Interface

Without Hadoop ToolRunner Tool Interface Example

In the above without tool interface code we had mention  only 2 arguments here one is input path and another one is output path and we had set number of reducer tasks is 1 . Here we can’t change anything after creating the jar file.

In the above code the number of reducers (1) is hardcoded and we can’t modify the code on demand.

With Hadoop ToolRunner Tool Interface

ToolsRunner execute your MapReduce job through its static run method.
In this example we do not need to hardcode the number of reducers anymore as it can be specified directly from the CLI (using the “-D” option).

In the above code we are using with tool interface concept so we can modify the code on demand.This is the industry level coding to Hadoop ToolRunner Implementation for Mapreduce Driver Example

Speak Your Mind