Commissioning and Decommissioning Nodes in a Hadoop Cluster

One of the best Advantage in Hadoop is commissioning and decommissioning nodes in a hadoop cluster,If any node in hadoop cluster is crashed then decommissioning is useful suppose we want to add more nodes to our hadoop cluster then commissioning concept is useful.one of the most common task of a Hadoop administrator is to commission (Add) and decommission (Remove) Data Nodes in a Hadoop Cluster.

Why do we need decommissioning and commissioning?

You cannot directly remove any datanode in large cluster or a real-time cluster, as it will cause a lot of disturbance. And if you want to take a machine away for hardware up-gradation purpose, or if you want to bring down one or more than one node, decommissioning will required because you cannot suddenly shut down the datanode/slave-nodes. Similarly, if you want to scale your cluster or add new data nodes without shutting down the cluster, you need commissioning.

Decommission-Remove-Nodes

Decommission-Remove-Nodes

The above diagram showing what is exactly we are doing for decommissioning in manual cluster (not in Ambari or Cloudera Manager).The first task is to update the ‘exclude‘ files for both HDFS (hdfs-site.xml) and MapReduce (mapred-site.xml).

Decommissioning Nodes in a Hadoop Cluster

  • for jobtracker contains the list of hosts that should be excluded by the jobtracker. If the value is empty, no hosts are excluded.
  • for Namenode contains a list of hosts that are not permitted to connect to the Namenode.

Here is the sample configuration for the exclude file in hdfs-site.xml and mapred-site.xml:

hdfs-site.xml

mapred-site.xml

Note: The full pathname of the files must be specified.

Commissioning Nodes in a Hadoop Cluster

  • or jobtracker containing the list of nodes that may connect to the JobTracker. If the value is empty, all hosts are permitted.
  • for Namenode containing a list of hosts that are permitted to connect to the Namenode. If the value is empty, all hosts are permitted.

The ‘dfsadmin’ and ‘mradmin’ commands refresh the configuration with the changes to make them aware of the new node.

The ‘slaves’ file on master server contains the list of all data nodes. This must also be updated to ensure any issues in future hadoop daemon start/stop.

Commission-Add-Nodes

Commission-Add-Nodes

The above diagram showing what is exactly we are doing for commissioning in manual cluster (not in Ambari or Cloudera Manager).The first task is to update the ‘include‘ files .

The important step in data node commission process is to run the Cluster Balancer.

>hadoop balancer -threshold 40

Balancer attempts to provide a balance to a certain threshold among data nodes by copying block data from older nodes to newly commissioned nodes.

So, this is how you can do – Commissioning and Decommissioning Nodes in a Hadoop Cluster.

Got a question for us? Please mention it in the comments section and we will get back to you.

Comments

  1. M JITHENDRA says:

    good info,but practicaly explaining when this is kind of senario useful in real time . please explain that

Speak Your Mind

*