Hadoop Interview Questions and Answers For experienced Pdf

Hadoop Interview Questions and Answers For experienced Pdf : Now a days Hadoop technology is one of the trending technology in IT market.In this internet computer world Every business is depending on their business related data and generating relating data again processing the data finally predicting the future results based on the previous data.In this competition world  every business is based on analysis.This is the main reason for boom of Hadoop in IT market.Now we will provide Top 50 Most frequently asked Hadoop Interview Questions and Answers For experienced Pdf .

Hadoop Interview Questions and Answers For experienced Pdf

 what is speculative execution in hadoop ?

If a particular task is taking too much time to complete a job then,hadoop will create a another duplicate task on another node to complete the same job.If other task were executing speculatively, Hadoop tells the Task Trackers to abandon the old tasks and discard their outputs.In simple words,due to lack of network problem or another problem task is getting large amount of time at that time hadoop lunch another one which is having no problem in network.That new one complete the job and gives the output.This is called speculative execution in hadoop .

What is single point of failure hadoop ?

In hadoop 1.X version Hadoop having single point of failure (SPOF) problem because of only name node concept.If name node is failed in hadoop 1.X entire execution will fail,to solve this problem Hadoop High Availability problem introduced in Hadoop 2.X versions.

Hadoop Interview Questions and Answers For experienced Pdf

Hadoop Interview Questions and Answers For experienced Pdf

 What is Hadoop High Availability ?

Namenode High availability comes into the picture from Hadoop 2.X version.In hadoop 1.x we have only one name node if that name node is down entire execution will be stopped,there is no instant active name node in hadoop 1.x,secondary name node is only used to store partial metadata  not up-to date metadata.hadoop high availability provides more than one name node in cluster so if one name is down another name node comes into action without stopping.

Hadoop Interview Questions

 What is HDFS Federation ?

In Hadoop 1.x has only one namenode and namespace and limited RAM.In Hadoop2.x maintains multiple namenode and namespaces.Because of horizontal scale up for namespaces reduces load on any single namenode.

What is Active name node and passive name node ?

In Hadoop1.x versions having only one namenode.Coming to Hadoop2.x,it has two namenodes,So there is no single point of failure.In Hadoop2.x version having two namenodes i.e., Active namenode and Passive namenode.Active namenode is the one of the namenode of Hadoop2.x  which works and runs in cluster,Passive namenode is also one of the namenode of Hadoop2.x which is having same data as Active namenode.If Active namenode fails then Passive namenode replaces the Active namenode in the cluster.Hence,cluster has no single point of failure and cluster is never without a namenode.

What is journal Node ?

journal node coordinates hadoop datanodes with the namenode.

What is balancer ?

Balancer should be run when we have a large number of datanodes that are mostly disproportionate in how much data they are handling.  For large sites with a significant amount of node failures, they often run balancer.

What is Edit log ?

A daily editor log is maintained by the script supervisor during the film production of a motion picture or a video production of a television show. The form is used to compile the beginning and ending mark of each shot.

 What is Fsimage ?

The entire file system namespace, including the “mapping of blocks to files” and file system properties, is stored in a file called the FsImage.Remember “mapping of blocks to files” is a part of FsImage.This is stored both in memory and on disk.Along with FsImage, Hadoop will also store in memory, block to datanode.

 What is Checkpoint node ?

The NameNode stores the metadata of the HDFS. The state of HDFS is stored in a file called fsimage and is the base of the metadata. During the runtime modifications are just written to a log file called edits

Hadoop Interview Questions

What is Backup node ?

The Backup Node provides the same functionality as the Checkpoint Node, but is synchronized with the NameNode. It doesn’t need to fetch the changes periodically because it receives a strem of file system edits. from the NameNode. … The state of HDFS is stored in a file called fsimage and is the base of the metadata.

What is Hadoop Chuksum ?

if you want to comapre file1 in both linux and hdfs you can use Hadoop checksum functionality . by using this you can identified data is correctly add or not to HDFS.


hadoop fs -checksum hdfs://nn1.example.com/file1 hadoop fs -checksum file:///path/in/linux/file1

What is Rack awareness ?

NameNode achieves this rack information by maintaining rack ids of each data node. This concept of choosing closer data nodes based on racks information is called Rack Awareness in Hadoop. A default Hadoop installation assumes all the nodes belong to the same rack

What is default time limit for heart beat mechanism ?

3 seconds

 Why replication factor is 3 in Hadoop ?

Hadoop cluster has multiple racks and each rack has multiple datanodes.So to make hdfs fault tolerance which has datanode failure and rack failure.

If one datanode fails,so can get the same data from another node and also If one rack fails,so can get the same data from different rack.No two replica’s goes to same datanode and atleast one replica goes to different rack.

What is distributed Cache in Mapreduce frame work ?

Mapreduce framework provided a important topic called distributed cache.It is used to share the files on multiple nodes in a cluster.

What is Counter in Hadoop  ?

It is one of the important feature for knowing internal behaviour of mapreduce programs and aslo MapReduce framework provides a number of built-in counters to measure basic I/O operations, such as FILE_BYTES_READ/WRITTEN and Map/Combine/Reduce input/output records.

Why block size is 64 Mb and 128 Mb in hadoop ?

In hadoop 1.X version hdfs default block size is 64 MB,from hadoop version 2.X  hdfs default size is 128Mb.Hadoop 2.X is advanced version of hadoop 1.x ,If number of blocks sizes are decreased then burden on metadata is also reduced,so in hadoop 2.x version hdfs default block size is 128 Mb.

Can we change default block size ?

Yes,we can change default block size of hadoop by using hdfs-site.xml file .In hdfs-site.xml file default block size is block property name. we set the dfs.block.size to 128 MB. Changing this setting will not affect the block size of any files currently in HDFS.

what is difference between mapreduce old api and new API ?

Recently Hadoop new version 2.7.2 has released into Market,Actually Hadoop versions are released in 3 stages 0.x.xx,1.x.xx and 2.x.x,Up to Hadoop 0.20 All packages are In Old API (Mapred) From Hadoop 0.21 All packages are in New API (Mapreduce).For full difference between mapreduce old api and new API .

This all are frequently asked Hadoop Interview Questions in real time IT companies.


  1. sir plz send me hadoop interview que..

Speak Your Mind