what is hadoop

What is Hadoop : Hadoop is one of the trending technology in IT market because the reason is Big data,Yes big data is a problem and Hadoop is a solution of that problem.Before going to what is hadoop concept first of all we have to know what is Big data and what are the main reasons to increase the data by data in this today world.I will show you some scenarios to you all after understanding this scenarios you all are get a full clarity on what is hadoop.

Hadoop Real Time Scenario 1:   Many banks today has more than 10 to 25 crores customers doing billions of transactions every month and the banks should have to analyze the transaction data and process the actual data and then provide a exact output.

Hadoop Real Time Scenario 2: In today’s internet world very large amount of data every day generated by social media websites and E-commerce websites.This websites track customer behavior on the website and then serve relevant information / product.

Many more Hadoop scenarios are available in this big data world.Here the main problem is traditional systems find it difficult to come up with this scale at required pace in cost-efficient manner,because traditional data bases are not supported for large amount of data and also not suitable for distributed processing,fault tolerance,data replication,rack awareness etc.These all features are supported by Hadoop.This Big Data is mainly depended on 3 characteristics and also called as big data 3V’s.

i) Volume (Large amount of data means Gb,Tb,Pb etc)

ii) Velocity (Batch processing,Near Real time etc)

iii) Variety (Structure,unstructured and semi structured data)

Here volume means Hadoop can handle very large amount of data with out any problems,velocity means hadoop can support batch processing,Iterative processing,Near real time processing also supported and last one variety means hadoop can store and process different types of data that means structure data (tables),unstructured data (images,audio,video,text etc) and semi structure data (json,Xml etc).

What is Hadoop

 Apache hadoop is an open source software platform distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware.  Hadoop services provide for data storage, data processing, data access, data governance, security, and operations.

what is hadoop

what is hadoop

Advantages of Hadoop

The main reason behind hadoop success is benefits of hadoop,because of this benefits so many organizations are using hadoop in production level.Hadoop is its’ ability to store, manage and analyze vast amounts of structured and unstructured data quickly, reliably, flexibly and at low-cost.

Scalability 

by using hadoop we can increase the storage capacity of cluster without changing anything.we can adding more number of nodes horizontally,so we can store petabytes of data in hadoop cluster.

Performance

one of the main feature in hadoop is distributed processing,so at a time work load shared by number of nodes,so work burden reduced output also generated very fastly. This distributed processing is one of the main reason of performance.

Reliability

large computing clusters are prone to failure of individual nodes in the cluster. Hadoop is fundamentally resilient – when a node fails processing is re-directed to the remaining nodes in the cluster and data is automatically re-replicated in preparation for future node failures.

Flexibility

Hadoop can accept structure data,unstructured data or semi structured data,but coming to traditional databases that are accepted only structure schema data sets,in hadoop any kind of data is very flexible.In today internet world 80% of data is unstructured and semi structured data.

Low Cost

Hadoop is an open source software so anyone can use this hadoop with buying any license.Hadoop can work on low commodity hardware .

Hadoop mainly divide into two categories

i) Hadoop Distributed File system (HDFS)

ii) MapReduce

HDFS is responsible for storing structure data,unstructured data and semi structured data,It can handle thousands of petabytes data based on cluster node capacity.Map reduce is responsible processing the data in distributed way.This is the main content about what is hadoop .

Comments

  1. Basavaraj K says:

    Very useful.. Thank you for providing this information to us.

Speak Your Mind

*