How Facebook uses Hadoop and Hive

How Facebook uses Hadoop and Hive

Most of the IT Companies are using Hadoop technology why because which can store large datasets and process large datasets.In Hadoop ecosystem which have database(HBase),datawarehouse(Hive),these two components are very useful to storing transcational data in hbase and generate reports by using hive.In traditional RDBMS supports upto certain limit of rows and columns but in hbase can we can store large data in column oriented.

Facebook is one of Hadoop and big data’s biggest champions, and it claims to operate the largest single Hadoop Distributed Filesystem (HDFS) cluster anywhere, with more than 100 petabytes of disk space in a single system as of July 2012.Facebook runs the world’s largest Hadoop cluster.

Just one of several Hadoop clusters operated by the company spans more than 4,000 machines, Facebook deployed Facebook Messages,its first ever user-facing application built on the Apache Hadoop platform.Apache HBase is a database-like layer built on Hadoop designed to support billions of messages per day.

Facebook which have uses hbase for storing transcations data which means messages, likes and put comment..etc , so,company want know how many people liked and commented on post,by using hive they can generates the reports.Hadoop has traditionally been used in conjunction with Hive for storage and analysis of large data sets.They are so many analytics tools are available like MS-BI,OBIEE..etc for generate the reports.

Who generates the data in facebook?

Lots of data is generated on Facebook
500+ million active users
30billion pieces of content shared every month
(news stories, photos, blogs, etc)

Also Read Top25 Hive Interview Questions

Let us see Statistics per day in facebook

1)20 TB of compressed new data added per day
2)3 PB of compressed data scanned per day
3)20K jobs on production cluster per day
4)480K compute hours per day

Now-a-days in India,E-Commerce plays key role for doing business.we have multiple e-commerce websites where we can buy electronic products and cloths..etc.Even these companies are using hadoop technology why because for storing large data regarding products and also processed the data.suppose they want know which itemsets are frequent buying by people on particular day or week or month or year.By using they generate the reports.

Please do subscribe for more updates from us and please comment your opinion about this post


  1. How can we find a word which is repeated more than 5000 times in a text file?

Speak Your Mind