Introduction to Hive

Introduction to Hive

What is Hive 

Hive is a data warehouse software which is used for facilitates querying and managing large data sets residing in distributed storage.Hive language almost look like SQL language called HiveQL.Hive is designed to enable easy data summarization.Hive also allows traditional map reduce programs to customize mappers and reducers when it is inconvenient or inefficient to execute the logic in HiveQL (User Defined Functions UDFS).Hive can easily integrated with other data technologies by using Hive JDBC connection  For More click here (Hive)

Introduction to Hive is not

What Hive is Not

Hive is a batch processing system and hive jobs takes much latency  to execute the quires  comparing to other databases like Oracle.In Oracle databases it can supports only GBs of data but in Hive we can execute More than TBs of data.Hive aims to provide acceptable (but not optimal) latency for interactive data browsing, queries over small data sets or test queries.Hive is not designed for online transaction supports and does not offer real-time queries and row level updates. It is best used for batch jobs over large sets of immutable data (like web logs).How Facebook Uses Hive Click Here (Hive FaceBook)

The Stinger Initiative successfully delivered a fundamental new Apache Hive, which evolved Hive’s traditional architecture and made it faster, with richer SQL semantics and petabyte scalability.

Three Key Facets of Hive

Three Key Facets of Hive

Three Key Facets of Hive


Introduction to Hive Releases 

Recent Hive Releases


 Recent Hive Releases

Recent Hive Releases


Apache hive versions

Apache hive versions

Introduction to Hive Features 

Hive Features Included

i) Easy to enable tools for ETL(extract/transform/load)

ii) Stores variety of data

iii) Directly store the data on top of HDFS or Apache Hbase

iv) Mapreduce Execution Internally  

v) Best used for batch jobs over large sets of append-only data (like web logs).

vi) Users very comfortable with SQL

vi) Developed by facebook and contributed by facebook

vii)Custom, aggregations,table functions available UDFs (User defined functions) UDAFs(User defined aggregation functions),and table functions (UDTF’s).

viii)  Hive works equally well on Thrift

ix) Apache Derby default one for Hive ,Mysql can optionally be used

x) Currently, there are four file formats supported in Hive, which are TEXTFILE, SEQUENCEFILE, ORC and RCFILE.

Introduction to Hive Applications

Hive Applications Include

  • Data Mining
  • Document Indexing
  • Predictive modeling, and Hypothesis testing
  • Customer-facing Business Intelligence (e.g., Google Analytics)
  • Log processing
  • Hive is not designed for OLTP workloads and does not offer real-time queries or row-level updates.



  1. Manjinder Singh says:


    I have one question. How can we update data in hive table? Please explain it with small example.

    Manjinder singh

    • mahesh chimmiri says:

      From Local ——– Load data local inpath ‘/urpath’ into table tablename;
      From HDFS ——– Load data inpath ‘/hdfspath’ into table tablename;

Speak Your Mind