Finding Frequent Itemsets using Hadoop-MapReduce Model

Finding Frequent Itemsets using Hadoop-MapReduce Model

Frequent sets play an essential role in many Data Mining tasks that try to find interesting patterns from databases, such as association rules, correlations, sequences, episodes, classifiers and clusters. The mining of association rules is one of the most popular problems of all these. The identification of sets of items, products, symptoms and characteristics, which often occur together in the given database, can be seen as one of the most basic tasks in Data Mining.

Apriori  is the most established algorithm for finding frequent itemsets from a transactional dataset; however, it needs to scan the dataset many times and to generate many candidate itemsets. Unfortunately, when the dataset size is huge, both memory use and  computational cost can still be very expensive. In addition, single processor’s memory and  CPU resources are very limited, which make the algorithm performance inefficient. Furthermore; because of the exponential growth of worldwide information, enterprises (organizations) have to deal with an ever growing amount of data. As these data grow past hundreds of gigabytes towards a terabyte or more, it becomes nearly impossible to process (mine) them on a single sequential machine. The solution for the above problems is parallel and distributed computing.(Hadoop-Mapreduce Framework)

Data Flow diagram of Apriori algorithm in Hadoop-MapReduce framework:


Here below to download the code for finding frequent itemsets:



Run this command on terminal:  hadoop jar /mraprior.jar /groceries.csv  /output1 /output2

In output1,we’ll see the 1-n frequent itemsets

In output2,we’ll see final results (assocation rule)

output2 screenshot:




  1. I’m a beginner in hadoop. Can u pls explain the code ?

  2. I am beginner in hadoop please any one send the code for apriori algorithm for frequent itemset mining in hadoop

  3. please explain me this code

  4. y am i getting infinity in the place of 1.0,1.5 and all
    plz help!!!!!!!!!!!!!!!!!!!!!!!11

  5. Is this parallel implementation of Apriori ????????????
    Please do reply

  6. I setup and Install in Eclipse but after running I have the following code:
    java.lang.ArrayIndexOutOfBoundsException: 2
    at RuleMining.main(

Speak Your Mind