Simplifying Concepts

Essence of Life is in Simplicity

Association Rule Mining

Association Rule Mining

Apriori Algorithm

Hi guys welcome back, in this post I would be covering about Apriori algorithm’ s association rules, its disadvantages and more specifically its limitations. First of all I would deal with pseudo code of Apriori algorithm.

Pseudo Code: (Ignore this if you get confused the steps in previous post are sufficient)
Ck: Candidate itemset of size k.
Lk: Frequent itemset of size k.
L1= {contains frequent items}
For (k=1; Lk! = null; k++)

    1. Ck+1 = candidates generated from Lk (through join operation);
    2. For each transaction t(say T100, T200, ..) in database increment the count of all candidates in Ck+1 that are contained in t;
    3. Lk+1= candidates in Ck+1 with min_support;

End
Return Union (Lk);

What is Association rule?
Representation: computer ➞ Antivirus software
Association depicts whether purchase of computer is associated with antivirus software. That is, Is Antivirus software bought frequently provided customer buys computer? This is what association is.

Association rule thus describes whether a particular item on Right hand side of the arrow is bought frequently with item on left hand side. Association uses the concept of confidence.Confidence is the minimum accepted percentage that depicts whether particular item is frequently purchased with other item. So if an association has confidence >= min_confidence specified then it is said to be strong association. (Again don’t worry this would be clearer in the below example)

Confidence = support_count(l)/support_count(s)

So from the previous post “Click here” we got Lk = {(I1, I2, I3), (I1, I2, I5)}

Generating Association Rules:

For each frequent itemset L (L refers to individual elements of Lk i.e. (I1, I2, I5) or (I1, I2, I5)) generate all non empty subsets of L.
s’ refers to item on left hand side and is usually subset of L.

General Association rule is depicted by S ➞ (L-S)

Considering L= (I1, I2, I5)
Subsets possible = ({I1}, {I2}, {I5}, {I1, I2}, {I2, I5}, {I1, I5})

When s= {I1} association rule is depicted by {I1} ➞ {I2^I5} that is from S ➞ (L-S)
Confidence=support_count(L)/support_count(s) = 2/6 =33.33% < 50% Support_count(I1,I2,I5)=2 Support_count(I1)=6 from previous post-Apriori Algorithm example.
So the above association {I1} ➞ {I2^I5} is not strong association.

Similarly calculate confidence for {I2} i.e. s= {I2}
{I2} ➞ {I1^I5} confidence=support_count(I1, I2, I5)/support_count(I2)= 2/7 =28.5% < 50% So again this is not a strong association {I5} ➞ {I1 ^ I2} confidence=2/2 =100% so greater than 50%
So above association is a strong association.

{I1 ^ I2} ➞ {I5} confidence=support_count(I1, I2, I5)/support_count(I1,I2)=2/4 =50%
{I1 ^ I5} ➞ {I2} confidence=2/2 =100% >50% hence strong association.
{I2 ^ I5} ➞ {I1} confidence=2/2=100% >50% hence strong association.

Now follow the same procedure to calculate association rule mining for L= {I1, I2, I3} in the similar way.

Limitations of Apriori Algorithm
1. It involves generation of candidate set of size k. It is not feasible to generate huge candidate set of k for large number of items and that too for large number of transactions/ records.
2. Further the database needs to be repeatedly scanned and checked for a large set of candidates by pattern matching which is again a performance issue in case of large number of transactions.

This has led to creation and usage of new algorithm for frequent item mining known as Frequent pattern growth (FP growth method)


Note:That’s it, for any doubts and queries you can post in the comment section below. Keep visiting and sharing this page because “Sharing is caring”.

Post a comment

error: Content is protected !!