Lecture 4 – Association Rule Mining Apriori, FP-Growth, Support, Confidence & Lift

Lecture 4 explains Association Rule Mining, including Apriori, FP-Growth, support, confidence, lift, frequent itemsets, market basket analysis, diagrams, formulas, step-by-step examples, and Python implementation perfect for BS CS, BS AI, BS IT, and Data Science students.

Association Rule Mining is one of the most famous techniques in Data Mining. It is used to analyze customer behavior, discover hidden relationships, and find interesting correlations between items. The classic example is Market Basket Analysis, but association rules are now used in healthcare, cybersecurity, web usage mining, fraud detection, and AI-driven recommendation engines.

This lecture covers the foundation of association rules, explains the Apriori and FP-Growth algorithms, shows how support, confidence, and lift are calculated, and gives real-world examples and Python demonstrations.

Introduction to Association Rule Mining

What Are Association Rules?

Association Rule Mining discovers relationships like:

“If a customer buys bread, they are likely to buy butter.”

In rule format:

Bread → Butter

Why Association Mining Is Used

  • Product recommendation
  • Inventory placement
  • Bundling strategies
  • Fraud detection
  • Web-clickstream analysis

Basic Terminology

Items & Itemsets

  • Item → a single product
  • Itemset → a group of items

Example itemset:

{Milk, Bread, Butter}

Transaction Database

A collection of customer transactions.

Example:

T1: Milk, Bread
T2: Milk, Butter
T3: Bread, Eggs

Frequent Itemsets

Itemsets that meet a minimum support threshold.

Rules & Metrics

Rules take the form:

X → Y

Meaning: if X occurs, Y is likely to occur.

Measures of Interestingness

Association rules are evaluated using mathematical measures.

1. Support

Support tells how often an itemset appears.

Formula
Support(X) = (Number of transactions containing X) / (Total transactions)

Example:

Support(Milk) = 3/5 = 0.6

2. Confidence

Confidence measures the probability that Y appears when X appears.

Formula
Confidence(X → Y) = Support(X ∪ Y) / Support(X)

Example:

Confidence(Bread → Butter) = 0.5

3. Lift

Lift tells whether X and Y occur together more than expected.

Formula
Lift(X → Y) = Confidence(X → Y) / Support(Y)

Interpretation

  • Lift > 1 → Positive correlation
  • Lift = 1 → Independent
  • Lift < 1 → Negative correlation

4. Conviction

Indicates the reliability of a rule.

MIT OCW → https://ocw.mit.edu

Market Basket Analysis (Real-World Example)

Consider this dataset:

TransactionItems
T1Milk, Bread
T2Milk, Butter
T3Bread, Butter
T4Milk, Bread, Butter
T5Bread

Step-by-Step Example:

  • Support(Milk) = 3/5
  • Support(Bread) = 4/5
  • Support(Milk ∪ Bread) = 2/5

Now calculate confidence:

Confidence(Milk → Bread) = 2/3 = 0.66

Lift:

Lift = 0.66 / 0.80 = 0.825

Lift < 1 → negative correlation.

Apriori Algorithm

Apriori is the classical algorithm used to mine frequent itemsets.

Intuition Behind Apriori

If an itemset is frequent, all its subsets must also be frequent.

Example:
If {Milk, Bread, Butter} is frequent → {Milk, Bread} must also be frequent.

Lecture 3 – Data Preprocessing in Data Mining Cleaning, Transformation & Integration

Step-by-Step Working of Apriori

STEP 1: Generate C1 (initial candidates)

Count individual item frequencies.

STEP 2: Generate L1 (frequent 1-itemsets)

Keep only items whose support ≥ min support.

STEP 3: Generate C2 (candidate pairs)

Pair items in L1.

STEP 4: Prune to get L2 (frequent 2-itemsets)

Remove pairs below min support.

STEP 5: Repeat for C3, L3, …

Process continues until no more frequent itemsets can be generated.

Apriori Lattice

Level 1: {A} {B} {C} {D}

Level 2: {A,B} {A,C} {A,D} {B,C} {B,D} {C,D}

Level 3: {A,B,C} {A,B,D} {A,C,D} {B,C,D}

Level 4: {A,B,C,D}

FP-Growth Algorithm

Apriori is slow for large databases.
FP-Growth solves that problem.

Why FP-Growth is Faster

  • No candidate generation
  • Uses tree compression
  • Mines frequent patterns directly

Steps in FP-Growth

1. Build FP-Tree
  • Count item frequency
  • Order items
  • Insert transactions into a tree
2. Mine FP-Tree

Extract frequent itemsets using tree paths.

Apriori vs FP-Growth (Comparison Table)

FeatureAprioriFP-Growth
Candidate generationYesNo
SpeedSlowFast
Memory usageHighLow
Works withSmall datasetsLarge datasets
ImplementationEasyComplex

Real-World Applications

Retail & E-commerce

  • Amazon recommendations
  • Product bundling
  • Store layout optimization

Healthcare

  • Symptoms → Disease relationships
  • Drug interaction patterns

Cybersecurity

  • Detect suspicious user patterns

Web Usage Mining

  • Clickstream → Page recommendation

Python Example (Apriori)

from mlxtend.frequent_patterns import apriori, association_rules
import pandas as pd

df = pd.read_csv("transactions.csv")
frequent = apriori(df, min_support=0.2, use_colnames=True)
rules = association_rules(frequent, metric="confidence", min_threshold=0.5)

print(rules.head())

Python Example (FP-Growth)

from mlxtend.frequent_patterns import fpgrowth

frequent = fpgrowth(df, min_support=0.2, use_colnames=True)
rules = association_rules(frequent, metric="lift", min_threshold=1.0)

print(rules)

Common Mistakes in Association Mining

  • Choosing too low support (too many patterns)
  • Choosing too high support (missing rare patterns)
  • Misinterpreting lift
  • Ignoring negative patterns
  • Applying Apriori on huge datasets

Summary

Lecture 4 covered Association Rule Mining in-depth with complete explanations of Apriori, FP-Growth, support, confidence, lift, and conviction. You learned how frequent itemsets are generated, how rules are formed, how Market Basket Analysis works, how to interpret relationships, and how to implement everything using Python.

Next. Lecture 5 – Classification Decision Trees, Naive Bayes, KNN & Logistic Regression

People also ask:

What is Association Rule Mining?

It finds relationships between items in transactional datasets.

What is support?

Support measures how frequently an itemset appears.

What is confidence?

The probability that Y appears when X appears.

What is lift?

Lift shows the strength of a rule compared to random chance.

Which is better: Apriori or FP-Growth?

FP-Growth is faster for large datasets; Apriori is easier to understand.

Leave a Reply

Your email address will not be published. Required fields are marked *