Lecture 4 - Association Rule Mining Apriori, FP-Growth, Support, Confidence & Lift

Lecture 4 explains Association Rule Mining, including Apriori, FP-Growth, support, confidence, lift, frequent itemsets, market basket analysis, diagrams, formulas, step-by-step examples, and Python implementation perfect for BS CS, BS AI, BS IT, and Data Science students.

Association Rule Mining is one of the most famous techniques in Data Mining. It is used to analyze customer behavior, discover hidden relationships, and find interesting correlations between items. The classic example is Market Basket Analysis, but association rules are now used in healthcare, cybersecurity, web usage mining, fraud detection, and AI-driven recommendation engines.

This lecture covers the foundation of association rules, explains the Apriori and FP-Growth algorithms, shows how support, confidence, and lift are calculated, and gives real-world examples and Python demonstrations.

Introduction to Association Rule Mining

What Are Association Rules?

Association Rule Mining discovers relationships like:

“If a customer buys bread, they are likely to buy butter.”

In rule format:

Bread → Butter

Why Association Mining Is Used

Product recommendation
Inventory placement
Bundling strategies
Fraud detection
Web-clickstream analysis

Basic Terminology

Items & Itemsets

Item → a single product
Itemset → a group of items

Example itemset:

{Milk, Bread, Butter}

Transaction Database

A collection of customer transactions.

Example:

T1: Milk, Bread
T2: Milk, Butter
T3: Bread, Eggs

Frequent Itemsets

Itemsets that meet a minimum support threshold.

Rules & Metrics

Rules take the form:

X → Y

Meaning: if X occurs, Y is likely to occur.

Measures of Interestingness

Association rules are evaluated using mathematical measures.

1. Support

Support tells how often an itemset appears.

Formula

Support(X) = (Number of transactions containing X) / (Total transactions)

Example:

Support(Milk) = 3/5 = 0.6

2. Confidence

Confidence measures the probability that Y appears when X appears.

Formula

Confidence(X → Y) = Support(X ∪ Y) / Support(X)

Example:

Confidence(Bread → Butter) = 0.5

3. Lift

Lift tells whether X and Y occur together more than expected.

Formula

Lift(X → Y) = Confidence(X → Y) / Support(Y)

Interpretation

Lift > 1 → Positive correlation
Lift = 1 → Independent
Lift < 1 → Negative correlation

4. Conviction

Indicates the reliability of a rule.

MIT OCW → https://ocw.mit.edu

Market Basket Analysis (Real-World Example)

Consider this dataset:

Transaction	Items
T1	Milk, Bread
T2	Milk, Butter
T3	Bread, Butter
T4	Milk, Bread, Butter
T5	Bread

Step-by-Step Example:

Support(Milk) = 3/5
Support(Bread) = 4/5
Support(Milk ∪ Bread) = 2/5

Now calculate confidence:

Confidence(Milk → Bread) = 2/3 = 0.66

Lift:

Lift = 0.66 / 0.80 = 0.825

Lift < 1 → negative correlation.

Apriori Algorithm

Apriori is the classical algorithm used to mine frequent itemsets.

Intuition Behind Apriori

If an itemset is frequent, all its subsets must also be frequent.

Example:
If {Milk, Bread, Butter} is frequent → {Milk, Bread} must also be frequent.

Lecture 3 – Data Preprocessing in Data Mining Cleaning, Transformation & Integration

Step-by-Step Working of Apriori

STEP 1: Generate C1 (initial candidates)

Count individual item frequencies.

STEP 2: Generate L1 (frequent 1-itemsets)

Keep only items whose support ≥ min support.

STEP 3: Generate C2 (candidate pairs)

Pair items in L1.

STEP 4: Prune to get L2 (frequent 2-itemsets)

Remove pairs below min support.

STEP 5: Repeat for C3, L3, …

Process continues until no more frequent itemsets can be generated.

Apriori Lattice

Level 1: {A} {B} {C} {D}

Level 2: {A,B} {A,C} {A,D} {B,C} {B,D} {C,D}

Level 3: {A,B,C} {A,B,D} {A,C,D} {B,C,D}

Level 4: {A,B,C,D}

FP-Growth Algorithm

Apriori is slow for large databases.
FP-Growth solves that problem.

Why FP-Growth is Faster

No candidate generation
Uses tree compression
Mines frequent patterns directly

Steps in FP-Growth

1. Build FP-Tree

Count item frequency
Order items
Insert transactions into a tree

2. Mine FP-Tree

Extract frequent itemsets using tree paths.

Apriori vs FP-Growth (Comparison Table)

Feature	Apriori	FP-Growth
Candidate generation	Yes	No
Speed	Slow	Fast
Memory usage	High	Low
Works with	Small datasets	Large datasets
Implementation	Easy	Complex

Real-World Applications

Retail & E-commerce

Amazon recommendations
Product bundling
Store layout optimization

Healthcare

Symptoms → Disease relationships
Drug interaction patterns

Cybersecurity

Detect suspicious user patterns

Web Usage Mining

Clickstream → Page recommendation

Python Example (Apriori)

from mlxtend.frequent_patterns import apriori, association_rules
import pandas as pd

df = pd.read_csv("transactions.csv")
frequent = apriori(df, min_support=0.2, use_colnames=True)
rules = association_rules(frequent, metric="confidence", min_threshold=0.5)

print(rules.head())

Python Example (FP-Growth)

from mlxtend.frequent_patterns import fpgrowth

frequent = fpgrowth(df, min_support=0.2, use_colnames=True)
rules = association_rules(frequent, metric="lift", min_threshold=1.0)

print(rules)

Common Mistakes in Association Mining

Choosing too low support (too many patterns)
Choosing too high support (missing rare patterns)
Misinterpreting lift
Ignoring negative patterns
Applying Apriori on huge datasets

Summary

Lecture 4 covered Association Rule Mining in-depth with complete explanations of Apriori, FP-Growth, support, confidence, lift, and conviction. You learned how frequent itemsets are generated, how rules are formed, how Market Basket Analysis works, how to interpret relationships, and how to implement everything using Python.

Next. Lecture 5 – Classification Decision Trees, Naive Bayes, KNN & Logistic Regression

Introduction to Association Rule Mining

What Are Association Rules?

Why Association Mining Is Used

Basic Terminology

Items & Itemsets

Transaction Database

Frequent Itemsets

Rules & Metrics

Measures of Interestingness

1. Support

Formula

2. Confidence

Formula

3. Lift

Formula

Interpretation

4. Conviction

Market Basket Analysis (Real-World Example)

Step-by-Step Example:

Now calculate confidence:

Lift:

Apriori Algorithm

Intuition Behind Apriori

Step-by-Step Working of Apriori

STEP 1: Generate C1 (initial candidates)

STEP 2: Generate L1 (frequent 1-itemsets)

STEP 3: Generate C2 (candidate pairs)

STEP 4: Prune to get L2 (frequent 2-itemsets)

STEP 5: Repeat for C3, L3, …

Apriori Lattice

FP-Growth Algorithm

Why FP-Growth is Faster

Steps in FP-Growth

1. Build FP-Tree

2. Mine FP-Tree

Apriori vs FP-Growth (Comparison Table)

Real-World Applications

Retail & E-commerce

Healthcare

Cybersecurity

Web Usage Mining

Python Example (Apriori)

Python Example (FP-Growth)

Common Mistakes in Association Mining

Summary

People also ask:

Related Posts

Lecture 17 – Final Exam Bank for Data Mining (Massive Question Set)

Lecture 16 – Robotics and Automation in Data Mining

Lecture 15 – Big Data Analytics for Data Mining

Leave a ReplyCancel Reply