Lecture 4 explains Association Rule Mining, including Apriori, FP-Growth, support, confidence, lift, frequent itemsets, market basket analysis, diagrams, formulas, step-by-step examples, and Python implementation perfect for BS CS, BS AI, BS IT, and Data Science students.
Association Rule Mining is one of the most famous techniques in Data Mining. It is used to analyze customer behavior, discover hidden relationships, and find interesting correlations between items. The classic example is Market Basket Analysis, but association rules are now used in healthcare, cybersecurity, web usage mining, fraud detection, and AI-driven recommendation engines.
This lecture covers the foundation of association rules, explains the Apriori and FP-Growth algorithms, shows how support, confidence, and lift are calculated, and gives real-world examples and Python demonstrations.
Introduction to Association Rule Mining
What Are Association Rules?
Association Rule Mining discovers relationships like:
“If a customer buys bread, they are likely to buy butter.”
In rule format:
Bread → Butter
Why Association Mining Is Used
- Product recommendation
- Inventory placement
- Bundling strategies
- Fraud detection
- Web-clickstream analysis
Basic Terminology
Items & Itemsets
- Item → a single product
- Itemset → a group of items
Example itemset:
{Milk, Bread, Butter}
Transaction Database
A collection of customer transactions.
Example:
T1: Milk, Bread
T2: Milk, Butter
T3: Bread, Eggs
Frequent Itemsets
Itemsets that meet a minimum support threshold.
Rules & Metrics
Rules take the form:
X → Y
Meaning: if X occurs, Y is likely to occur.
Measures of Interestingness
Association rules are evaluated using mathematical measures.
1. Support
Support tells how often an itemset appears.
Formula
Support(X) = (Number of transactions containing X) / (Total transactions)
Example:
Support(Milk) = 3/5 = 0.6
2. Confidence
Confidence measures the probability that Y appears when X appears.
Formula
Confidence(X → Y) = Support(X ∪ Y) / Support(X)
Example:
Confidence(Bread → Butter) = 0.5
3. Lift
Lift tells whether X and Y occur together more than expected.
Formula
Lift(X → Y) = Confidence(X → Y) / Support(Y)
Interpretation
- Lift > 1 → Positive correlation
- Lift = 1 → Independent
- Lift < 1 → Negative correlation
4. Conviction
Indicates the reliability of a rule.
MIT OCW → https://ocw.mit.edu
Market Basket Analysis (Real-World Example)
Consider this dataset:
| Transaction | Items |
|---|---|
| T1 | Milk, Bread |
| T2 | Milk, Butter |
| T3 | Bread, Butter |
| T4 | Milk, Bread, Butter |
| T5 | Bread |
Step-by-Step Example:
- Support(Milk) = 3/5
- Support(Bread) = 4/5
- Support(Milk ∪ Bread) = 2/5
Now calculate confidence:
Confidence(Milk → Bread) = 2/3 = 0.66
Lift:
Lift = 0.66 / 0.80 = 0.825
Lift < 1 → negative correlation.
Apriori Algorithm
Apriori is the classical algorithm used to mine frequent itemsets.
Intuition Behind Apriori
If an itemset is frequent, all its subsets must also be frequent.
Example:
If {Milk, Bread, Butter} is frequent → {Milk, Bread} must also be frequent.
Lecture 3 – Data Preprocessing in Data Mining Cleaning, Transformation & Integration
Step-by-Step Working of Apriori
STEP 1: Generate C1 (initial candidates)
Count individual item frequencies.
STEP 2: Generate L1 (frequent 1-itemsets)
Keep only items whose support ≥ min support.
STEP 3: Generate C2 (candidate pairs)
Pair items in L1.
STEP 4: Prune to get L2 (frequent 2-itemsets)
Remove pairs below min support.
STEP 5: Repeat for C3, L3, …
Process continues until no more frequent itemsets can be generated.
Apriori Lattice
Level 1: {A} {B} {C} {D}
Level 2: {A,B} {A,C} {A,D} {B,C} {B,D} {C,D}
Level 3: {A,B,C} {A,B,D} {A,C,D} {B,C,D}
Level 4: {A,B,C,D}
FP-Growth Algorithm
Apriori is slow for large databases.
FP-Growth solves that problem.
Why FP-Growth is Faster
- No candidate generation
- Uses tree compression
- Mines frequent patterns directly
Steps in FP-Growth
1. Build FP-Tree
- Count item frequency
- Order items
- Insert transactions into a tree
2. Mine FP-Tree
Extract frequent itemsets using tree paths.
Apriori vs FP-Growth (Comparison Table)
| Feature | Apriori | FP-Growth |
|---|---|---|
| Candidate generation | Yes | No |
| Speed | Slow | Fast |
| Memory usage | High | Low |
| Works with | Small datasets | Large datasets |
| Implementation | Easy | Complex |
Real-World Applications
Retail & E-commerce
- Amazon recommendations
- Product bundling
- Store layout optimization
Healthcare
- Symptoms → Disease relationships
- Drug interaction patterns
Cybersecurity
- Detect suspicious user patterns
Web Usage Mining
- Clickstream → Page recommendation
Python Example (Apriori)
from mlxtend.frequent_patterns import apriori, association_rules
import pandas as pd
df = pd.read_csv("transactions.csv")
frequent = apriori(df, min_support=0.2, use_colnames=True)
rules = association_rules(frequent, metric="confidence", min_threshold=0.5)
print(rules.head())
Python Example (FP-Growth)
from mlxtend.frequent_patterns import fpgrowth
frequent = fpgrowth(df, min_support=0.2, use_colnames=True)
rules = association_rules(frequent, metric="lift", min_threshold=1.0)
print(rules)
Common Mistakes in Association Mining
- Choosing too low support (too many patterns)
- Choosing too high support (missing rare patterns)
- Misinterpreting lift
- Ignoring negative patterns
- Applying Apriori on huge datasets
Summary
Lecture 4 covered Association Rule Mining in-depth with complete explanations of Apriori, FP-Growth, support, confidence, lift, and conviction. You learned how frequent itemsets are generated, how rules are formed, how Market Basket Analysis works, how to interpret relationships, and how to implement everything using Python.
Next. Lecture 5 – Classification Decision Trees, Naive Bayes, KNN & Logistic Regression
People also ask:
It finds relationships between items in transactional datasets.
Support measures how frequently an itemset appears.
The probability that Y appears when X appears.
Lift shows the strength of a rule compared to random chance.
FP-Growth is faster for large datasets; Apriori is easier to understand.




