Lecture 8 explains Web Mining and Social Network Mining, including content mining, structure mining, usage mining, link analysis, PageRank, HITS, graph theory, community detection, sentiment analysis, examples, diagrams, and Python code.
The modern web contains billions of interconnected documents, pages, videos, and social posts. Every search query, click, comment, reaction, and share generates data. Organizations use this data to understand user behavior, improve search engines, detect spam, analyze trends, personalize recommendations, and strengthen cybersecurity.
This lecture explores how Data Mining techniques are applied specifically to the web and social networks.
Introduction to Web & Social Network Mining
What Is Web Mining?
Web Mining refers to extracting useful information from the web using:
- Data Mining
- Machine Learning
- Natural Language Processing
- Graph analysis
It covers content, structure, and user behavior on the web.
Why Web Mining Is Important
Web Mining helps:
- Improve search engine results
- Detect fake reviews
- Recommend products
- Understand user behavior
- Track trends
- Analyze website performance
What Is Social Network Mining?
Social Network Mining analyzes:
- Relationships
- User interactions
- Network structure
- Influence flow
- Community formation
Examples include:
- Twitter/X
- TikTok
Types of Web Mining
Web Mining is divided into three main categories.
1. Web Content Mining
Extracting useful content from:
- HTML pages
- Blogs
- Articles
- Images
- Audio
- Video
2. Web Structure Mining
Analyzing link structures:
- Hyperlinks
- Inbound & outbound links
- Site architecture
- Graph representation
3. Web Usage Mining
Understanding how users interact with websites:
- Clickstream
- Session logs
- Browsing patterns
Web Content Mining
Web content mining focuses on extracting meaningful data from web pages.
Text Extraction & Web Scraping
Web scraping tools:
- BeautifulSoup
- Scrapy
- Selenium
Example HTML extraction:
<h1>Data Mining Course</h1>
<p>Learn classification, clustering, and more.</p>
Python example:
from bs4 import BeautifulSoup
import requests
page = requests.get("https://example.com")
soup = BeautifulSoup(page.text, 'html.parser')
print(soup.find('h1').text)
NLP in Web Content Mining
Natural Language Processing techniques:
- Tokenization
- Named-Entity Recognition (NER)
- Topic Modeling
- Sentiment Analysis
Applications:
- Summarizing news articles
- Extracting keywords
- Detecting spam comments
Multimedia Mining
Extracting features from:
- Images
- Videos
- Audio
Tools used:
- CNNs for image classification
- OCR for image text extraction
- Video content summarization
Example:
A product image is analyzed to classify color, size, or type.
Web Structure Mining
Web structure mining views the internet as a graph.
Nodes = web pages
Edges = hyperlinks
Link Analysis
Link popularity determines importance.
PageRank Algorithm
Used by Google to rank webpages.
Formula (simplified):
PR(A) = PR(B)/links(B) + PR(C)/links(C) + ...
High PageRank = important page.
HITS Algorithm
Hyperlink-Induced Topic Search
Divides pages into:
- Authorities (valuable pages)
- Hubs (pages that link to many authorities)
Example:
- A university homepage → hub
- Research papers → authorities
Web Usage Mining
Web usage mining discovers patterns in user interactions.
Log File Sources
- Web server logs
- Browser logs
- Cookies
- Session history
Example log:
IP: 192.168.0.1
Page Visited: /products
Time: 10:23 AM
Clickstream Analysis
Sequence of pages visited by a user.
Example:
Homepage → Category → Product → Checkout
Useful for:
- Improving website design
- Personalized recommendations
- Identifying drop-off pages
User Behavior Tracking
Using heatmaps, scroll depths, session duration to improve UX.
Introduction to Social Network Mining
Social networks are represented as graphs.
Graph Theory Basics
- Node: A user
- Edge: Relationship between two users
Example:
Alice — Bob
Bob — Charlie
Charlie — Alice
Degree
Number of connections a node has.
Centrality
Importance of a node:
- Degree Centrality
- Betweenness Centrality
- Closeness Centrality
Social Network Analysis Techniques
Community Detection
Grouping people with similar interests.
Algorithms:
- Louvain Method
- Girvan-Newman Algorithm
Used in:
- Marketing segmentation
- Political campaign analysis
- Recommendation systems
Influence Propagation
How ideas spread in networks.
Examples:
- Viral tweets
- Trends
- News spread
Mathematical models:
- Independent Cascade
- Linear Threshold
Sentiment Analysis on Social Data
Sentiment analysis detects user emotions in posts.
Lexicon-Based Approach
Uses predefined dictionaries:
- Positive words
- Negative words
Example:
Happy, excited → positive
Angry, terrible → negative
Machine Learning Approach
Algorithms:
- SVM
- Naive Bayes
Deep Learning Approach
Models:
- LSTMs
- Transformers
- BERT
Used for:
- Brand monitoring
- Hate-speech detection
- Customer feedback analysis
Case Studies of Web & Social Mining
1. E-Commerce Personalization
Amazon uses:
- Clickstream
- Customer behavior
- PageRank-style link analysis
To recommend:
- Products
- Bundles
- Similar items
2. Security & Threat Detection
Social networks reveal:
- Spam accounts
- Fake news spread
- Bot networks
Machine learning helps detect:
- Fake profiles
- Suspicious activity
3. Political & Social Trend Analysis
Twitter trends reflect:
- Public opinion
- Election predictions
- Movement growth
Sentiment models analyze millions of tweets.
Python Examples
Web Scraping
import requests
from bs4 import BeautifulSoup
response = requests.get("https://example.com")
soup = BeautifulSoup(response.text, "html.parser")
print(soup.title.text)
PageRank (simplified)
import networkx as nx
G = nx.DiGraph()
G.add_edges_from([("A","B"), ("B","C"), ("C","A")])
pr = nx.pagerank(G)
print(pr)
Sentiment Analysis
from textblob import TextBlob
text = "This product is amazing!"
print(TextBlob(text).sentiment.polarity)
Common Challenges & Mistakes
- Scraping dynamic websites incorrectly
- Ignoring robots.txt rules
- Misinterpreting sentiment polarity
- Incorrect PageRank assumptions
- Oversampling or undersampling in social graphs
- Incomplete log data
Summary
Lecture 8 covered Web Mining and Social Network Mining comprehensively. It explained web content, structure, and usage mining, link analysis algorithms, social graph analysis, sentiment analysis, and influence modeling. You also learned practical Python examples, real-world case studies, and important techniques for analyzing large-scale web and social data.
People also ask:
Web Mining is extracting useful patterns from web content, structure, and user interactions.
It analyzes relationships and structures in social media networks using graph theory.
PageRank and HITS.
Lexicon-based, machine learning, and deep learning models.
Tracking the sequence of pages visited by a user.




