Lecture 2 explains the types of data in Data Mining, including structured, semi-structured, unstructured, spatial, temporal, and multimedia data. It also covers data patterns, practical examples, case studies, diagrams, and tools used in industry perfect for BS CS, BS AI, BS IT & Data Science learners.
Data is the fuel of modern analytics, machine learning, artificial intelligence, and decision-making. But not all data is equal. Some data is neatly organized, while other data exists in raw, chaotic forms like tweets, images, logs, or sensor signals. Lecture 2 explores the types of data used in Data Mining and the patterns that help uncover meaningful insights.
Understanding the nature of data is essential before applying any mining algorithm. This lecture prepares you to handle complex datasets in real industrial environments such as healthcare, banking, retail, cybersecurity, and IoT.
Introduction to Data Types
Why Understanding Data Types Matters
If you choose the wrong method for the wrong data type, results become inaccurate.
For example:
- Classification works well with structured data
- NLP techniques are required for text data
- CNNs work best for images
How Data Types Influence Algorithms
Algorithms depend on the data structure.
Example:
- SQL queries → Structured data
- Deep learning → Unstructured data
Real-World Motivation
Every industry uses a mix of structured and unstructured data:
- Banks store transactional data
- Hospitals store medical records & X-rays
- Social media generates text, images, and videos
- IoT generates streaming sensor data
Structured Data
Structured data is organized in rows and columns, making it easy to search, analyze, and model.
Characteristics
- Tabular format
- Clearly defined schema
- Fast processing
Examples
- SQL databases
- Attendance sheets
- Inventory systems
Advantages
- Easy to manage
- Supports fast queries
- Ideal for statistical analysis
Limitations
- Cannot store video, images, or text efficiently
- Lacks flexibility for modern AI applications
Semi-Structured Data
Semi-structured data contains tags or markers but does not follow a strict schema.
XML & JSON
Used extensively in APIs, web applications, and configuration files.
Example (JSON):
{
"name": "Ali",
"age": 25,
"skills": ["Python", "Data Mining"]
}
Log Files
- Server logs
- Application logs
- Security logs
Use Cases
- System monitoring
- Web analytics
- Big data pipelines
Lecture 1 – Introduction to Data Mining Concepts, Tasks & Real-World Applications
Unstructured Data
Unstructured data has no predefined model and is the largest type of data in the world.
Text Data
Tweets, comments, blogs, WhatsApp messages.
Techniques required:
- Tokenization
- Stemming
- Sentiment analysis
- Topic modeling
Multimedia Data
Images, video, and audio require:
- CNNs
- Speech recognition
- Face detection
Big Data Challenges
Unstructured data requires:
- High storage
- Specialized algorithms
- GPU computing
Specialized Data Types
Transactional Data
Each row represents a transaction.
Used in:
- Market basket analysis
- Fraud detection
Example:
Transaction ID | Items
001 | Milk, Bread, Eggs
AI Course → https://electuresai.com/artificial-intelligence
Temporal (Time-Series) Data
Data with time stamps:
- Stock prices
- Weather readings
- IoT sensor data
Spatial & Geographic Data
Used in:
- Google Maps
- GIS systems
- Disease mapping
Streaming Data
Continuous real-time data:
- Live temperature sensors
- Social media streams
- Network intrusion logs
Kaggle Learn → https://kaggle.com/learn
Data Patterns in Data Mining
Frequent Patterns
Patterns that appear frequently in datasets.
Example:
Customers who buy rice often buy oil.
Sequential Patterns
Patterns that follow a sequence.
Example:
Visit → Add to cart → Purchase
Graph Patterns
Used in:
- Social networks
- Knowledge graphs
Spatial Patterns
Used in:
- Earthquake analysis
- Weather prediction
- Location-based services
Data Characteristics That Affect Mining
Dimensionality
Higher dimensions = harder to analyze.
Example:
Image data has thousands of pixels (features).
Sparsity
Sparse datasets contain many zeros
Examples:
- Recommendation systems
- Text mining
Distribution & Noise
Noisy data reduces model accuracy.
Examples & Case Studies
Healthcare
- Patient vitals (temporal data)
- MRI images (unstructured data)
- Disease prediction (patterns)
Banking
- Credit score data (structured)
- Fraud detection (outlier detection)
E-Commerce
- User behavior (sequential patterns)
- Purchase history (transactional data)
Tools for Handling Different Data Types
Python Libraries
- Pandas → Structured
- NumPy → Numeric
- Matplotlib → Visualization
- NLTK, SpaCy → Text
- OpenCV → Image
NoSQL Databases
- MongoDB
- Cassandra
- Firebase
Cloud Storage
- AWS S3
- Google Cloud Storage
- Azure Blob
Common Mistakes When Handling Data Types
- Treating text as structured data
- Ignoring multimedia data
- Using wrong algorithms
- Not preprocessing properly
- Mixing inconsistent data formats
Summary
Lecture 2 introduced the essential types of data in Data Mining: structured, semi-structured, unstructured, temporal, spatial, multimedia, and transactional data. You explored how different data types require different tools, algorithms, and preprocessing steps. Finally, you learned how patterns such as frequent, sequential, graph, and spatial patterns help uncover hidden knowledge.
Next Lecture 3 – Data Preprocessing in Data Mining Cleaning, Transformation & Integration
People also ask:
Structured, semi-structured, and unstructured data.
Data that has tags (like JSON or XML) but no strict schema.
Repeated relationships between data items.
Graph data, text, images, and video.
Python NLP libraries, OpenCV, TensorFlow, deep learning frameworks.




