A complete introduction to parallel and distributed computing covering system evolution, parallel vs distributed systems, Flynn’s taxonomy (SIMD, MIMD), and key performance motivations.
Parallel and Distributed Computing is the foundation of modern high-performance systems. Whether you’re running apps on a smartphone, training AI models, streaming videos, simulating climate systems, or serving millions of web users, you’re depending on computation happening simultaneously either inside one machine (parallel) or across many machines (distributed).
This lecture builds the core understanding you need before learning OpenMP, MPI, GPU programming, and performance tuning. We’ll go deep into the evolution of computing, parallel vs distributed systems, Flynn’s taxonomy (SIMD, MIMD), and the main motivations: performance, scalability, and energy efficiency.
1) Evolution of Computing Systems
1.1 Early Era: Single Processor (Sequential Computing)
Initially, computers were designed to execute one instruction at a time. The classical model is Von Neumann architecture, where:
- Instructions and data share memory.
- A single CPU fetches and executes instructions sequentially.
- One “instruction stream” processes one “data stream.”
This approach shaped early programming:
- One CPU core
- One control flow
- One program counter
1.2 Performance Growth Through Frequency Scaling
For many years, CPUs improved primarily by:
- Increasing clock speed (MHz → GHz)
- Improving instruction pipelines
- Using caches
- Adding instruction-level parallelism (ILP), such as pipelining and superscalar execution
But this approach hit limits.
1.3 Why Clock Speed Couldn’t Keep Increasing (The Wall)
Increasing frequency increases:
- Power consumption
- Heat production
- Leakage current in transistors
This led to major barriers:
- Power wall: power becomes too high to cool economically.
- Thermal wall: chips overheat.
- Memory wall: CPU became faster than memory; CPU waits for data.
- ILP wall: compilers and hardware can’t extract unlimited parallelism from single instruction streams.
So instead of pushing frequency, industry shifted to parallelism.
1.4 Multicore Revolution
Modern CPUs evolved into:
- Dual-core, quad-core, 8-core, 16-core, 64-core…
- Each core can run independent threads.
This created a new reality:
- Performance improvements require parallel programs, not just faster hardware.
1.5 Distributed Computing Evolution (From Clusters to Cloud)
As problems grew larger:
- One machine was not enough (memory capacity, compute power, storage).
- Multiple machines connected via networks formed:
- Clusters (co-located machines, high-speed interconnects)
- Grids (sharing resources across domains)
- Clouds (virtualized, scalable, on-demand infrastructure)
Distributed computing became essential for:
- Big data
- Global web services
- Resilience and fault tolerance
- Large-scale scientific computation
2) The Problem with Purely Sequential Computing
Even the fastest single-core machine has fundamental limitations.
2.1 Execution Time Bottleneck
A sequential program must do:
- Step 1 → Step 2 → Step 3 → Step 4
No overlap.
For large tasks (e.g., video encoding, deep learning training), sequential execution time becomes impractical.
2.2 Memory and I/O Bottlenecks
A CPU can compute quickly, but if data isn’t available:
- It stalls waiting for memory.
- Cache misses become expensive.
- Disk/network I/O delays dominate performance.
Parallel and distributed systems use:
- Caches
- Multiple memory channels
- Overlapped communication and computation
- Data partitioning
2.3 Limits of Hardware Optimization Alone
Even with:
- Pipelining
- Branch prediction
- Out-of-order execution
A single instruction stream can’t match the throughput of multiple cores/GPUs.
3) What is Parallel Computing?
3.1 Definition
Parallel computing means solving a problem by dividing it into parts and executing those parts simultaneously using multiple processing elements.
3.2 Where Parallelism Exists
Parallelism can happen at many levels:
(A) Bit-Level Parallelism
Early CPUs increased word size (8-bit → 16-bit → 32-bit → 64-bit).
Operating on more bits per instruction gives more performance.
(B) Instruction-Level Parallelism (ILP)
Inside a CPU core:
- Pipeline stages execute overlapping instructions.
- Superscalar CPUs execute multiple instructions per cycle.
You get parallelism without changing code much, but it has limits.
(C) Data Parallelism
Same operation on many data items:
- Add two large arrays element-by-element
- Apply filter to every pixel in an image
This is ideal for SIMD and GPUs.
(D) Task Parallelism
Different tasks run concurrently:
- One thread handles user input
- One thread processes data
- One thread writes output
This is common in MIMD multicore CPUs.
3.3 Shared Memory Parallel Systems
Many parallel machines use shared memory:
- Threads share address space.
- Communication occurs via shared variables.
Advantages:
- Easy to share data
- Faster than networking
Challenges:
- Race conditions
- Synchronization overhead
- Cache coherence complexity
Example: A typical multicore CPU running OpenMP threads.
4) What is Distributed Computing?
4.1 Definition
Distributed computing uses multiple independent computers connected via a network to solve a problem.
Each machine (node):
- Has its own CPU(s)
- Has its own memory
- Has its own operating system instance
- Communicates using messages
4.2 Message Passing and Communication
Since memory is not shared:
- Node A can’t directly read Node B’s memory.
- Data must be sent via:
- sockets
- RPC (remote procedure calls)
- message passing frameworks (MPI)
- distributed data engines (e.g., MapReduce model)
4.3 Key Properties of Distributed Systems
Distributed systems add complexity but provide huge benefits:
Concurrency
Many nodes run at the same time.
No Global Clock
Different machines have different clocks, leading to challenges in ordering events.
Partial Failures
In a distributed system:
- one node can fail while others remain active
- network links can fail
- messages can be delayed or lost
This requires fault tolerance mechanisms.
5) Parallel vs Distributed Systems
5.1 Core Difference: Memory and Communication
Parallel system (shared memory):
- Communication through memory reads/writes
- Very fast inter-core communication
Distributed system (distributed memory):
- Communication through network messages
- Higher latency, lower bandwidth than memory
5.2 Latency and Bandwidth
- Memory access latency: nanoseconds
- Network latency: microseconds to milliseconds
This makes distributed computing harder:
- Communication cost is high
- Algorithms must minimize messaging
5.3 Fault Tolerance
Parallel computing typically assumes:
- one machine, fewer failures
Distributed computing must assume:
- failures are normal
- systems must recover automatically
5.4 Scaling
Parallel systems scale up to:
- tens/hundreds of cores per machine
Distributed systems scale to:
- thousands/millions of machines globally
5.5 Examples
Parallel example:
- 16-core CPU rendering a video frame using threads
Distributed example:
- A cloud cluster processing petabytes of logs using many machines
6) Flynn’s Taxonomy (SIMD, MIMD)
Flynn’s taxonomy classifies systems by:
- Instruction stream: how many instruction sequences are executed
- Data stream: how many data sequences are processed
6.1 SISD (Single Instruction, Single Data)
- Classic sequential machine
- One core, one instruction flow
Example:
- Simple embedded processors
- Basic single-thread execution
6.2 SIMD (Single Instruction, Multiple Data)
- One instruction applied to many data items simultaneously
- Great for vectorizable computations
Modern SIMD appears in:
- CPU vector extensions (SSE, AVX)
- GPUs executing many threads in lockstep style
Best suited for:
- Image processing (apply filter to all pixels)
- Linear algebra
- Physics simulations
- Audio/video encoding
Limitations:
- Works best when operations are uniform across data
- Branch divergence reduces efficiency (especially on GPUs)
6.3 MISD (Multiple Instruction, Single Data)
Rare in practice.
Sometimes appears in:
- Fault-tolerant systems (multiple computations verifying same data)
6.4 MIMD (Multiple Instruction, Multiple Data)
Most common modern architecture:
- Multiple cores execute different programs or threads
- Each can operate on different data
Examples:
- Multicore CPUs
- Multi-socket servers
- Distributed clusters (each node is independent)
MIMD is flexible, but requires:
- synchronization tools (locks, barriers)
- careful algorithm design
7) Motivation: Why Parallel and Distributed Computing?
7.1 Motivation 1: Performance
The simplest motivation: finish tasks faster.
If a computation can be split into 8 equal independent parts:
- 1 core: 80 seconds
- 8 cores (ideal): 10 seconds
But real speedup is limited by:
- communication overhead
- synchronization delays
- serial parts of code
- load imbalance
This leads to a core principle:
Parallelism is powerful, but overhead and serial code limit gains.
7.2 Motivation 2: Scalability (Handle Bigger Problems)
Some problems cannot fit on one machine due to:
- memory limits
- compute limits
- storage limits
Distributed systems scale by adding more machines:
- more memory
- more compute
- more storage
Example:
A machine learning pipeline on massive datasets needs:
- distributed storage
- distributed processing
- distributed training
7.3 Motivation 3: Energy Efficiency (Performance per Watt)
Higher performance used to mean:
- higher clock speed
- much higher power consumption
Now the goal is:
- maximize performance per watt
Using multiple slower cores can be more energy efficient than one very fast core:
- lower voltage
- less heat
- better throughput
This is critical for:
- mobile devices
- data centers (electricity cost)
- sustainability goals
8) Key Terms You Must Know (Glossary)
- Concurrency: multiple tasks in progress (not always simultaneous)
- Parallelism: tasks executed simultaneously
- Thread: lightweight execution unit within a process
- Process: program instance with separate memory space
- Shared memory: multiple threads access same memory
- Distributed memory: each node has its own memory
- Speedup: performance improvement compared to sequential execution
- Overhead: extra cost due to coordination/communication
- Synchronization: coordination to avoid incorrect memory access conflicts
Summary
Parallel and Distributed Computing evolved to overcome the limitations of single-core, sequential systems. Parallel computing uses multiple processors within one machine to execute tasks simultaneously, while distributed computing connects multiple machines over a network to work together. Flynn’s taxonomy classifies systems like SIMD and MIMD, which form the basis of modern multicore CPUs and GPUs. The main goals of these approaches are improved performance, scalability, and energy efficiency, making them essential for today’s high-performance and cloud-based applications.
Multiple Choice Questions (MCQs)
1. Which architecture executes one instruction on one data stream?
A) SIMD
B) MISD
C) SISD
D) MIMD
Answer: C) SISD
2. The main reason for shifting from single-core to multicore processors was:
A) Software limitations
B) Power and thermal constraints
C) Lack of memory
D) Network latency
Answer: B) Power and thermal constraints
3. In parallel computing, processors typically communicate through:
A) Internet
B) Message brokers
C) Shared memory
D) Satellites
Answer: C) Shared memory
4. In distributed computing, nodes communicate using:
A) Shared registers
B) Message passing
C) Cache coherence
D) CPU pipeline
Answer: B) Message passing
5. SIMD stands for:
A) Single Instruction Multiple Data
B) Sequential Instruction Multiple Data
C) Single Integrated Memory Device
D) System Integrated Multi Device
Answer: A) Single Instruction Multiple Data
6. Which architecture is most common in modern multicore systems?
A) SISD
B) SIMD only
C) MIMD
D) MISD
Answer: C) MIMD
7. GPUs are best categorized under:
A) SISD
B) SIMD
C) MISD
D) None
Answer: B) SIMD
8. Which is NOT a motivation for parallel computing?
A) Performance
B) Scalability
C) Energy efficiency
D) Reducing storage permanently
Answer: D) Reducing storage permanently
9. Distributed systems are typically:
A) Tightly coupled
B) Loosely coupled
C) Single-core
D) Cache dependent
Answer: B) Loosely coupled
10. The term “speedup” refers to:
A) Increasing clock frequency
B) Performance improvement compared to sequential execution
C) Reducing RAM
D) Increasing voltage
Answer: B) Performance improvement compared to sequential execution
11. A cluster of computers connected via a network is an example of:
A) Sequential system
B) Parallel shared memory system
C) Distributed system
D) Single-core system
Answer: C) Distributed system
12. Which of the following is rare in practical systems?
A) SISD
B) SIMD
C) MISD
D) MIMD
Answer: C) MISD
13. The “power wall” refers to:
A) Lack of RAM
B) Increasing power consumption limiting CPU speed
C) Slow networks
D) Low storage
Answer: B) Increasing power consumption limiting CPU speed
14. In distributed systems, partial failure means:
A) Entire system crashes
B) One node fails while others continue
C) CPU overheats
D) Cache misses
Answer: B) One node fails while others continue
15. Applying the same operation to all pixels of an image is an example of:
A) Task parallelism
B) Data parallelism
C) Sequential processing
D) MISD
Answer: B) Data parallelism
Answers to Short Answer Questions
1. Define parallel computing.
Parallel computing is a computing model in which multiple processors or cores execute different parts of a program simultaneously to reduce execution time and improve performance.
2. Define distributed computing.
Distributed computing is a system where multiple independent computers communicate over a network and coordinate their actions by passing messages to solve a common problem.
3. What is Flynn’s taxonomy?
Flynn’s taxonomy is a classification system for computer architectures based on the number of instruction streams and data streams. It includes four categories: SISD, SIMD, MISD, and MIMD.
4. State two differences between parallel and distributed systems.
- Parallel systems often share memory, while distributed systems have separate memory for each node.
- Parallel systems are tightly coupled within a single machine, while distributed systems are loosely coupled across multiple machines connected via a network.
5. What is scalability in computing systems?
Scalability is the ability of a computing system to handle increased workload or problem size efficiently by adding more resources such as processors, memory, or machines.
Answers to Long Answer Questions
1. Explain the evolution of computing systems from sequential to parallel and distributed systems.
Initially, computing systems followed a sequential model based on the Von Neumann architecture, where a single processor executed one instruction at a time. Performance improvements were achieved mainly by increasing clock speed and improving instruction-level parallelism. However, increasing frequency led to higher power consumption and heat generation, creating physical limitations known as the power wall and thermal wall.
To overcome these limitations, the industry shifted toward multicore processors, where multiple cores execute tasks simultaneously within the same system. This marked the beginning of widespread parallel computing. As computational demands continued to grow beyond the capacity of a single machine, distributed computing emerged. Distributed systems connect multiple independent machines through networks, allowing them to collaborate on large-scale problems. This evolution enabled higher performance, improved scalability, and better resource utilization.
2. Describe Flynn’s taxonomy in detail.
Flynn’s taxonomy classifies computer architectures based on instruction and data streams:
- SISD (Single Instruction, Single Data): A traditional sequential computer with one instruction stream operating on one data stream. Example: Single-core processor running one program.
- SIMD (Single Instruction, Multiple Data): A single instruction is applied to multiple data elements simultaneously. This architecture is efficient for data-parallel tasks such as image processing and scientific computations. GPUs are a common example.
- MISD (Multiple Instruction, Single Data): Multiple instructions operate on the same data stream. This architecture is rare and mainly used in fault-tolerant systems.
- MIMD (Multiple Instruction, Multiple Data): Multiple processors execute different instructions on different data independently. This is the most common architecture today and is used in multicore CPUs and distributed systems.
MIMD is widely used because it provides flexibility and supports both task parallelism and data parallelism.
3. Discuss the motivation for parallel and distributed computing.
The primary motivations for parallel and distributed computing are performance, scalability, and energy efficiency.
Performance: Parallel execution reduces execution time by dividing tasks among multiple processors. This improves speedup and throughput for computationally intensive applications.
Scalability: Distributed systems allow workloads to grow by adding more machines. This enables large-scale data processing, cloud computing, and big data analytics.
Energy Efficiency: Instead of increasing clock speeds, using multiple lower-frequency cores improves performance per watt. This reduces power consumption and heat generation, making systems more sustainable and cost-effective.
Modern applications such as artificial intelligence, weather forecasting, financial modeling, and cloud services depend heavily on parallel and distributed computing to meet performance and scalability requirements.
Next Lecture: Parallel Computer Architectures


