Lecture 2: Instruction Execution, Caches & Memory Hierarchy

CPU fetch–decode execute cycle, caches and memory hierarchy, virtual memory, the OS’s role, and how systems communicate over networks.

Instruction Execution: Fetch → Decode → Execute

Fetch: The Program Counter (PC) holds the address of the next instruction. The CPU requests that instruction from L1 I-cache/memory.
Decode: The control unit translates the instruction into internal control signals (or micro-ops).
Execute: The ALU, load/store unit, or branch unit performs the operation, possibly reading/writing registers or memory.
Write-back & PC update: Results go to registers/memory; the PC advances or is changed by a branch/jump.

Modern CPUs overlap these stages with pipelining and may issue multiple instructions per cycle (superscalar, out-of-order).

Why Caches Matter ?

DRAM is far slower than the CPU. Caches keep recently or nearby data close to the core.

Locality
- Temporal: recently used data likely used again.
- Spatial: nearby addresses likely accessed next.
Cache anatomy
- Line (block): e.g., 64 bytes fetched on a miss.
- Mapping: direct-mapped, N-way set-associative, or fully associative.
- Replacement: LRU/PLRU/random.
- Write policy: write-back vs write-through; write-allocate vs no-write-allocate.
Miss types: compulsory (cold), capacity, conflict.

Example: a loop that scans an array linearly benefits from spatial locality; a stride of 64B matches typical line size, while large strides cause more misses.

The Memory & Storage Hierarchy (conceptual latencies)

Registers (≈1 cycle) → L1 (≈4 cycles) → L2 (≈12 cycles) → L3 (≈30–50 cycles) → DRAM (≈100–200 ns) → NVMe SSD (≈100 µs) → HDD (ms) → Network/Cloud (ms–100s ms).
Hierarchy trades speed vs capacity vs cost; software and the OS try to keep working sets near the top.

Lecture 1: Introduction to Computer Systems & Compilation System (Bits, Bytes, and Integers)

Virtual Memory in One Page

Pages/Frames: virtual address space is split into pages (e.g., 4 KiB), mapped to physical frames.
Page Table: per-process mapping maintained by the OS; enables isolation and shared pages (e.g., shared libraries).
TLB: a small cache of recent page translations. A TLB miss forces a page-table walk.
Page Faults: on missing pages, the OS loads data from disk (slow); on protection faults, the OS may kill the process.

OS Responsibilities You Touch in Assembly

Process & Thread Scheduling (time slices, priorities).
System Calls (trap/int/syscall instructions): file I/O, memory management (mmap, brk), networking, timers.
Interrupts & Exceptions: device signals and error conditions; the OS dispatches handlers.
DMA: devices transfer data to memory with minimal CPU intervention.
Filesystems: logical file view over blocks on storage.

Devices, I/O & Networks

Device controller ↔ driver: registers (often memory-mapped I/O), queues, interrupts.
Networking path: user process → socket API → kernel networking stack (TCP/IP) → NIC → wire → remote host.
Latency vs throughput: small messages pay handshake/RTT costs; large transfers saturate bandwidth.

Mini-Lab

# 1) Observe CPU + caches
lscpu | egrep 'Model name|cache'
# Linux: cat /proc/cpuinfo | less  (for more detail)

# 2) Measure memory locality (Python quick test)
python3 - << 'PY'
import numpy as np, time
N=64*1024*1024//8; a=np.zeros(N,dtype=np.int64)
for s in [1,8,64,512,4096]:
    t=time.time()
    x=0
    for i in range(0,N,s): x+=a[i]
    print(f"stride={s:4d} time={time.time()-t:.3f}s")
PY

# 3) Basic network path
ping -c 3 8.8.8.8
traceroute google.com  # Windows: tracert google.com

Quick Check (with answers)

What architectural principle makes caches effective? Temporal and spatial locality.
What does the TLB cache? Virtual→physical address translations.
Which policy delays writes to DRAM until eviction? Write-back.
What causes a page fault? Access to a virtual page not currently in physical memory.
Where do system calls execute? In kernel mode via a trap/syscall mechanism.

The approach followed at E Lectures reflects both academic depth and easy-to-understand explanations.

Lecture 2: Instruction Execution, Caches & Memory Hierarchy

Instruction Execution: Fetch → Decode → Execute

Why Caches Matter ?

The Memory & Storage Hierarchy (conceptual latencies)

Virtual Memory in One Page

OS Responsibilities You Touch in Assembly

Devices, I/O & Networks

Mini-Lab

Quick Check (with answers)

People also ask:

Leave a ReplyCancel Reply

Contact Us

Instruction Execution: Fetch → Decode → Execute

Why Caches Matter ?

The Memory & Storage Hierarchy (conceptual latencies)

Virtual Memory in One Page

OS Responsibilities You Touch in Assembly

Devices, I/O & Networks

Mini-Lab

Quick Check (with answers)

People also ask:

Related Posts

Lecture 4 – Register Transfer & Datapath Design, Control Unit Design, and Instruction Set Architecture (ISA)

Lecture 3: Cache Memory Deep Dive Locality, Mapping, Write Policies, TLB & Prefetching

Lecture 1: Introduction to Computer Systems & Compilation System (Bits, Bytes, and Integers)

Leave a ReplyCancel Reply