Skip to content

Computer Organization Principles

Preface

After building a CPU from transistors, how do computers form a complete system? In the previous chapter, we started from transistors and built adders, registers, arithmetic units, and finally assembled the CPU core. But a CPU alone is not enough — it needs to work in coordination with memory and I/O devices, requires buses to connect components, and needs an instruction set to drive operations. This chapter shifts our perspective from inside the CPU to the entire computer system, providing an in-depth understanding of the Von Neumann architecture, instruction sets, storage hierarchy, buses, and I/O principles.

What will you learn from this article?

After completing this chapter, you will gain:

  • System perspective: Understand how CPU, memory, and I/O work together — no longer just an isolated hardware enthusiast
  • Hardware terminology: Master hardcore concepts like instruction cycles, pipelines, CPI, and cache hit rates
  • Performance thinking: Understand bottlenecks and optimization approaches in computer organization
  • Foundation for further learning: Build a professional foundation for operating systems, computer architecture, and embedded development
ChapterContentCore Concepts
Chapter 1Von Neumann ArchitectureStored program, five major components, data path
Chapter 2Instruction Set ArchitectureInstruction format, addressing modes, CISC vs RISC
Chapter 3CPU Control UnitControl unit, micro-operations, instruction cycle
Chapter 4Storage HierarchyCache, main memory, virtual memory, paging
Chapter 5Bus and I/OBus arbitration, DMA, interrupt mechanism

0. Big Picture: Computer Hardware System

In the previous chapter "From Transistors to CPU," we understood how the CPU works internally — from fetch, decode, execute to write-back. But the CPU itself is just an execution unit. To make a computer truly "usable," a series of peripheral components must work together.

Detailed CPU Instruction Cycle Demo
CPU
Control Unit CU
PC256Program Counter
IRInstruction Register
MARMemory Address Register
MDRMemory Data Register
Arithmetic Logic Unit ALU
ACC0Accumulator
General Register File
R00
R10
R20
R30
Address Bus
Data Bus
Control Bus
Main Memory
0x100LOAD R0, [0x200]
 0x101LOAD R1, #7
 0x102ADD R0, R1
 0x103STORE [0x201], R0
Data Area
 0x51242
 0x5130
FetchFetch
DecodeDecode
ExecuteExecute
Write BackWrite Back
Step 0 / 32
Click "Clock Pulse" to step through execution, or "Auto Run" to play continuously.

Layer-by-Layer Breakdown: Computer Hardware System

  • Layer 1: CPU Core Responsible for instruction execution, including the control unit (issuing control signals) and the arithmetic unit (performing arithmetic and logic operations)

  • Layer 2: Register File High-speed storage units inside the CPU, including general-purpose registers and special-purpose registers (PC, IR, MAR, MDR, etc.)

  • Layer 3: Main Memory Memory for storing programs and data, accessed by the CPU through address and data buses

  • Layer 4: I/O Devices Input and output devices connected to the system bus through I/O controllers

  • Layer 5: System Bus Data channels connecting CPU, memory, and I/O, including address bus, data bus, and control bus


1. Von Neumann Architecture: The "Constitution" of Modern Computers

1.1 The Stored-Program Principle

In 1945, mathematician John von Neumann proposed the groundbreaking stored-program architecture concept. This idea laid the foundation for modern computers.

Core Concept

Stored Program: The program itself, as a special kind of data, is stored in memory just like ordinary data. The CPU can read and execute program instructions stored in memory the same way it reads and writes data.

This means:

  • Early computers: Programs were implemented through fixed wiring; changing a program required re-soldering circuits
  • Von Neumann architecture: Programs are stored in memory; changing a program only requires modifying the memory contents

1.2 Five Major Components

The Von Neumann architecture divides a computer into five core components:

CPU Register FileHigh-speed storage inside the CPU
Special Registers
PC
0x00401000
Program counter
IR
0x8B450008
Instruction register
MAR
0x00401000
Memory address register
MDR
0x00000000
Memory data register
ACC
0x0000001A
Accumulator
General Purpose Registers
RAX
0x00000000
Return value
RBX
0x00000000
Base register
RCX
0x00000000
Counter register
RDX
0x00000000
Data register
RSI
0x00000000
Source index
RDI
0x00000000
Destination index
RBP
0x00000000
Base pointer
RSP
0x7FFDE000
Stack pointer
Program Status Word (PSW / FLAGS)
CF0Carry flag
PF0Parity flag
AF0Auxiliary carry
ZF0Zero flag
SF0Sign flag
OF0Overflow flag
Registers vs Memory
FeatureRegisterMemory (RAM)
LocationInside the CPUOutside the CPU
Access speedFastest (< 1ns)Slower (50-100ns)
CapacityTiny (bytes)Large (GB)
RoleHold instructions, operands, and resultsStore programs and data
ComponentEnglishFunctionMain Composition
Arithmetic UnitALU (Arithmetic Logic Unit)Performs arithmetic and logic operationsAdders, shifters, comparators
Control UnitCU (Control Unit)Directs and coordinates all componentsInstruction register, decoder, timing generator
MemoryMemoryStores programs and dataMemory Address Register (MAR), Memory Data Register (MDR)
Input DevicesInputInformation inputKeyboard, mouse, scanner
Output DevicesOutputInformation outputMonitor, printer

1.3 Data Path

The Data Path is the route through which data flows between functional components. Inside the CPU, the data path connects:

  • Register file
  • Arithmetic Logic Unit (ALU)
  • Memory Data Register (MDR)

The width of the data path (how many bits can be transferred at once) directly affects computer performance.

1.4 The Von Neumann Bottleneck

The Von Neumann architecture has a famous performance bottleneck:

The data transfer speed between CPU and memory is far lower than the CPU's processing speed.

This causes the CPU to frequently be in an idle "waiting for data" state. Many optimization techniques in modern computers revolve around this problem:

Optimization TechniquePrinciple
CachePlace small, high-speed storage near the CPU
Instruction PipelineAllow multiple instructions to be in different stages simultaneously
SuperscalarIssue multiple instructions in the same clock cycle
Multi-core ParallelismMultiple CPU cores share computing tasks

2. Instruction Set Architecture: The Interface Between CPU and Software

In the previous section, we learned the core idea of the Von Neumann architecture: programs are stored in memory just like data. But this raises a key question — what does a "program" stored in memory actually look like? How does the CPU understand it?

The answer is the Instruction Set Architecture (ISA). If the CPU is a service, then the instruction set is its API documentation — it defines all the commands the CPU can understand, the format of each command, and the data range each command can operate on. Every line of code you write is ultimately translated by the compiler into a sequence of calls to this "API."

2.1 From Code to Instructions: A Line of Code's Translation Journey

First, let's establish a holistic understanding: the code you write in an editor and what the CPU actually executes are separated by several layers of translation.

🔗 From Code to Instructions: One Line Through the Translation Pipeline

Click each stage to see how source code becomes CPU-executable instructions

1Source code
int a = 10 + 5;
This is high-level code written in an editor. It is easy for humans to read, but the CPU does not understand int or the + operator directly.
2Compiler emits assembly
MOV  R1, #10    ; put 10 into register R1
MOV  R2, #5     ; put 5 into register R2
ADD  R3, R1, R2 ; R3 = R1 + R2
STORE R3, [a]   ; store the result at variable a
3Assembler emits machine code
0001 0001 0000 1010  → MOV R1, #10
0001 0010 0000 0101  → MOV R2, #5
0010 0011 0001 0010  → ADD R3, R1, R2
0100 0011 1000 0000  → STORE R3, [a]
4CPU executes instructions
Clock 1: fetch → decode → execute MOV R1, #10
Clock 2: fetch → decode → execute MOV R2, #5
Clock 3: fetch → decode → execute ADD R3, R1, R2
Clock 4: fetch → decode → execute STORE R3, [a]
💡 Key idea
An instruction set is the CPU API: it defines every command the CPU understands. A compiler translates your high-level language into calls to that API. Different CPUs, such as x86 and ARM, have different instruction sets, just as different services expose different APIs.

This translation chain is key to understanding instruction sets:

LayerContentWho Can Understand It
High-level languageint a = 10 + 5;Humans
Assembly languageMOV R1, #10 / ADD R3, R1, R2Humans (with training)
Machine code0001 0001 0000 1010CPU

Why Understand This Chain?

  • When you see a compilation error, you know it occurred at the "high-level language → assembly" step
  • When you see a runtime crash, you know the problem is at the CPU instruction execution stage
  • When understanding performance optimization, you know what optimizations the compiler makes during "translation"
  • When choosing a CPU architecture (x86 vs ARM), you know the difference is in the "instruction set API"

2.2 What Does an Instruction Look Like?

Now that we know code gets translated into instructions, the next question is: what is the internal structure of an instruction?

Each machine instruction is essentially a string of binary digits, but it has a strict internal format. The two most core parts are:

  • Opcode: Tells the CPU "what to do" — add? jump? or read memory?
  • Operand: Tells the CPU "what to do it to" — which register? which memory address? what constant?

Just as a sentence has a "verb + object" structure, an instruction has an "operation + target" structure:

Instruction:  ADD  R3, R1, R2
              ───  ──────────
              Opcode  Operands
              (do addition) (R3 = R1 + R2)

Based on the number of operands, instruction formats range from simple to complex in four types:

Machine Instruction FormatOpcode + operands = machine instruction
Opcode8 bits
Destination8 bits
Source 18 bits
Source 28 bits
Example instruction
01101100 00000001 00000010 00000011
Result goes to a new destination without changing sources
Three-address format
Three addresses identify the destination and two source operands separately. The result goes into the destination without modifying the sources.
Common examples
ADD R1, R2, R3R1 = R2 + R3
SUB R1, R2, R3R1 = R2 - R3
MUL R1, R2, R3R1 = R2 × R3
Common opcodes
00000000NOPNo operation
00000001MOVMove data
00000010ADDAddition
00000011SUBSubtraction
00000100MULMultiplication
00000101DIVDivision
00000110ANDLogical AND
00000111ORLogical OR
00001000NOTLogical NOT
00001001XORExclusive OR
00001010SHLShift left
00001011SHRShift right
00001100JMPUnconditional jump
00001101JEJump if equal
00001110JNEJump if not equal
00001111CALLCall subroutine
00010000RETReturn
00010001PUSHPush stack
00010010POPPop stack
00010011LOADLoad from memory
00010100STOREStore to memory
FormatStructureExampleUse Case
Zero-addressOpcode onlyRET (return)Stack machines; operands implicit at stack top
One-addressOpcode + 1 addressINC R1 (increment R1 by 1)Single-operand operations
Two-addressOpcode + 2 addressesMOV R1, R2Most common; data transfer and operations
Three-addressOpcode + 3 addressesADD R3, R1, R2Preserves source operands

Why So Many Formats?

This is a trade-off between space and flexibility. Zero-address instructions are the shortest (saving memory) but require extra stack operations; three-address instructions are the most flexible (preserving source data) but occupy more bits. Different CPU architectures choose different combinations of instruction formats.

2.3 How Does the CPU Find Data? — Addressing Modes

An instruction tells the CPU to "do addition," but where are the two numbers for the addition? They might be written directly in the instruction, in a register, or at some memory address. Addressing modes are the rules that tell the CPU "where to find the operands."

Using a real-life analogy of "finding someone":

Addressing ModeAnalogyInstruction ExampleDescription
Immediate addressingThe person is standing right in front of youMOV R1, #100Data is written directly in the instruction; fastest
Register addressingCall an internal extension to reach a colleagueMOV R1, R2Data is in a CPU register; very fast
Direct addressingKnow the address, go directly to the doorMOV R1, [0x1000]Memory address is written in the instruction
Indirect addressingAsk the front desk "which room is Zhang San in?"MOV R1, [R2]The register contains an address; requires one extra lookup
Indexed addressing"Building 3 + Floor 5" to calculate the roomMOV R1, [R2+10]Base address + offset; used for array access
Addressing ModesHow an instruction finds operand locations
Immediate addressingImmediate Addressing
Definition
The operand is embedded directly in the instruction and is immediately available.
Instruction format
MOV R1, #100
Example
MOV R1, #100 ; R1 = 100
Immediate value 100 is stored directly in the instruction, so no register or memory lookup is needed.
Execution process
1CPU reads immediate value 100 directly from the instruction
2Write the immediate value into target register R1
3Execution completes without extra memory access
Characteristics
SpeedFast
FlexibilityLow
Addressing mode comparison
Addressing modeFormatSpeedUse case
Immediate addressingMOV R1, #100FastestConstant assignment and initialization
Register addressingMOV R1, R2FastestRegister-to-register data transfer
Direct addressingMOV R1, [100]Relatively fastAccessing global variables
Indirect addressingMOV R1, [R2]Relatively fastPointers and array traversal
Indexed addressingMOV R1, [R2 + R3]Relatively fastArray access and loops
Based addressingMOV R1, [R2 + 100]Relatively fastStruct fields and function parameters
Relative addressingJMP LABELFastestLoops and conditional branches

Why So Many Addressing Modes?

Different scenarios require different "find data" strategies:

  • Constant assignment (x = 100) → Immediate addressing; data is right in the instruction
  • Variable operations (a + b) → Register addressing; data already loaded into registers
  • Array access (arr[i]) → Indexed addressing; base address + index offset
  • Pointer operations (*ptr) → Indirect addressing; register holds the address

When you write arr[i], you don't think about addressing modes, but the compiler automatically selects the most appropriate one.

2.4 The CPU's Capability List — Instruction Categories

Now that we know instruction formats and addressing modes, the last question is: what exactly can the CPU do?

All instructions can be grouped into six categories that cover everything a computer can do:

TypeWhat It DoesRepresentative InstructionsMaps to Your Code
Data TransferMove data aroundMOV, LOAD, STORElet x = y, function parameter passing
ArithmeticAddition, subtraction, multiplication, divisionADD, SUB, MUL, DIVa + b, count++
LogicBitwise operationsAND, OR, NOT, XORflags & 0xFF, permission checks
ShiftShift left/rightSHL, SHRx << 2 (equivalent to multiplying by 4)
Control TransferJumps and callsJMP, CALL, RETif, for, function calls
Input/OutputCommunicate with peripheralsIN, OUTRead keyboard, write to screen

A Key Insight

All the code you write — no matter how complex the business logic or how flashy the UI animations — is ultimately broken down into combinations of these six basic operation types. The CPU's "intelligence" lies not in how complex its individual operations are, but in its ability to execute these simple operations billions of times per second.

2.5 Two Design Philosophies: CISC vs RISC

There is a fundamental divide in instruction set design: should each instruction be as powerful as possible, or as simple as possible?

This divide created two camps that directly affect every device you use today:

⚔️ Two Design Philosophies: CISC vs RISC

Click a comparison dimension to see the core differences between instruction set styles

Thousands of complex instructions
Instruction count
Tens to hundreds of streamlined instructions
One instruction can do many things
Single instruction
One instruction does one thing
Variable length (1-15 bytes)
Instruction length
Fixed length, often 4 bytes
Complex instructions take multiple cycles
Execution speed
Most instructions complete in one cycle
Higher
Power use
Lower
Harder to optimize because lengths vary
Pipeline
Easier to optimize because instructions are regular
Lighter because hardware does more
Compiler burden
Heavier because software optimizes more
🌍 Real-world choices
💻 Your computerx86 (CISC)Compatible with decades of software
📱 Your phoneARM (RISC)Low power consumption and longer battery life
🍎 Apple SiliconARM (RISC)High performance per watt reshaped laptops
🔬 RISC-V boardRISC-V (RISC)Open and royalty-free for IoT and education

An analogy to understand this:

  • CISC is like a Swiss Army knife: One tool integrates scissors, a bottle opener, a screwdriver... lots of functions but each one may not be the best
  • RISC is like a professional tool set: Each tool does only one thing, but does it fast and well

Why Does Your Phone Use ARM and Your Computer Use x86?

  • x86 (CISC) has dominated the PC and server market for 40 years, accumulating a massive software ecosystem. Switching architectures means all software must be recompiled
  • ARM (RISC) dominates mobile devices thanks to its low-power advantage. Phone batteries are small; every milliwatt counts
  • Apple Silicon proved that RISC can also deliver high performance — the M-series chips surpassed x86 competitors in both performance and power efficiency
  • RISC-V is an open-source RISC architecture rapidly rising in IoT, education, and AI chip sectors

Summary: The instruction set is the bridge between software and hardware. Your code is translated by the compiler into instructions, which tell the CPU what to do and to whom through opcodes and operands. Addressing modes determine where data comes from. Different instruction set designs (CISC/RISC) determine the CPU's performance characteristics and applicable scenarios.

Now we know the "static structure" of instructions — what they look like and what types exist. The next question is: how does the CPU execute these instructions step by step internally? That's the control unit's job.


3. Control Unit: The CPU's "Command Center"

3.1 Components of the Control Unit

The control unit is the "brain" of the CPU, responsible for coordinating all components to work according to instruction requirements:

How the Controller WorksHow control signals coordinate CPU components
Control Unit CU
Instruction Register IR
Instruction Decoder
Timing Generator
Output control signals:
PC→MAR
MEM→MDR
MDR→IR
IR→ID
ALU→ACC
ACC→MDR
PC
Program Counter
MAR
Address Register
Memory
Main Memory
MDR
Data Register
IR
Instruction Register
ID
Decoder
ALU
Arithmetic Logic Unit
ACC
Accumulator
Current microinstruction
Core controller concepts
Control signals:Electrical signals emitted by the controller to control each component on the data path.
Timing:CPU operations advance by clock ticks; each tick performs specific micro-operations.
Hardwired vs microprogrammed:Hardwired controllers are fast but complex; microprogrammed controllers are flexible but slightly slower.
ComponentFunction
Program Counter (PC)Stores the address of the next instruction
Instruction Register (IR)Stores the currently executing instruction
Instruction DecoderParses the instruction's opcode and operands
Timing GeneratorGenerates clock beat signals to control component timing
Micro-operation Sequence GeneratorGenerates the series of control signals needed to execute instructions
Program Status Word (PSW)The CPU status indicators
CF
0
Carry flag
PF
0
Parity flag
AF
0
Auxiliary carry
ZF
0
Zero flag
SF
0
Sign flag
TF
0
Trap flag
IF
1
Interrupt flag
DF
0
Direction flag
OF
0
Overflow flag
How operation results affect flags
Result:
0
CF:0PF:0AF:0ZF:0SF:0TF:0IF:1DF:0OF:0
Typical flag uses
🔀
Conditional jumps
JE, JNE, JG, JL and similar instructions decide jumps based on ZF, SF, and OF.
Arithmetic
Multi-word arithmetic uses CF for carry and OF for signed overflow.
🔄
Loop control
Loop instructions often use ZF to detect the loop ending condition.

3.2 Instruction Cycle

The CPU executes an instruction through a complete instruction cycle, typically including:

  1. Fetch Cycle: Read instruction from memory into IR
  2. Decode Cycle: Parse the instruction's meaning
  3. Execute Cycle: Perform the operation
  4. Memory Access Cycle: Access memory if needed
  5. Write-Back Cycle: Write the result back to a register or memory

3.3 Micro-Operations

Micro-operations are the most basic operations driven by control signals. For example, the "fetch" phase can be decomposed into the following micro-operations:

BeatMicro-OperationControl Signals
T1PC → MARPCout, MARin
T2MEM → MDRMEMout, MDRin
T3MDR → IRMDRout, IRin
T4PC + 1 → PCPC+1, PCin

3.4 Hardwired vs. Microprogrammed Control

FeatureHardwired ControlMicroprogrammed Control
ImplementationCombinational logic circuitsMicroinstruction sequences (firmware)
SpeedFastSlightly slower
Design difficultyComplexSimpler
FlexibilityPoor (changes require circuit redesign)Good (just modify the microprogram)
Typical applicationsRISC processorsEarly CISC processors

4. Storage Hierarchy: Why Do We Need Cache?

4.1 Storage Hierarchy Structure

A computer's storage devices form a pyramid structure:

Storage HierarchyFrom fastest to slowest, smallest to largest
Registers
Fastest
Smallest (KB)
Cache
Very fast
Small (MB)
Memory
Fast
Medium (GB)
Disk
Slow
Large (TB)
Network/Cloud
Slowest
Unlimited
Detailed comparison
Storage levelAccess timeTypical capacityCost
Registers< 1 nsA few KBHighest
L1 cache~1 ns64 KBVery high
L2 cache~3 ns256 KBHigh
L3 cache~10 ns8 MBMedium
Memory~100 ns8-32 GBMedium-low
SSD~100 μs256 GB-2 TBLow
HDD~10 ms1-10 TBLowest
Locality principle
Programs tend to access recently accessed locations (temporal locality) and nearby locations (spatial locality)
By exploiting locality, caches can significantly improve performance.
LevelStorage TypeAccess TimeTypical CapacityLocation
RegistersSRAM<1nsA few KBInside CPU
L1 CacheSRAM~1ns32-64KBNear CPU core
L2 CacheSRAM~3-10ns256KB-1MBOn CPU chip
L3 CacheSRAM~10-20ns2-16MBOn CPU chip / shared
Main Memory (RAM)DRAM~50-100ns8-64GBOn motherboard
SSDFlash~10-100μs256GB-2TBOn motherboard
HDDMagnetic disk~5-10ms1-10TBInside case

Analogy for Speed Differences

If CPU accessing L1 cache is like grabbing a piece of paper from your desk:

  • Accessing main memory → Taking the elevator to a convenience store downstairs to buy paper
  • Accessing SSD → Driving to another city to buy paper
  • Accessing HDD → Flying to another country to buy paper

The speed difference can be millions of times!

4.2 Cache Principles

Cache is fast storage located between the CPU and main memory. Its core idea is based on two locality principles:

Locality Principles

  • Temporal locality: If data was just accessed, it's likely to be accessed again soon
  • Spatial locality: If data was accessed, nearby data is likely to be accessed too

How Cache Works

  1. Hit: The data the CPU needs is in the cache; read directly
  2. Miss: The data is not in the cache; must be loaded from main memory
Hit rate = Number of hits / Total number of accesses
Average access time = Hit rate × Cache time + (1 - Hit rate) × Memory time
Cache PrinciplesThe bridge between CPU and memory
CPU core
L1 cache
64 KB~1ns
L2 cache
256 KB~5ns
L3 cache
8 MB~15ns
Main memory
16 GB~100ns
Cache operation demo
Operation log
Why does cache work? Locality principle
⏱️
Temporal locality
Recently accessed data is likely to be accessed again.
Variables inside loops
📦
Spatial locality
After one item is accessed, nearby data is likely to be accessed.
Array traversal and sequential execution
Cache mapping methods
Each memory block maps to exactly one cache line.
SpeedFastest
Hit rateLower
Implementation complexityLowest
Hit-rate calculation
Average access time = H × Tc + (1-H) × Tm
2 ns
100 ns
90%
Average access time = 12 ns

4.3 Cache Mapping Methods

MethodPrincipleAdvantageDisadvantage
Direct-mappedEach memory block can only go to one fixed locationSimple and fastHigh conflict rate
Set-associativeEach memory block can go to N locations (N-way)Balances speed and hit rateMore complex implementation
Fully associativeAny locationLowest conflict rateHardest to implement (requires comparing all tags)

4.4 Virtual Memory

Virtual memory is an important abstraction provided by the operating system:

  • Each process thinks it has a complete virtual address space
  • The operating system translates virtual addresses to physical addresses
  • Infrequently used pages can be swapped out to disk (swap space)

Virtual Memory Analogy

Think of virtual memory as a hotel managing rooms:

  • You (the process) think the entire building is yours
  • In reality, the hotel (OS) only assigns you the rooms you currently need
  • Unused rooms get "swapped out" to storage (disk)
  • Needed rooms can be "swapped in" at any time

5. Bus and I/O: The Computer's "Blood Vessels"

5.1 System Bus

A Bus is the data channel connecting computer components:

Computer Bus SystemAddress bus, data bus, and control bus
CPU
Control unit
ALU
Address bus32 bits
Data bus64 bits
Control busControl signal
Main memory
0x0
0x1
0x2
0x3
0x4
0x5
0x6
0x7
Operation flow
Bus concepts
Address bus
CPU sends memory addresses over a one-way path.
Data bus
Transfers actual data in both directions.
Control bus
Transfers read/write and other control signals.
Bus TypeFunctionDirectionTypical Width
Address BusTransfers memory addressesUnidirectional (CPU→Memory)32-bit/64-bit
Data BusTransfers dataBidirectional32-bit/64-bit
Control BusTransfers control signalsBidirectionalMultiple signal lines

5.2 Bus Arbitration

When multiple devices simultaneously request bus access, an arbitration mechanism determines who goes first:

Arbitration MethodDescription
Centralized arbitrationA central arbiter makes the decision
Distributed arbitrationDevices negotiate among themselves

5.3 I/O Device Access Methods

MethodPrincipleAdvantageDisadvantage
Programmed I/O (Polling)CPU polls I/O statusSimpleLow CPU utilization
Interrupt-driven I/OI/O device actively notifies CPU when doneCPU can work in parallelInterrupt handling has overhead
DMAI/O device accesses memory directlyCPU not involved at allRequires a DMA controller
I/O Method ComparisonProgrammed I/O · Interrupt-driven I/O · DMA
Programmed I/OProgrammed I/O
Workflow
1CPU polls the I/O device status
2Device busy? Keep waiting
3Device ready, send read/write command
4CPU reads or writes data byte by byte
5Check whether transfer is complete
6If incomplete, keep polling
CPU involvementHigh
SpeedSlow
ComplexityLow
Three I/O methods compared
FeatureProgrammed I/OInterrupt-driven I/ODMA
CPU involvementInvolved throughoutOnly handles interruptsAlmost uninvolved
Data transferCPU moves each byteCPU moves each wordDevice transfers directly to memory
ProsSimple and flexible controlHigh CPU efficiencyCPU is fully freed
ConsLow CPU utilizationInterrupt overheadComplex hardware
Best forSimple or low-speed devicesLow/medium-speed devicesHigh-speed bulk transfer

5.4 DMA Principles

DMA (Direct Memory Access) allows I/O devices to exchange data directly with memory:

How Networks ConnectThe complete path from sending to receiving
💻
Sender
192.168.1.100
📧
Mail app
📧
Application layer
Mail software creates the message content
🔐
Transport layer
TCP adds port numbers and sequence numbers
🌐
Network layer
IP adds source and destination addresses
🔌
Data link layer
Ethernet adds MAC addresses
Physical layer
Convert to electrical signals and send
🖥️
Receiver
192.168.1.200
📧
Mail app
Data encapsulation process
7Application layer
Message content: "Hello!"
6Presentation layer
Encoding: UTF-8
5Session layer
Session ID: sess_123
4Transport layer
TCP header: port 25
3Network layer
IP header: 192.168.1.100 → 192.168.1.200
2Data link layer
Ethernet frame: MAC address
1Physical layer
Bitstream: 01010101...
Network protocol stack (OSI model)
Sender
Application layer (HTTP, SMTP)
Transport layer (TCP, UDP)
Network layer (IP)
Data link layer (Ethernet)
Physical layer (electrical signals)
Receiver
Application layer (HTTP, SMTP)
Transport layer (TCP, UDP)
Network layer (IP)
Data link layer (Ethernet)
Physical layer (electrical signals)
  • Without DMA: The CPU participates in the entire data transfer process and can't do anything else
  • With DMA: The CPU tells the DMA controller "transfer from where to where, how much," then goes to do other tasks. The DMA notifies the CPU when complete

DMA Analogy

This is like ordering food delivery:

  • Without DMA: You go to the supermarket yourself, buy groceries, go home, wash vegetables, and cook (involved in the entire process)
  • With DMA: You place an order by phone, and the delivery person brings it straight to your kitchen (someone else handles it; you just "receive the goods" at the end)

5.5 Interrupt Mechanism

Interrupts are a very important mechanism in computer systems:

  1. After an I/O device completes an operation, it sends an interrupt request to the CPU
  2. The CPU, currently executing an instruction, responds to the interrupt after completing the current instruction
  3. The CPU saves its current state and jumps to the interrupt handler
  4. After handling is complete, it restores the state and continues execution

6. CPU Performance Optimization: Pipeline Technology

6.1 Instruction Pipeline

Instruction pipelining is a parallel technique that maximizes CPU efficiency:

CPU Instruction PipelineFive stages: Fetch → Decode → Execute → Memory → Write Back
Fetch(IF)
Decode(ID)
Execute(EX)
Memory(MEM)
Write Back(WB)
ADD R1,R2,R3
SUB R4,R1,R5
LOAD R6,[R4]
STORE R6,[R7]
AND R8,R1,R6
Total cycles0
Completed instructions0
CPI0
Pipeline principle

Sequential execution: each instruction finishes before the next starts, so N instructions require N × 5 cycles.

Pipeline execution: multiple instructions occupy different stages at once; ideally CPI ≈ 1.

How Pipelining Works

Sequential execution (5 instructions, 15 cycles):
Instr 1: IF→ID→EX→MEM→WB
Instr 2:            IF→ID→EX→MEM→WB
Instr 3:                         IF→ID→EX→MEM→WB
...

Pipeline execution (5 instructions, 9 cycles):
Instr 1: IF→ID→EX→MEM→WB
Instr 2:    IF→ID→EX→MEM→WB
Instr 3:       IF→ID→EX→MEM→WB
...

Ideally, CPI (cycles per instruction) for N instructions ≈ 1

6.2 Pipeline Hazards

While pipelining improves performance, it also introduces hazard problems:

TypeCauseSolution
Structural hazardHardware resource conflictAdd hardware / stagger execution
Data hazardLater instruction needs the result of an earlier oneData forwarding / bubbles / scheduling
Control hazardBranch instructions change execution flowDelay slots / branch prediction

7. Summary: How Does a Computer "Run"?

Let's connect the entire process using professional terminology:

After a program starts, the operating system loads the executable file from disk into memory. The CPU's fetch unit (IF) reads instructions from memory into the instruction register (IR) via the address bus. The control unit decodes the instruction (ID), and after identifying the operation type, generates the corresponding control signals. The arithmetic unit (EX) performs arithmetic and logic operations. If memory access is needed, it accesses memory (MEM) via the data bus, and finally the result is written back (WB) to a register or memory. The entire process is driven by the clock, with micro-operation sequences generated by the control unit coordinating all components to work in an orderly manner.


Further Reading

TopicRecommended Resources
Computer ArchitectureComputer Organization and Design: The Hardware/Software Interface - Patterson & Hennessy
CPU MicroarchitectureComputer Systems: A Programmer's Perspective - Bryant & O'Hallaron
Instruction Set ArchitectureARMv8 Architecture Manual, Intel x64 Manual
Cache PrinciplesCache Coherence Protocol (MESI), Cache Write Policies
Operating SystemsNext chapter: "Operating Systems"

Next Steps

Now that you've mastered the professional knowledge of computer organization, you can continue learning: