Computer Organization Principles

Preface

After building a CPU from transistors, how do computers form a complete system? In the previous chapter, we started from transistors and built adders, registers, arithmetic units, and finally assembled the CPU core. But a CPU alone is not enough — it needs to work in coordination with memory and I/O devices, requires buses to connect components, and needs an instruction set to drive operations. This chapter shifts our perspective from inside the CPU to the entire computer system, providing an in-depth understanding of the Von Neumann architecture, instruction sets, storage hierarchy, buses, and I/O principles.

What will you learn from this article?

After completing this chapter, you will gain:

System perspective: Understand how CPU, memory, and I/O work together — no longer just an isolated hardware enthusiast
Hardware terminology: Master hardcore concepts like instruction cycles, pipelines, CPI, and cache hit rates
Performance thinking: Understand bottlenecks and optimization approaches in computer organization
Foundation for further learning: Build a professional foundation for operating systems, computer architecture, and embedded development

Chapter	Content	Core Concepts
Chapter 1	Von Neumann Architecture	Stored program, five major components, data path
Chapter 2	Instruction Set Architecture	Instruction format, addressing modes, CISC vs RISC
Chapter 3	CPU Control Unit	Control unit, micro-operations, instruction cycle
Chapter 4	Storage Hierarchy	Cache, main memory, virtual memory, paging
Chapter 5	Bus and I/O	Bus arbitration, DMA, interrupt mechanism

0. Big Picture: Computer Hardware System

In the previous chapter "From Transistors to CPU," we understood how the CPU works internally — from fetch, decode, execute to write-back. But the CPU itself is just an execution unit. To make a computer truly "usable," a series of peripheral components must work together.

Detailed CPU Instruction Cycle Demo

CPU

Control Unit CU

PC256Program Counter

IR—Instruction Register

MAR—Memory Address Register

MDR—Memory Data Register

Arithmetic Logic Unit ALU

ACC0Accumulator

—

General Register File

R00

R10

R20

R30

Address Bus

Data Bus

Control Bus

→

↔

→

Main Memory

▶0x100LOAD R0, [0x200]

0x101LOAD R1, #7

0x102ADD R0, R1

0x103STORE [0x201], R0

Data Area

0x51242

0x5130

FetchFetch

DecodeDecode

ExecuteExecute

Write BackWrite Back

Step 0 / 32

Click "Clock Pulse" to step through execution, or "Auto Run" to play continuously.

Layer-by-Layer Breakdown: Computer Hardware System

Layer 1: CPU Core Responsible for instruction execution, including the control unit (issuing control signals) and the arithmetic unit (performing arithmetic and logic operations)
Layer 2: Register File High-speed storage units inside the CPU, including general-purpose registers and special-purpose registers (PC, IR, MAR, MDR, etc.)
Layer 3: Main Memory Memory for storing programs and data, accessed by the CPU through address and data buses
Layer 4: I/O Devices Input and output devices connected to the system bus through I/O controllers
Layer 5: System Bus Data channels connecting CPU, memory, and I/O, including address bus, data bus, and control bus

1. Von Neumann Architecture: The "Constitution" of Modern Computers

1.1 The Stored-Program Principle

In 1945, mathematician John von Neumann proposed the groundbreaking stored-program architecture concept. This idea laid the foundation for modern computers.

Core Concept

Stored Program: The program itself, as a special kind of data, is stored in memory just like ordinary data. The CPU can read and execute program instructions stored in memory the same way it reads and writes data.

This means:

Early computers: Programs were implemented through fixed wiring; changing a program required re-soldering circuits
Von Neumann architecture: Programs are stored in memory; changing a program only requires modifying the memory contents

1.2 Five Major Components

The Von Neumann architecture divides a computer into five core components:

CPU Register FileHigh-speed storage inside the CPU

Special Registers

0x00401000

Program counter

0x8B450008

Instruction register

MAR

0x00401000

Memory address register

MDR

0x00000000

Memory data register

ACC

0x0000001A

Accumulator

General Purpose Registers

RAX

0x00000000

Return value

RBX

0x00000000

Base register

RCX

0x00000000

Counter register

RDX

0x00000000

Data register

RSI

0x00000000

Source index

RDI

0x00000000

Destination index

RBP

0x00000000

Base pointer

RSP

0x7FFDE000

Stack pointer

Program Status Word (PSW / FLAGS)

CF0Carry flag

PF0Parity flag

AF0Auxiliary carry

ZF0Zero flag

SF0Sign flag

OF0Overflow flag

Registers vs Memory

Feature	Register	Memory (RAM)
Location	Inside the CPU	Outside the CPU
Access speed	Fastest (< 1ns)	Slower (50-100ns)
Capacity	Tiny (bytes)	Large (GB)
Role	Hold instructions, operands, and results	Store programs and data

Component	English	Function	Main Composition
Arithmetic Unit	ALU (Arithmetic Logic Unit)	Performs arithmetic and logic operations	Adders, shifters, comparators
Control Unit	CU (Control Unit)	Directs and coordinates all components	Instruction register, decoder, timing generator
Memory	Memory	Stores programs and data	Memory Address Register (MAR), Memory Data Register (MDR)
Input Devices	Input	Information input	Keyboard, mouse, scanner
Output Devices	Output	Information output	Monitor, printer

1.3 Data Path

The Data Path is the route through which data flows between functional components. Inside the CPU, the data path connects:

Register file
Arithmetic Logic Unit (ALU)
Memory Data Register (MDR)

The width of the data path (how many bits can be transferred at once) directly affects computer performance.

1.4 The Von Neumann Bottleneck

The Von Neumann architecture has a famous performance bottleneck:

The data transfer speed between CPU and memory is far lower than the CPU's processing speed.

This causes the CPU to frequently be in an idle "waiting for data" state. Many optimization techniques in modern computers revolve around this problem:

Optimization Technique	Principle
Cache	Place small, high-speed storage near the CPU
Instruction Pipeline	Allow multiple instructions to be in different stages simultaneously
Superscalar	Issue multiple instructions in the same clock cycle
Multi-core Parallelism	Multiple CPU cores share computing tasks

2. Instruction Set Architecture: The Interface Between CPU and Software

In the previous section, we learned the core idea of the Von Neumann architecture: programs are stored in memory just like data. But this raises a key question — what does a "program" stored in memory actually look like? How does the CPU understand it?

The answer is the Instruction Set Architecture (ISA). If the CPU is a service, then the instruction set is its API documentation — it defines all the commands the CPU can understand, the format of each command, and the data range each command can operate on. Every line of code you write is ultimately translated by the compiler into a sequence of calls to this "API."

2.1 From Code to Instructions: A Line of Code's Translation Journey

First, let's establish a holistic understanding: the code you write in an editor and what the CPU actually executes are separated by several layers of translation.

🔗 From Code to Instructions: One Line Through the Translation Pipeline

Click each stage to see how source code becomes CPU-executable instructions

1Source code

int a = 10 + 5;

This is high-level code written in an editor. It is easy for humans to read, but the CPU does not understand int or the + operator directly.

2Compiler emits assembly

MOV  R1, #10    ; put 10 into register R1
MOV  R2, #5     ; put 5 into register R2
ADD  R3, R1, R2 ; R3 = R1 + R2
STORE R3, [a]   ; store the result at variable a

3Assembler emits machine code

0001 0001 0000 1010  → MOV R1, #10
0001 0010 0000 0101  → MOV R2, #5
0010 0011 0001 0010  → ADD R3, R1, R2
0100 0011 1000 0000  → STORE R3, [a]

4CPU executes instructions

Clock 1: fetch → decode → execute MOV R1, #10
Clock 2: fetch → decode → execute MOV R2, #5
Clock 3: fetch → decode → execute ADD R3, R1, R2
Clock 4: fetch → decode → execute STORE R3, [a]

↓

💡 Key idea

An instruction set is the CPU API: it defines every command the CPU understands. A compiler translates your high-level language into calls to that API. Different CPUs, such as x86 and ARM, have different instruction sets, just as different services expose different APIs.

This translation chain is key to understanding instruction sets:

Layer	Content	Who Can Understand It
High-level language	`int a = 10 + 5;`	Humans
Assembly language	`MOV R1, #10` / `ADD R3, R1, R2`	Humans (with training)
Machine code	`0001 0001 0000 1010`	CPU

Why Understand This Chain?

When you see a compilation error, you know it occurred at the "high-level language → assembly" step
When you see a runtime crash, you know the problem is at the CPU instruction execution stage
When understanding performance optimization, you know what optimizations the compiler makes during "translation"
When choosing a CPU architecture (x86 vs ARM), you know the difference is in the "instruction set API"

2.2 What Does an Instruction Look Like?

Now that we know code gets translated into instructions, the next question is: what is the internal structure of an instruction?

Each machine instruction is essentially a string of binary digits, but it has a strict internal format. The two most core parts are:

Opcode: Tells the CPU "what to do" — add? jump? or read memory?
Operand: Tells the CPU "what to do it to" — which register? which memory address? what constant?

Just as a sentence has a "verb + object" structure, an instruction has an "operation + target" structure:

Instruction:  ADD  R3, R1, R2
              ───  ──────────
              Opcode  Operands
              (do addition) (R3 = R1 + R2)

Based on the number of operands, instruction formats range from simple to complex in four types:

Machine Instruction FormatOpcode + operands = machine instruction

Opcode8 bits

Destination8 bits

Source 18 bits

Source 28 bits

Example instruction

01101100 00000001 00000010 00000011

Result goes to a new destination without changing sources

Three-address format

Three addresses identify the destination and two source operands separately. The result goes into the destination without modifying the sources.

Common examples

ADD R1, R2, R3R1 = R2 + R3

SUB R1, R2, R3R1 = R2 - R3

MUL R1, R2, R3R1 = R2 × R3

Common opcodes

00000000NOPNo operation

00000001MOVMove data

00000010ADDAddition

00000011SUBSubtraction

00000100MULMultiplication

00000101DIVDivision

00000110ANDLogical AND

00000111ORLogical OR

00001000NOTLogical NOT

00001001XORExclusive OR

00001010SHLShift left

00001011SHRShift right

00001100JMPUnconditional jump

00001101JEJump if equal

00001110JNEJump if not equal

00001111CALLCall subroutine

00010000RETReturn

00010001PUSHPush stack

00010010POPPop stack

00010011LOADLoad from memory

00010100STOREStore to memory

Format	Structure	Example	Use Case
Zero-address	Opcode only	`RET` (return)	Stack machines; operands implicit at stack top
One-address	Opcode + 1 address	`INC R1` (increment R1 by 1)	Single-operand operations
Two-address	Opcode + 2 addresses	`MOV R1, R2`	Most common; data transfer and operations
Three-address	Opcode + 3 addresses	`ADD R3, R1, R2`	Preserves source operands

Why So Many Formats?

This is a trade-off between space and flexibility. Zero-address instructions are the shortest (saving memory) but require extra stack operations; three-address instructions are the most flexible (preserving source data) but occupy more bits. Different CPU architectures choose different combinations of instruction formats.

2.3 How Does the CPU Find Data? — Addressing Modes

An instruction tells the CPU to "do addition," but where are the two numbers for the addition? They might be written directly in the instruction, in a register, or at some memory address. Addressing modes are the rules that tell the CPU "where to find the operands."

Using a real-life analogy of "finding someone":

Addressing Mode	Analogy	Instruction Example	Description
Immediate addressing	The person is standing right in front of you	`MOV R1, #100`	Data is written directly in the instruction; fastest
Register addressing	Call an internal extension to reach a colleague	`MOV R1, R2`	Data is in a CPU register; very fast
Direct addressing	Know the address, go directly to the door	`MOV R1, [0x1000]`	Memory address is written in the instruction
Indirect addressing	Ask the front desk "which room is Zhang San in?"	`MOV R1, [R2]`	The register contains an address; requires one extra lookup
Indexed addressing	"Building 3 + Floor 5" to calculate the room	`MOV R1, [R2+10]`	Base address + offset; used for array access

Addressing ModesHow an instruction finds operand locations

Immediate addressingImmediate Addressing

Definition

The operand is embedded directly in the instruction and is immediately available.

Instruction format

MOV R1, #100

Example

MOV R1, #100 ; R1 = 100

Immediate value 100 is stored directly in the instruction, so no register or memory lookup is needed.

Execution process

1CPU reads immediate value 100 directly from the instruction

2Write the immediate value into target register R1

3Execution completes without extra memory access

Characteristics

SpeedFast

FlexibilityLow

Addressing mode comparison

Addressing mode	Format	Speed	Use case
Immediate addressing	`MOV R1, #100`	Fastest	Constant assignment and initialization
Register addressing	`MOV R1, R2`	Fastest	Register-to-register data transfer
Direct addressing	`MOV R1, [100]`	Relatively fast	Accessing global variables
Indirect addressing	`MOV R1, [R2]`	Relatively fast	Pointers and array traversal
Indexed addressing	`MOV R1, [R2 + R3]`	Relatively fast	Array access and loops
Based addressing	`MOV R1, [R2 + 100]`	Relatively fast	Struct fields and function parameters
Relative addressing	`JMP LABEL`	Fastest	Loops and conditional branches

Why So Many Addressing Modes?

Different scenarios require different "find data" strategies:

Constant assignment (x = 100) → Immediate addressing; data is right in the instruction
Variable operations (a + b) → Register addressing; data already loaded into registers
Array access (arr[i]) → Indexed addressing; base address + index offset
Pointer operations (*ptr) → Indirect addressing; register holds the address

When you write arr[i], you don't think about addressing modes, but the compiler automatically selects the most appropriate one.

2.4 The CPU's Capability List — Instruction Categories

Now that we know instruction formats and addressing modes, the last question is: what exactly can the CPU do?

All instructions can be grouped into six categories that cover everything a computer can do:

Type	What It Does	Representative Instructions	Maps to Your Code
Data Transfer	Move data around	MOV, LOAD, STORE	`let x = y`, function parameter passing
Arithmetic	Addition, subtraction, multiplication, division	ADD, SUB, MUL, DIV	`a + b`, `count++`
Logic	Bitwise operations	AND, OR, NOT, XOR	`flags & 0xFF`, permission checks
Shift	Shift left/right	SHL, SHR	`x << 2` (equivalent to multiplying by 4)
Control Transfer	Jumps and calls	JMP, CALL, RET	`if`, `for`, function calls
Input/Output	Communicate with peripherals	IN, OUT	Read keyboard, write to screen

A Key Insight

All the code you write — no matter how complex the business logic or how flashy the UI animations — is ultimately broken down into combinations of these six basic operation types. The CPU's "intelligence" lies not in how complex its individual operations are, but in its ability to execute these simple operations billions of times per second.

2.5 Two Design Philosophies: CISC vs RISC

There is a fundamental divide in instruction set design: should each instruction be as powerful as possible, or as simple as possible?

This divide created two camps that directly affect every device you use today:

⚔️ Two Design Philosophies: CISC vs RISC

Click a comparison dimension to see the core differences between instruction set styles

Thousands of complex instructions

Instruction count

Tens to hundreds of streamlined instructions

One instruction can do many things

Single instruction

One instruction does one thing

Variable length (1-15 bytes)

Instruction length

Fixed length, often 4 bytes

Complex instructions take multiple cycles

Execution speed

Most instructions complete in one cycle

Higher

Power use

Lower

Harder to optimize because lengths vary

Pipeline

Easier to optimize because instructions are regular

Lighter because hardware does more

Compiler burden

Heavier because software optimizes more

🌍 Real-world choices

💻 Your computerx86 (CISC)Compatible with decades of software

📱 Your phoneARM (RISC)Low power consumption and longer battery life

🍎 Apple SiliconARM (RISC)High performance per watt reshaped laptops

🔬 RISC-V boardRISC-V (RISC)Open and royalty-free for IoT and education

An analogy to understand this:

CISC is like a Swiss Army knife: One tool integrates scissors, a bottle opener, a screwdriver... lots of functions but each one may not be the best
RISC is like a professional tool set: Each tool does only one thing, but does it fast and well

Why Does Your Phone Use ARM and Your Computer Use x86?

x86 (CISC) has dominated the PC and server market for 40 years, accumulating a massive software ecosystem. Switching architectures means all software must be recompiled
ARM (RISC) dominates mobile devices thanks to its low-power advantage. Phone batteries are small; every milliwatt counts
Apple Silicon proved that RISC can also deliver high performance — the M-series chips surpassed x86 competitors in both performance and power efficiency
RISC-V is an open-source RISC architecture rapidly rising in IoT, education, and AI chip sectors

Summary: The instruction set is the bridge between software and hardware. Your code is translated by the compiler into instructions, which tell the CPU what to do and to whom through opcodes and operands. Addressing modes determine where data comes from. Different instruction set designs (CISC/RISC) determine the CPU's performance characteristics and applicable scenarios.
Now we know the "static structure" of instructions — what they look like and what types exist. The next question is: how does the CPU execute these instructions step by step internally? That's the control unit's job.

3. Control Unit: The CPU's "Command Center"

3.1 Components of the Control Unit

The control unit is the "brain" of the CPU, responsible for coordinating all components to work according to instruction requirements:

How the Controller WorksHow control signals coordinate CPU components

Control Unit CU

Instruction Register IR

Instruction Decoder

Timing Generator

Output control signals:

PC→MAR

MEM→MDR

MDR→IR

IR→ID

ALU→ACC

ACC→MDR

Program Counter

→

MAR

Address Register

→

Memory

Main Memory

MDR

Data Register

→

Instruction Register

→

Decoder

ALU

Arithmetic Logic Unit

↔

ACC

Accumulator

Current microinstruction

Core controller concepts

Control signals:Electrical signals emitted by the controller to control each component on the data path.

Timing:CPU operations advance by clock ticks; each tick performs specific micro-operations.

Hardwired vs microprogrammed:Hardwired controllers are fast but complex; microprogrammed controllers are flexible but slightly slower.

Component	Function
Program Counter (PC)	Stores the address of the next instruction
Instruction Register (IR)	Stores the currently executing instruction
Instruction Decoder	Parses the instruction's opcode and operands
Timing Generator	Generates clock beat signals to control component timing
Micro-operation Sequence Generator	Generates the series of control signals needed to execute instructions

Program Status Word (PSW)The CPU status indicators

Carry flag

Parity flag

Auxiliary carry

Zero flag

Sign flag

Trap flag

Interrupt flag

Direction flag

Overflow flag

How operation results affect flags

Operand A:

Operand B:

Result:

CF:0PF:0AF:0ZF:0SF:0TF:0IF:1DF:0OF:0

Typical flag uses

🔀

Conditional jumps

JE, JNE, JG, JL and similar instructions decide jumps based on ZF, SF, and OF.

➕

Arithmetic

Multi-word arithmetic uses CF for carry and OF for signed overflow.

🔄

Loop control

Loop instructions often use ZF to detect the loop ending condition.

3.2 Instruction Cycle

The CPU executes an instruction through a complete instruction cycle, typically including:

Fetch Cycle: Read instruction from memory into IR
Decode Cycle: Parse the instruction's meaning
Execute Cycle: Perform the operation
Memory Access Cycle: Access memory if needed
Write-Back Cycle: Write the result back to a register or memory

3.3 Micro-Operations

Micro-operations are the most basic operations driven by control signals. For example, the "fetch" phase can be decomposed into the following micro-operations:

Beat	Micro-Operation	Control Signals
T1	PC → MAR	PCout, MARin
T2	MEM → MDR	MEMout, MDRin
T3	MDR → IR	MDRout, IRin
T4	PC + 1 → PC	PC+1, PCin

3.4 Hardwired vs. Microprogrammed Control

Feature	Hardwired Control	Microprogrammed Control
Implementation	Combinational logic circuits	Microinstruction sequences (firmware)
Speed	Fast	Slightly slower
Design difficulty	Complex	Simpler
Flexibility	Poor (changes require circuit redesign)	Good (just modify the microprogram)
Typical applications	RISC processors	Early CISC processors

4. Storage Hierarchy: Why Do We Need Cache?

4.1 Storage Hierarchy Structure

A computer's storage devices form a pyramid structure:

Storage HierarchyFrom fastest to slowest, smallest to largest

Registers

Fastest

Smallest (KB)

Cache

Very fast

Small (MB)

Memory

Fast

Medium (GB)

Disk

Slow

Large (TB)

Network/Cloud

Slowest

Unlimited

Detailed comparison

Storage level	Access time	Typical capacity	Cost
Registers	< 1 ns	A few KB	Highest
L1 cache	~1 ns	64 KB	Very high
L2 cache	~3 ns	256 KB	High
L3 cache	~10 ns	8 MB	Medium
Memory	~100 ns	8-32 GB	Medium-low
SSD	~100 μs	256 GB-2 TB	Low
HDD	~10 ms	1-10 TB	Lowest

Locality principle

Programs tend to access recently accessed locations (temporal locality) and nearby locations (spatial locality)

By exploiting locality, caches can significantly improve performance.

Level	Storage Type	Access Time	Typical Capacity	Location
Registers	SRAM	<1ns	A few KB	Inside CPU
L1 Cache	SRAM	~1ns	32-64KB	Near CPU core
L2 Cache	SRAM	~3-10ns	256KB-1MB	On CPU chip
L3 Cache	SRAM	~10-20ns	2-16MB	On CPU chip / shared
Main Memory (RAM)	DRAM	~50-100ns	8-64GB	On motherboard
SSD	Flash	~10-100μs	256GB-2TB	On motherboard
HDD	Magnetic disk	~5-10ms	1-10TB	Inside case

Analogy for Speed Differences

If CPU accessing L1 cache is like grabbing a piece of paper from your desk:

Accessing main memory → Taking the elevator to a convenience store downstairs to buy paper
Accessing SSD → Driving to another city to buy paper
Accessing HDD → Flying to another country to buy paper

The speed difference can be millions of times!

4.2 Cache Principles

Cache is fast storage located between the CPU and main memory. Its core idea is based on two locality principles:

Locality Principles

Temporal locality: If data was just accessed, it's likely to be accessed again soon
Spatial locality: If data was accessed, nearby data is likely to be accessed too

How Cache Works

Hit: The data the CPU needs is in the cache; read directly
Miss: The data is not in the cache; must be loaded from main memory

Hit rate = Number of hits / Total number of accesses
Average access time = Hit rate × Cache time + (1 - Hit rate) × Memory time

Cache PrinciplesThe bridge between CPU and memory

CPU core

⚡

→

L1 cache

64 KB~1ns

→

L2 cache

256 KB~5ns

→

L3 cache

8 MB~15ns

→

Main memory

16 GB~100ns

Cache operation demo

Operation log

Why does cache work? Locality principle

⏱️

Temporal locality

Recently accessed data is likely to be accessed again.

Variables inside loops

📦

Spatial locality

After one item is accessed, nearby data is likely to be accessed.

Array traversal and sequential execution

Cache mapping methods

Each memory block maps to exactly one cache line.

SpeedFastest

Hit rateLower

Implementation complexityLowest

Hit-rate calculation

Average access time = H × Tc + (1-H) × Tm

Cache access time (Tc):2 ns

Memory access time (Tm):100 ns

Hit rate (H):90%

Average access time = 12 ns

4.3 Cache Mapping Methods

Method	Principle	Advantage	Disadvantage
Direct-mapped	Each memory block can only go to one fixed location	Simple and fast	High conflict rate
Set-associative	Each memory block can go to N locations (N-way)	Balances speed and hit rate	More complex implementation
Fully associative	Any location	Lowest conflict rate	Hardest to implement (requires comparing all tags)

4.4 Virtual Memory

Virtual memory is an important abstraction provided by the operating system:

Each process thinks it has a complete virtual address space
The operating system translates virtual addresses to physical addresses
Infrequently used pages can be swapped out to disk (swap space)

Virtual Memory Analogy

Think of virtual memory as a hotel managing rooms:

You (the process) think the entire building is yours
In reality, the hotel (OS) only assigns you the rooms you currently need
Unused rooms get "swapped out" to storage (disk)
Needed rooms can be "swapped in" at any time

5. Bus and I/O: The Computer's "Blood Vessels"

5.1 System Bus

A Bus is the data channel connecting computer components:

Computer Bus SystemAddress bus, data bus, and control bus

CPU

Control unit

ALU

Address bus32 bits

Data bus64 bits

Control busControl signal

Main memory

0x0

0x1

0x2

0x3

0x4

0x5

0x6

0x7

Operation flow

Bus concepts

Address bus

CPU sends memory addresses over a one-way path.

Data bus

Transfers actual data in both directions.

Control bus

Transfers read/write and other control signals.

Bus Type	Function	Direction	Typical Width
Address Bus	Transfers memory addresses	Unidirectional (CPU→Memory)	32-bit/64-bit
Data Bus	Transfers data	Bidirectional	32-bit/64-bit
Control Bus	Transfers control signals	Bidirectional	Multiple signal lines

5.2 Bus Arbitration

When multiple devices simultaneously request bus access, an arbitration mechanism determines who goes first:

Arbitration Method	Description
Centralized arbitration	A central arbiter makes the decision
Distributed arbitration	Devices negotiate among themselves

5.3 I/O Device Access Methods

Method	Principle	Advantage	Disadvantage
Programmed I/O (Polling)	CPU polls I/O status	Simple	Low CPU utilization
Interrupt-driven I/O	I/O device actively notifies CPU when done	CPU can work in parallel	Interrupt handling has overhead
DMA	I/O device accesses memory directly	CPU not involved at all	Requires a DMA controller

I/O Method ComparisonProgrammed I/O · Interrupt-driven I/O · DMA

Programmed I/OProgrammed I/O

Workflow

1CPU polls the I/O device status

↓

2Device busy? Keep waiting

↓

3Device ready, send read/write command

↓

4CPU reads or writes data byte by byte

↓

5Check whether transfer is complete

↓

6If incomplete, keep polling

CPU involvementHigh

SpeedSlow

ComplexityLow

Three I/O methods compared

Feature	Programmed I/O	Interrupt-driven I/O	DMA
CPU involvement	Involved throughout	Only handles interrupts	Almost uninvolved
Data transfer	CPU moves each byte	CPU moves each word	Device transfers directly to memory
Pros	Simple and flexible control	High CPU efficiency	CPU is fully freed
Cons	Low CPU utilization	Interrupt overhead	Complex hardware
Best for	Simple or low-speed devices	Low/medium-speed devices	High-speed bulk transfer

5.4 DMA Principles

DMA (Direct Memory Access) allows I/O devices to exchange data directly with memory:

How Networks ConnectThe complete path from sending to receiving

💻

Sender

192.168.1.100

📧

Mail app

📧

Application layer

Mail software creates the message content

🔐

Transport layer

TCP adds port numbers and sequence numbers

🌐

Network layer

IP adds source and destination addresses

🔌

Data link layer

Ethernet adds MAC addresses

⚡

Physical layer

Convert to electrical signals and send

🖥️

Receiver

192.168.1.200

📧

Mail app

Data encapsulation process

7Application layer

Message content: "Hello!"

6Presentation layer

Encoding: UTF-8

5Session layer

Session ID: sess_123

4Transport layer

TCP header: port 25

3Network layer

IP header: 192.168.1.100 → 192.168.1.200

2Data link layer

Ethernet frame: MAC address

1Physical layer

Bitstream: 01010101...

Network protocol stack (OSI model)

Sender

Application layer (HTTP, SMTP)

Transport layer (TCP, UDP)

Network layer (IP)

Data link layer (Ethernet)

Physical layer (electrical signals)

→

Receiver

Application layer (HTTP, SMTP)

Transport layer (TCP, UDP)

Network layer (IP)

Data link layer (Ethernet)

Physical layer (electrical signals)

Without DMA: The CPU participates in the entire data transfer process and can't do anything else
With DMA: The CPU tells the DMA controller "transfer from where to where, how much," then goes to do other tasks. The DMA notifies the CPU when complete

DMA Analogy

This is like ordering food delivery:

Without DMA: You go to the supermarket yourself, buy groceries, go home, wash vegetables, and cook (involved in the entire process)
With DMA: You place an order by phone, and the delivery person brings it straight to your kitchen (someone else handles it; you just "receive the goods" at the end)

5.5 Interrupt Mechanism

Interrupts are a very important mechanism in computer systems:

After an I/O device completes an operation, it sends an interrupt request to the CPU
The CPU, currently executing an instruction, responds to the interrupt after completing the current instruction
The CPU saves its current state and jumps to the interrupt handler
After handling is complete, it restores the state and continues execution

6. CPU Performance Optimization: Pipeline Technology

6.1 Instruction Pipeline

Instruction pipelining is a parallel technique that maximizes CPU efficiency:

CPU Instruction PipelineFive stages: Fetch → Decode → Execute → Memory → Write Back

Fetch(IF)

Decode(ID)

Execute(EX)

Memory(MEM)

Write Back(WB)

ADD R1,R2,R3

SUB R4,R1,R5

LOAD R6,[R4]

STORE R6,[R7]

AND R8,R1,R6

Total cycles0

Completed instructions0

CPI0

Pipeline principle

Sequential execution: each instruction finishes before the next starts, so N instructions require N × 5 cycles.

Pipeline execution: multiple instructions occupy different stages at once; ideally CPI ≈ 1.

How Pipelining Works

Sequential execution (5 instructions, 15 cycles):
Instr 1: IF→ID→EX→MEM→WB
Instr 2:            IF→ID→EX→MEM→WB
Instr 3:                         IF→ID→EX→MEM→WB
...

Pipeline execution (5 instructions, 9 cycles):
Instr 1: IF→ID→EX→MEM→WB
Instr 2:    IF→ID→EX→MEM→WB
Instr 3:       IF→ID→EX→MEM→WB
...

Ideally, CPI (cycles per instruction) for N instructions ≈ 1

6.2 Pipeline Hazards

While pipelining improves performance, it also introduces hazard problems:

Type	Cause	Solution
Structural hazard	Hardware resource conflict	Add hardware / stagger execution
Data hazard	Later instruction needs the result of an earlier one	Data forwarding / bubbles / scheduling
Control hazard	Branch instructions change execution flow	Delay slots / branch prediction

7. Summary: How Does a Computer "Run"?

Let's connect the entire process using professional terminology:

After a program starts, the operating system loads the executable file from disk into memory. The CPU's fetch unit (IF) reads instructions from memory into the instruction register (IR) via the address bus. The control unit decodes the instruction (ID), and after identifying the operation type, generates the corresponding control signals. The arithmetic unit (EX) performs arithmetic and logic operations. If memory access is needed, it accesses memory (MEM) via the data bus, and finally the result is written back (WB) to a register or memory. The entire process is driven by the clock, with micro-operation sequences generated by the control unit coordinating all components to work in an orderly manner.

Next Steps

Now that you've mastered the professional knowledge of computer organization, you can continue learning:

Operating Systems: Understand how programs run on an operating system, and how processes, threads, and memory management are implemented
Data Encoding, Storage, and Transmission: Deepen your understanding of how data is represented in computers

Topic	Recommended Resources
Computer Architecture	Computer Organization and Design: The Hardware/Software Interface - Patterson & Hennessy
CPU Microarchitecture	Computer Systems: A Programmer's Perspective - Bryant & O'Hallaron
Instruction Set Architecture	ARMv8 Architecture Manual, Intel x64 Manual
Cache Principles	Cache Coherence Protocol (MESI), Cache Write Policies
Operating Systems	Next chapter: "Operating Systems"

Computer Organization Principles ​

0. Big Picture: Computer Hardware System ​

1. Von Neumann Architecture: The "Constitution" of Modern Computers ​

1.1 The Stored-Program Principle ​

1.2 Five Major Components ​

1.3 Data Path ​

1.4 The Von Neumann Bottleneck ​

2. Instruction Set Architecture: The Interface Between CPU and Software ​

2.1 From Code to Instructions: A Line of Code's Translation Journey ​

🔗 From Code to Instructions: One Line Through the Translation Pipeline

2.2 What Does an Instruction Look Like? ​

2.3 How Does the CPU Find Data? — Addressing Modes ​

2.4 The CPU's Capability List — Instruction Categories ​

2.5 Two Design Philosophies: CISC vs RISC ​

⚔️ Two Design Philosophies: CISC vs RISC

3. Control Unit: The CPU's "Command Center" ​

3.1 Components of the Control Unit ​

3.2 Instruction Cycle ​

3.3 Micro-Operations ​

3.4 Hardwired vs. Microprogrammed Control ​

4. Storage Hierarchy: Why Do We Need Cache? ​

4.1 Storage Hierarchy Structure ​

4.2 Cache Principles ​

How Cache Works ​

4.3 Cache Mapping Methods ​

4.4 Virtual Memory ​

5. Bus and I/O: The Computer's "Blood Vessels" ​

5.1 System Bus ​

5.2 Bus Arbitration ​

5.3 I/O Device Access Methods ​

5.4 DMA Principles ​

5.5 Interrupt Mechanism ​

6. CPU Performance Optimization: Pipeline Technology ​

6.1 Instruction Pipeline ​

How Pipelining Works ​

6.2 Pipeline Hazards ​

7. Summary: How Does a Computer "Run"? ​

Further Reading ​

Next Steps ​

Computer Organization Principles

0. Big Picture: Computer Hardware System

1. Von Neumann Architecture: The "Constitution" of Modern Computers

1.1 The Stored-Program Principle

1.2 Five Major Components

1.3 Data Path

1.4 The Von Neumann Bottleneck

2. Instruction Set Architecture: The Interface Between CPU and Software

2.1 From Code to Instructions: A Line of Code's Translation Journey

2.2 What Does an Instruction Look Like?

2.3 How Does the CPU Find Data? — Addressing Modes

2.4 The CPU's Capability List — Instruction Categories

2.5 Two Design Philosophies: CISC vs RISC

3. Control Unit: The CPU's "Command Center"

3.1 Components of the Control Unit

3.2 Instruction Cycle

3.3 Micro-Operations

3.4 Hardwired vs. Microprogrammed Control

4. Storage Hierarchy: Why Do We Need Cache?

4.1 Storage Hierarchy Structure

4.2 Cache Principles

How Cache Works

4.3 Cache Mapping Methods

4.4 Virtual Memory

5. Bus and I/O: The Computer's "Blood Vessels"

5.1 System Bus

5.2 Bus Arbitration

5.3 I/O Device Access Methods

5.4 DMA Principles

5.5 Interrupt Mechanism

6. CPU Performance Optimization: Pipeline Technology

6.1 Instruction Pipeline

How Pipelining Works

6.2 Pipeline Hazards

7. Summary: How Does a Computer "Run"?

Further Reading

Next Steps