電腦組成原理

前言

從電晶體到 CPU 後，電腦如何組成完整系統？ 上一章我們從電晶體出發，建構了加法器、暫存器、運算單元，最終拼出了 CPU 核心。但僅有 CPU 是不夠的——它需要和記憶體、I/O 裝置協同工作，需要匯流排連接各個部件，需要指令系統來驅動。這一章我們將從 CPU 的內部視角轉向整個電腦系統的視角，深入理解馮諾依曼架構、指令系統、儲存層次、匯流排與 I/O 的專業原理。

這篇文章會帶你學什麼？

學完這章後，你將獲得：

系統視角：理解 CPU、記憶體、I/O 是如何協同工作的
硬體專業術語：掌握指令週期、流水線、CPI、快取命中率等硬核概念
效能思維：理解電腦組成中的瓶頸與最佳化手段
後續學習基礎：為作業系統、體系架構、嵌入式開發打下專業基礎

章节	內容	核心概念
第 1 章	馮諾依曼架構	儲存程式、五大組成部件、資料通路
第 2 章	指令系統	指令格式、定址方式、CISC vs RISC
第 3 章	CPU 控制器	控制單元、微操作、指令週期
第 4 章	儲存體系	快取、主記憶體、虛擬記憶體、分頁機制
第 5 章	匯流排與 I/O	匯流排仲裁、DMA、中斷機制

0. 全景圖：電腦硬體系統

Detailed CPU Instruction Cycle Demo

CPU

Control Unit CU

PC256Program Counter

IR—Instruction Register

MAR—Memory Address Register

MDR—Memory Data Register

Arithmetic Logic Unit ALU

ACC0Accumulator

—

General Register File

R00

R10

R20

R30

Address Bus

Data Bus

Control Bus

→

↔

→

Main Memory

▶0x100LOAD R0, [0x200]

0x101LOAD R1, #7

0x102ADD R0, R1

0x103STORE [0x201], R0

Data Area

0x51242

0x5130

FetchFetch

DecodeDecode

ExecuteExecute

Write BackWrite Back

Step 0 / 32

Click "Clock Pulse" to step through execution, or "Auto Run" to play continuously.

逐層解構：電腦硬體系統

第一層：CPU 核心 — 負責指令執行
第二層：暫存器組 — CPU 內部的高速儲存單元
第三層：主記憶體 — 存放程式和資料的記憶體
第四層：I/O 裝置 — 輸入輸出裝置
第五層：系統匯流排 — 連接 CPU、記憶體、I/O 的資料通道

1. 馮諾依曼架構：現代電腦的「憲法」

1.1 儲存程式原理

1945 年，數學家約翰·馮·諾依曼提出了劃時代的儲存程式（Stored-program）架構思想。

核心概念

儲存程式：程式本身作為一種特殊的資料，和普通資料一樣儲存在記憶體中。CPU 可以像讀寫資料一樣讀取並執行儲存在記憶體中的程式指令。

1.2 五大組成部件

CPU Register FileHigh-speed storage inside the CPU

Special Registers

0x00401000

Program counter

0x8B450008

Instruction register

MAR

0x00401000

Memory address register

MDR

0x00000000

Memory data register

ACC

0x0000001A

Accumulator

General Purpose Registers

RAX

0x00000000

Return value

RBX

0x00000000

Base register

RCX

0x00000000

Counter register

RDX

0x00000000

Data register

RSI

0x00000000

Source index

RDI

0x00000000

Destination index

RBP

0x00000000

Base pointer

RSP

0x7FFDE000

Stack pointer

Program Status Word (PSW / FLAGS)

CF0Carry flag

PF0Parity flag

AF0Auxiliary carry

ZF0Zero flag

SF0Sign flag

OF0Overflow flag

Registers vs Memory

Feature	Register	Memory (RAM)
Location	Inside the CPU	Outside the CPU
Access speed	Fastest (< 1ns)	Slower (50-100ns)
Capacity	Tiny (bytes)	Large (GB)
Role	Hold instructions, operands, and results	Store programs and data

部件	英文	功能	主要組成
運算器	ALU	執行算術和邏輯運算	加法器、移位器、比較器
控制器	CU	指揮協調各部件工作	指令暫存器、解碼器、時序產生器
記憶體	Memory	儲存程式和資料	記憶體地址暫存器(MAR)、記憶體資料暫存器(MDR)
輸入裝置	Input	資訊輸入	鍵盤、滑鼠、掃描器
輸出裝置	Output	資訊輸出	顯示器、印表機

1.3 馮諾依曼瓶頸

馮諾依曼架構有一個著名的效能瓶頸：

CPU 與記憶體之間的資料傳輸速度，遠低於 CPU 的處理速度。

最佳化技術	原理
快取(Cache)	在 CPU 附近放置小容量高速儲存
指令流水線	讓多條指令同時處於不同階段
超純量	同一時脈週期發射多條指令
多核並行	多個 CPU 核心分擔計算任務

2. 指令系統：CPU 與軟體的介面

指令系統（Instruction Set Architecture, ISA） 是 CPU 的 API 文件。

2.1 從程式碼到指令

🔗 From Code to Instructions: One Line Through the Translation Pipeline

Click each stage to see how source code becomes CPU-executable instructions

1Source code

int a = 10 + 5;

This is high-level code written in an editor. It is easy for humans to read, but the CPU does not understand int or the + operator directly.

2Compiler emits assembly

MOV  R1, #10    ; put 10 into register R1
MOV  R2, #5     ; put 5 into register R2
ADD  R3, R1, R2 ; R3 = R1 + R2
STORE R3, [a]   ; store the result at variable a

3Assembler emits machine code

0001 0001 0000 1010  → MOV R1, #10
0001 0010 0000 0101  → MOV R2, #5
0010 0011 0001 0010  → ADD R3, R1, R2
0100 0011 1000 0000  → STORE R3, [a]

4CPU executes instructions

Clock 1: fetch → decode → execute MOV R1, #10
Clock 2: fetch → decode → execute MOV R2, #5
Clock 3: fetch → decode → execute ADD R3, R1, R2
Clock 4: fetch → decode → execute STORE R3, [a]

↓

💡 Key idea

An instruction set is the CPU API: it defines every command the CPU understands. A compiler translates your high-level language into calls to that API. Different CPUs, such as x86 and ARM, have different instruction sets, just as different services expose different APIs.

2.2 一條指令長什麼樣？

每條機器指令有嚴格的內部格式：

操作碼（Opcode）：告訴 CPU「做什麼」
運算元（Operand）：告訴 CPU「對誰做」

Machine Instruction FormatOpcode + operands = machine instruction

Opcode8 bits

Destination8 bits

Source 18 bits

Source 28 bits

Example instruction

01101100 00000001 00000010 00000011

Result goes to a new destination without changing sources

Three-address format

Three addresses identify the destination and two source operands separately. The result goes into the destination without modifying the sources.

Common examples

ADD R1, R2, R3R1 = R2 + R3

SUB R1, R2, R3R1 = R2 - R3

MUL R1, R2, R3R1 = R2 × R3

Common opcodes

00000000NOPNo operation

00000001MOVMove data

00000010ADDAddition

00000011SUBSubtraction

00000100MULMultiplication

00000101DIVDivision

00000110ANDLogical AND

00000111ORLogical OR

00001000NOTLogical NOT

00001001XORExclusive OR

00001010SHLShift left

00001011SHRShift right

00001100JMPUnconditional jump

00001101JEJump if equal

00001110JNEJump if not equal

00001111CALLCall subroutine

00010000RETReturn

00010001PUSHPush stack

00010010POPPop stack

00010011LOADLoad from memory

00010100STOREStore to memory

2.3 CPU 怎麼找到資料？——定址方式

Addressing ModesHow an instruction finds operand locations

Immediate addressingImmediate Addressing

Definition

The operand is embedded directly in the instruction and is immediately available.

Instruction format

MOV R1, #100

Example

MOV R1, #100 ; R1 = 100

Immediate value 100 is stored directly in the instruction, so no register or memory lookup is needed.

Execution process

1CPU reads immediate value 100 directly from the instruction

2Write the immediate value into target register R1

3Execution completes without extra memory access

Characteristics

SpeedFast

FlexibilityLow

Addressing mode comparison

Addressing mode	Format	Speed	Use case
Immediate addressing	`MOV R1, #100`	Fastest	Constant assignment and initialization
Register addressing	`MOV R1, R2`	Fastest	Register-to-register data transfer
Direct addressing	`MOV R1, [100]`	Relatively fast	Accessing global variables
Indirect addressing	`MOV R1, [R2]`	Relatively fast	Pointers and array traversal
Indexed addressing	`MOV R1, [R2 + R3]`	Relatively fast	Array access and loops
Based addressing	`MOV R1, [R2 + 100]`	Relatively fast	Struct fields and function parameters
Relative addressing	`JMP LABEL`	Fastest	Loops and conditional branches

2.4 兩種設計哲學：CISC vs RISC

⚔️ Two Design Philosophies: CISC vs RISC

Click a comparison dimension to see the core differences between instruction set styles

Thousands of complex instructions

Instruction count

Tens to hundreds of streamlined instructions

One instruction can do many things

Single instruction

One instruction does one thing

Variable length (1-15 bytes)

Instruction length

Fixed length, often 4 bytes

Complex instructions take multiple cycles

Execution speed

Most instructions complete in one cycle

Higher

Power use

Lower

Harder to optimize because lengths vary

Pipeline

Easier to optimize because instructions are regular

Lighter because hardware does more

Compiler burden

Heavier because software optimizes more

🌍 Real-world choices

💻 Your computerx86 (CISC)Compatible with decades of software

📱 Your phoneARM (RISC)Low power consumption and longer battery life

🍎 Apple SiliconARM (RISC)High performance per watt reshaped laptops

🔬 RISC-V boardRISC-V (RISC)Open and royalty-free for IoT and education

為什麼你的手機用 ARM、電腦用 x86？

x86 (CISC) 統治了 PC 和伺服器市場 40 年
ARM (RISC) 憑藉低功耗優勢統治了行動裝置
Apple Silicon 證明了 RISC 也能做到高效能
RISC-V 是開源的 RISC 架構，正在快速崛起

3. 控制器：CPU 的「指揮中心」

How the Controller WorksHow control signals coordinate CPU components

Control Unit CU

Instruction Register IR

Instruction Decoder

Timing Generator

Output control signals:

PC→MAR

MEM→MDR

MDR→IR

IR→ID

ALU→ACC

ACC→MDR

Program Counter

→

MAR

Address Register

→

Memory

Main Memory

MDR

Data Register

→

Instruction Register

→

Decoder

ALU

Arithmetic Logic Unit

↔

ACC

Accumulator

Current microinstruction

Core controller concepts

Control signals:Electrical signals emitted by the controller to control each component on the data path.

Timing:CPU operations advance by clock ticks; each tick performs specific micro-operations.

Hardwired vs microprogrammed:Hardwired controllers are fast but complex; microprogrammed controllers are flexible but slightly slower.

3.1 指令週期

取指週期 (Fetch): 從記憶體讀取指令到 IR
解碼週期 (Decode): 解析指令含義
執行週期 (Execute): 執行操作
訪存週期 (Memory Access): 如果需要訪存
寫回週期 (Write Back): 把結果寫回暫存器或記憶體

3.2 硬接線 vs 微程式控制器

特性	硬接線控制器	微程式控制器
實作方式	組合邏輯電路	微指令序列（韌體）
速度	快	稍慢
彈性	差	好

4. 儲存體系：為什麼需要快取？

4.1 儲存層次結構

Storage HierarchyFrom fastest to slowest, smallest to largest

Registers

Fastest

Smallest (KB)

Cache

Very fast

Small (MB)

Memory

Fast

Medium (GB)

Disk

Slow

Large (TB)

Network/Cloud

Slowest

Unlimited

Detailed comparison

Storage level	Access time	Typical capacity	Cost
Registers	< 1 ns	A few KB	Highest
L1 cache	~1 ns	64 KB	Very high
L2 cache	~3 ns	256 KB	High
L3 cache	~10 ns	8 MB	Medium
Memory	~100 ns	8-32 GB	Medium-low
SSD	~100 μs	256 GB-2 TB	Low
HDD	~10 ms	1-10 TB	Lowest

Locality principle

Programs tend to access recently accessed locations (temporal locality) and nearby locations (spatial locality)

By exploiting locality, caches can significantly improve performance.

速度差異的比喻

如果把 CPU 訪問 L1 快取比作從桌上拿一張紙：

訪問記憶體 → 坐電梯去樓下便利商店買紙
訪問 SSD → 開車去另一個城市買紙
訪問 HDD → 坐飛機去另一個國家買紙

速度差異可達上百萬倍！

4.2 快取原理

Cache PrinciplesThe bridge between CPU and memory

CPU core

⚡

→

L1 cache

64 KB~1ns

→

L2 cache

256 KB~5ns

→

L3 cache

8 MB~15ns

→

Main memory

16 GB~100ns

Cache operation demo

Operation log

Why does cache work? Locality principle

⏱️

Temporal locality

Recently accessed data is likely to be accessed again.

Variables inside loops

📦

Spatial locality

After one item is accessed, nearby data is likely to be accessed.

Array traversal and sequential execution

Cache mapping methods

Each memory block maps to exactly one cache line.

SpeedFastest

Hit rateLower

Implementation complexityLowest

Hit-rate calculation

Average access time = H × Tc + (1-H) × Tm

Cache access time (Tc):2 ns

Memory access time (Tm):100 ns

Hit rate (H):90%

Average access time = 12 ns

區域性原理

時間區域性：如果一個資料剛被存取，它很可能很快又被存取
空間區域性：如果一個資料被存取，它附近的資料很可能也被存取

4.3 虛擬記憶體

虛擬記憶體的比喻

把虛擬記憶體想象成旅館管理房間：

你（程序）以為整棟樓都是你的
實際上旅館（OS）只給你分配當前需要的房間
不住的房間會被「換出」到倉庫（磁碟）

5. 匯流排與 I/O：電腦的「血管」

Computer Bus SystemAddress bus, data bus, and control bus

CPU

Control unit

ALU

Address bus32 bits

Data bus64 bits

Control busControl signal

Main memory

0x0

0x1

0x2

0x3

0x4

0x5

0x6

0x7

Operation flow

Bus concepts

Address bus

CPU sends memory addresses over a one-way path.

Data bus

Transfers actual data in both directions.

Control bus

Transfers read/write and other control signals.

5.1 I/O 裝置存取方式

方式	原理	優點	缺點
程式查詢	CPU 輪詢檢查 I/O 狀態	簡單	CPU 利用率低
中斷方式	I/O 完成後主動通知 CPU	CPU 可並行工作	中斷處理有開銷
DMA	I/O 裝置直接存取記憶體	CPU 完全不參與	需要 DMA 控制器

I/O Method ComparisonProgrammed I/O · Interrupt-driven I/O · DMA

Programmed I/OProgrammed I/O

Workflow

1CPU polls the I/O device status

↓

2Device busy? Keep waiting

↓

3Device ready, send read/write command

↓

4CPU reads or writes data byte by byte

↓

5Check whether transfer is complete

↓

6If incomplete, keep polling

CPU involvementHigh

SpeedSlow

ComplexityLow

Three I/O methods compared

Feature	Programmed I/O	Interrupt-driven I/O	DMA
CPU involvement	Involved throughout	Only handles interrupts	Almost uninvolved
Data transfer	CPU moves each byte	CPU moves each word	Device transfers directly to memory
Pros	Simple and flexible control	High CPU efficiency	CPU is fully freed
Cons	Low CPU utilization	Interrupt overhead	Complex hardware
Best for	Simple or low-speed devices	Low/medium-speed devices	High-speed bulk transfer

DMA 的比喻

這就像點外送：

沒有 DMA：你親自去超市買菜、回家、洗菜、炒菜（全過程參與）
有 DMA：你打電話下單，外送員直接送到廚房（別人幫你搞定，你只需要最後「收貨」）

6. CPU 效能最佳化：流水線技術

CPU Instruction PipelineFive stages: Fetch → Decode → Execute → Memory → Write Back

Fetch(IF)

Decode(ID)

Execute(EX)

Memory(MEM)

Write Back(WB)

ADD R1,R2,R3

SUB R4,R1,R5

LOAD R6,[R4]

STORE R6,[R7]

AND R8,R1,R6

Total cycles0

Completed instructions0

CPI0

Pipeline principle

Sequential execution: each instruction finishes before the next starts, so N instructions require N × 5 cycles.

Pipeline execution: multiple instructions occupy different stages at once; ideally CPI ≈ 1.

6.1 流水線冒險

類型	原因	解決方案
結構冒險	硬體資源衝突	增加硬體/錯開執行
資料冒險	後面的指令需要前面的結果	資料轉發/氣泡/排程
控制冒險	跳轉指令改變執行流	延遲槽/分支預測

7. 總結：電腦是如何「跑起來」的？

程式啟動後，作業系統將可執行檔案從磁碟載入到記憶體。CPU 的取指單元(IF)透過地址匯流排從記憶體讀取指令到指令暫存器(IR)。控制器對指令進行解碼(ID)，識別出操作型別後產生相應的控制訊號。運算單元(EX)執行算術邏輯運算，如果需要訪存則透過資料匯流排存取記憶體(MEM)，最後結果寫回(WB)到暫存器或記憶體。

延伸閱讀

主題	推薦深入學習內容
電腦體系結構	《電腦組成與設計：硬體/軟體介面》- Patterson & Hennessy
CPU 微架構	《深入理解電腦系統》- Bryant & O'Hallaron
指令集架構	ARMv8 架構手冊、Intel x64 手冊

下一步

作業系統：了解程式是如何在作業系統上運行的
資料的編碼、儲存與傳輸：深入理解資料在電腦中的表示方式

電腦組成原理 ​

0. 全景圖：電腦硬體系統 ​

1. 馮諾依曼架構：現代電腦的「憲法」 ​

1.1 儲存程式原理 ​

1.2 五大組成部件 ​

1.3 馮諾依曼瓶頸 ​

2. 指令系統：CPU 與軟體的介面 ​

2.1 從程式碼到指令 ​

🔗 From Code to Instructions: One Line Through the Translation Pipeline

2.2 一條指令長什麼樣？ ​

2.3 CPU 怎麼找到資料？——定址方式 ​

2.4 兩種設計哲學：CISC vs RISC ​

⚔️ Two Design Philosophies: CISC vs RISC

3. 控制器：CPU 的「指揮中心」 ​

3.1 指令週期 ​

3.2 硬接線 vs 微程式控制器 ​

4. 儲存體系：為什麼需要快取？ ​

4.1 儲存層次結構 ​

4.2 快取原理 ​

4.3 虛擬記憶體 ​

5. 匯流排與 I/O：電腦的「血管」 ​

5.1 I/O 裝置存取方式 ​

6. CPU 效能最佳化：流水線技術 ​

6.1 流水線冒險 ​

7. 總結：電腦是如何「跑起來」的？ ​

延伸閱讀 ​

下一步 ​

電腦組成原理

0. 全景圖：電腦硬體系統

1. 馮諾依曼架構：現代電腦的「憲法」

1.1 儲存程式原理

1.2 五大組成部件

1.3 馮諾依曼瓶頸

2. 指令系統：CPU 與軟體的介面

2.1 從程式碼到指令

2.2 一條指令長什麼樣？

2.3 CPU 怎麼找到資料？——定址方式

2.4 兩種設計哲學：CISC vs RISC

3. 控制器：CPU 的「指揮中心」

3.1 指令週期

3.2 硬接線 vs 微程式控制器

4. 儲存體系：為什麼需要快取？

4.1 儲存層次結構

4.2 快取原理

4.3 虛擬記憶體

5. 匯流排與 I/O：電腦的「血管」

5.1 I/O 裝置存取方式

6. CPU 效能最佳化：流水線技術

6.1 流水線冒險

7. 總結：電腦是如何「跑起來」的？

延伸閱讀

下一步