編譯原理入門

前言

當你按下「執行」按鈕，程式碼是怎麼變成螢幕上的結果的？ 你寫的每一行程式碼，電腦其實都「看不懂」——它只認識 0 和 1。編譯器就是那個把人類語言翻譯成機器語言的「翻譯官」。理解編譯原理，你就能理解報錯資訊從哪來、為什麼有些語言快有些慢、以及程式碼最佳化的底層邏輯。

這篇文章會帶你學什麼？

學完這章後，你將獲得：

全域視野：掌握從原始碼到可執行程式的完整編譯流水線
詞法分析：理解編譯器如何把程式碼拆成一個個 Token
語法分析：理解 AST（抽象語法樹）的建構過程
AST 視覺化：直觀看到程式碼的樹形結構
語義分析與最佳化：理解型別檢查和程式碼最佳化的原理
最佳化技術實戰：掌握常數摺疊、死程式碼消除等核心最佳化手段
執行模型：區分編譯型、直譯型和 JIT 三種執行方式

章节	內容	核心概念
第 1 章	編譯器是什麼	翻譯官類比、編譯流水線
第 2 章	詞法分析	Token、詞法規則
第 3 章	語法分析	AST、語法樹、優先級
第 4 章	AST 視覺化	互動式語法樹、節點型別
第 5 章	語義分析與最佳化	型別檢查、常數摺疊、死程式碼消除
第 6 章	最佳化技術實戰	函式內聯、迴圈外提、常數傳播
第 7 章	編譯型 vs 直譯型 vs JIT	三種執行模型對比

0. 全景圖：程式碼的「翻譯之旅」

想象你是一個翻譯官，要把一本中文小說翻譯成英文。你不會一個字一個字地直譯，而是：

辨識詞語 — 把句子拆成一個個詞（詞法分析）
理解句法 — 判斷句子結構是否正確（語法分析）
理解語義 — 確保意思通順、沒有矛盾（語義分析）
潤色最佳化 — 讓譯文更地道流暢（程式碼最佳化）
輸出譯文 — 寫出最終的英文版本（程式碼產生）

編譯器做的事情完全一樣，只不過它翻譯的是程式語言。

Compiler Principles: The Art of TranslationHow code becomes machine instructions

A compiler is like a translator, turning human-readable code into machine-readable instructions

The Complete Code Translation Pipeline

Lexical analysis

Break code into individual words called tokens

int age = 25 → [int, age, =, 25]

→

Syntax analysis

Check grammar rules and build a syntax tree

Validate whether statement structure is correct

→

Semantic analysis

Check whether the meaning of the code is valid

Check variable definitions and type compatibility

→

Intermediate code generation

Generate a machine-independent intermediate representation

Generate bytecode or intermediate representation

→

Optimization

Improve code so it runs more efficiently

Constant folding and dead-code elimination

→

Target code generation

Generate machine code or target code

Generate x86 or ARM machine instructions

Lexical analysis: tokenization

int age = 25;

↓

Keywordint

Identifierage

Operator=

Number25

Separator;

Syntax analysis: build a tree

Assignment statement

Variableage

Operator=

Number25

Compilation vs Interpretation

Compiled languages

Source code → Compiler → Machine code

C, Go, Rust

✓ Fast execution

✓ Compile once, run many times

✗ Slow compile step

Interpreted languages

Source code → Interpreter → Line-by-line execution

Python, JavaScript, PHP

✓ Fast development

✓ Cross-platform

✗ Slower execution

Compiler Optimization

Before:

x = 5 + 3 + 2

⬇️

After:

x = 10

The compiler can optimize code automatically and improve runtime efficiency

1. 編譯器的六步流水線

編譯器的工作可以分為六個階段，像工廠流水線一樣，每個階段處理完交給下一個階段。

How a Compiler WorksA six-step journey from source code to machine code

Input code:

Lexical analysis→ Token stream

Syntax analysis→ AST syntax tree

Semantic analysis→ Typed AST

Intermediate code generation→ IR (intermediate representation)

Code optimization→ Optimized IR

Target code generation→ Machine code

1Lexical analysisOutput: Token stream

Split source code into individual words called tokens, like recognizing each word in a sentence.

Recognize keywordsRecognize identifiersRecognize numbersRecognize operatorsFilter whitespace

int x = 10 + 5;
→ [int] [x] [=] [10] [+] [5] [;]
    keyword identifier operator number operator number separator

Live lexical analysis

intKeyword

xIdentifier

=Operator

10Number

+Operator

5Number

;Separator

Three Execution Models Compared

Compiled

Source →Compiler →Machine code →CPU execution

Fast executionMust wait for compilation

C, C++, Rust, Go

Interpreted

Source →Interpreter →Line-by-line execution

Run immediately while writingSlower execution

Python, Ruby, PHP

JIT

Source →Bytecode →JIT hot path compilation →Execution

Balances performance and flexibilitySlower startup

Java, JavaScript (V8)

Core idea:A compiler is like a translator: it gradually turns human-readable code into instructions the machine can run. The six stages each do one job: identify words → understand syntax → check meaning → generate IR → optimize → generate machine code.

編譯流水線

詞法分析（Lexical Analysis）：把原始碼拆成一個個 Token（單詞）
語法分析（Syntax Analysis）：把 Token 組織成語法樹（AST）
語義分析（Semantic Analysis）：檢查型別是否正確、變數是否宣告
中間程式碼產生（IR Generation）：產生與平台無關的中間表示
程式碼最佳化（Optimization）：讓中間程式碼更高效
程式碼產生（Code Generation）：產生目標平台的機器碼

階段	輸入	輸出	類比
詞法分析	原始碼字元流	Token 流	把句子拆成單詞
語法分析	Token 流	AST（語法樹）	分析句子結構
語義分析	AST	帶型別的 AST	檢查意思是否通順
中間程式碼	帶型別的 AST	IR	寫出初稿
程式碼最佳化	IR	最佳化後的 IR	潤色刪減
程式碼產生	最佳化後的 IR	機器碼	輸出終稿

2. 詞法分析：把程式碼拆成「單詞」

詞法分析是編譯的第一步。編譯器從左到右掃描原始碼的每個字元，把它們組合成有意義的 Token（詞法單元）。

🔤 Lexer: Split Code into Tokens

Enter a line of code and see lexical analysis results in real time

就像讀英文句子時，你的大腦會自動把字母組合成單詞一樣，詞法分析器把字元組合成 Token：

原始碼: let x = 10 + 5;

Token 流:
[let]   → 關鍵字（語言保留字）
[x]     → 識別符（變數名）
[=]     → 運算子（賦值）
[10]    → 數字字面量
[+]     → 運算子（加法）
[5]     → 數字字面量
[;]     → 分隔符（敘述結束）

Token 的五大型別

關鍵字：語言保留的特殊單詞，如 let、if、return、function
識別符：程式設計師定義的名稱，如變數名、函式名
字面量：直接寫在程式碼裡的值，如數字 42、字串 "hello"
運算子：執行運算的符號，如 +、-、=、===
分隔符：分隔程式碼結構的符號，如 ;、,、(、)

3. 語法分析：建構語法樹（AST）

詞法分析把程式碼拆成了 Token，但 Token 只是一個個孤立的「單詞」。語法分析的任務是把這些 Token 按照語法規則組織成一棵 抽象語法樹（Abstract Syntax Tree, AST）——它反映了程式碼的結構和運算優先級。

運算式: 1 + 2 * 3

語法樹:        為什麼這樣？
       +       因為 * 的優先級
      / \      高於 +，所以
     1   *     2 * 3 先結合
        / \    成為一個子樹
       2   3

AST 的重要性

AST 是編譯器的「核心資料結構」，後續的語義分析、最佳化、程式碼產生都基於它進行。現代開發工具也大量使用 AST：

ESLint：解析程式碼為 AST，檢查是否違反規則
Prettier：解析為 AST 後重新格式化輸出
Babel：解析 AST → 轉換 → 產生相容程式碼
IDE 重構：基於 AST 進行安全的變數重新命名、函式提取

語法結構	Token 序列	AST 節點
變數宣告	`let` `x` `=` `10`	VariableDeclaration → Identifier + Literal
函式呼叫	`add` `(` `1` `,` `2` `)`	CallExpression → Identifier + Arguments
條件敘述	`if` `(` `a` `>` `b` `)`	IfStatement → BinaryExpression + Block

4. AST 視覺化：看見程式碼的「骨架」

上面我們用文字描述了 AST 的結構，但「看到」比「讀到」更直觀。下面的互動元件讓你選擇不同的運算式，即時觀察它們的語法樹長什麼樣。

🌳 AST Visualizer: See the Skeleton of Code

Choose an expression and inspect its abstract syntax tree

Syntax tree

BinaryExpression+

NumericLiteral1

BinaryExpression*

NumericLiteral2

NumericLiteral3

Parse notes

1* has higher precedence than +, so 2 * 3 groups first

22 * 3 forms a BinaryExpression subtree

31 and that subtree become the left and right operands of +

4The final + node is the root, showing the evaluation order

💡 Try AST Explorer — inspect ASTs for arbitrary code online

透過視覺化你會發現，AST 的核心規律其實很簡單：

程式碼結構	AST 根節點	子節點
`1 + 2 * 3`	BinaryExpression (+)	左: NumericLiteral(1)，右: BinaryExpression(*)
`let x = 10`	VariableDeclaration	VariableDeclarator → Identifier(x) + NumericLiteral(10)
`add(a, b)`	CallExpression	Identifier(add) + Arguments(a, b)

AST 在日常開發中的應用

你可能沒直接寫過編譯器，但你每天都在用基於 AST 的工具：

ESLint / Prettier：解析程式碼為 AST，檢查規則或重新格式化
Babel / SWC：解析 AST → 轉換語法 → 產生相容程式碼
IDE 重構：基於 AST 做安全的重新命名、提取函式
Tree-shaking：分析 AST 中的 import/export，刪除未使用的程式碼

5. 語義分析與程式碼最佳化

語法分析確保程式碼「結構正確」，但結構正確不代表「意思正確」。語義分析負責檢查程式碼的含義是否合法，程式碼最佳化則讓程式跑得更快。

Compilation PracticeFrom code to executable file

Input code

#include <stdio.h>

int main() {
    printf("Hello, World!\n");
    return 0;
}

Compilation steps

Preprocess

gcc -E hello.c -o hello.i

Process #include and expand macros

Compile

gcc -S hello.i -o hello.s

Generate assembly code

Assemble

gcc -c hello.s -o hello.o

Generate object file

Link

gcc hello.o -o hello

Generate executable file

Generated files

📄

hello.c

Source code file

📝

hello.i

Preprocessed file

⚙️

hello.s

Assembly code file

📦

hello.o

Object file

🚀

hello

Executable file

Common compiler tools

GCC

GNU Compiler Collection

Clang

LLVM C/C++ compiler

MSVC

Microsoft Visual C++

4.1 語義分析：檢查「意思」對不對

檢查內容	範例	結果
型別檢查	`int x = "hello"`	❌ 型別不匹配
作用域檢查	使用未宣告的變數 `y`	❌ 變數不存在
型別推斷	`1 + 2.0`	✅ 推斷結果為 float
參數檢查	`add(1, 2, 3)` 但函式只接受 2 個參數	❌ 參數數量不匹配

你見過的報錯，大多來自語義分析

TypeError: Cannot read properties of undefined — 型別檢查
ReferenceError: x is not defined — 作用域檢查
Expected 2 arguments, but got 3 — 參數檢查

4.2 程式碼最佳化：讓程式更快

編譯器在產生最終程式碼前，會對中間程式碼做各種最佳化。這些最佳化對程式設計師透明，但能顯著提升效能。

最佳化技術	最佳化前	最佳化後	原理
常數摺疊	`x = 10 + 5`	`x = 15`	編譯時直接算出結果
死程式碼消除	`if (false) { ... }`	直接刪除	永遠不會執行的程式碼
常數傳播	`x = 15; y = x * 2`	`y = 30`	已知值直接替換
迴圈不變量外提	迴圈內重複計算 `len = arr.length`	提到迴圈外	避免重複計算

6. 最佳化技術實戰：編譯器如何讓程式碼更快

上面我們提到了幾種最佳化技術名稱，現在來深入看看編譯器具體是怎麼做的。下面的互動元件展示了 5 種最常見的編譯器最佳化，你可以直觀對比最佳化前後的程式碼差異。

⚡ Compiler Optimization: Make Code Faster Automatically

Choose an optimization technique and see how the compiler improves code

📝 Before optimization

const width = 10
const height = 20
const area = width * height  // computed at runtime
console.log(area)

→Compiler optimization

🚀 After optimization

const area = 200  // computed during compilation
console.log(200)

How Constant folding works

The compiler sees that width and height are constants, so it computes 10 * 20 = 200 during compilation. Runtime no longer needs a multiplication.

Performance gain:

30%

現代編譯器和 JIT 引擎（如 V8、GCC、LLVM）會自動應用數十種最佳化。作為開發者，你不需要手動做這些最佳化，但理解它們能幫你：

寫出更容易被最佳化的程式碼：比如用 const 而不是 let，編譯器更容易做常數摺疊
理解效能差異：為什麼小函式比大函式快？因為編譯器能內聯它們
避免「反最佳化」：某些寫法會阻止編譯器最佳化，比如 eval() 和 with

最佳化技術	觸發條件	效能影響	開發者能做什麼
常數摺疊	運算式中全是常數	消除執行時計算	多用 const 宣告
死程式碼消除	程式碼不可達或結果未使用	減小程式碼體積	及時清理無用程式碼
迴圈不變量外提	迴圈內有不變的計算	減少重複計算	手動提取也是好習慣
函式內聯	小函式被頻繁呼叫	消除呼叫開銷	保持函式小而專注
常數傳播	變數值在編譯時可確定	整條計算鏈被消除	用常數代替魔法數字

7. 編譯型 vs 直譯型 vs JIT

程式碼寫完後，有三種「翻譯方式」讓它執行起來。這三種方式各有優劣，直接決定了語言的效能特徵和使用場景。

🔄 Compiled vs Interpreted vs JIT

Click an execution mode to see how code moves from source to running program

📝

Source code

main.c

→

⚙️

Compiler

Full compilation

→

📦

Machine code

Binary executable

→

🚀

Run directly

CPU runs it directly

Run speed

Very fast

Startup

Slow; compile first

Portability

Recompile required

Representative languages:CC++RustGo

維度	編譯型	直譯型	JIT 即時編譯
過程	先全量編譯成機器碼，再執行	邊讀邊執行，逐行翻譯	先直譯執行，熱點程式碼再編譯
執行速度	最快	最慢	中等（熱點接近編譯型）
啟動速度	慢（需要編譯）	快（直接執行）	中等（需要預熱）
跨平台	需要重新編譯	天然跨平台	跨平台
代表語言	C, Rust, Go	Python, Ruby	JavaScript (V8), Java

為什麼 JavaScript 這麼快？

V8 引擎的 JIT 編譯器會監測哪些程式碼被頻繁執行（熱點程式碼），然後把它們編譯成高度最佳化的機器碼。所以雖然 JavaScript 是「直譯型語言」，但在 V8 中它的效能可以接近編譯型語言。這也是 Node.js 能做伺服器端的底氣。

總結

編譯原理不是只有編譯器開發者才需要了解的知識。理解編譯流程，能幫你更好地理解報錯資訊、選擇合適的語言、寫出更高效的程式碼。

回顧本章的關鍵要點：

編譯器是翻譯官：把人類可讀的程式碼翻譯成機器可執行的指令
六步流水線：詞法分析 → 語法分析 → 語義分析 → 中間程式碼 → 最佳化 → 程式碼產生
詞法分析拆 Token：把字元流拆成關鍵字、識別符、運算子等有意義的單元
語法分析建 AST：按語法規則把 Token 組織成樹形結構，反映運算優先級
語義分析保正確：型別檢查、作用域檢查，你見過的大多數報錯都來自這裡
編譯器自動最佳化：常數摺疊、死程式碼消除、函式內聯等技術讓程式碼自動變快
三種執行模型：編譯型最快、直譯型最靈活、JIT 兼顧兩者

編譯原理入門 ​

0. 全景圖：程式碼的「翻譯之旅」 ​

1. 編譯器的六步流水線 ​

2. 詞法分析：把程式碼拆成「單詞」 ​

🔤 Lexer: Split Code into Tokens

3. 語法分析：建構語法樹（AST） ​

4. AST 視覺化：看見程式碼的「骨架」 ​

🌳 AST Visualizer: See the Skeleton of Code

5. 語義分析與程式碼最佳化 ​

4.1 語義分析：檢查「意思」對不對 ​

4.2 程式碼最佳化：讓程式更快 ​

6. 最佳化技術實戰：編譯器如何讓程式碼更快 ​