Skip to content

Nhap mon Nguyen ly Bien dich

Loi noi dau

Khi ban nhan nut "chay", lam the nao code bien thanh ket qua tren man hinh? Moi dong code ban viet, may tinh thuc chat "khong hieu" -- no chi nhan dien 0 va 1. Trinh bien dich chinh la "nguoi dich" chuyen ngon ngu con nguoi thanh ngon ngu may. Hieu nguyen ly bien dich giup ban hieu thong bao loi den tu dau, tai sao mot so ngon ngu nhanh mot so cham, va logic co ban cua toi uu hoa code.

Bai viet nay se giup ban hoc gi?

Sau khi hoc xong chuong nay, ban se co duoc:

  • Tam nhin toan cau: lam chu day chuyen bien dich hoan chinh tu ma nguon den chuong trinh thuc thi
  • Phan tich tu vung: hieu trinh bien dich chia code thanh cac Token nhu the nao
  • Phan tich ngu phap: hieu qua trinh xay dung AST (Cay Ngu phap Truu tuong)
  • Truc quan hoa AST: nhin thay truc tiep cau truc cay cua code
  • Phan tich ngu nghia va toi uu: hieu nguyen ly kiem tra kieu va toi uu hoa code
  • Thuc hanh ky thuat toi uu: lam chu cac ky thuat cot loi nhu gap lai hang so, loai bo code chet
  • Mo hinh thuc thi: phan biet giua bien dich, thong dich va JIT
ChuongNoi dungKhai niem cot loi
Chuong 1Trinh bien dich la giVi du nguoi dich, day chuyen bien dich
Chuong 2Phan tich tu vungToken, quy tac tu vung
Chuong 3Phan tich ngu phapAST, cay ngu phap, uu tien
Chuong 4Truc quan hoa ASTCay ngu phap tuong tac, loai nut
Chuong 5Phan tich ngu nghia va toi uuKiem tra kieu, gap lai hang so, loai bo code chet
Chuong 6Thuc hanh ky thuat toi uuChen inline ham, tach ngoai vong lap, truyen hang so
Chuong 7Bien dich vs Thong dich vs JITSo sanh ba mo hinh thuc thi

0. Toan canh: "Hanh trinh dich" cua code

Tuong tuong ban la mot nguoi dich, can dich mot cuon tieu thuyet Trung Quoc sang tieng Anh. Ban se khong dich tung chu mot cach khac dinh, ma:

  1. Nhan dien tu vung -- chia cau thanh tung tu (phan tich tu vung)
  2. Hieu ngu phap -- xac dinh cau truc cau co dung khong (phan tich ngu phap)
  3. Hieu ngu nghia -- dam bao y nghia lien mach, khong co mau thuan (phan tich ngu nghia)
  4. Chinh sua toi uu -- lam ban dich tu nhien va truot hon (toi uu hoa code)
  5. Xuat ban dich -- viet phien ban tieng Anh cuoi cung (tao ma code)

Trinh bien dich lam dung nhung dieu do, chi khac la no dich ngon ngu lap trinh.

Compiler Principles: The Art of TranslationHow code becomes machine instructions
A compiler is like a translator, turning human-readable code into machine-readable instructions
The Complete Code Translation Pipeline
1
Lexical analysis
Break code into individual words called tokens
int age = 25 → [int, age, =, 25]
2
Syntax analysis
Check grammar rules and build a syntax tree
Validate whether statement structure is correct
3
Semantic analysis
Check whether the meaning of the code is valid
Check variable definitions and type compatibility
4
Intermediate code generation
Generate a machine-independent intermediate representation
Generate bytecode or intermediate representation
5
Optimization
Improve code so it runs more efficiently
Constant folding and dead-code elimination
6
Target code generation
Generate machine code or target code
Generate x86 or ARM machine instructions
Lexical analysis: tokenization
int age = 25;
Keywordint
Identifierage
Operator=
Number25
Separator;
Syntax analysis: build a tree
Assignment statement
Variableage
Operator=
Number25
Compilation vs Interpretation
Compiled languages
Source code → Compiler → Machine code
C, Go, Rust
✓ Fast execution
✓ Compile once, run many times
✗ Slow compile step
Interpreted languages
Source code → Interpreter → Line-by-line execution
Python, JavaScript, PHP
✓ Fast development
✓ Cross-platform
✗ Slower execution
Compiler Optimization
Before:
x = 5 + 3 + 2
⬇️
After:
x = 10
The compiler can optimize code automatically and improve runtime efficiency

1. Day chuyen sau buoc cua trinh bien dich

Cong viec cua trinh bien dich co the chia thanh sau giai doan, nhu day chuyen san xuat trong nha may, moi giai doan xu ly xong roi truyen cho giai doan tiep theo.

How a Compiler WorksA six-step journey from source code to machine code
1
Lexical analysis→ Token stream
2
Syntax analysis→ AST syntax tree
3
Semantic analysis→ Typed AST
4
Intermediate code generation→ IR (intermediate representation)
5
Code optimization→ Optimized IR
6
Target code generation→ Machine code
1Lexical analysisOutput: Token stream
Split source code into individual words called tokens, like recognizing each word in a sentence.
Recognize keywordsRecognize identifiersRecognize numbersRecognize operatorsFilter whitespace
int x = 10 + 5;
→ [int] [x] [=] [10] [+] [5] [;]
    keyword identifier operator number operator number separator
Live lexical analysis
intKeyword
xIdentifier
=Operator
10Number
+Operator
5Number
;Separator
Three Execution Models Compared
Compiled
Source Compiler Machine code CPU execution
Fast executionMust wait for compilation
C, C++, Rust, Go
Interpreted
Source Interpreter Line-by-line execution
Run immediately while writingSlower execution
Python, Ruby, PHP
JIT
Source Bytecode JIT hot path compilation Execution
Balances performance and flexibilitySlower startup
Java, JavaScript (V8)
Core idea:A compiler is like a translator: it gradually turns human-readable code into instructions the machine can run. The six stages each do one job: identify words → understand syntax → check meaning → generate IR → optimize → generate machine code.

Day chuyen bien dich

  1. Phan tich tu vung (Lexical Analysis): chia ma nguon thanh cac Token (tu vung)
  2. Phan tich ngu phap (Syntax Analysis): to chuc Token thanh cay ngu phap (AST)
  3. Phan tich ngu nghia (Semantic Analysis): kiem tra kieu co dung khong, bien da duoc khai bao chua
  4. Tao ma trung gian (IR Generation): tao bieu dien trung gian doc lap voi nen tang
  5. Toi uu hoa code (Optimization): lam ma trung gian hieu qua hon
  6. Tao ma code (Code Generation): tao ma may cho nen tang dich
Giai doanDau vaoDau raVi du
Phan tich tu vungLuong ky tu ma nguonLuong TokenChia cau thanh tu
Phan tich ngu phapLuong TokenAST (cay ngu phap)Phan tich cau truc cau
Phan tich ngu nghiaASTAST co kieuKiem tra y nghia co dung khong
Ma trung gianAST co kieuIRViet ban nhap
Toi uu hoa codeIRIR da toi uuChinh sua va giam bot
Tao ma codeIR da toi uuMa mayXuat ban cuoi

2. Phan tich tu vung: chia code thanh "tu vung"

Phan tich tu vung la buoc dau tien cua bien dich. Trinh bien dich quet tung ky tu ma nguon tu trai sang phai, gop chung thanh cac Token (don vi tu vung) co y nghia.

🔤 Lexer: Split Code into Tokens

Enter a line of code and see lexical analysis results in real time

Giong nhu khi ban doc cau tieng Anh, oc cua ban tu dong gop cac chu cai thanh tu, phan tich tu vung gop cac ky tu thanh Token:

Ma nguon: let x = 10 + 5;

Luong Token:
[let]   → Tu khoa (tu duoc giu cua ngon ngu)
[x]     → Danh hieu (ten bien)
[=]     → Toan tu (gan)
[10]    → Gia tri so nguyen
[+]     → Toan tu (cong)
[5]     → Gia tri so nguyen
[;]     → Phan cach (ket thuc cau lenh)

Nam loai Token

  • Tu khoa: cac tu dac biet duoc giu cua ngon ngu, nhu let, if, return, function
  • Danh hieu: ten do lap trinh vien dinh nghia, nhu ten bien, ten ham
  • Gia tri nguyen ban: gia tri viet truc tiep trong code, nhu so 42, chuoi "hello"
  • Toan tu: ky hieu thuc hien phep toan, nhu +, -, =, ===
  • Phan cach: ky hieu phan tach cau truc code, nhu ;, ,, (, )

3. Phan tich ngu phap: xay dung cay ngu phap (AST)

Phan tich tu vung chia code thanh Token, nhung Token chi la nhung "tu vung" rieng re. Nhiem vu cua phan tich ngu phap la to chuc cac Token nay theo quy tac ngu phap thanh mot Cay Ngu phap Truu tuong (Abstract Syntax Tree, AST) -- phan anh cau truc cua code va thu tu uu tien phep toan.

Bieu thuc: 1 + 2 * 3

Cay ngu phap:        Tai sao nhu vay?
       +       Vi muc uu tien cua *
      / \      cao hon +, nen
     1   *     2 * 3 duoc ket hop truoc
        / \    thanh mot cay con
       2   3

Tam quan trong cua AST

AST la "cau truc du lieu cot loi" cua trinh bien dich; phan tich ngu nghia, toi uu va tao ma ve sau deu dua tren no. Cong cu phat trien hien dai cung su dung AST rat nhieu:

  • ESLint: phan tich code thanh AST, kiem tra co vi pham quy tac khong
  • Prettier: phan tich thanh AST roi dinh dang lai dau ra
  • Babel: phan tich AST → chuyen doi → tao ma tuong thich
  • Cau truc lai trong IDE: doi ten bien an toan, trich xuat ham dua tren AST
Cau truc ngu phapTrinh tu TokenNut AST
Khai bao bienlet x = 10VariableDeclaration → Identifier + Literal
Goi hamadd ( 1 , 2 )CallExpression → Identifier + Arguments
Cau dieu kienif ( a > b )IfStatement → BinaryExpression + Block

4. Truc quan hoa AST: nhin thay "khung xuong" cua code

O tren chung ta da mo ta cau truc AST bang van ban, nhung "nhin thay" truc quan hon "doc duoc". Thanh phan tuong tac duoi day cho phep ban chon cac bieu thuc khac nhau, quan sat truc tiep cay ngu phap cua chung nhu the nao.

🌳 AST Visualizer: See the Skeleton of Code

Choose an expression and inspect its abstract syntax tree

Syntax tree
BinaryExpression+
NumericLiteral1
BinaryExpression*
NumericLiteral2
NumericLiteral3
Parse notes
1* has higher precedence than +, so 2 * 3 groups first
22 * 3 forms a BinaryExpression subtree
31 and that subtree become the left and right operands of +
4The final + node is the root, showing the evaluation order
💡 Try AST Explorer — inspect ASTs for arbitrary code online

Thong qua truc quan hoa ban se phat hien ra rang, cac quy luat cot loi cua AST thuc ra rat don gian:

Cau truc codeNut goc ASTNut con
1 + 2 * 3BinaryExpression (+)Trai: NumericLiteral(1), Phai: BinaryExpression(*)
let x = 10VariableDeclarationVariableDeclarator → Identifier(x) + NumericLiteral(10)
add(a, b)CallExpressionIdentifier(add) + Arguments(a, b)

Ung dung cua AST trong phat trien hang ngay

Ban co the chua tung viet trinh bien dich truc tiep, nhung ban dang dung cong cu dua tren AST moi ngay:

  • ESLint / Prettier: phan tich code thanh AST, kiem tra quy tac hoac dinh dang lai
  • Babel / SWC: phan tich AST → chuyen doi ngu phap → tao ma tuong thich
  • Cau truc lai trong IDE: doi ten an toan, trich xuat ham dua tren AST
  • Tree-shaking: phan tich import/export trong AST, xoa code khong su dung

5. Phan tich ngu nghia va toi uu hoa code

Phan tich ngu phap dam bao code "dung cau truc", nhung dung cau truc khong co nghia la "dung y nghia". Phan tich ngu nghia kiem tra y nghia cua code co hop le khong, con toi uu hoa code lam chuong trinh chay nhanh hon.

Compilation PracticeFrom code to executable file
Input code
Compilation steps
1
Preprocess
gcc -E hello.c -o hello.i
Process #include and expand macros
2
Compile
gcc -S hello.i -o hello.s
Generate assembly code
3
Assemble
gcc -c hello.s -o hello.o
Generate object file
4
Link
gcc hello.o -o hello
Generate executable file
Generated files
📄
hello.c
Source code file
📝
hello.i
Preprocessed file
⚙️
hello.s
Assembly code file
📦
hello.o
Object file
🚀
hello
Executable file
Common compiler tools
GCC
GNU Compiler Collection
Clang
LLVM C/C++ compiler
MSVC
Microsoft Visual C++

4.1 Phan tich ngu nghia: kiem tra "y nghia" co dung khong

Noi dung kiem traVi duKet qua
Kiem tra kieuint x = "hello"Kieu khong khop
Kiem tra pham viSu dung bien chua khai bao yBien khong ton tai
Sue dien kieu1 + 2.0Sue dien ket qua la float
Kiem tra tham soadd(1, 2, 3) nhung ham chi nhan 2 tham soSo luong tham so khong khop

Cac loi ban gap, phan lon den tu phan tich ngu nghia

  • TypeError: Cannot read properties of undefined — kiem tra kieu
  • ReferenceError: x is not defined — kiem tra pham vi
  • Expected 2 arguments, but got 3 — kiem tra tham so

4.2 Toi uu hoa code: lam chuong trinh nhanh hon

Truoc khi tao ma cuoi cung, trinh bien dich thuc hien cac toi uu hoa khac nhau cho ma trung gian. Cac toi uu hoa nay trong suot voi lap trinh vien, nhung co the cai thien hieu suat dang ke.

Ky thuat toi uuTruocSauNguyen ly
Gap lai hang sox = 10 + 5x = 15Tinh ket qua truc tiep khi bien dich
Loai bo code chetif (false) { ... }Xoa truc tiepCode khong bao gio thuc thi
Truyen hang sox = 15; y = x * 2y = 30Thay the truc tiep bang gia tri da biet
Tach bien ngoai vong lapTinh toan lap lai len = arr.length trong vong lapTach ra ngoai vong lapTranh tinh toan lap lai

6. Thuc hanh ky thuat toi uu: trinh bien dich lam code nhanh hon nhu the nao

O tren chung ta da nhac den ten cua mot so ky thuat toi uu, bay gio hay xem chi tiet trinh bien dich lam cu the nhu the nao. Thanh phan tuong tac duoi day hien thi 5 toi uu hoa trinh bien dich pho bien nhat, ban co the so sanh truc quan su khac biet truoc va sau toi uu.

⚡ Compiler Optimization: Make Code Faster Automatically

Choose an optimization technique and see how the compiler improves code

📝 Before optimization
const width = 10
const height = 20
const area = width * height  // computed at runtime
console.log(area)
Compiler optimization
🚀 After optimization
const area = 200  // computed during compilation
console.log(200)
How Constant folding works
The compiler sees that width and height are constants, so it computes 10 * 20 = 200 during compilation. Runtime no longer needs a multiplication.
Performance gain:
30%

Cac trinh bien dich hien dai va dong co JIT (nhu V8, GCC, LLVM) tu dong ap dung hang chuc ky thuat toi uu. La mot lap trinh vien, ban khong can lam cac toi uu nay thu cong, nhung hieu chung giup ban:

  • Viet code de toi uu hon: vi du, dung const thay vi let, trinh bien dich de gap lai hang so hon
  • Hieu su khac biet hieu suat: tai sao ham nho nhanh hon ham lon? Vi trinh bien dich co the chen inline chung
  • Tranh "giai toi uu": mot so cach viet ngan can toi uu cua trinh bien dich, nhu eval() va with
Ky thuat toi uuDieu kien kich hoatTac dong hieu suatLap trinh vien co the lam gi
Gap lai hang soBieu thuc toan hang soLoai bo tinh toan luc chaySu dung nhieu khai bao const
Loai bo code chetCode khong the den hoac ket qua khong su dungGiam kich thuoc codeDon sach code khong su dung kip thoi
Tach bien ngoai vong lapTinh toan khong doi trong vong lapGiam tinh toan lap laiTach thu cong cung la thoi quen tot
Chen inline hamHam nho duoc goi thuong xuyenLoai bo chi phi goi hamGiu ham nho va tap trung
Truyen hang soGia tri bien xac dinh duoc khi bien dichCa day tinh toan bi loai boDung hang so thay vi so than bien

7. Bien dich vs Thong dich vs JIT

Sau khi viet xong code, co ba "cach dich" de chay no. Ba cach nay co uu nhuoc diem rieng, quyet dinh truc tiep dac tinh hieu suat va truong hop su dung cua ngon ngu.

🔄 Compiled vs Interpreted vs JIT

Click an execution mode to see how code moves from source to running program

📝
Source code
main.c
⚙️
Compiler
Full compilation
📦
Machine code
Binary executable
🚀
Run directly
CPU runs it directly
Run speed
Very fast
Startup
Slow; compile first
Portability
Recompile required
Representative languages:CC++RustGo
ChieuBien dichThong dichJIT (Bien dich luc chay)
Qua trinhBien dich toan bo thanh ma may truoc, roi chayDoc va chay tung dong, dich truc tuyenThong dich truoc, roi bien dich ma nong
Toc do chayNhanh nhatCham nhatTrung binh (ma nong gan bang bien dich)
Toc do khoi dongCham (can bien dich)Nhanh (chay truc tiep)Trung binh (can lam nong)
Da nen tangCan bien dich laiTu nhien da nen tangDa nen tang
Ngon ngu dai dienC, Rust, GoPython, RubyJavaScript (V8), Java

Vi sao JavaScript nhanh nhu vay?

Trinh bien dich JIT cua dong co V8 theo doi doan code nao duoc thuc thi thuong xuyen (ma nong), roi bien dich no thanh ma may duoc toi uu cao. Vi vay mac dau JavaScript la "ngon ngu thong dich", nhung trong V8 hieu suat cua no co the gan bang ngon ngu bien dich. Day cung la co so de Node.js co the lam may chu.


Tom tat

Nguyen ly bien dich khong phai la kien thuc chi danh cho nhung nguoi phat trien trinh bien dich. Hieu qua trinh bien dich giup ban hieu tot hon thong bao loi, chon ngon ngu phu hop va viet code hieu qua hon.

Tom tat cac diem cot loi cua chuong nay:

  1. Trinh bien dich la nguoi dich: chuyen code doc duoc boi con nguoi thanh lenh thuc thi duoc boi may
  2. Day chuyen sau buoc: phan tich tu vung → phan tich ngu phap → phan tich ngu nghia → ma trung gian → toi uu → tao ma
  3. Phan tich tu vung chia Token: chia luong ky tu thanh don vi co y nghia nhu tu khoa, danh hieu, toan tu
  4. Phan tich ngu phap xay dung AST: to chuc Token thanh cau truc cay theo quy tac ngu phap, phan anh thu tu uu tien phep toan
  5. Phan tich ngu nghia dam bao dung: kiem tra kieu, kiem tra pham vi; phan lon loi ban gap deu den tu day
  6. Trinh bien dich tu dong toi uu: cac ky thuat nhu gap lai hang so, loai bo code chet, chen inline ham lam code tu dong nhanh hon
  7. Ba mo hinh thuc thi: bien dich nhanh nhat, thong dich linh hoat nhat, JIT ket hop ca hai

Doc them