Skip to content

From Transistors to CPU

Preface

How does a computer "think"? You probably know the CPU is the computer's "brain," but how does this brain actually work? How does it go from a pile of metal and plastic to a smart device that can execute programs and process data? This chapter takes you from the most fundamental transistors, step by step, to understand the principles behind CPU construction.

What will you learn from this article?

After completing this chapter, you will gain:

  • Terminology understanding: When hearing "CPU clock speed," "multi-core," or "instruction set," you won't be confused — you'll understand the physical principles behind them
  • Code execution perspective: See how a line of code goes through fetch, decode, execute, and write-back to ultimately become pixels on screen
  • Abstraction layer thinking: Understand how each layer provides services to the layer above while hiding the complexity of the layer below
  • Foundation for further learning: Build a foundation for computer architecture, embedded development, and performance optimization
ChapterContentCore Concepts
Chapter 1TransistorsSwitches of the digital world
Chapter 2Logic GatesPhysical implementation of Boolean operations
Chapter 3Functional UnitsAdders, registers, multiplexers
Chapter 4CPU CoreFetch, decode, execute, write-back

0. Big Picture: From Sand to Intelligence

In exploring the底层 of computers, one fundamental question often arises: where does the "thinking" ability of modern computers actually come from?

If we peel away a computer's shiny casing, we typically see just a pile of metal, plastic, and silicon wafers. They have no life of their own, understand no mathematics, and know nothing of intelligence. But when electrical current flows through them, everything begins to operate. Ultimately, it all stems from one deceptively simple physical abstraction: the switch.

Imagine a switch that controls a light bulb. Press it and the light turns on, representing "1"; release it and the light turns off, representing "0." If we had billions of such switches and could make the output of one switch control another switch, thereby combining them into an incredibly complex logical network, what would happen?

The answer is a universal computing platform capable of executing arbitrary logic. The key to understanding computer systems lies in "abstraction." Like building blocks, we control underlying complexity through layered encapsulation. Here are the four core levels from sand to intelligence:

Layer-by-Layer Breakdown: From Sand to Intelligence

  • Layer 1: Transistors (tens of billions) These are the底层 "switches." Modern CPUs primarily use MOSFETs (Metal-Oxide-Semiconductor Field-Effect Transistors). Applying voltage to the gate allows current to flow between source and drain. This is the physical starting point of "using electricity to control electricity." The core problem it solves: how to use one electrical signal to control another electrical signal?

  • Layer 2: Logic Gates (billions) When we connect specific transistors in series or parallel, a remarkable transformation occurs — circuits become mathematics. For example, an AND gate requires both inputs to be 1 for the output to be 1. This constitutes a mapping of Boolean algebra onto physical circuits. The core problem it solves: how to transform physical on/off states into logical operations based on 0s and 1s?

  • Layer 3: Functional Units (hundreds) By assembling basic logic gates, we can build computing modules with specific purposes. Adders handle arithmetic, multiplexers control data flow, and registers give circuits the ability to remember. The core problem it solves: how to construct machines capable of performing addition calculations and remembering states?

  • Layer 4: CPU Core (1-128 cores) This is the command center of the entire microarchitecture. When you write a line of code, the various components inside the CPU work in coordination billions of times per second, executing the entire process of fetch, decode, execute, and write-back. The core problem it solves: how to make all modules work in unison to automatically execute a specified program sequence?


1. Transistors: Switches of the Digital World

Let's start from the microscopic world. The component below demonstrates the basic principles of a transistor. Try operating it and observe how current flows:

MOSFET transistor diagram -- click to toggle Gate voltage
Source
Source
Gate0
Open -> output 0
Drain
Drain
👆 Click to toggle Gate voltage

1.1 What Is a Transistor?

Concept Introduction

In engineering, a transistor is a semiconductor device that changed the course of human history. In the context of digital circuits, we can directly abstract it as a perfect "switch."

Why do we need transistors? Think of a water faucet in daily life. You turn the valve with your hand, and water flows out. A transistor is essentially a nanoscale water faucet:

  • The source and drain are like the two ends of a water pipe.
  • The gate is the valve that controls the water flow.

The key difference is: we don't turn the switch by hand, but with voltage signals. When one switch can be controlled by electrical signals produced by another switch, we've crossed the enormous chasm from "manual intervention" to "automatic computation."

1.2 How Do Transistors Represent 0 and 1?

You might ask: what does the computer's so-called "only understanding 0s and 1s" actually look like in the physical world? Are there really tiny 0s and 1s flowing through chips?

Of course not. It all relies on human-defined abstraction conventions. We abandon the obsession with continuous analog signals and set two extreme thresholds:

  • We define high voltage (e.g., 3.3V or 1.0V) as logical 1 (True).
  • We define low voltage (near 0V) as logical 0 (False).

This is the so-called digital abstraction: we take the noisy analog world and cleanly slice it into crisp 0s and 1s. When the gate receives high voltage, the transistor conducts — the switch is closed; when the gate receives low voltage, the switch opens.

1.3 The Evolution of Transistor Counts

A single transistor can only control on/off, which seems incredibly insignificant. But what if we combine billions of such switches? Look at the table below reflecting Moore's Law to understand the development of modern chips.

MilestoneProcessor ChipTransistor CountProcess NodeHistorical Significance
1971Intel 40042,30010μmDawn of the microprocessor
1993Intel Pentium3.1 million800nmPCs become universal
2006Intel Core 2 Duo291 million65nmMulti-core architecture goes mainstream
2020Apple M116 billion5nmMobile architecture revolution feeds back to desktop
2023Apple M3 Max92 billion3nmApproaching the physical limits of atoms

Deep dive: What is "3nm"? When we hear about 5nm or 3nm in the news, imagine how incredibly tiny that is. A silicon atom has a diameter of about 0.2 nanometers. So at the 3nm process node, the most critical structures of a transistor are only a few dozen atoms wide! This means we're building humanity's largest computing fortresses at the very edge of where quantum mechanics takes effect.


2. Logic Gates: Computing with Switches

2.1 From Transistors to Logic Gates

As mentioned earlier, a single transistor is just a simple control over current flow. But when you arrange multiple transistors in specific patterns, physics transforms into mathematical logic. In this new dimension, we no longer talk about cumbersome voltages and currents, but directly about pure logical "true" (1) and "false" (0).

Please experience the effect of combining switches through the logic gate demo below:

Four Basic Logic GatesThe building blocks of all digital computing
ANDAND gate
Operation:A ∧ B
Outputs 1 only when both inputs are 1
Series switches: both switches must be closed
Truth table
ABOutput
000
010
100
111
OROR gate
Operation:A ∨ B
Outputs 1 when at least one input is 1
Parallel switches: either switch can close the circuit
Truth table
ABOutput
000
011
101
111
NOTNOT gate
Operation:¬A
Inverts the input: 0 becomes 1, 1 becomes 0
Inverter: on becomes off, off becomes on
Truth table
AOutput
01
10
XORXOR gate
Operation:A ⊕ B
Outputs 1 only when the two inputs are different
Difference detector: different means true
Truth table
ABOutput
000
011
101
110
Core idea: Logic gates turn physical circuit on/off states into mathematical true/false operations. They are the bridge from hardware to software logic.

2.2 Basic Logic Gates

In our computer architecture, there are several fundamental logic gates. All supercomputers are built from these building blocks:

  • AND Gate:

    • Rule: Output is 1 only when all inputs are 1.
    • Intuitive understanding: Connect two transistors in series. Current can only pass through if both barriers are opened simultaneously. Like opening a bank vault — both the manager and supervisor must insert their keys at the same time.
  • OR Gate:

    • Rule: Output is 1 if at least one input is 1.
    • Intuitive understanding: Connect two transistors in parallel. With multiple parallel channels, as long as one path is open, current can flow through.
  • NOT Gate (Inverter):

    • Rule: Input 1 always outputs 0; input 0 always outputs 1.
    • Intuitive understanding: A gate specifically designed to flip states, also a key defense line frequently used in circuit design for signal shaping.
  • XOR Gate (Exclusive OR):

    • Rule: Output is exactly 1 when the two inputs are different.
    • Intuitive understanding: You can think of it as a precision machine for "detecting differences." This is our secret weapon for performing binary addition in circuits.

2.3 Implementing Addition with Logic Gates

If the logic gates introduced above can only perform simple conditional judgments, how does a computer actually do mathematical operations?

From Hand Addition to Logic GatesHow can computers do math with only 0 and 1? Follow the pattern.
Step 1: Recall carrying in decimal addition
1
7
+5
12

Because 7 + 5 = 12, the result is larger than the largest single digit (9). We split 12 into "one full 10" and "the remaining 2":

  • 2 The remaining 2 stays in the current column. This is the sum bit.
  • The full 10 carries a 1 into the tens column. This is the carry.
Step 2: The four binary addition cases
+=00

0 + 0 = 0. Write 0 in this column, with no carry.

Step 3: Name the patterns as circuits
ABCarrySum
0000
0101
1001
1110
Sum Sum pattern:
The sum is 1 only for inputs (0,1) or (1,0). It is 1 only when the two inputs are different.
In circuits, this pattern is called XOR.
Carry Carry pattern:
The carry is 1 only for inputs (1,1). It is 1 only when both inputs are 1.
In circuits, this pattern is called AND.

Therefore, by combining an XOR gate (responsible for computing the current bit) and an AND gate (responsible for computing the carry), we get a circuit that can perform single-digit addition. This is the most basic half adder.

Half Adder -- Interactive DemoClick inputs A and B to see the result for one binary column
+=00
▲ Carry: pass a 1 to the column on the left ▲ Sum: the digit written in this column
0 + 0 = 0. Write 0 in this column, with no carry.
All possible cases
ABWrite (sum)Carry
0000
0110
1010
1101

Look closely at the table and two patterns appear:

  • The sum column is 1 only when A and B are different. This is XOR.
  • The carry column is 1 only when A and B are both 1. This is AND.
The circuit is connected like this:
A = 0
B = 0
XOR gate
Different -> 1
Output: 0
AND gate
All 1 -> 1
Output: 0
Sum
0
Carry
0

But the half adder has a fatal flaw: physically, it has only two input ports (A and B).

Imagine performing decimal column addition (e.g., 19 + 22):

  • Ones digit: 9 + 2 = 11. Only two numbers need to be added; write 1, carry 1. This is exactly two inputs — a half adder handles it perfectly.
  • Tens digit: Not only do we need to compute 1 + 2, but we also need to add the "carry 1" passed from the ones digit (i.e., 1 + 2 + 1 = 4). This means in multi-digit addition, except for the rightmost digit, all other digits are actually adding three numbers together!

Because the half adder has no third input port to accept the "carry-in from the lower digit," it can't be used for any digit except the rightmost one. To solve this problem, we need a full adder that can accept three signals:

Full Adder -- Interactive DemoA full adder adds one more input: carry-in (Cin) from the lower bit. Click the three inputs to try it.
++=01
ABCarry-inCarrySum
1 + 0 + 0 = 1. Write 1 in this column, with no carry.
Compared with a half adder: A full adder adds a third input: carry-in (Cin). In multi-bit addition, each column adds A, B, and the carry from the column on the right.
All 8 cases (3 inputs -> 2³ = 8)
ABCinSumCarry
00000
00110
01010
01101
10010
10101
11001
11111
Inside a full adder = two half adders in series
Step 1: Half adder 1
First calculate A + B
A = 1B = 0
Intermediate sum: 1Carry 1: 0
Step 2: Half adder 2
Add the intermediate sum and carry-in
Intermediate sum = 1Cin = 0
Sum: 1Carry 2: 0
Step 3: Merge carries
If either carry path is 1, carry 1 into the next higher bit.
Carry 1 = 0Carry 2 = 0
Final carry: 0

By cascading multiple full adders, multi-digit addition can be accomplished:

Ripple Carry AdderCascade multiple full adders to perform multi-bit binary addition
CascadeLower-bit Cout connects to higher-bit Cin
RippleCarry propagates bit by bit like a wave
OverflowThe highest bit produces a carry beyond the range
Bits:
+=13
A0111(7)
B0110(6)
=1101(13)
Adder cascadeHover to inspect each bit calculation
Bit 0Half adder
A1B0
Sum1Cout0
Bit 1Full adder
A1B1Cin0
Sum0Cout1
Bit 2Full adder
A1B1Cin1
Sum1Cout1
Bit 3Full adder
A0B0Cin1
Sum1Cout0
Overall calculation
Input:A = 7 (0111), B = 6 (0110)
Process:Start at bit 0, compute each sum and carry, and propagate carries toward higher bits.
Result:1101 = 13
Core idea: Carry ripples from the lowest bit to the highest bit, which is why this circuit is called a ripple carry adder. More bits increase delay, but the circuit stays simple.

Core Analysis: Decomposing the Adder

To handle more complex numbers in the real world, adders need to be assembled like building blocks:

  1. Half Adder: It can handle adding two single-digit numbers (the combination of XOR and AND gates described above). It computes the sum and carry bits but can't accept a carry-in from a lower position.

  2. Full Adder: In multi-digit calculations, intermediate digits need to add A and B together plus handle the carry-in from the lower position. Incorporating the lower carry into the logic creates a full adder.

  3. Ripple Carry Adder: To handle 32-bit or 64-bit numbers, simply cascade dozens of full adders together. The carry signal ripples from the lower bits to the higher bits like a wave, completing addition of any size.

Want to understand the complete process from logic gates to multi-digit addition all at once? Try this comprehensive demo:

Complete Adder DemoFrom logic gates to multi-bit addition -- abstraction layer by layer
Layer 1: Logic gates
The basic operation units. Each gate performs one Boolean operation.
AND gateOutputs 1 only when all inputs are 1
OR gateOutputs 1 when any input is 1
XOR gateOutputs 1 when inputs differ
&
AND gateA AND B
0001
>=1
OR gateA OR B
0111
=1
XOR gateA XOR B
0110
1
NOT gateNOT A
10
Core idea: Logic gates turn voltage levels (0/1) into Boolean operations (false/true). They are where hardware starts implementing math.
Abstraction layers
Logic gates
Half adder
⊞⊞Full adder
[]Multi-bit adder
CPUALU/CPU

3. Functional Units: Combining Logic Gates

Now, with our building blocks made of logic gates in hand, we can leap to a higher level of abstraction. Computing addition alone is not enough. We package groups of logic gates and assemble them into modules with specific functions. These modules are collectively called functional units.

3.1 Common Functional Modules

When designing a CPU, there are time-tested classic prefabricated modules:

ModuleCore MissionInternal Logic NatureReal-life Analogy
AdderEngine for all types of arithmetic operationsHigh-level bit-wise cascading of full addersA tireless abacus
Multiplexer (MUX)Controls data flow paths, implementing one-of-many selectionCleverly combines AND gates as switches, OR gates for aggregationPrecision railroad switch
DecoderDecodes and translates external binary instructionsGate arrays that precisely light up specific outputs based on input statesCode-breaking translator
Flip-FlopBreaks through the ephemeral nature of electrical signals to record historySubtle cross-feedback loops forming bistable modesA seesaw that holds its state

To intuitively experience how these functional units work, you can operate the component below to view the internal logic of a multiplexer and a decoder:

Common Functional Units -- switch modules to see how they work
Multiplexer (MUX): like a railway switch, it uses the select signal to decide which data input passes through.
Data 0 (D0)
Data 1 (D1)
MUX
Select (Sel)
Output (Out)0

The select signal is 0, so the output equals data 0 (D0): 0

Please explore the most fascinating part through the component below — how memory emerges from nothing:

CPU Register FileHigh-speed storage inside the CPU
Special Registers
PC
0x00401000
Program counter
IR
0x8B450008
Instruction register
MAR
0x00401000
Memory address register
MDR
0x00000000
Memory data register
ACC
0x0000001A
Accumulator
General Purpose Registers
RAX
0x00000000
Return value
RBX
0x00000000
Base register
RCX
0x00000000
Counter register
RDX
0x00000000
Data register
RSI
0x00000000
Source index
RDI
0x00000000
Destination index
RBP
0x00000000
Base pointer
RSP
0x7FFDE000
Stack pointer
Program Status Word (PSW / FLAGS)
CF0Carry flag
PF0Parity flag
AF0Auxiliary carry
ZF0Zero flag
SF0Sign flag
OF0Overflow flag
Registers vs Memory
FeatureRegisterMemory (RAM)
LocationInside the CPUOutside the CPU
Access speedFastest (< 1ns)Slower (50-100ns)
CapacityTiny (bytes)Large (GB)
RoleHold instructions, operands, and resultsStore programs and data

3.2 Registers: Data Storage Units

Beyond computation, computers also need the ability to remember data long-term or temporarily. If a computer loses memory of the previous second during computation, no complex calculation is possible. The computer must have some means of preserving past states. This ability primarily relies on a circuit structure called a flip-flop.

Deep Understanding: Memory Is Essentially a Loop

In most logic circuits, signal flow is forward (feedforward). To produce sustained "memory," early pioneers came up with a brilliant design: feed the output signal back to the input.

It's like an ingeniously balanced seesaw with two stable resting points. Without external disturbance, thanks to its closed-loop design, it permanently stays in either the "left-high-right-low" state (e.g., representing 0) or the opposite state (representing 1). Even fleeting state changes get permanently "locked in" by the closed loop.

When we arrange 32 or 64 of these flip-flops in a neat row and apply a unified, powerful clock frequency signal to command them to act in unison, a register is born. It resides at the heart of the CPU system, serving as ultra-fast "scratch paper" that silently safeguards every one of your critical real-time variables.

Please experience the process of breaking and restoring the closed loop through the interactive demo below:

From Flip-Flops to Registers: The Feedback Loop of Memory
Change the data and observe it: without a clock signal, the output feeds back to the input and the closed loop preserves memory.
Data Bus (Data Input)
1
0
1
0
Gate
🔒
4-bit Register (Stored State)
0
0
0
0
Control Center
Try changing the left-side input. The register value is locked while the feedback loop is closed.

4. CPU Architecture: From Functional Units to Processor

With various computing modules and memory components designed, it's now time for the core integration phase. How do we combine these modules to make them into a central processing unit (CPU) that can automatically execute instructions?

4.1 Core CPU Components

If we view the CPU as a machine with clear division of labor, each unit has its irreplaceable position:

  • Arithmetic Logic Unit (ALU): The "worker" unit responsible for executing addition, subtraction, multiplication, division, and various logical operations.
  • Register File: Temporary drawers on the workbench — very small capacity but extremely fast, used for storing urgent parameters currently being computed.
  • Internal Bus: The conveyor belt within the system, responsible for transporting data and signals between modules.
  • Control Unit: The commander-in-chief. Its mission is to read instructions composed of 0s and 1s from memory, parse what should be done, and transmit specific control signals to other modules, directing them to perform their duties.
CPU Internal Microarchitecture
Click a module to see its subcircuits and how it works
CPU Core (Central Processing Unit)
Address Bus
Data Bus

Control Unit

Program Counter (PC)
Instruction Register (IR)
Instruction Decoder
Clock Generator
Control signals ↓

Register File

General Registers R0-R3
Accumulator (ACC)

Arithmetic Logic Unit (ALU)

Adder Circuit
Status Flags
Control Bus
🖱️

Click a module in the CPU diagram to explore its circuit-level implementation.

4.2 How Does the CPU Execute Instructions?

No matter how complex the high-level programming language you write, it ultimately becomes a series of底层 instructions in memory. The process of executing any instruction essentially repeats these four typical steps:

  1. Fetch: Following the current program execution cursor address, reach into the relatively slow cache and forcefully pull the next binary "instruction" into the core.
  2. Decode: The command brain immediately analyzes: is this command asking me to move memory, or to call the adder for a computation? Immediately connect and activate the required circuits.
  3. Execute: The instruction dispatch reaches a workshop like the ALU. The machines roar into action, going all out for hardcore logical operations.
  4. Write Back: The moment of crystallization — carefully write the newly obtained answer to a specific register or back to the broader memory.

Click "Clock Pulse" below to observe how instructions are decomposed and executed step by step in this cycle, and which hardware modules are involved:

Detailed CPU Instruction Cycle Demo
CPU
Control Unit CU
PC256Program Counter
IRInstruction Register
MARMemory Address Register
MDRMemory Data Register
Arithmetic Logic Unit ALU
ACC0Accumulator
General Register File
R00
R10
R20
R30
Address Bus
Data Bus
Control Bus
Main Memory
0x100LOAD R0, [0x200]
 0x101LOAD R1, #7
 0x102ADD R0, R1
 0x103STORE [0x201], R0
Data Area
 0x51242
 0x5130
FetchFetch
DecodeDecode
ExecuteExecute
Write BackWrite Back
Step 0 / 32
Click "Clock Pulse" to step through execution, or "Auto Run" to play continuously.

The Pursuit of Extreme Efficiency: Pipelining

If we had to wait for one instruction to complete all four steps before starting the next, efficiency would clearly be too low.

Just like a factory assembly line, chip engineers introduced instruction pipelining. This means when the first circuit section is "executing" instruction A, the preceding circuits aren't idle — they're "decoding" instruction B, or even "fetching" instruction C in advance. Through this parallel overlapping approach, CPU execution efficiency is greatly improved.


5. Summary: Crossing Abstraction Layers

Looking back on this journey, we've experienced the most core layers of abstraction in computer architecture. This is the complete path from底层 physical materials to a universal computing platform:

  1. Macroscopic physics: Sand (silicon dioxide crystals)After human smelting, slicing, toxic gas etching, and other demanding processes
  2. Microscopic physics: Massive arrays of transistor switches (using micro-electricity to control micro-electricity) → After sleepless wiring by engineering masters, achieving astonishing digital abstraction constraints
  3. Digital algebra: AND / OR / NOT logic gate systemsRuthlessly eliminating errors, deriving basic behaviors from perfect truth tables
  4. Microarchitecture modules: Functional unit building block sets (adders and other components)Adding system life rhythms and memory characteristics, evolving into complete functional bodies
  5. Complex system architecture: Vast and exquisite CPU combined arraysOpening the door to the virtual application world for developers worldwide
  6. Millions of application kingdoms: Algorithms, system-level software, and the blooming internet universe

The most fascinating aspect of computer science is that each layer of encapsulation perfectly hides the complex details of the layer below. As a software developer, when you write salary = base + bonus, you don't need to think about electron drift or current flow inside half adders. Similarly, chip hardware designers don't need to worry about what software their chip will run in the future.

It is this extreme layer decoupling and highly independent black-box encapsulation that together nurtured and paved the way for the carnival of modern technology.

Ultimate Reflection

Ultimately, so-called computing power is nothing more than the transformation of massive switch reconfigurations within a confined space; accompanied by the rhythm of the clock, completing complex computations on this tiny silicon chip.

"Quantitative change ultimately triggers a qualitative leap" — this phrase is continuously validated in computer architecture. When we tap our keyboards and gaze at our screens, we can try to imagine: deep within the incredibly microscopic silicon substrate, billions of tiny transistors are at this very moment straining to perform precise coordination in the flash of electric light. This is perhaps the most unique beauty of computer science.


Further Reading

If you're full of curiosity about底层 technology, you can explore the following directions:

  • Classic textbook: Computer Organization and Design (The Hardware/Software Interface) is an excellent reference for studying architecture in depth.
  • Digital logic simulation: Try using logic simulation software or basic components to build a simple 8-bit adder or simulator hands-on.
  • Architecture frontiers: Learn about how multi-level caches mitigate the "memory wall" problem, the principles of out-of-order instruction execution, and GPU's special computing mechanisms.
  • Low-level and assembly language: Try learning some basic assembly language to understand how high-level languages are ultimately transformed into hexadecimal instructions executable by machines.