Concurrency, Async, and Multithreading
💡 Learning Guide: Concurrent programming is the "Achilles' heel" for many backend engineers — you get stumped in interviews, hit bugs in production, and have no clue where to start with performance tuning. This chapter revolves around one core question: When 100,000 users hit your service simultaneously, will your code collapse?
Before diving in, make sure you have these two foundational pieces:
- What are CPU, Memory, and I/O: If you're fuzzy on these basics, brush up on operating system fundamentals.
- What is blocking/non-blocking: If you're not yet comfortable with synchronous/asynchronous concepts, get a feel for them through hands-on programming first.
0. Introduction: Why Does Your Service "Freeze" During Peak Traffic?
Process / Thread / Coroutine Comparison
Task Queue
Each process has its own independent memory space, strong isolation but high overhead. Inter-process communication requires IPC mechanisms. Suitable for scenarios requiring strong isolation, such as browser tabs and sandbox programs.
Many developers encounter situations like these in practice:
- The service responds lightning-fast in local testing, but becomes a "slideshow" once deployed;
- You bought high-spec servers, yet CPU utilization never goes up;
- During promotional peaks, the service "avalanches," forcing you to degrade or circuit-break.
Intuitively, we think: "The server isn't powerful enough." But most of the time, the problem isn't that the hardware is "too slow" — it's that we didn't design the concurrency model properly.
The Core Dilemma:
- Without concurrent processing: requests queue up, user experience is terrible;
- With reckless multithreading: lock contention and context-switching overhead actually degrade performance.
Faced with these challenges, simply "adding more machines" is no longer enough. We need a systematic approach to concurrency design that ensures both performance and stability under high concurrency. That's exactly what this chapter aims to address.
1. Core Concepts: Processes, Threads, and Coroutines — What's the Difference?
1.1 A Restaurant Analogy
Imagine you run a restaurant and need to serve many customers at once:
| Concept | Restaurant Analogy | Technical Meaning |
|---|---|---|
| Process | An independent restaurant branch | Has its own memory space and resource allocation. It is the basic unit of OS resource allocation. A crash in one process does not affect others. |
| Thread | A chef within a branch | The basic unit of CPU scheduling, sharing memory space within a process. Threads in the same process can share data, but one thread crashing can bring down the entire process. |
| Coroutine | A chef's "cloning technique" | A user-space lightweight thread, scheduled by the program itself rather than the OS. Switching overhead is minimal; you can create millions of them. |
1.2 In-Depth Comparison: The Essential Differences
Process Memory Isolation Demo
Each process has its own independent virtual address space. A crash in one process does not affect other processes. Click "Create Process" to start the demo.
Process: The "Container" of Resource Isolation
Key Characteristics:
- Strong isolation: Each process has its own independent virtual address space
- High overhead: Creation/switching requires OS intervention, taking ~1-10ms
- Complex communication: Inter-Process Communication (IPC) requires special mechanisms (pipes, message queues, shared memory, etc.)
Use Cases:
- Services requiring strong isolation (e.g., browser tabs, sandboxed programs)
- Multi-language hybrid deployments
- Service units that need independent restart/upgrade
Thread: The "Light Cavalry" of Shared Memory
Thread Scheduling Demo
Current Scheduling Algorithm: Round Robin (Time Slice)
Each thread takes turns executing for a time slice. When the slice expires, it switches to the next thread. Good responsiveness, suitable for interactive systems.
Key Characteristics:
- Shared memory: Threads within the same process share code, data, and heap segments
- Independent stack space: Each thread has its own stack (typically ~1MB)
- Fast switching: Thread switching takes ~1-10μs, about 1000× faster than process switching
- Synchronization required: Shared data must be protected with locks
Use Cases:
- CPU-intensive tasks (computation, image processing)
- Concurrent tasks that need to share a lot of data
- Latency-sensitive background tasks
Coroutine: The User-Space "Green Thread"
Coroutine Lightweight Comparison Demo
Thread Model
Coroutine Model
Using coroutines can save -100% of memory (about -1000MB), with 10x faster creation speed.
Key Characteristics:
- User-space scheduling: Scheduled by the program/runtime library, not the OS
- Extremely lightweight: Coroutine stacks are typically only a few KB; you can create millions
- Extremely fast switching: Coroutine switching takes ~100ns, about 100× faster than thread switching
- Non-preemptive: Coroutines voluntarily yield the CPU (cooperative multitasking)
Use Cases:
- I/O-intensive high-concurrency services (web servers, gateways)
- Scenarios requiring many long-lived connections (IM, game servers)
- Streaming data processing, pipeline workflows
2. Case Study: An E-Commerce Promotion's "Concurrency Nightmare"
2.1 Lessons Learned: Evolution from "Single Machine" to "Distributed"
Let's look at the evolution story of a real e-commerce system:
Phase 1: The Single-Machine Era (1K DAU)
# A simple Flask application
from flask import Flask
app = Flask(__name__)
@app.route('/order')
def create_order():
# Query inventory
stock = db.query("SELECT stock FROM products WHERE id=1")
if stock > 0:
# Deduct inventory
db.execute("UPDATE products SET stock = stock - 1 WHERE id=1")
# Create order
db.execute("INSERT INTO orders ...")
return "Order created!"
return "Out of stock!"
# Start: flask runProblems:
- Single process, single thread — can only handle one request at a time
- Inventory deduction is not locked, leading to overselling under concurrency
- Limited database connections; the connection pool is quickly exhausted
Phase 2: The Multi-Process Era (10K DAU)
# Deploy with Gunicorn multi-process
gunicorn -w 4 -k sync app:app
# 4 worker processes, each handling requests independentlyNew Problems:
- 4 processes all query inventory simultaneously, all see stock=1, all deduct successfully — 3 oversold items!
- Need to introduce distributed locks
import redis
# Use Redis distributed lock
lock = redis_client.lock("stock_lock", timeout=10)
if lock.acquire():
try:
stock = db.query("SELECT stock FROM products WHERE id=1")
if stock > 0:
db.execute("UPDATE products SET stock = stock - 1 WHERE id=1")
finally:
lock.release()Phase 3: The Coroutine Era (100K DAU)
# Use FastAPI + asyncio
from fastapi import FastAPI
import asyncio
app = FastAPI()
async def check_stock(product_id: int) -> int:
# Async database query, non-blocking
result = await db.fetch_one(
"SELECT stock FROM products WHERE id = :id",
{"id": product_id}
)
return result["stock"]
@app.get("/order")
async def create_order(product_id: int):
# Concurrently check inventory and user info
stock_task = check_stock(product_id)
user_task = get_user_info(request.user_id)
stock, user = await asyncio.gather(stock_task, user_task)
if stock > 0:
# Async inventory deduction
await db.execute(
"UPDATE products SET stock = stock - 1 WHERE id = :id",
{"id": product_id}
)
return {"status": "success"}
return {"status": "out_of_stock"}
# Start: uvicorn main:app --workers 4
# Each worker can handle thousands of concurrent coroutinesAdvantages:
- Thousands of concurrent connections within a single thread
- Voluntarily yields CPU during I/O operations without blocking other requests
- Extremely low memory footprint, ideal for high-concurrency long-connection scenarios
2.2 Concurrency Model Evolution Comparison Table
| Phase | Concurrency Model | DAU Supported | Core Problem | Solution |
|---|---|---|---|---|
| Monolith | Single process, single thread | 1K | Cannot handle concurrency | Introduce multi-process |
| Multi-Process | Multi-process synchronous | 10K | Data races, overselling | Distributed locks |
| Multi-Threaded | Multi-threaded + locks | 50K | Context-switching overhead, deadlocks | Thread pools, lock-free queues |
| Coroutine | Async I/O | 100K+ | Code complexity, debugging difficulty | Framework encapsulation, distributed tracing |
| Hybrid | Multi-process + coroutines | 1M+ | Architectural complexity | Service governance, elastic scaling |
3. Deep Dive: How Various Concurrency Models Work
3.1 Process Model: Isolation and Communication
Memory Isolation Mechanism
Process Memory Isolation Demo
Each process has its own independent virtual address space. A crash in one process does not affect other processes. Click "Create Process" to start the demo.
Each process has its own independent virtual address space:
Process A Virtual Memory Process B Virtual Memory
+----------------+ +----------------+
| Kernel Space | | Kernel Space | <-- Shared (read-only)
| (shared) | | (shared) |
+----------------+ +----------------+
| Stack | | Stack | <-- Independent
| (grows down) | | (grows down) |
+----------------+ +----------------+
| Heap | | Heap | <-- Independent
| (grows up) | | (grows up) |
+----------------+ +----------------+
| Data Segment | | Data Segment | <-- Independent
| (.bss/.data) | | (.bss/.data) |
+----------------+ +----------------+
| Code Segment | | Code Segment | <-- Independent
| (.text) | | (.text) |
+----------------+ +----------------+Inter-Process Communication (IPC) Methods
| Method | Principle | Speed | Use Case |
|---|---|---|---|
| Pipe | Kernel buffer, unidirectional stream | Medium | Parent-child process communication |
| Message Queue | Kernel message linked list | Medium | Async message passing |
| Shared Memory | Same physical memory mapped | Fastest | Large data sharing |
| Semaphore | Kernel counter | - | Synchronization and mutual exclusion |
| Socket | Network protocol stack | Slower | Cross-machine communication |
| Signal | Soft interrupt | - | Event notification |
3.2 Thread Model: Scheduling and Synchronization
Thread Scheduling Principles
Thread Scheduling Demo
Current Scheduling Algorithm: Round Robin (Time Slice)
Each thread takes turns executing for a time slice. When the slice expires, it switches to the next thread. Good responsiveness, suitable for interactive systems.
How the OS thread scheduler works:
Ready Queue Running Waiting Queue
+--------+ +--------+ +--------+
| Thread B| <-- timeslice| Thread A| <-- I/O req | Thread C|
| Thread D| expires |(running)| | Thread E|
| Thread F| +--------+ |(blocked)|
+--------+ +--------+
| |
v v
Scheduler picks next to run by priority Moved back to ready queue when I/O completesCommon Thread Synchronization Mechanisms
| Mechanism | Principle | Pros | Cons |
|---|---|---|---|
| Mutex | Binary state, exclusive access | Simple to implement | Poor performance under heavy contention |
| RWLock | Shared for reads, exclusive for writes | Efficient for read-heavy workloads | Complex implementation, risk of write starvation |
| Spinlock | Busy-waiting, doesn't release CPU | Efficient when wait time is short | Wastes CPU when wait time is long |
| Condition Variable | Wait for a specific condition to be met | Avoids busy-waiting | Must be used with a lock |
| Semaphore | Counter controls access count | Can limit concurrency | Error-prone if misused |
| Atomic Operations | CPU instruction-level atomicity | Lock-free, highest performance | Only works with simple data types |
| Lock-Free Queue | Implemented via CAS operations | Excellent performance under high concurrency | Complex implementation, ABA problem |
3.3 Coroutine Model: User-Space Scheduling
Coroutine Lightweight Comparison Demo
Thread Model
Coroutine Model
Using coroutines can save -100% of memory (about -1000MB), with 10x faster creation speed.
Core Advantages of Coroutines
Traditional Multithreading vs Coroutine Model
+------------+ +------------+
| Thread 1 | | Event Loop |
| (1MB stack)| | (Scheduler)|
+------------+ +------------+
| |
v v
+------------+ +------------+
| Thread 2 | | Coroutine A|
| (1MB stack)| | (few KB) |
+------------+ +------------+
| |
v v
+------------+ +------------+
| Thread 3 | | Coroutine B|
| (1MB stack)| | (few KB) |
+------------+ +------------+
Overhead: N MB Overhead: N KB
Creation: ~10μs Creation: ~100ns
Switching: ~1μs Switching: ~100nsHow async/await Works
async/await Mechanism Demo
import asyncio
async def fetch_data(url):
# await suspends, yields CPU
response = await aiohttp.get(url)
# Continue after I/O completes
return response.json()
async def main():
# Concurrent execution
tasks = [fetch_data(url) for url in urls]
results = await asyncio.gather(*tasks)Execution Timeline
When a task encounters an I/O operation (such as a network request), await yields the CPU, and the event loop schedules other tasks to execute. After I/O completes, the task resumes from the suspension point. This approach allows a single thread to handle thousands of concurrent tasks.
import asyncio
async def fetch_data(url):
# When await is hit, the coroutine suspends and yields the CPU
response = await aiohttp.get(url)
# After I/O completes, the event loop wakes the coroutine, resuming from here
return response.json()
async def main():
# Create 3 coroutine tasks
tasks = [
fetch_data("https://api1.example.com"),
fetch_data("https://api2.example.com"),
fetch_data("https://api3.example.com")
]
# Execute concurrently; total time ≈ the slowest request
results = await asyncio.gather(*tasks)
return results
# Start the event loop
asyncio.run(main())Execution Flow:
Timeline -------------------------------------------------------------------->
Coroutine A: [Prepare]--[await suspend]=======[Response received]--[Process]
|
Coroutine B: [Prepare]--[await suspend]=======[Response received]--[Process]
|
Coroutine C: [Prepare]--[await suspend]=======[Response received]
|
↓
All I/O complete
Legend: [ ] = CPU execution, === = I/O waiting, | = coroutine switch3.4 Event Loop: The "Heart" of Coroutines
Event Loop Demo
The event loop is the core mechanism of coroutine scheduling:
import selectors
import heapq
class EventLoop:
def __init__(self):
self.selector = selectors.DefaultSelector()
self.ready = [] # Ready queue
self.scheduled = [] # Scheduled task queue
self.current = None
def run(self):
while True:
# 1. Process scheduled tasks
now = time.time()
while self.scheduled and self.scheduled[0][0] <= now:
_, callback = heapq.heappop(self.scheduled)
self.ready.append(callback)
# 2. Wait for I/O events
timeout = 0 if self.ready else 0.1
events = self.selector.select(timeout)
for key, mask in events:
callback = key.data
self.ready.append(callback)
# 3. Execute ready callbacks
while self.ready:
callback = self.ready.popleft()
callback()3.5 Concurrency vs. Parallelism: Not the Same Thing
Concurrency vs Parallelism Demo
| Concept | Meaning | Analogy | Requirements |
|---|---|---|---|
| Concurrency | Multiple tasks interleave execution, appearing to progress simultaneously | One person alternating between cooking multiple dishes | Single-core CPU is enough |
| Parallelism | Multiple tasks truly execute at the same time | Multiple people cooking different dishes simultaneously | Multi-core CPU or multiple machines |
Illustration:
Single-Core CPU - Concurrency
Time → 1 2 3 4 5 6 7 8
Task A: [Run ][Run ] [Run ][Run ]
Task B: [Run ][Run ] [Run ][Run ]
Two tasks interleave execution, appearing to progress "simultaneously"
========================================
Multi-Core CPU - Parallelism
Time → 1 2 3 4 5 6 7 8
Core 1: [Task A][Task A][Task A][Task A]
Core 2: [Task B][Task B][Task B][Task B]
Two tasks truly execute "simultaneously"
========================================
In reality, it's often: Concurrency + Parallelism
Time → 1 2 3 4 5 6 7 8
Core 1: [A1][A1][B1][B1][C1][C1][D1][D1]
Core 2: [A2][A2][B2][B2][C2][C2][D2][D2]
Multiple tasks are concurrently scheduled to different cores, then run in parallel on those cores4. In Practice: Go Goroutines and Green Threads
4.1 Go's Concurrency Philosophy
Go Goroutine & GMP Scheduling Demo
G (Goroutine): Tasks to be executed. M (Machine): OS threads that execute G. P (Processor): Logical processor providing execution context. G is first placed in P's local queue. After P binds to M, M fetches G from P for execution. When the local queue is empty, it steals tasks from the global queue or other P's.
Go's concurrency design philosophy: Don't communicate by sharing memory; share memory by communicating.
package main
import (
"fmt"
"time"
)
// Producer
func producer(ch chan<- int, id int) {
for i := 0; i < 5; i++ {
fmt.Printf("Producer %d sending: %d\n", id, i)
ch <- i // Send data to channel
time.Sleep(100 * time.Millisecond)
}
}
// Consumer
func consumer(ch <-chan int, id int) {
for val := range ch { // Receive data from channel
fmt.Printf("Consumer %d received: %d\n", id, val)
}
}
func main() {
// Create a buffered channel
ch := make(chan int, 10)
// Start 2 producer goroutines
for i := 0; i < 2; i++ {
go producer(ch, i)
}
// Start 2 consumer goroutines
for i := 0; i < 2; i++ {
go consumer(ch, i)
}
// Wait for a while
time.Sleep(3 * time.Second)
close(ch)
}4.2 Goroutine Scheduler: The GMP Model
Go's scheduler uses the GMP model:
| Component | Meaning | Role |
|---|---|---|
| G (Goroutine) | Coroutine | The task to execute, lightweight (2KB stack, dynamically resizable) |
| M (Machine) | OS Thread | The carrier that actually executes G, 1:1 mapping with kernel threads |
| P (Processor) | Logical Processor | Scheduling context containing a runnable G queue; count defaults to the number of CPU cores |
Scheduling Flow:
Global Queue
+----------------+
| G1 | G2 | G3 |
+----------------+
P0 Local Queue P1 Local Queue P2 Local Queue P3 Local Queue
+----------+ +----------+ +----------+ +----------+
| G4 | G5 | | G6 | G7 | | G8 | G9 | | G10| G11 |
+----------+ +----------+ +----------+ +----------+
| | | |
v v v v
+----------+ +----------+ +----------+ +----------+
| M0 | | M1 | | M2 | | M3 |
| (OS Thrd)| | (OS Thrd)| | (OS Thrd)| | (OS Thrd)|
+----------+ +----------+ +----------+ +----------+
Scheduling Strategy:
1. Each P maintains a local G queue to reduce lock contention
2. P takes G from its local queue and hands it to M for execution
3. When the local queue is empty, "steal" half the G's from another P (Work Stealing)
4. The global queue serves as a fallback, checked periodically5. Practical Code Templates
5.1 Python asyncio High-Concurrency Template
import asyncio
import aiohttp
from typing import List, Dict
import time
class AsyncHTTPClient:
"""High-performance HTTP client based on asyncio"""
def __init__(self, max_connections: int = 100, timeout: int = 30):
self.timeout = aiohttp.ClientTimeout(total=timeout)
# Limit concurrent connections to avoid overwhelming the target service
connector = aiohttp.TCPConnector(
limit=max_connections,
limit_per_host=10, # Connection limit per host
enable_cleanup_closed=True,
force_close=True,
)
self.session = aiohttp.ClientSession(
connector=connector,
timeout=self.timeout,
)
async def fetch(self, url: str, method: str = 'GET', **kwargs) -> Dict:
"""Send a single request"""
try:
async with self.session.request(method, url, **kwargs) as response:
return {
'url': url,
'status': response.status,
'data': await response.text(),
'error': None
}
except asyncio.TimeoutError:
return {'url': url, 'status': None, 'data': None, 'error': 'Timeout'}
except Exception as e:
return {'url': url, 'status': None, 'data': None, 'error': str(e)}
async def fetch_many(self, urls: List[str], concurrency: int = 10) -> List[Dict]:
"""Fetch multiple URLs concurrently, with a concurrency limit"""
semaphore = asyncio.Semaphore(concurrency)
async def fetch_with_limit(url):
async with semaphore:
return await self.fetch(url)
# Execute all requests concurrently
tasks = [fetch_with_limit(url) for url in urls]
return await asyncio.gather(*tasks, return_exceptions=True)
async def close(self):
await self.session.close()
# Usage example
async def main():
client = AsyncHTTPClient(max_connections=50)
# List of URLs to fetch
urls = [
"https://api.github.com/users/github",
"https://api.github.com/users/google",
"https://api.github.com/users/microsoft",
# ... more URLs
] * 10 # Simulate 300 requests
start = time.time()
results = await client.fetch_many(urls, concurrency=20)
elapsed = time.time() - start
# Summarize results
success = sum(1 for r in results if r.get('status') == 200)
failed = len(results) - success
print(f"Total requests: {len(results)}")
print(f"Success: {success}, Failed: {failed}")
print(f"Elapsed: {elapsed:.2f}s")
print(f"QPS: {len(results)/elapsed:.1f}")
await client.close()
if __name__ == "__main__":
asyncio.run(main())5.2 Go High-Concurrency Service Template
package main
import (
"context"
"encoding/json"
"fmt"
"log"
"net/http"
"runtime"
"time"
"golang.org/x/sync/errgroup"
)
// Request/Response structures
type OrderRequest struct {
UserID int64 `json:"user_id"`
ProductID int64 `json:"product_id"`
Quantity int `json:"quantity"`
Price float64 `json:"price"`
}
type OrderResponse struct {
OrderID int64 `json:"order_id"`
Status string `json:"status"`
Total float64 `json:"total"`
CreatedAt string `json:"created_at"`
}
// Simulated database operations
type Database struct {
orders map[int64]*OrderResponse
mutex chan struct{}
}
func NewDatabase() *Database {
db := &Database{
orders: make(map[int64]*OrderResponse),
mutex: make(chan struct{}, 1), // Simulated mutex
}
return db
}
func (db *Database) CreateOrder(ctx context.Context, req *OrderRequest) (*OrderResponse, error) {
// Acquire lock
select {
case db.mutex <- struct{}{}:
defer func() { <-db.mutex }()
case <-ctx.Done():
return nil, ctx.Err()
}
// Simulate database operation latency
select {
case <-time.After(50 * time.Millisecond):
case <-ctx.Done():
return nil, ctx.Err()
}
order := &OrderResponse{
OrderID: time.Now().UnixNano(),
Status: "created",
Total: req.Price * float64(req.Quantity),
CreatedAt: time.Now().Format(time.RFC3339),
}
db.orders[order.OrderID] = order
return order, nil
}
// HTTP handler
type Handler struct {
db *Database
}
func NewHandler(db *Database) *Handler {
return &Handler{db: db}
}
func (h *Handler) CreateOrder(w http.ResponseWriter, r *http.Request) {
// Set request timeout
ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
defer cancel()
var req OrderRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
http.Error(w, err.Error(), http.StatusBadRequest)
return
}
order, err := h.db.CreateOrder(ctx, &req)
if err != nil {
if err == context.DeadlineExceeded {
http.Error(w, "Request timeout", http.StatusGatewayTimeout)
return
}
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(order)
}
func (h *Handler) Health(w http.ResponseWriter, r *http.Request) {
info := map[string]interface{}{
"status": "ok",
"goroutine": runtime.NumGoroutine(),
"cpu": runtime.NumCPU(),
"version": runtime.Version(),
}
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(info)
}
// Batch processing example
func BatchProcess(ctx context.Context, items []int) ([]int, error) {
g, ctx := errgroup.WithContext(ctx)
g.SetLimit(10) // Limit concurrency to 10
results := make([]int, len(items))
for i, item := range items {
i, item := i, item // Avoid closure capture pitfall
g.Go(func() error {
select {
case <-ctx.Done():
return ctx.Err()
default:
// Simulate processing
time.Sleep(100 * time.Millisecond)
results[i] = item * 2
return nil
}
})
}
if err := g.Wait(); err != nil {
return nil, err
}
return results, nil
}
func main() {
// Initialize database
db := NewDatabase()
// Create handler
handler := NewHandler(db)
// Set up routes
mux := http.NewServeMux()
mux.HandleFunc("/order", handler.CreateOrder)
mux.HandleFunc("/health", handler.Health)
// Create server
server := &http.Server{
Addr: ":8080",
Handler: mux,
ReadTimeout: 5 * time.Second,
WriteTimeout: 10 * time.Second,
IdleTimeout: 120 * time.Second,
}
fmt.Println("Server starting on :8080")
fmt.Printf("Go version: %s\n", runtime.Version())
fmt.Printf("CPU cores: %d\n", runtime.NumCPU())
if err := server.ListenAndServe(); err != nil {
log.Fatal(err)
}
}6. Summary Comparison Tables
6.1 Core Concept Comparison
| Feature | Process | Thread | Coroutine |
|---|---|---|---|
| Scheduler | OS | OS | User program / runtime |
| Switch Overhead | ~1-10ms | ~1-10μs | ~100ns |
| Memory Footprint | ~10MB+ | ~1MB | ~2KB |
| Communication | IPC | Shared memory | Shared memory / Channel |
| Sync Required | No | Locks needed | Locks / Cooperative |
| Crash Impact | This process only | Entire process | Controllable |
| Use Case | Strong isolation, multi-tenant | CPU-intensive | I/O-intensive |
| Typical Languages | All languages | All languages | Go, Python, JS, Rust |
6.2 Concurrency Model Selection Guide
| Scenario | Recommended Model | Rationale |
|---|---|---|
| Web service gateway | Coroutine + async I/O | High concurrent connections, low memory footprint |
| Real-time communication | Coroutine + long connections | Maintain many WebSocket connections |
| Data processing pipelines | Multi-process + coroutines | Utilize multi-core, I/O non-blocking |
| Scientific computing | Multi-threaded / multi-process | CPU-intensive, needs parallel computation |
| Microservice architecture | Multi-process + coroutines | Inter-service isolation, internal high concurrency |
| Embedded systems | Coroutine / single-threaded | Resource-constrained, deterministic scheduling |
6.3 Terminology Reference
| English Term | Chinese Translation | Explanation |
|---|---|---|
| Process | 进程 | Basic unit of OS resource allocation, with independent memory space |
| Thread | 线程 | Basic unit of CPU scheduling, shares process memory space |
| Coroutine | 协程 | User-space lightweight thread, scheduled by the program itself |
| Concurrency | 并发 | Multiple tasks interleave execution, appearing to progress simultaneously |
| Parallelism | 并行 | Multiple tasks truly execute simultaneously, requires multi-core support |
| Context Switch | 上下文切换 | The process of the CPU switching from one task to another |
| Blocking I/O | 阻塞 I/O | Initiates an I/O request and waits for completion; the thread is suspended |
| Non-blocking I/O | 非阻塞 I/O | Initiates an I/O request and returns immediately without waiting for the result |
| Async I/O | 异步 I/O | Notifies the caller via callback or notification mechanism when I/O completes |
| Event Loop | 事件循环 | Coroutine scheduling mechanism that continuously listens for events and dispatches them |
| Goroutine | Go 协程 | Go language's lightweight thread implementation |
| Channel | 通道 | Go's mechanism for communication between goroutines |
| Mutex | 互斥锁 | Synchronization primitive for protecting shared resources |
| Semaphore | 信号量 | Controls the number of threads concurrently accessing a resource |
| Deadlock | 死锁 | Multiple threads waiting for each other to release resources, causing permanent blocking |
| Race Condition | 竞态条件 | Multiple threads accessing shared data simultaneously, leading to non-deterministic results |
| Thread Pool | 线程池 | Pre-create a set of threads and reuse them to reduce creation/destruction overhead |
| Work Stealing | 工作窃取 | Idle threads "steal" tasks from busy threads' queues for execution |
| Zero-copy | 零拷贝 | Data transferred between kernel space and user space without CPU copy |
| C10K Problem | C10K 问题 | The challenge of handling 10,000 connections on a single machine |
| C10M Problem | C10M 问题 | The ultimate challenge of handling 10 million connections on a single machine |
7. Closing Thoughts
7.1 The Golden Rules of Concurrent Programming
- Don't optimize prematurely: Make the code work correctly first, then consider performance optimization
- Avoid shared state: "Don't communicate by sharing memory; share memory by communicating"
- Let errors surface early: Concurrency bugs are often hard to reproduce; expose them as much as possible during testing
- Limit concurrency: Unlimited concurrency is no protection at all; use semaphores or connection pools to cap it
- Monitor and observe: Concurrent systems must have comprehensive monitoring to quickly locate issues
7.2 Learning Roadmap
Phase 1: Fundamental Understanding
├── Understand basic concepts of processes and threads
├── Learn synchronization primitives (locks, semaphores, condition variables)
└── Write simple multi-threaded programs
Phase 2: Deep Dive into Principles
├── Understand memory models and visibility
├── Learn lock-free programming and atomic operations
├── Understand thread pools and work stealing
└── Analyze deadlocks and race conditions
Phase 3: Advanced Applications
├── Master coroutines and async programming
├── Learn Go/Python/Rust concurrency models
├── Understand concurrency in distributed systems
└── Performance tuning and capacity planning
Phase 4: Expert Level
├── Design high-concurrency system architectures
├── Solve complex concurrency bugs
├── Develop concurrency programming frameworks
└── Share and spread concurrency knowledgeI hope this guide helps you build a systematic understanding of concurrent programming. Remember, concurrency is not the goal — it's the means. The real objective is to build high-performance, highly available services. Understand the principles, choose the right model, write solid code, and you'll go far on the concurrency journey.