Skip to content

Database Fundamentals (Indexes / Transactions / Query Optimization)

Core Question

Why does your Excel query take 10 seconds, while Taobao's search takes only 0.01 seconds? When data grows from "a few thousand rows" to "a billion rows," and from "one person using it" to "tens of millions accessing it simultaneously," Excel is no longer enough. Databases were created to solve this problem — they are "super Excels" specifically designed to handle massive data and high-concurrency access. This chapter will take you from zero to understanding the core principles of databases.


1. Why "Databases"?

1.1 From a Small Bookstore to Taobao: The Evolution of Data Scale

Imagine you run a small bookstore that sells a few books each day. You casually jot down notes in a notebook:

2024-01-15: Zhang San bought "One Hundred Years of Solitude", 59 yuan
2024-01-16: Li Si bought "To Live", 39 yuan

At this point, a notebook works perfectly fine. But when your bookstore becomes the next "Amazon" with millions of orders pouring in daily, problems arise:

  • Data volume: Not dozens of rows, but hundreds of millions of rows
  • Concurrent access: Not one person querying, but tens of millions of simultaneous users
  • Data relationships: Orders linked to users, products, inventory, logistics... complex relationships need efficient management
  • Data security: A power outage shouldn't cause you to lose all orders

Excel / Notebook

  • Suitable for individuals or small teams
  • Data volume: a few thousand to tens of thousands of rows
  • Single user, sequential access
  • Manual lookup, slow speed

Database

  • Suitable for enterprise applications
  • Data volume: billions and above
  • Tens of millions of simultaneous online users
  • Millisecond-level query speed

This is the problem "databases" solve: how to efficiently store, quickly query, and securely manage massive amounts of data?

1.2 A Real Cautionary Tale: Why You Can't Use Excel for User Data

You might say: "My project only has tens of thousands of users — isn't Excel enough?" Let me tell you a true story.

Xiao Lin's Startup Mishap

Xiao Lin built a social app as a startup. Initially, there weren't many users, so he used Excel to store user information (name, phone, registration time, etc.). Exporting Excel daily to track user growth worked fine.

When users surpassed 100,000, problems started appearing:

  • Excel took 5 minutes to open
  • Filtering "users in Beijing" caused long freezes
  • One time, the Excel file got corrupted, and thousands of user records were permanently lost

The most critical issue was that he wanted to implement "view all orders for a specific user" — but user information and orders were in different Excel files. He had to manually copy and paste, taking 30 minutes each time.

He later asked a senior colleague for advice. The colleague took one look and laughed: "What you need isn't Excel — it's a database."

After switching to a database, everything changed:

  • Querying "users in Beijing" took only 0.01 seconds
  • Users and orders were automatically linked through "relationships" — one SQL statement did the job
  • Data was automatically backed up — no more fear of file corruption

Xiao Lin learned an important lesson: When data is small, anything works; but once data grows, Excel is a disaster.

Key Takeaway

A database isn't "a more complex Excel" — it's an entirely different design philosophy:

  • Excel: Designed for small data, single-user usage
  • Database: Designed for big data, high concurrency, and complex relationships

Choosing the right tool can improve your system performance by thousands of times.


2. Core Concepts: Tables, Rows, Columns, Primary Keys

How Do These Concepts Relate to Databases?

Tables, rows, columns, and primary keys are the "building blocks" of databases.

Imagine building a house:

  • Table = A room (stores one type of data)
  • Row = A box in the room (one complete record)
  • Column = A label on the box (name, age, etc.)
  • Primary Key = The box's unique serial number (never duplicated)

Understanding these foundational concepts will help you know how data is organized.

Before diving deeper into databases, we need to clarify these core concepts. To help you understand, we'll use a library analogy.

2.1 Understanding Database Structure Through a Library Analogy

Imagine walking into a library — its organization is strikingly similar to a database:

ConceptLibrary AnalogyActual FunctionConcrete Example
DatabaseThe entire libraryContainer that holds all dataAn e-commerce website's database
TableA bookshelfCollection of the same type of dataUsers table, products table, orders table
ColumnLabels on book spinesData attributes (fields)Name, age, phone number
RowEach book on the shelfOne specific data record"Zhang San, 25, Beijing"
Primary KeyEach book's ISBN numberUnique ID for each rowuser_id = 1001

A real example: Users table (users)

user_id (Primary Key)nameagecityemail
1001Zhang San25Beijingzhangsan@example.com
1002Li Si30Shanghailisi@example.com
1003Wang Wu28Beijingwangwu@example.com
  • Table: users (stores all user data)
  • Columns: user_id, name, age, city, email (attributes for each user)
  • Row: Each row is one user (e.g., "Zhang San, 25, Beijing")
  • Primary Key: user_id (1001, 1002, 1003 — never duplicated)

2.2 Primary Key: The "ID Number" for Data

What is a Primary Key?

A primary key is the unique identifier for each row in a table, just like an ID number.

Key Characteristics:

  • Uniqueness: Absolutely never duplicated (no two people have the same ID number)
  • Non-null: Must have a value (there's no such thing as a person "without an ID number")
  • Immutability: Once set, it doesn't change (your ID number doesn't change)

Common Approaches:

  • Use auto-incrementing integers: 1, 2, 3, 4...
  • Use UUID (Universally Unique Identifier): 550e8400-e29b-41d4-a716-446655440000

Why do we need primary keys? Imagine a world without them:

Scenario: You want to change "Zhang San's" age, but there are 3 "Zhang Sans" in the table. Which one should the system update?

sql
-- Without a primary key, this updates ALL people named "Zhang San"!
UPDATE users SET age = 26 WHERE name = 'Zhang San';

-- With a primary key, precise update
UPDATE users SET age = 26 WHERE user_id = 1001;

The Golden Rule of Primary Keys: Every table should have a primary key, and you should never modify it.

2.3 Foreign Key: The Bridge Connecting Tables

This is what makes databases more powerful than Excel — tables can establish relationships with each other.

What is a Foreign Key?

A foreign key is a column that points to another table's primary key, used to establish associations between tables.

Simple understanding:

  • Primary key = My own ID number
  • Foreign key = Someone else's ID number that I reference

Example: The user_id in the orders table is a foreign key that points to the primary key of the users table.

Let's look at a real example:

Users table (users):

user_id (Primary Key)namephone
1001Zhang San138xxxx
1002Li Si139xxxx

Orders table (orders):

order_id (Primary Key)product_namepriceuser_id (Foreign Key)
5001iPhone 1559991001
5002MacBook149991001
5003AirPods19991002

Key Understanding:

  • user_id = 1001 in the orders table points to user_id = 1001 in the users table (Zhang San)
  • When you want to check "who placed order 5001?", the database automatically looks up the user with user_id = 1001 in the users table

Benefits:

  • No data duplication: Even if Zhang San makes 100 purchases, his information is stored only once in the users table
  • Easy maintenance: If Zhang San changes his phone number, you only update the users table, and all orders automatically link to the new number
  • Flexible queries: You can easily answer complex questions like "what's each user's total spending?"
🔗外键关系演示理解表与表之间如何关联
想象你在管理一个家族谱系:有"家谱表"记录每个人,有"婚姻表"记录谁和谁结婚了。两张表通过"人名"关联起来,这就是外键的作用。
👥用户表 (users)主表
🔑 user_id
name
phone
address
101
张三
138xxxx
北京
102
李四
139xxxx
上海
103
王五
137xxxx
广州
user_id (外键) → user_id (主键)
📦订单表 (orders)从表
🔑 order_id
book_name
🔗 user_id
price
001
百年孤独
101
59
002
活着
101
39
003
三体
101
99
004
百年孤独
102
59
005
红楼梦
102
79
006
西游记
103
69
💡 核心概念

主键(Primary Key):用户表的 user_id 是主键,唯一标识每个用户。

外键(Foreign Key):订单表的 user_id 是外键,指向用户表的主键。

关联查询:通过外键,数据库可以快速找到"订单 001 是用户 101 买的",然后去用户表查到"用户 101 是张三"。

🎯核心优势:外键消除了数据冗余。张三的地址只存一次,无论他买多少本书。如果要修改地址,只需改用户表的一行,所有订单自动关联到新地址。

3. How to Talk to a Database? SQL Basics and Practice

You can't directly "click" a database with a mouse (although GUI tools exist, they essentially convert actions into commands). You need a special language to instruct the database to work.

That language is SQL (Structured Query Language).

The good news is: SQL is very close to natural English — it reads like you're talking.

3.1 Core SQL Operations: CRUD

Most of the time, you only need to master four operations, commonly known as CRUD:

OperationEnglishSQL KeywordPlain Understanding
CreateCreateINSERTAdd a new data record
ReadReadSELECTQuery data
UpdateUpdateUPDATEModify data
DeleteDeleteDELETEDelete data

What Can You See from This Table?

These four operations cover all data processing scenarios:

  • Create: When a user registers, insert a new user record
  • Read: When a user logs in, query their username and password
  • Update: When a user edits their profile, update the data in the table
  • Delete: When a user deletes their account, remove their user data

Master these four, and you've got 80% of everyday SQL operations.

3.2 Querying Data (SELECT): The Most Common Database Operation

Querying is the most important function of a database and the key to performance optimization.

Example 1: Find all users in Beijing

sql
SELECT name, age FROM users WHERE city = 'Beijing';

Word-by-word understanding:

  • SELECT name, age: Choose the name and age columns
  • FROM users: From the users table
  • WHERE city = 'Beijing': Where city equals 'Beijing'

Return result:

nameage
Zhang San25
Wang Wu28

Example 2: Find products priced between 5000 and 15000

sql
SELECT name, price FROM products
WHERE price BETWEEN 5000 AND 15000;

Example 3: Fuzzy search (find users whose names contain "Zhang")

sql
SELECT name FROM users WHERE name LIKE '%Zhang%';

Performance Trap: Using LIKE

LIKE '%Zhang%' causes a full table scan, which is very slow with large data volumes.

Optimization advice:

  • Don't use LIKE '%Zhang%' (wildcards on both sides)
  • You can use LIKE 'Zhang%' (wildcard only on the right side)

Because LIKE 'Zhang%' can use an index, while LIKE '%Zhang%' cannot.

3.3 Inserting Data (INSERT): Adding Records

Example: Add a new user

sql
INSERT INTO users (user_id, name, age, city, email)
VALUES (1004, 'Zhao Liu', 35, 'Guangzhou', 'zhaoliu@example.com');

Word-by-word understanding:

  • INSERT INTO users: Insert into the users table
  • (user_id, name, age, city, email): Specify the columns to insert
  • VALUES (1004, 'Zhao Liu', ...): The corresponding values

Batch insert (more efficient):

sql
INSERT INTO users (name, age, city) VALUES
('Xiao Ming', 25, 'Beijing'),
('Xiao Hong', 28, 'Shanghai'),
('Xiao Gang', 30, 'Guangzhou');

3.4 Updating Data (UPDATE): Modifying Records

Example: Add 1 to the age of all users in Beijing

sql
UPDATE users SET age = age + 1 WHERE city = 'Beijing';

Very Dangerous: Don't Forget WHERE!

If you forget to write the WHERE clause, you'll modify all rows!

sql
-- Dangerous! Changes ALL users' age to 26
UPDATE users SET age = 26;

-- Correct: Only modify user with user_id = 1001
UPDATE users SET age = 26 WHERE user_id = 1001;

Real lesson: In 2012, a well-known company had an engineer forget to write WHERE, causing millions of user records to be incorrectly updated in production. The system was down for 4 hours, resulting in massive losses.

3.5 Deleting Data (DELETE): Removing Records

Example: Delete the user with user_id = 1004

sql
DELETE FROM users WHERE user_id = 1004;

Double Danger: DELETE Needs WHERE Even More!

sql
-- Dangerous! Deletes ALL data in the table!
DELETE FROM users;

-- Correct: Only delete the specified row
DELETE FROM users WHERE user_id = 1004;

Best practices:

  1. Always use SELECT to confirm data before deleting
  2. In critical systems, use "soft delete" (add an is_deleted field to mark deletion)
  3. Back up data before operations in production environments

3.6 Multi-table Queries (JOIN): The Magic Moment of Databases

Remember the "foreign keys" we discussed? The most powerful aspect of SQL is the ability to query multiple related tables at once.

Scenario: Query "all products purchased by Zhang San"

Assume we have three tables:

Users table (users):

user_idname
1001Zhang San

Products table (products):

product_idnameprice
201iPhone 155999
202MacBook14999

Orders table (orders):

order_iduser_idproduct_idquantity
500110012011
500210012022

SQL Query:

sql
SELECT u.name, p.name AS product_name, p.price, o.quantity
FROM orders o
JOIN users u ON o.user_id = u.user_id
JOIN products p ON o.product_id = p.product_id
WHERE u.name = 'Zhang San';

Return result:

nameproduct_namepricequantity
Zhang SaniPhone 1559991
Zhang SanMacBook149992

Understanding the JOIN process:

  1. FROM orders o: Start from the orders table
  2. JOIN users u ON o.user_id = u.user_id: Link to users table via user_id
  3. JOIN products p ON o.product_id = p.product_id: Link to products table via product_id
  4. WHERE u.name = 'Zhang San': Filter for Zhang San's orders
💻SQL 练习场体验 SQL 的 CRUD 操作
SQL 就像和数据库对话:你说"给我找所有年龄大于 25 的用户",数据库就会执行查询并返回结果。即使不会编程,也能很快上手。
📝 示例 SQL
SELECT name, age FROM users WHERE age > 25;
💡 逐词翻译
SELECT name, age选择 name 和 age 这两列
FROM users从 users 这张表
WHERE age > 25在 age 大于 25 的条件下
📊 返回结果
name
age
李四
30
王五
28
🎯核心概念:CRUD 涵盖了所有数据管理的基本需求。无论是淘宝、微信、抖音,它们的数据库操作本质上就是这四种:增、删、改、查。

4. Why Are Databases So Fast? Demystifying Indexes

This is the most fascinating part of databases and also the most commonly asked topic in interviews.

If you search for "everyone whose surname is Zhang" in Excel, Excel has to scan from the first row to the last. This is a full table scan — the more data, the slower it gets.

But in a database, even with 1 billion rows, lookups take only milliseconds.

The secret is: Indexes.

4.1 Intuitive Understanding: The Dictionary Revelation

Imagine you need to find a word in a 1,000-page book with no table of contents. What would you do?

You can only flip through page by page — this is a full table scan, averaging 500 pages to flip through.

But what if this book has a pinyin index?

You want to find the word "database":

  1. Turn to the index, find the section starting with "da"
  2. Within the "da" section, look for "ta"
  3. The index tells you: it's on page 256

You only needed 3 lookups to find it! This is index lookup.

A database index is like a book's table of contents:

  • Without an index: Row-by-row scan (1 billion rows = several minutes)
  • With an index: Direct jump (1 billion rows = 3 disk I/Os = milliseconds)

4.2 Full Table Scan vs. Index Lookup: Speed Comparison

Suppose we have a users table with 10 million records.

Scenario: Find the user with user_id = 5,555,555

MethodProcessRows to CheckEstimated Time
Full Table ScanStart from row 1, check one by oneAverage 5 million rows5-30 seconds
Index LookupSearch the index tree, jump directly to the target3-4 comparisons0.003 seconds

Speed difference: thousands of times!

Key Takeaway

Indexes are not a silver bullet — they have costs:

  • Storage overhead: Indexes require additional storage space
  • Slower writes: Every INSERT/UPDATE/DELETE must also update the index

When to create indexes?

  • Columns frequently used in queries (WHERE, JOIN conditions)
  • Large data volumes (indexes are unnecessary for tables with a few thousand rows or fewer)

When NOT to create indexes?

  • Rarely queried columns
  • Frequently updated columns
  • Small tables

4.3 Underlying Data Structure: B+ Trees

Real indexes aren't simple "alphabetical lists" — they're a carefully designed data structure called a B+ Tree.

What is a B+ Tree?

A B+ Tree is a "short and wide" tree data structure:

  • Short: From root to leaf is typically only 3-4 levels
  • Wide: Each node can store hundreds of key values

Why "short and wide"?

Because data is stored on disk, and every disk read (I/O) is extremely slow (thousands of times slower than memory). The B+ Tree's design goal is to minimize disk I/O operations.

  • 3-4 levels of height = at most 3-4 disk reads
  • Large amounts of data stored per level = ensures the tree doesn't grow too tall

A real example:

Suppose a B+ Tree where each node can store 1,000 key values:

  • Root node: 1,000 key values → points to 1,000 child nodes
  • Intermediate nodes: Each stores 1,000 key values → points to 1,000 leaf nodes
  • Leaf nodes: Each stores 1,000 actual data records

Total data = 1000 x 1000 x 1000 = 1 billion records

Tree height = 3 levels

This means: finding any single record among 1 billion records requires only 3 disk I/Os!

This is the secret behind lightning-fast database queries.

🌳B+ 树索引演示理解数据库如何快速查找数据
想象你要在字典里找一个字。你会先看目录,定位到首字母的区域,再在这个区域里找具体页码。B+ 树就是这样的多层目录,让数据库在 10 亿条数据中 3 次就能找到目标。
🐢全表扫描
001用户1
002用户2
003用户3
004用户4
005用户5
006用户6
007用户7
008用户8
009用户9
010用户10
011用户11
012用户12
013用户13
014用户14
015用户15
016用户16
017用户17
018用户18
019用户19
020用户20

👆 点击"开始查找"看全表扫描有多慢

索引查找
根节点
1-100
中间节点
1-10
叶子节点
1
2
3
4
5
6
7
8
9
10

👆 点击"开始查找"看索引有多快

数据量
100 万条
全表扫描
平均 50 万次比较
B+ 树索引
仅 3 次比较
速度提升
10 万倍+
💡核心原理:B+ 树通过"矮胖"的设计,让树的高度只有 3-4 层。每层可以存储成百上千个键值,所以 10 亿数据也只需要 3 次磁盘 I/O。这就是数据库查询飞快的秘密。

5. Transactions: How to Ensure Data Isn't Lost or Corrupted?

Imagine the scenario of snapping up train tickets during the Spring Festival travel rush:

  • Time T1: User A queries and finds "G1234 train still has 1 ticket remaining"
  • Time T2: User B also queries and finds "1 ticket remaining"
  • Time T3: User A clicks "buy," the system deducts inventory, ticket sold to A
  • Time T4: User B clicks "buy" — without a protection mechanism, the system would deduct inventory again, selling the same ticket to B!

This is a classic concurrency conflict problem.

5.1 What is a Transaction?

A transaction is a group of database operations that either all succeed or all fail — there's no "half-done" state.

A Real-Life Example

Bank transfer is a classic transaction:

  1. Deduct 100 yuan from Account A
  2. Add 100 yuan to Account B

If step 1 succeeds but step 2 fails (e.g., power outage), what happens?

  • Without transactions: Account A's money is gone, Account B never received it — money vanished into thin air
  • With transactions: The system detects step 2 failed and automatically rolls back step 1. Both accounts return to their original state

This is the atomicity of transactions: all or nothing.

5.2 The Four Properties of Transactions (ACID)

Transactions have four properties, abbreviated as ACID:

PropertyEnglishMeaningBank Transfer Example
AtomicityAtomicityAll or nothingDeduction and credit must both succeed; can't just deduct without crediting
ConsistencyConsistencyData always remains in a valid stateBefore and after transfer, the total amount in both accounts should remain unchanged
IsolationIsolationMultiple transactions don't affect each otherWhen A is transferring, B should see the balance "before transfer" or "after transfer," not an intermediate state
DurabilityDurabilityOnce committed, data is permanently savedAfter a successful transfer, even a power outage won't revert the account balance

What Can You See from This Table?

These four properties ensure data safety:

  • Atomicity: Prevents "half-done" operations (deducted but not credited)
  • Consistency: Prevents invalid data (total amount changed after transfer)
  • Isolation: Prevents concurrency conflicts (two people modifying the same data simultaneously)
  • Durability: Prevents data loss (committed data survives power outages)

Without these guarantees, banking systems could never function.

5.3 Transaction Isolation Levels: Balancing Safety and Performance

In theory, we want transactions to be fully isolated. But full isolation = very poor performance (because it requires extensive locking, causing other transactions to wait).

Therefore, databases provide four isolation levels:

Isolation LevelDirty ReadNon-Repeatable ReadPhantom ReadPerformanceUse Cases
Read UncommittedPossiblePossiblePossibleFastestAlmost never used (data may be incorrect)
Read CommittedNot possiblePossiblePossibleFastGeneral business (Oracle default)
Repeatable ReadNot possibleNot possiblePossibleMediumBank transfers (MySQL default)
SerializableNot possibleNot possibleNot possibleSlowestExtremely strict scenarios (rarely used)

What Do the Three "Reads" Mean?

  • Dirty Read: Reading data that another transaction hasn't committed yet (may be rolled back, so data is inaccurate)
  • Non-Repeatable Read: Within the same transaction, two reads of the same data return different results (modified by another transaction)
  • Phantom Read: Within the same transaction, two queries return different numbers of rows (another transaction inserted/deleted data)

Plain examples (checking bank balance):

  • Dirty Read: You see a balance of 1,000 yuan, but the other transaction was rolled back — the actual balance is only 100 yuan
  • Non-Repeatable Read: You check your balance the first time and see 1,000 yuan, the second time it's 800 yuan (a deduction occurred)
  • Phantom Read: You find 5 transactions the first time, 6 the second time (a new transaction was added)
🔒事务 ACID 特性演示理解事务如何保证数据安全
想象银行转账:A 转给 B 100 元。这个操作包含两步:从 A 扣 100,给 B 加 100。如果只扣了钱但没到账,就是灾难。事务保证这两步要么全成功,要么全失败
⚛️
A
原子性
Atomicity
⚖️
C
一致性
Consistency
🔒
I
隔离性
Isolation
💾
D
持久性
Durability
👆 点击上方任意特性,查看详细解释
🎯 12306 抢票场景

场景:用户 A 和 B 同时看到还剩 1 张票,同时点击购买。

没有事务:A 扣库存,B 也扣库存,同一张票卖给了两个人!

有事务(隔离性):A 的操作加锁,B 必须等待。A 买完后,库存变为 0,B 看到的是"已售罄"。

💡核心思想:ACID 四个特性共同保证了数据在高并发环境下的不丢、不乱、不冲突。这就是为什么所有涉及资金、订单的系统都必须使用数据库事务。

6. Performance Optimization: Practical Tips to Make Queries 1000x Faster

Now you understand core concepts like indexes and transactions. But in real projects, you may encounter various performance issues.

This section provides actionable optimization strategies.

6.1 Index Pitfall Guide

Common Mistakes: Index Invalidation

Often, you've created an index, but queries are still slow — because the index became invalid.

Common causes of index invalidation:

  1. Using functions on indexed columns
  2. Implicit type conversion
  3. LIKE queries starting with %
  4. OR conditions (in some cases)
  5. Composite indexes not satisfying the leftmost prefix rule

Pitfall 1: Using functions on indexed columns

sql
-- Wrong: Using a function on the indexed column, cannot use index
SELECT * FROM users WHERE YEAR(created_at) = 2024;

-- Correct: Rewrite as a range query, can use index
SELECT * FROM users
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';

Pitfall 2: Implicit type conversion

sql
-- Assume user_id is int type
-- Wrong: Passing a string causes implicit conversion, cannot use index
SELECT * FROM users WHERE user_id = '123';

-- Correct: Pass the matching type
SELECT * FROM users WHERE user_id = 123;

Pitfall 3: LIKE starting with %

sql
-- Wrong: Starting with %, cannot use index
SELECT * FROM users WHERE name LIKE '%Zhang San%';

-- Correct: Starting with a fixed prefix, can use index
SELECT * FROM users WHERE name LIKE 'Zhang San%';

-- Or use full-text index (suitable for text search)
SELECT * FROM users WHERE MATCH(name) AGAINST('Zhang San');

6.2 SQL Optimization Practical Templates

Template 1: Pagination Optimization (Deep Pagination Problem)

View the Problem and Solution
sql
-- Problem: When OFFSET is large, queries get progressively slower
SELECT * FROM orders
ORDER BY created_at DESC
LIMIT 10 OFFSET 1000000;

-- Optimization 1: Use the last query's timestamp as a cursor
SELECT * FROM orders
WHERE created_at < '2024-01-15 12:00:00'
ORDER BY created_at DESC
LIMIT 10;

-- Optimization 2: Use primary key range query
SELECT * FROM orders
WHERE order_id > 1000000
ORDER BY order_id
LIMIT 10;

Template 2: Batch Insert Optimization

sql
-- Inefficient: Multiple single inserts (multiple network round trips)
INSERT INTO users (name, age) VALUES ('Zhang San', 25);
INSERT INTO users (name, age) VALUES ('Li Si', 30);
INSERT INTO users (name, age) VALUES ('Wang Wu', 28);

-- Efficient: Single SQL batch insert (only one network round trip)
INSERT INTO users (name, age) VALUES
('Zhang San', 25),
('Li Si', 30),
('Wang Wu', 28);

**Template 3: Avoid SELECT ***

sql
-- Inefficient: Returns all columns (including unnecessary large fields)
SELECT * FROM users WHERE user_id = 1;

-- Efficient: Only return needed columns
SELECT user_id, name, email FROM users WHERE user_id = 1;

6.3 Strategies for High-Concurrency Scenarios

ScenarioProblemSolution
Hot DataA single row is frequently read/written, causing lock contentionUse caching (Redis) + read-write separation
Flash SalesInstantaneous high-concurrency inventory deductionOptimistic locking + inventory pre-warming + message queue peak shaving
Slow QueriesComplex queries overwhelm the databaseIndex optimization + query splitting + read-write separation
Connection ExhaustionToo many concurrent requests deplete the connection poolConnection pool optimization + rate limiting + service degradation

Key Takeaway

Basic principles of performance optimization:

  1. Measure first, optimize later: Use EXPLAIN to analyze query plans and find the real bottleneck
  2. Indexes first: 80% of performance issues can be solved by optimizing indexes
  3. Reduce database pressure: Use caching where possible, make things async where possible
  4. Divide and conquer: Split large tables into small tables, split large queries into small queries
查询优化演示常见错误与正确做法对比
很多时候,查询慢不是因为数据库性能差,而是因为 SQL 写错了。下面这些错误,你可能每天都在犯。
1
在索引列上使用函数
SELECT * FROM users WHERE YEAR(created_at) = 2024;
⚠️ 索引失效,全表扫描
SELECT * FROM users WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';
💡 可以使用索引,查询速度提升 1000 倍
原理:当对列使用函数时,数据库必须先计算每一行的函数值,无法使用索引。把函数移到等号右边,或用范围查询代替。
2
隐式类型转换
3
LIKE 以 % 开头
4
SELECT * 返回所有列
📝 优化建议清单
为 WHERE、JOIN、ORDER BY 的列创建索引
避免在索引列上使用函数或表达式
用 EXPLAIN 分析查询执行计划
只查询需要的列,避免 SELECT *
批量操作代替单条操作
考虑使用覆盖索引减少回表
🎯核心原则:不要让数据库做"多余的工作"。索引失效、全表扫描、返回不必要的数据,这些都是最常见的性能杀手。写出高效 SQL 的关键,是理解数据库如何执行你的查询

7. Summary and Learning Path

Let's review the core concepts of databases with a table:

ConceptOne-Sentence ExplanationProblem SolvedKey Points
Tables, Rows, ColumnsHow data is organizedHow to store structured dataTable = Excel worksheet, Row = record, Column = field
Primary KeyUnique identifier for each rowHow to precisely find one row of dataUnique, non-null, immutable
Foreign KeyBridge connecting tablesHow to link data across different tablesPoints to another table's primary key
SQLLanguage for talking to databasesHow to CRUD dataSELECT, INSERT, UPDATE, DELETE
IndexData structure that speeds up queriesHow to quickly find dataB+ Tree, reduces disk I/O
TransactionMechanism that ensures data safetyHow to prevent concurrency conflicts and data lossACID: Atomicity, Consistency, Isolation, Durability

Final Words

Databases are a vast and deep subject — this article is just an introduction. If you want to continue learning in depth, we recommend following this path:

Next steps:

  1. Hands-on practice: Install MySQL or PostgreSQL, create tables, insert data, write SQL queries
  2. ORM frameworks: Learn how to use databases in code (e.g., SQLAlchemy, Prisma, TypeORM)
  3. Index optimization: Dive deeper into composite indexes, covering indexes, index condition pushdown, and other advanced topics
  4. Transaction principles: Learn about MVCC (Multi-Version Concurrency Control), lock mechanisms, isolation level implementations
  5. Distributed databases: Learn about table sharding, read-write separation, master-slave replication architectures

Remember: Theory + Practice = True Mastery.