Data Structures

Preface

Programs = Data Structures + Algorithms. Previously, we learned how the CPU executes instructions and how the operating system manages resources. But the core objects that programs handle are data — user information, product lists, social relationships... How this data is organized in memory directly determines whether a program is fast or slow. You may have wondered: why do some programs process tens of thousands of records quickly while others freeze with just a few hundred? The answer often lies in the choice of data structures.

What will you learn from this article?

After completing this chapter, you will gain:

Intuitive judgment: When you see a requirement, the right data structure automatically comes to mind
Performance analysis perspective: Determine whether a performance bottleneck is due to wrong data structure choice or inefficient algorithms
Trade-off thinking: Understand "trading space for time" vs "trading time for space" — know that there's no perfect data structure
Code reading ability: Terms like HashMap, Stack, Queue will no longer be foreign
Foundation for further learning: Build a foundation for database indexes, caching systems, search engines, and other technologies

Chapter	Content	Core Concepts
Chapter 1	Big Picture	Four major data structure categories, classification criteria
Chapter 2	Linear Structures	Arrays, linked lists, stacks, queues
Chapter 3	Hash Tables	Hash functions, collision handling, O(1) lookup
Chapter 4	Tree Structures	Binary trees, file system trees, DOM trees
Chapter 5	Graph Structures	Directed graphs, undirected graphs, traversal algorithms
Chapter 6	Performance Comparison	Time complexity, space complexity
Chapter 7	Selection Guide	Scenario analysis, decision flow

1. Big Picture: What Are Data Structures?

Imagine you need to organize a pile of books:

Piled on the floor: To find a book, you have to flip through one by one — this is the most primitive storage
Arranged by number on shelves: Go directly to the corresponding position to grab it — this is an array
Sorted by category into cabinets: First determine the cabinet, then find the book — this is a hash table
Sorted alphabetically on multi-level shelves: Eliminate half each time — this is a tree

Different organizational methods result in vastly different book-finding efficiency. A data structure is the "organization method" for data — it determines how data is stored, found, and modified.

Data Structure OverviewChoose a data organization model for each scenario

Data structures are like ways to organize a room: clothes in a wardrobe, books on shelves, and small items in drawers.

📚

Linear structures

Data arranged in order, like a row of books

ArrayLinked listStackQueue

🗂️

Hash structures

Fast lookup through a key

Hash tableDictionarySet

🌳

Tree structures

Hierarchy, like a family tree

Binary treeB-treeHeap

🕸️

Graph structures

Complex relationship networks

Directed graphUndirected graphNetwork graph

📚Linear structures

Features

✓One-to-one relationship between elements

✓A clear before-and-after order

✓Can use contiguous storage or linked storage

Use Cases

📝

Arrays: list data

Store ordered data such as student scores or product prices

🔄

Stacks: undo operations

Text editor undo history, last in first out

🎫

Queues: task scheduling

Print queues and task queues, first in first out

Operation Complexity

OperationAverage time

Access elementO(1)

Insert/deleteO(n)

Everyday Analogy

Like train cars connected in order

To find car 5, count directly to it; to insert a new car, you need to break and reconnect links.

All data structures can be categorized into four major types:

Type	Data Relationship	Typical Examples	Real-life Analogy
Linear	One-to-one, arranged in a line	Arrays, linked lists, stacks, queues	Train cars, checkout lines
Hash	Key→Value mapping	Hash tables, dictionaries, sets	Library index cards
Tree	One-to-many, hierarchical	Binary trees, B-trees, heaps	Family trees, folder structures
Graph	Many-to-many, networked	Directed graphs, undirected graphs	Subway maps, social networks

Why Learn So Many Types?

Because there is no universal data structure. Each one is a trade-off between "lookup speed," "insertion speed," and "memory usage." Just as you wouldn't use a backpack to move furniture or a truck to deliver a single letter — choosing the right tool makes all the difference.

2. Linear Structures: The Most Basic Organization

Linear structures are the most intuitive way to organize data — data items are arranged one after another, like train cars. But different "connection methods" and "operation endpoints" produce four variants, each with its own strengths.

Four Forms of Linear StructuresHow arrays, linked lists, stacks, and queues differ

ArrayContiguous memory, indexed access

✓ Contiguous memory | ✓ Fast access (O(1)) | ✗ Slow insert/delete (O(n))

Operation Comparison

Data structure	Access	Insert	Delete	Feature
📊 Array	O(1) fast	O(n) slow	O(n) slow	Fixed size
🔗 Linked list	O(n) slow	O(1) fast	O(1) fast	Variable size
🥞 Stack	O(n)	O(1) fast	O(1) fast	One-end operations
🚶 Queue	O(n)	O(1) fast	O(1) fast	Two-end operations

Real-World Uses

📋

List data

Student scores or product price lists

🖼️

Image processing

Pixel matrix storage

📈

Charts

Data ordered by time

2.1 Arrays vs Linked Lists: Two Fundamentally Different Storage Methods

Arrays and linked lists are the two most basic linear structures. Their core difference lies in memory layout:

Comparison	Array	Linked List
Memory layout	One continuous block	Scattered, connected by pointers
Access nth element	Calculate address directly, O(1)	Search from the head one by one, O(n)
Insert in the middle	Must shift all subsequent elements, O(n)	Just change two pointers, O(1)
Size	Fixed at creation	Can grow at any time
Real-life analogy	A row of numbered lockers	A chain of treasure hunt clues

When to Use Arrays? When to Use Linked Lists?

Known data volume, frequent access by position → Array (e.g., student grade tables, pixel matrices)
Unknown data volume, frequent insertion/deletion → Linked list (e.g., playlists, undo history)
Not sure? → Start with an array. In most scenarios, arrays' cache-friendly nature provides greater performance advantages

2.2 Stacks and Queues: Linear Structures with "Rules"

Stacks and queues are essentially arrays or linked lists, just with restricted operation methods. It may seem like reduced functionality, but this restriction gives them specific purposes:

Structure	Rule	Operations	Analogy	Where in Your Code?
Stack	Last In, First Out (LIFO)	push / pop	A stack of plates	Function call stack, browser back button, Ctrl+Z undo
Queue	First In, First Out (FIFO)	enqueue / dequeue	Waiting in line for tickets	Task scheduling, message queues, print queues

Why Is "Restriction" Actually a Good Thing?

Imagine a stack with only two operations — "place plate" and "remove plate." You'll never get the order wrong. Restriction brings certainty, and certainty brings reliability. The function call stack relies on "last in, first out" to ensure the most recently called function returns first. If random access to intermediate functions were allowed, programs would be chaotic.

3. Hash Tables: The Fastest Lookup

Linear structures aren't fast enough for lookups — arrays require O(n) traversal, and even sorted binary search is O(log n). Is there a structure that can achieve O(1) direct lookup? Yes — the hash table.

Hash Tables: Super-Fast LookupFind data directly through a key

📚

A hash table is like a library index card: instead of searching shelf by shelf, you use the index to find the book location directly.

Store data

Hashing process

Input key

apple

↓

Hash function

hash(key) % 10

↓

Array index

Hash table

apple: apple

Empty

orange: orange

Empty

banana: banana

Performance Comparison

Hash table lookup

O(1)

Found instantly

Array lookup

O(n)

Requires traversal

Binary search

O(log n)

Requires sorted data

Common Uses

👤

User table (user ID → profile)

🛒

Shopping cart (product ID → quantity)

📝

Cache system (URL → page content)

🔍

Dictionary (word → definition)

3.1 The Core Idea of Hash Tables

The principle of hash tables is actually quite simple:

You provide a key (e.g., "apple")
A hash function computes a number from the key (e.g., hash("apple") = 3)
Go directly to position 3 in the array — no traversal needed, one step and you're there

This is like a library's index system: instead of searching shelf by shelf, you check the index card to find the book's exact location.

3.2 Hash Collisions: What When Two Keys Collide?

Two different keys may compute the same index — this is called a hash collision. Like two books having the same index number pointing to the same location.

Resolution Method	Principle	Analogy
Chaining	Store multiple values at the same position using a linked list	Put multiple books in the same cabinet
Open addressing	If there's a collision, look for the next empty slot	If the cabinet is full, use the adjacent one

3.3 Hash Table Performance

Operation	Average Case	Worst Case (All Collisions)
Lookup	O(1)	O(n)
Insert	O(1)	O(n)
Delete	O(1)	O(n)

When Does It Degrade?

When all keys map to the same index, the hash table degrades into a linked list and all operations become O(n). Prevention: choose a good hash function + dynamic resizing (expand when the load factor exceeds a threshold).

Hash Tables Are Everywhere in Your Code

JavaScript {} objects and Map → Hash table
Python dict → Hash table
Java HashMap → Hash table
Database indexes → Also use hashing at the底层

Every time you write user["name"] or map.get("key"), a hash table is working behind the scenes.

4. Tree Structures: Expressing Hierarchical Relationships

Hash tables are fast for lookups, but data is unordered. If you need both fast lookup and ordered data, you need tree structures.

The core characteristic of a tree: each node can have multiple "children" but only one "parent" (except the root node). This one-to-many hierarchical relationship is everywhere in the real world.

Tree Structures: Representing HierarchiesAn organization model like a family tree

Choose a tree type:

Tree Structure Features

🌲

Hierarchy

Nodes have one-to-many parent-child relationships

🎯

Single root node

Except for the root, each node has exactly one parent

🔍

Efficient lookup

Binary search tree lookup is O(log n)

🔄

Multiple traversals

Preorder, inorder, postorder, and level-order traversal

Use Cases

📁

File systems

Hierarchical organization of folders and files

🌐

HTML DOM

Nested structure of web page elements

🏢

Organization charts

Company management hierarchy

🌲

Decision trees

Classification algorithms in machine learning

4.1 Binary Search Trees: Ordered Trees

A binary search tree has one simple but powerful rule: left is smaller, right is larger.

All values in the left subtree < root node
All values in the right subtree > root node

When searching, each comparison eliminates half the nodes, with time complexity O(log n). Like the number guessing game — "Is it bigger or smaller than 50?" "Bigger." "Bigger or smaller than 75?" — eliminating half each time.

4.2 Balanced Trees: Preventing Degradation

Binary search trees have a problem: if data is inserted in order (1, 2, 3, 4, 5), the tree degenerates into a linked list and lookups return to O(n). Balanced trees avoid this by automatically adjusting the structure:

Type	Balancing Strategy	Characteristics	Typical Applications
AVL Tree	Strict balance (height difference ≤ 1)	Fastest lookups, slightly slower insertions/deletions	Scenarios requiring frequent lookups
Red-Black Tree	Approximate balance	Good overall performance	Java TreeMap, Linux kernel
B-Tree	Multi-way balance; one node stores multiple values	Reduces disk I/O	Database indexes

Where Are Trees in Your Code?

File system: Nested folders are tree structures
HTML DOM: <html> → <body> → <div> → <p> is a tree
Database indexes: B+ trees enable lookups across millions of records with only 3-4 disk reads
JSON/XML: Nested data formats are essentially trees

5. Graph Structures: Networks of Complex Relationships

Trees can only represent "one-to-many" hierarchical relationships. But many real-world relationships are "many-to-many" — your friends also have friends, and there are multiple routes between cities. A structure where any node can potentially connect to any other node is a graph.

Graph Structures: Representing Complex RelationshipsA network of nodes and edges

Graph properties

Vertices (V)

Edges (E)

Degree

2.4

Use Cases

🗺️Map navigation (shortest path)

👥Social networks (friend relationships)

🌐Web links (PageRank)

🔗Dependencies (package management)

5.1 Three Types of Graphs

Type	Characteristics	Analogy	Typical Applications
Undirected graph	Edges have no direction; A→B equals B→A	WeChat friends (mutual)	Social networks, communication networks
Directed graph	Edges have direction; A→B is not the same as B→A	Weibo follows (one-way)	Web page links, dependency relationships
Weighted graph	Edges have weights (distance, cost, etc.)	Highways between cities (with mileages)	Map navigation, shortest path

5.2 Graph Traversal

Graph traversal is more complex than linear structures because there may be cycles (A→B→C→A), requiring tracking of "visited" nodes:

Traversal Method	Strategy	Analogy	Use Cases
BFS (Breadth-First)	Visit all neighbors first, then neighbors' neighbors	Ripples spreading in water	Shortest path, level-order traversal
DFS (Depth-First)	Go as deep as possible on one path, backtrack when stuck	Navigating a maze	Path search, connectivity checking

Graphs in the Real World

Map navigation: Cities are nodes, roads are edges; navigation finds the shortest path in the graph
Social networks: Users are nodes, follows/friendships are edges; "People you may know" is graph algorithm recommendations
Package managers: npm/pip dependency relationships are directed graphs; npm install performs topological sorting of the graph

6. Performance Comparison: One Table to See All Data Structures

After learning so many data structures, how do their performances actually compare? The interactive comparison below will help you build intuition:

Data Structures: Containers for DataChoose different storage models for different scenarios

ArrayContiguous memory and fast indexed access

010

120

230

340

450

560

670

780

Access arr[2] = O(1), insert/delete = O(n)

Time Complexity Comparison

Operation	Array	Linked list	Hash table	Tree
Access	O(1)	O(n)	O(1)	O(log n)
Search	O(n)	O(n)	O(1)	O(log n)
Insert	O(n)	O(1)	O(1)	O(log n)
Delete	O(n)	O(1)	O(1)	O(log n)

Core idea:Data structures are containers for data. Different containers have different tradeoffs. Choosing the right data structure can improve program efficiency by orders of magnitude.

Core Performance Comparison Table:

Data Structure	Access	Search	Insert	Delete	Space
Array	O(1)	O(n)	O(n)	O(n)	O(n)
Linked List	O(n)	O(n)	O(1)	O(1)	O(n)
Stack/Queue	O(n)	O(n)	O(1)	O(1)	O(n)
Hash Table	—	O(1)	O(1)	O(1)	O(n)
Binary Search Tree	—	O(log n)	O(log n)	O(log n)	O(n)
Graph	—	O(V+E)	O(1)	O(E)	O(V+E)

How to Read This Table

O(1): Regardless of data volume, operation time is constant — fastest
O(log n): Doubling the data adds only one step — very fast
O(n): Doubling the data doubles the time — average
O(V+E): Depends on the number of vertices and edges — specific to graphs

Note: These are all average cases. In the worst case, hash tables degrade to O(n), and binary search trees also degrade to O(n).

7. Selection Guide: Which Data Structure to Use?

After learning so many data structures, how do you choose when facing actual requirements? The key is to start from the requirements and ask yourself a few questions:

What's the most frequent operation? Lookup? Insertion? Deletion? Traversal?
What's the relationship between data items? One-to-one? One-to-many? Many-to-many?
How much data? The optimal choice for dozens vs. millions of records may be completely different
Does order matter? Do you need to traverse data in a specific order?

How Do You Choose the Right Data Structure?Make the best choice based on scenario needs

What is your use case?

🔍

Fast lookup

Find matching data quickly by key

📊

Preserve order

Data must stay in insertion order or a specific order

🥞

Last in, first out

The most recent item is processed first

🚶

First in, first out

Earlier items are processed first

🌳

Hierarchy

Data has parent-child relationships

🕸️

Complex relationships

Data has many-to-many connections

Quick Reference

Scenario need	Recommended structure	Time complexity
Random access	Array	O(1)
Fast lookup	Hash table	O(1)
Ordered lookup	Binary search tree	O(log n)
Frequent insertion/deletion	Linked list	O(1)
Undo operations	Stack	O(1)
Task scheduling	Queue	O(1)

Decision Flow

❓

Need fast element access?

Yes

Array / Hash table

Need frequent insertion and deletion?

Yes

Linked list

Need to preserve order?

Yes

Stack / Queue

Tree / Graph

Quick Decision Flow:

Your Need	Recommended Structure	Reason
Fast access by position	Array	O(1) random access
Frequent insertion/deletion in the middle	Linked list	O(1) insert/delete without moving elements
Last in, first out (undo, recursion)	Stack	LIFO semantics naturally match
First in, first out (task queue)	Queue	FIFO semantics naturally match
Fast lookup by key	Hash table	O(1) average lookup
Ordered data + fast lookup	Binary search tree	O(log n) lookup while maintaining order
Complex many-to-many relationships	Graph	Can express connections between any nodes

Rules of Thumb in Practice

80% of scenarios are fine with arrays and hash tables
Consider trees when you need ordering
Consider graphs when relationships are complex
Not sure? Start with the simplest and switch when you hit performance issues. Premature optimization is the root of all evil

Summary

Data structures are the skeleton of programs. Arrays are like a row of numbered lockers — fastest for retrieving items by position; linked lists are like a chain of treasure hunt clues — most flexible for insertions and deletions; hash tables are like a library index — fastest for finding things by name; trees are like family trees — expressing hierarchical relationships while maintaining order; graphs are like subway maps — expressing arbitrarily complex networked relationships. There's no best data structure, only the most appropriate one — the key is understanding the strengths and costs of each structure and making trade-offs based on actual requirements.

Next Steps

Now that you've mastered the core knowledge of data structures, you can continue learning:

Algorithmic Thinking: Learn to solve problems using sorting, searching, recursion, dynamic programming, and other algorithmic paradigms
Programming Languages: Understand how different programming languages implement these data structures

Topic	Recommended Resources
Data structure visualization	VisuAlgo - Animated demonstrations of various data structures and algorithms
Algorithms and data structures	Grokking Algorithms by Aditya Bhargava — illustrated and beginner-friendly
In-depth understanding	Data Structures and Algorithm Analysis by Mark Allen Weiss
Practice problems	LeetCode - Practice categorized by data structure

Data Structures ​

1. Big Picture: What Are Data Structures? ​

2. Linear Structures: The Most Basic Organization ​

2.1 Arrays vs Linked Lists: Two Fundamentally Different Storage Methods ​

2.2 Stacks and Queues: Linear Structures with "Rules" ​

3. Hash Tables: The Fastest Lookup ​

3.1 The Core Idea of Hash Tables ​

3.2 Hash Collisions: What When Two Keys Collide? ​

3.3 Hash Table Performance ​

4. Tree Structures: Expressing Hierarchical Relationships ​

4.1 Binary Search Trees: Ordered Trees ​

4.2 Balanced Trees: Preventing Degradation ​

5. Graph Structures: Networks of Complex Relationships ​

5.1 Three Types of Graphs ​

5.2 Graph Traversal ​

6. Performance Comparison: One Table to See All Data Structures ​

7. Selection Guide: Which Data Structure to Use? ​

Summary ​

Further Reading ​

Next Steps ​

Data Structures

1. Big Picture: What Are Data Structures?

2. Linear Structures: The Most Basic Organization

2.1 Arrays vs Linked Lists: Two Fundamentally Different Storage Methods

2.2 Stacks and Queues: Linear Structures with "Rules"

3. Hash Tables: The Fastest Lookup

3.1 The Core Idea of Hash Tables

3.2 Hash Collisions: What When Two Keys Collide?

3.3 Hash Table Performance

4. Tree Structures: Expressing Hierarchical Relationships

4.1 Binary Search Trees: Ordered Trees

4.2 Balanced Trees: Preventing Degradation

5. Graph Structures: Networks of Complex Relationships

5.1 Three Types of Graphs

5.2 Graph Traversal

6. Performance Comparison: One Table to See All Data Structures

7. Selection Guide: Which Data Structure to Use?

Summary

Further Reading

Next Steps