Skip to content

Data Structures

Preface

Programs = Data Structures + Algorithms. Previously, we learned how the CPU executes instructions and how the operating system manages resources. But the core objects that programs handle are data — user information, product lists, social relationships... How this data is organized in memory directly determines whether a program is fast or slow. You may have wondered: why do some programs process tens of thousands of records quickly while others freeze with just a few hundred? The answer often lies in the choice of data structures.

What will you learn from this article?

After completing this chapter, you will gain:

  • Intuitive judgment: When you see a requirement, the right data structure automatically comes to mind
  • Performance analysis perspective: Determine whether a performance bottleneck is due to wrong data structure choice or inefficient algorithms
  • Trade-off thinking: Understand "trading space for time" vs "trading time for space" — know that there's no perfect data structure
  • Code reading ability: Terms like HashMap, Stack, Queue will no longer be foreign
  • Foundation for further learning: Build a foundation for database indexes, caching systems, search engines, and other technologies
ChapterContentCore Concepts
Chapter 1Big PictureFour major data structure categories, classification criteria
Chapter 2Linear StructuresArrays, linked lists, stacks, queues
Chapter 3Hash TablesHash functions, collision handling, O(1) lookup
Chapter 4Tree StructuresBinary trees, file system trees, DOM trees
Chapter 5Graph StructuresDirected graphs, undirected graphs, traversal algorithms
Chapter 6Performance ComparisonTime complexity, space complexity
Chapter 7Selection GuideScenario analysis, decision flow

1. Big Picture: What Are Data Structures?

Imagine you need to organize a pile of books:

  • Piled on the floor: To find a book, you have to flip through one by one — this is the most primitive storage
  • Arranged by number on shelves: Go directly to the corresponding position to grab it — this is an array
  • Sorted by category into cabinets: First determine the cabinet, then find the book — this is a hash table
  • Sorted alphabetically on multi-level shelves: Eliminate half each time — this is a tree

Different organizational methods result in vastly different book-finding efficiency. A data structure is the "organization method" for data — it determines how data is stored, found, and modified.

Data Structure OverviewChoose a data organization model for each scenario
Data structures are like ways to organize a room: clothes in a wardrobe, books on shelves, and small items in drawers.
📚
Linear structures
Data arranged in order, like a row of books
ArrayLinked listStackQueue
🗂️
Hash structures
Fast lookup through a key
Hash tableDictionarySet
🌳
Tree structures
Hierarchy, like a family tree
Binary treeB-treeHeap
🕸️
Graph structures
Complex relationship networks
Directed graphUndirected graphNetwork graph
📚Linear structures
Features
One-to-one relationship between elements
A clear before-and-after order
Can use contiguous storage or linked storage
Use Cases
📝
Arrays: list data
Store ordered data such as student scores or product prices
🔄
Stacks: undo operations
Text editor undo history, last in first out
🎫
Queues: task scheduling
Print queues and task queues, first in first out
Operation Complexity
OperationAverage time
Access elementO(1)
Insert/deleteO(n)
Everyday Analogy
Like train cars connected in order
To find car 5, count directly to it; to insert a new car, you need to break and reconnect links.

All data structures can be categorized into four major types:

TypeData RelationshipTypical ExamplesReal-life Analogy
LinearOne-to-one, arranged in a lineArrays, linked lists, stacks, queuesTrain cars, checkout lines
HashKey→Value mappingHash tables, dictionaries, setsLibrary index cards
TreeOne-to-many, hierarchicalBinary trees, B-trees, heapsFamily trees, folder structures
GraphMany-to-many, networkedDirected graphs, undirected graphsSubway maps, social networks

Why Learn So Many Types?

Because there is no universal data structure. Each one is a trade-off between "lookup speed," "insertion speed," and "memory usage." Just as you wouldn't use a backpack to move furniture or a truck to deliver a single letter — choosing the right tool makes all the difference.


2. Linear Structures: The Most Basic Organization

Linear structures are the most intuitive way to organize data — data items are arranged one after another, like train cars. But different "connection methods" and "operation endpoints" produce four variants, each with its own strengths.

Four Forms of Linear StructuresHow arrays, linked lists, stacks, and queues differ
ArrayContiguous memory, indexed access
0
10
1
25
2
33
3
47
4
59
5
62
✓ Contiguous memory | ✓ Fast access (O(1)) | ✗ Slow insert/delete (O(n))
Operation Comparison
Data structureAccessInsertDeleteFeature
📊 ArrayO(1) fastO(n) slowO(n) slowFixed size
🔗 Linked listO(n) slowO(1) fastO(1) fastVariable size
🥞 StackO(n)O(1) fastO(1) fastOne-end operations
🚶 QueueO(n)O(1) fastO(1) fastTwo-end operations
Real-World Uses
📋
List data
Student scores or product price lists
🖼️
Image processing
Pixel matrix storage
📈
Charts
Data ordered by time

2.1 Arrays vs Linked Lists: Two Fundamentally Different Storage Methods

Arrays and linked lists are the two most basic linear structures. Their core difference lies in memory layout:

ComparisonArrayLinked List
Memory layoutOne continuous blockScattered, connected by pointers
Access nth elementCalculate address directly, O(1)Search from the head one by one, O(n)
Insert in the middleMust shift all subsequent elements, O(n)Just change two pointers, O(1)
SizeFixed at creationCan grow at any time
Real-life analogyA row of numbered lockersA chain of treasure hunt clues

When to Use Arrays? When to Use Linked Lists?

  • Known data volume, frequent access by position → Array (e.g., student grade tables, pixel matrices)
  • Unknown data volume, frequent insertion/deletion → Linked list (e.g., playlists, undo history)
  • Not sure? → Start with an array. In most scenarios, arrays' cache-friendly nature provides greater performance advantages

2.2 Stacks and Queues: Linear Structures with "Rules"

Stacks and queues are essentially arrays or linked lists, just with restricted operation methods. It may seem like reduced functionality, but this restriction gives them specific purposes:

StructureRuleOperationsAnalogyWhere in Your Code?
StackLast In, First Out (LIFO)push / popA stack of platesFunction call stack, browser back button, Ctrl+Z undo
QueueFirst In, First Out (FIFO)enqueue / dequeueWaiting in line for ticketsTask scheduling, message queues, print queues

Why Is "Restriction" Actually a Good Thing?

Imagine a stack with only two operations — "place plate" and "remove plate." You'll never get the order wrong. Restriction brings certainty, and certainty brings reliability. The function call stack relies on "last in, first out" to ensure the most recently called function returns first. If random access to intermediate functions were allowed, programs would be chaotic.


3. Hash Tables: The Fastest Lookup

Linear structures aren't fast enough for lookups — arrays require O(n) traversal, and even sorted binary search is O(log n). Is there a structure that can achieve O(1) direct lookup? Yes — the hash table.

Hash Tables: Super-Fast LookupFind data directly through a key
📚
A hash table is like a library index card: instead of searching shelf by shelf, you use the index to find the book location directly.
Store data
Hashing process
Input key
apple
Hash function
hash(key) % 10
Array index
0
Hash table
0
apple: apple
1
Empty
2
Empty
3
Empty
4
Empty
5
Empty
6
orange: orange
7
Empty
8
Empty
9
banana: banana
Performance Comparison
Hash table lookup
O(1)
Found instantly
Array lookup
O(n)
Requires traversal
Binary search
O(log n)
Requires sorted data
Common Uses
👤
User table (user ID → profile)
🛒
Shopping cart (product ID → quantity)
📝
Cache system (URL → page content)
🔍
Dictionary (word → definition)

3.1 The Core Idea of Hash Tables

The principle of hash tables is actually quite simple:

  1. You provide a key (e.g., "apple")
  2. A hash function computes a number from the key (e.g., hash("apple") = 3)
  3. Go directly to position 3 in the array — no traversal needed, one step and you're there

This is like a library's index system: instead of searching shelf by shelf, you check the index card to find the book's exact location.

3.2 Hash Collisions: What When Two Keys Collide?

Two different keys may compute the same index — this is called a hash collision. Like two books having the same index number pointing to the same location.

Resolution MethodPrincipleAnalogy
ChainingStore multiple values at the same position using a linked listPut multiple books in the same cabinet
Open addressingIf there's a collision, look for the next empty slotIf the cabinet is full, use the adjacent one

3.3 Hash Table Performance

OperationAverage CaseWorst Case (All Collisions)
LookupO(1)O(n)
InsertO(1)O(n)
DeleteO(1)O(n)

When Does It Degrade?

When all keys map to the same index, the hash table degrades into a linked list and all operations become O(n). Prevention: choose a good hash function + dynamic resizing (expand when the load factor exceeds a threshold).

Hash Tables Are Everywhere in Your Code

  • JavaScript {} objects and Map → Hash table
  • Python dict → Hash table
  • Java HashMap → Hash table
  • Database indexes → Also use hashing at the底层

Every time you write user["name"] or map.get("key"), a hash table is working behind the scenes.


4. Tree Structures: Expressing Hierarchical Relationships

Hash tables are fast for lookups, but data is unordered. If you need both fast lookup and ordered data, you need tree structures.

The core characteristic of a tree: each node can have multiple "children" but only one "parent" (except the root node). This one-to-many hierarchical relationship is everywhere in the real world.

Tree Structures: Representing HierarchiesAn organization model like a family tree
Choose a tree type:
50307020406080
Tree Structure Features
🌲
Hierarchy
Nodes have one-to-many parent-child relationships
🎯
Single root node
Except for the root, each node has exactly one parent
🔍
Efficient lookup
Binary search tree lookup is O(log n)
🔄
Multiple traversals
Preorder, inorder, postorder, and level-order traversal
Use Cases
📁
File systems
Hierarchical organization of folders and files
🌐
HTML DOM
Nested structure of web page elements
🏢
Organization charts
Company management hierarchy
🌲
Decision trees
Classification algorithms in machine learning

4.1 Binary Search Trees: Ordered Trees

A binary search tree has one simple but powerful rule: left is smaller, right is larger.

  • All values in the left subtree < root node
  • All values in the right subtree > root node

When searching, each comparison eliminates half the nodes, with time complexity O(log n). Like the number guessing game — "Is it bigger or smaller than 50?" "Bigger." "Bigger or smaller than 75?" — eliminating half each time.

4.2 Balanced Trees: Preventing Degradation

Binary search trees have a problem: if data is inserted in order (1, 2, 3, 4, 5), the tree degenerates into a linked list and lookups return to O(n). Balanced trees avoid this by automatically adjusting the structure:

TypeBalancing StrategyCharacteristicsTypical Applications
AVL TreeStrict balance (height difference ≤ 1)Fastest lookups, slightly slower insertions/deletionsScenarios requiring frequent lookups
Red-Black TreeApproximate balanceGood overall performanceJava TreeMap, Linux kernel
B-TreeMulti-way balance; one node stores multiple valuesReduces disk I/ODatabase indexes

Where Are Trees in Your Code?

  • File system: Nested folders are tree structures
  • HTML DOM: <html><body><div><p> is a tree
  • Database indexes: B+ trees enable lookups across millions of records with only 3-4 disk reads
  • JSON/XML: Nested data formats are essentially trees

5. Graph Structures: Networks of Complex Relationships

Trees can only represent "one-to-many" hierarchical relationships. But many real-world relationships are "many-to-many" — your friends also have friends, and there are multiple routes between cities. A structure where any node can potentially connect to any other node is a graph.

Graph Structures: Representing Complex RelationshipsA network of nodes and edges
ABCDE
Graph properties
Vertices (V)
5
Edges (E)
6
Degree
2.4
Use Cases
🗺️Map navigation (shortest path)
👥Social networks (friend relationships)
🌐Web links (PageRank)
🔗Dependencies (package management)

5.1 Three Types of Graphs

TypeCharacteristicsAnalogyTypical Applications
Undirected graphEdges have no direction; A→B equals B→AWeChat friends (mutual)Social networks, communication networks
Directed graphEdges have direction; A→B is not the same as B→AWeibo follows (one-way)Web page links, dependency relationships
Weighted graphEdges have weights (distance, cost, etc.)Highways between cities (with mileages)Map navigation, shortest path

5.2 Graph Traversal

Graph traversal is more complex than linear structures because there may be cycles (A→B→C→A), requiring tracking of "visited" nodes:

Traversal MethodStrategyAnalogyUse Cases
BFS (Breadth-First)Visit all neighbors first, then neighbors' neighborsRipples spreading in waterShortest path, level-order traversal
DFS (Depth-First)Go as deep as possible on one path, backtrack when stuckNavigating a mazePath search, connectivity checking

Graphs in the Real World

  • Map navigation: Cities are nodes, roads are edges; navigation finds the shortest path in the graph
  • Social networks: Users are nodes, follows/friendships are edges; "People you may know" is graph algorithm recommendations
  • Package managers: npm/pip dependency relationships are directed graphs; npm install performs topological sorting of the graph

6. Performance Comparison: One Table to See All Data Structures

After learning so many data structures, how do their performances actually compare? The interactive comparison below will help you build intuition:

Data Structures: Containers for DataChoose different storage models for different scenarios
ArrayContiguous memory and fast indexed access
010
120
230
340
450
560
670
780
Access arr[2] = O(1), insert/delete = O(n)
Time Complexity Comparison
OperationArrayLinked listHash tableTree
AccessO(1)O(n)O(1)O(log n)
SearchO(n)O(n)O(1)O(log n)
InsertO(n)O(1)O(1)O(log n)
DeleteO(n)O(1)O(1)O(log n)
Core idea:Data structures are containers for data. Different containers have different tradeoffs. Choosing the right data structure can improve program efficiency by orders of magnitude.

Core Performance Comparison Table:

Data StructureAccessSearchInsertDeleteSpace
ArrayO(1)O(n)O(n)O(n)O(n)
Linked ListO(n)O(n)O(1)O(1)O(n)
Stack/QueueO(n)O(n)O(1)O(1)O(n)
Hash TableO(1)O(1)O(1)O(n)
Binary Search TreeO(log n)O(log n)O(log n)O(n)
GraphO(V+E)O(1)O(E)O(V+E)

How to Read This Table

  • O(1): Regardless of data volume, operation time is constant — fastest
  • O(log n): Doubling the data adds only one step — very fast
  • O(n): Doubling the data doubles the time — average
  • O(V+E): Depends on the number of vertices and edges — specific to graphs

Note: These are all average cases. In the worst case, hash tables degrade to O(n), and binary search trees also degrade to O(n).


7. Selection Guide: Which Data Structure to Use?

After learning so many data structures, how do you choose when facing actual requirements? The key is to start from the requirements and ask yourself a few questions:

  1. What's the most frequent operation? Lookup? Insertion? Deletion? Traversal?
  2. What's the relationship between data items? One-to-one? One-to-many? Many-to-many?
  3. How much data? The optimal choice for dozens vs. millions of records may be completely different
  4. Does order matter? Do you need to traverse data in a specific order?
How Do You Choose the Right Data Structure?Make the best choice based on scenario needs
What is your use case?
🔍
Fast lookup
Find matching data quickly by key
📊
Preserve order
Data must stay in insertion order or a specific order
🥞
Last in, first out
The most recent item is processed first
🚶
First in, first out
Earlier items are processed first
🌳
Hierarchy
Data has parent-child relationships
🕸️
Complex relationships
Data has many-to-many connections
Quick Reference
Scenario needRecommended structureTime complexity
Random accessArrayO(1)
Fast lookupHash tableO(1)
Ordered lookupBinary search treeO(log n)
Frequent insertion/deletionLinked listO(1)
Undo operationsStackO(1)
Task schedulingQueueO(1)
Decision Flow
Need fast element access?
Yes
Array / Hash table
No
Need frequent insertion and deletion?
Yes
Linked list
No
Need to preserve order?
Yes
Stack / Queue
No
Tree / Graph

Quick Decision Flow:

Your NeedRecommended StructureReason
Fast access by positionArrayO(1) random access
Frequent insertion/deletion in the middleLinked listO(1) insert/delete without moving elements
Last in, first out (undo, recursion)StackLIFO semantics naturally match
First in, first out (task queue)QueueFIFO semantics naturally match
Fast lookup by keyHash tableO(1) average lookup
Ordered data + fast lookupBinary search treeO(log n) lookup while maintaining order
Complex many-to-many relationshipsGraphCan express connections between any nodes

Rules of Thumb in Practice

  • 80% of scenarios are fine with arrays and hash tables
  • Consider trees when you need ordering
  • Consider graphs when relationships are complex
  • Not sure? Start with the simplest and switch when you hit performance issues. Premature optimization is the root of all evil

Summary

Data structures are the skeleton of programs. Arrays are like a row of numbered lockers — fastest for retrieving items by position; linked lists are like a chain of treasure hunt clues — most flexible for insertions and deletions; hash tables are like a library index — fastest for finding things by name; trees are like family trees — expressing hierarchical relationships while maintaining order; graphs are like subway maps — expressing arbitrarily complex networked relationships. There's no best data structure, only the most appropriate one — the key is understanding the strengths and costs of each structure and making trade-offs based on actual requirements.


Further Reading

TopicRecommended Resources
Data structure visualizationVisuAlgo - Animated demonstrations of various data structures and algorithms
Algorithms and data structuresGrokking Algorithms by Aditya Bhargava — illustrated and beginner-friendly
In-depth understandingData Structures and Algorithm Analysis by Mark Allen Weiss
Practice problemsLeetCode - Practice categorized by data structure

Next Steps

Now that you've mastered the core knowledge of data structures, you can continue learning:

  • Algorithmic Thinking: Learn to solve problems using sorting, searching, recursion, dynamic programming, and other algorithmic paradigms
  • Programming Languages: Understand how different programming languages implement these data structures