Data Models Overview (Document / Graph / Time-Series / Vector)

Core Question

Why can't you just stuff all your data into MySQL tables? When your data is a social network graph, millions of sensor readings per second, or semantic vectors for AI to understand, relational tables fall short. Different data shapes require different modeling approaches.

1. Beyond Relational: Why Do We Need Other Data Models?

Relational databases (MySQL, PostgreSQL) organize data with "tables + rows + columns," suitable for structured, well-defined business data. But real-world data comes in far more forms than just this:

Data Shape	Relational Pain Point	Better Model
User profiles (flexible fields, nested structures)	Frequent ALTER TABLE, many NULL columns	Document Model
Social networks (friends of friends of friends)	Multi-level JOIN performance degrades exponentially	Graph Model
Monitoring metrics (millions of writes per second)	Write bottlenecks, historical data bloat	Time-Series Model
AI semantic search ("similar meaning" content)	Cannot express semantic similarity	Vector Model

Core Insight

It's not about "replacing" relational databases, but "supplementing" them. Most systems still run their core business on MySQL/PostgreSQL, but introducing specialized data models for specific scenarios can yield orders-of-magnitude performance improvements.

2. Document Model

2.1 What is the Document Model?

The document model stores data as JSON/BSON documents, where each record is a self-contained document that can have different field structures.

json

{
  "_id": "user_1001",
  "name": "Zhang San",
  "tags": ["VIP", "Active"],
  "address": { "city": "Beijing", "district": "Chaoyang" },
  "orders": [
    { "id": "o1", "amount": 299 },
    { "id": "o2", "amount": 599 }
  ]
}

Key Features:

No Schema Constraints: No need to predefine table structure; fields can be added or removed at any time
Nested Structures: Addresses and orders are embedded directly in the document; one read gets all data
Horizontal Scaling: Naturally suited for sharding, easily handling massive data volumes

2.2 Document vs. Relational

Comparison	Relational (MySQL)	Document (MongoDB)
Data Structure	Fixed Schema, ALTER TABLE to modify	Flexible Schema, add fields anytime
Nested Data	Requires multi-table JOINs	Embedded directly in the document
Cross-record Relationships	JOINs are powerful	Relationship queries are weaker
Best For	Structurally stable business data	Structurally variable content data

2.3 Typical Use Cases

CMS Content Management: Articles, comments, and tags with varying structures
User Profiles: Different users have different attribute fields
Product Catalogs: Phones have "screen size," food has "shelf life" — completely different fields
Configuration Centers: Each service's configuration structure is inconsistent

Common Misconception

"MongoDB doesn't need data structure design" — Wrong! The document model also requires careful design: nesting levels shouldn't be too deep, and frequently updated sub-documents should be split into separate collections.

3. Graph Model

3.1 What is the Graph Model?

The graph model uses Nodes and Edges to represent entities and their relationships. Each node is an entity, each edge is a relationship, and both nodes and edges can carry properties.

(Zhang San) --[follows]--> (Li Si) --[follows]--> (Wang Wu)
   |                                    |
   +--------[purchased]----> (iPhone) <--[purchased]--+

3.2 The Graph Model's Killer Feature: Multi-hop Queries

Scenario: Finding "friends of friends of friends" in a social network

Relational approach (3-level JOIN):

sql

SELECT DISTINCT f3.name
FROM friends f1
JOIN friends f2 ON f1.friend_id = f2.user_id
JOIN friends f3 ON f2.friend_id = f3.user_id
WHERE f1.user_id = 1001;

Graph database approach (Cypher query language):

cypher

MATCH (me)-[:FOLLOWS*1..3]->(target)
WHERE me.name = 'Zhang San'
RETURN DISTINCT target.name

Each additional hop in the relational approach adds another JOIN, causing exponential performance degradation. Graph databases traverse relationships via pointers directly, so multi-hop query performance remains nearly unchanged.

3.3 Typical Use Cases

Social Networks: Friend recommendations, mutual follows, influence propagation
Knowledge Graphs: Entity relationship reasoning ("who is the student of who's teacher")
Fraud Detection: Discovering money loops, associated account networks
Recommendation Systems: User-product-tag relationship graph-based recommendations

4. Time-Series Model

4.1 What is the Time-Series Model?

The time-series model uses timestamps as the primary axis, specifically optimized for "write in chronological order, query by time range" scenarios.

timestamp            device      cpu_usage   memory
2024-01-15 10:00:01  server-01   45%         12.3GB
2024-01-15 10:00:02  server-01   67%         12.5GB
2024-01-15 10:00:03  server-01   92%         14.1GB

4.2 Why Not Use MySQL for Time-Series Data?

Issue	MySQL	Time-Series Database (InfluxDB)
Write Speed	Tens of thousands/sec	Millions/sec
Historical Data	Manual cleanup, tables keep growing	Automatic expiration policy (TTL)
Aggregation Queries	Slow GROUP BY	Built-in downsampling (5 sec → 1 min average)
Storage Efficiency	General-purpose storage, wasted space	Columnar compression, saving 90% space

4.3 Typical Use Cases

Server Monitoring: CPU, memory, disk collected every second
IoT Sensors: Temperature, humidity, GPS trajectories
Financial Markets: Stock prices, trading volume at second-level granularity
Log Analysis: Timeline aggregation of application logs

5. Vector Model

5.1 What is the Vector Model?

The vector model converts unstructured data like text, images, and audio into high-dimensional numerical vectors through an Embedding model, then measures semantic similarity by calculating the distance between vectors.

"delicious Japanese food" → Embedding → [0.82, 0.15, 0.91, 0.33, ...]
                                        ↓ Cosine similarity
"Ginza sushi master"    → [0.80, 0.18, 0.89, ...] → 96% similar
"Italian pizza"          → [0.12, 0.85, 0.20, ...] → 31% similar

5.2 Vector Search vs. Keyword Search

Comparison	Keyword Search (LIKE / Full-text Index)	Vector Search
Search Method	Exact string matching	Semantic similarity matching
"delicious Japanese food"	Can only match text containing "Japanese food"	Can find "sushi," "sashimi," "izakaya"
Multilingual	Needs separate handling	Cross-language semantic understanding
Multimodal	Text only	Unified retrieval across text, images, and audio

5.3 Typical Use Cases

RAG (Retrieval-Augmented Generation): Providing relevant knowledge fragments to LLMs
Semantic Search: Understanding user intent rather than keywords
Image Search: Upload an image to find visually similar images
Recommendation Systems: Content semantic-based similarity recommendations

Choosing a Vector Database

Standalone Vector Databases: Pinecone, Milvus, Weaviate — focused on vector retrieval, best performance
Traditional Database Extensions: pgvector (PostgreSQL), Atlas Vector Search (MongoDB) — reduce architectural complexity
In-Memory Vector Libraries: FAISS, Annoy — suitable for small-scale, low-latency scenarios

6. Selection Guide: How to Choose a Data Model?

What Does Your Data Look Like?	Recommended Model	Representative Products
Fixed structure, clear relationships (orders, users)	Relational	MySQL, PostgreSQL
Flexible structure, deep nesting (content, configs)	Document	MongoDB, DynamoDB
Complex relationships between entities, need multi-hop traversal	Graph	Neo4j, Amazon Neptune
Write in chronological order, query by time range	Time-Series	InfluxDB, TimescaleDB
Unstructured data, need semantic similarity search	Vector	Pinecone, Milvus, pgvector

Practical Advice

Modern systems typically use multiple models together:

Core business on PostgreSQL (relational)
User behavior logs on InfluxDB (time-series)
AI knowledge base on Milvus + pgvector (vector)
Recommendation engine on Neo4j (graph)

Don't try to find "one database to solve all problems" — instead, let each type of data find its most suitable home.

🗂️数据模型全景四种主流数据模型对比

不是所有数据都适合塞进关系型表格。社交网络的人脉关系、IoT 设备的时间流水、AI 搜索的语义向量——不同的数据形态需要不同的建模方式。

📄文档模型 (Document)MongoDB / DynamoDB

数据以 JSON 文档存储，每条记录可以有不同的字段结构，天然适合嵌套、半结构化数据。

{
  "_id": "user_1001",
  "name": "张三",
  "tags": ["VIP", "活跃"],
  "address": {
    "city": "北京",
    "district": "朝阳区"
  },
  "orders": [
    { "id": "o1", "amount": 299 },
    { "id": "o2", "amount": 599 }
  ]
}

无需预定义 Schema，字段随时扩展

嵌套数据一次读取，无需 JOIN

跨文档关联查询较弱

典型场景：用户画像CMS 内容商品目录配置中心

💡选型原则：没有万能数据库。关系型（MySQL/PostgreSQL）仍是大多数业务的基石，但当数据形态明确偏向文档、图、时序或向量时，选择专用模型能获得数量级的性能提升。

Data Models Overview (Document / Graph / Time-Series / Vector) ​

1. Beyond Relational: Why Do We Need Other Data Models? ​

2. Document Model ​

2.1 What is the Document Model? ​

2.2 Document vs. Relational ​

2.3 Typical Use Cases ​

3. Graph Model ​

3.1 What is the Graph Model? ​

3.2 The Graph Model's Killer Feature: Multi-hop Queries ​

3.3 Typical Use Cases ​

4. Time-Series Model ​

4.1 What is the Time-Series Model? ​

4.2 Why Not Use MySQL for Time-Series Data? ​

4.3 Typical Use Cases ​

5. Vector Model ​

5.1 What is the Vector Model? ​

5.2 Vector Search vs. Keyword Search ​

5.3 Typical Use Cases ​

6. Selection Guide: How to Choose a Data Model? ​