Skip to content

Data Models Overview (Document / Graph / Time-Series / Vector)

Core Question

Why can't you just stuff all your data into MySQL tables? When your data is a social network graph, millions of sensor readings per second, or semantic vectors for AI to understand, relational tables fall short. Different data shapes require different modeling approaches.


1. Beyond Relational: Why Do We Need Other Data Models?

Relational databases (MySQL, PostgreSQL) organize data with "tables + rows + columns," suitable for structured, well-defined business data. But real-world data comes in far more forms than just this:

Data ShapeRelational Pain PointBetter Model
User profiles (flexible fields, nested structures)Frequent ALTER TABLE, many NULL columnsDocument Model
Social networks (friends of friends of friends)Multi-level JOIN performance degrades exponentiallyGraph Model
Monitoring metrics (millions of writes per second)Write bottlenecks, historical data bloatTime-Series Model
AI semantic search ("similar meaning" content)Cannot express semantic similarityVector Model

Core Insight

It's not about "replacing" relational databases, but "supplementing" them. Most systems still run their core business on MySQL/PostgreSQL, but introducing specialized data models for specific scenarios can yield orders-of-magnitude performance improvements.


2. Document Model

2.1 What is the Document Model?

The document model stores data as JSON/BSON documents, where each record is a self-contained document that can have different field structures.

json
{
  "_id": "user_1001",
  "name": "Zhang San",
  "tags": ["VIP", "Active"],
  "address": { "city": "Beijing", "district": "Chaoyang" },
  "orders": [
    { "id": "o1", "amount": 299 },
    { "id": "o2", "amount": 599 }
  ]
}

Key Features:

  • No Schema Constraints: No need to predefine table structure; fields can be added or removed at any time
  • Nested Structures: Addresses and orders are embedded directly in the document; one read gets all data
  • Horizontal Scaling: Naturally suited for sharding, easily handling massive data volumes

2.2 Document vs. Relational

ComparisonRelational (MySQL)Document (MongoDB)
Data StructureFixed Schema, ALTER TABLE to modifyFlexible Schema, add fields anytime
Nested DataRequires multi-table JOINsEmbedded directly in the document
Cross-record RelationshipsJOINs are powerfulRelationship queries are weaker
Best ForStructurally stable business dataStructurally variable content data

2.3 Typical Use Cases

  • CMS Content Management: Articles, comments, and tags with varying structures
  • User Profiles: Different users have different attribute fields
  • Product Catalogs: Phones have "screen size," food has "shelf life" — completely different fields
  • Configuration Centers: Each service's configuration structure is inconsistent

Common Misconception

"MongoDB doesn't need data structure design" — Wrong! The document model also requires careful design: nesting levels shouldn't be too deep, and frequently updated sub-documents should be split into separate collections.


3. Graph Model

3.1 What is the Graph Model?

The graph model uses Nodes and Edges to represent entities and their relationships. Each node is an entity, each edge is a relationship, and both nodes and edges can carry properties.

(Zhang San) --[follows]--> (Li Si) --[follows]--> (Wang Wu)
   |                                    |
   +--------[purchased]----> (iPhone) <--[purchased]--+

3.2 The Graph Model's Killer Feature: Multi-hop Queries

Scenario: Finding "friends of friends of friends" in a social network

Relational approach (3-level JOIN):

sql
SELECT DISTINCT f3.name
FROM friends f1
JOIN friends f2 ON f1.friend_id = f2.user_id
JOIN friends f3 ON f2.friend_id = f3.user_id
WHERE f1.user_id = 1001;

Graph database approach (Cypher query language):

cypher
MATCH (me)-[:FOLLOWS*1..3]->(target)
WHERE me.name = 'Zhang San'
RETURN DISTINCT target.name

Each additional hop in the relational approach adds another JOIN, causing exponential performance degradation. Graph databases traverse relationships via pointers directly, so multi-hop query performance remains nearly unchanged.

3.3 Typical Use Cases

  • Social Networks: Friend recommendations, mutual follows, influence propagation
  • Knowledge Graphs: Entity relationship reasoning ("who is the student of who's teacher")
  • Fraud Detection: Discovering money loops, associated account networks
  • Recommendation Systems: User-product-tag relationship graph-based recommendations

4. Time-Series Model

4.1 What is the Time-Series Model?

The time-series model uses timestamps as the primary axis, specifically optimized for "write in chronological order, query by time range" scenarios.

timestamp            device      cpu_usage   memory
2024-01-15 10:00:01  server-01   45%         12.3GB
2024-01-15 10:00:02  server-01   67%         12.5GB
2024-01-15 10:00:03  server-01   92%         14.1GB

4.2 Why Not Use MySQL for Time-Series Data?

IssueMySQLTime-Series Database (InfluxDB)
Write SpeedTens of thousands/secMillions/sec
Historical DataManual cleanup, tables keep growingAutomatic expiration policy (TTL)
Aggregation QueriesSlow GROUP BYBuilt-in downsampling (5 sec → 1 min average)
Storage EfficiencyGeneral-purpose storage, wasted spaceColumnar compression, saving 90% space

4.3 Typical Use Cases

  • Server Monitoring: CPU, memory, disk collected every second
  • IoT Sensors: Temperature, humidity, GPS trajectories
  • Financial Markets: Stock prices, trading volume at second-level granularity
  • Log Analysis: Timeline aggregation of application logs

5. Vector Model

5.1 What is the Vector Model?

The vector model converts unstructured data like text, images, and audio into high-dimensional numerical vectors through an Embedding model, then measures semantic similarity by calculating the distance between vectors.

"delicious Japanese food" → Embedding → [0.82, 0.15, 0.91, 0.33, ...]
                                        ↓ Cosine similarity
"Ginza sushi master"    → [0.80, 0.18, 0.89, ...] → 96% similar
"Italian pizza"          → [0.12, 0.85, 0.20, ...] → 31% similar
ComparisonKeyword Search (LIKE / Full-text Index)Vector Search
Search MethodExact string matchingSemantic similarity matching
"delicious Japanese food"Can only match text containing "Japanese food"Can find "sushi," "sashimi," "izakaya"
MultilingualNeeds separate handlingCross-language semantic understanding
MultimodalText onlyUnified retrieval across text, images, and audio

5.3 Typical Use Cases

  • RAG (Retrieval-Augmented Generation): Providing relevant knowledge fragments to LLMs
  • Semantic Search: Understanding user intent rather than keywords
  • Image Search: Upload an image to find visually similar images
  • Recommendation Systems: Content semantic-based similarity recommendations

Choosing a Vector Database

  • Standalone Vector Databases: Pinecone, Milvus, Weaviate — focused on vector retrieval, best performance
  • Traditional Database Extensions: pgvector (PostgreSQL), Atlas Vector Search (MongoDB) — reduce architectural complexity
  • In-Memory Vector Libraries: FAISS, Annoy — suitable for small-scale, low-latency scenarios

6. Selection Guide: How to Choose a Data Model?

What Does Your Data Look Like?Recommended ModelRepresentative Products
Fixed structure, clear relationships (orders, users)RelationalMySQL, PostgreSQL
Flexible structure, deep nesting (content, configs)DocumentMongoDB, DynamoDB
Complex relationships between entities, need multi-hop traversalGraphNeo4j, Amazon Neptune
Write in chronological order, query by time rangeTime-SeriesInfluxDB, TimescaleDB
Unstructured data, need semantic similarity searchVectorPinecone, Milvus, pgvector

Practical Advice

Modern systems typically use multiple models together:

  • Core business on PostgreSQL (relational)
  • User behavior logs on InfluxDB (time-series)
  • AI knowledge base on Milvus + pgvector (vector)
  • Recommendation engine on Neo4j (graph)

Don't try to find "one database to solve all problems" — instead, let each type of data find its most suitable home.

🗂️数据模型全景四种主流数据模型对比
不是所有数据都适合塞进关系型表格。社交网络的人脉关系、IoT 设备的时间流水、AI 搜索的语义向量——不同的数据形态需要不同的建模方式
📄文档模型 (Document)MongoDB / DynamoDB
数据以 JSON 文档存储,每条记录可以有不同的字段结构,天然适合嵌套、半结构化数据。
{
  "_id": "user_1001",
  "name": "张三",
  "tags": ["VIP", "活跃"],
  "address": {
    "city": "北京",
    "district": "朝阳区"
  },
  "orders": [
    { "id": "o1", "amount": 299 },
    { "id": "o2", "amount": 599 }
  ]
}
无需预定义 Schema,字段随时扩展
嵌套数据一次读取,无需 JOIN
跨文档关联查询较弱
典型场景:用户画像CMS 内容商品目录配置中心
💡选型原则:没有万能数据库。关系型(MySQL/PostgreSQL)仍是大多数业务的基石,但当数据形态明确偏向文档、图、时序或向量时,选择专用模型能获得数量级的性能提升