Epilogue: Seeing
You have turned this book to its final page.
In 1687, Newton wrote
This arc is not a history lesson. This arc is an argument.
The argument's core is a single sentence: Learning, reasoning, and generation are all the motion of the same dynamical system along trajectories in different spaces.
You don't need to memorize this equation. You already understand it. From the wilderness hiker of Chapter 1, to Lyapunov in Chapter 6, to the cave diver in Chapter 7, to diffusion in Chapter 12—in every story you have seen the same motion: along the slope beneath your feet, step by step, toward the fixed point.
But this book is not only about mathematics.
In the preface, I told the story of Yonglin. A friend who pulled me back to myself through companionship—and I wrote his name into thought through a theorem. Companionship matters more than proof—but proof itself can also be a form of companionship.
On some late night while reading this book, perhaps in a dorm room, perhaps in a lab, perhaps on a commuter train. You might be a student, struggling between papers and exams. You might be an engineer, exhausted by models whose training won't converge. You might simply be a curious person, wanting to know what "learning" is really about.
I hope this book has walked with you for a stretch of road. I hope that when you close it, what you see is no longer a pile of formulas and code—but a wilderness. The model is the hiker. The terrain is the loss function. The step size is the learning rate. Inertia is momentum. The canyon is ResNet. The river is GPT. The fixed point is DEQ. The reasoning field is the gravitational net that every problem casts across belief space.
Force lets you calculate. Energy lets you understand. Geometry lets you see.
Newton gave force. Hamilton gave energy. Riemann gave curved space. Lyapunov gave the insight that you can know convergence without watching until the end. Banach gave the existence and uniqueness of fixed points. And we—the people of this era—have used this entire language to redescribe learning, reasoning, and generation.
But "seeing" is not only a matter of mathematics. Seeing is lifting your eyes from a formula, looking out the window—and realizing that the wilderness outside, and the energy terrain you just derived on paper, are the same.
The terrain of learning unfolds beneath your feet. It has always been there. You just needed someone to tell you—look up.
Zixi Li (Pallas's Cat Professor)2026, Sun Yat-sen University
Geometric Concept Quick Reference
| Geometric Concept | Mathematical Object | Machine Learning Counterpart |
|---|---|---|
| Position | Point | Parameters |
| Terrain | Scalar function | Loss function, negative entropy function |
| Slope direction | Gradient | Parameter update direction, score function |
| Step size | Learning rate, reasoning step size | |
| Discrete motion | Euler method | Gradient descent, residual connection, CoT step |
| Continuous motion | Gradient flow | Continuous-depth models, probability flow ODE |
| Curvature | Hessian | Local curvature of loss terrain |
| Critical point | Minimum, maximum, saddle point | |
| Minimum | Training convergence point | |
| Saddle point | Training bottleneck | |
| Flatness | Magnitude of Hessian eigenvalues | Generalization ability |
| Non-Euclidean distance | Bregman divergence | KL divergence |
| Belief space metric | Fisher information matrix | Natural gradient |
| Energy descent | Lyapunov function | Loss descent, KL convergence |
| Fixed point | DEQ output, belief solidification | |
| Stability | Sign of real parts of Jacobian eigenvalues | Basin depth and width |
| Attractor | Asymptotically stable fixed point | Correct answer basin |
| Basin of attraction | Set of initial points converging to the same fixed point | Reasoning robustness |
| Bifurcation | Qualitative behavior change caused by parameter variation | Critical learning rate, emergent ability |
| Vector field | Reasoning field | |
| Contraction mapping | Banach convergence guarantee | |
| Diffusion | Stochastic differential equation | Forward noise injection |
| Reverse diffusion | Reverse SDE / probability flow ODE | Generation process |
| Data manifold | Low-dimensional submanifold | Structure of natural data distribution |
Formula Family of the Book
The core thesis of the book can be captured in six formulas. They are not six independent formulas—they are six faces of the same formula in different spaces.
Core dynamical system:
ResNet = Explicit Euler:
DEQ = Fixed point:
Bregman divergence (the mother formula of KL):
Yonglin Limit (reasoning convergence criterion):
Score function (the vector field of diffusion):
Reading Roadmap
- If you want to quickly grasp the core ideas: Preface → Volume I Introduction → ch1 → ch3 → ch6 → ch7 → ch12 → Epilogue
- If you care about optimization and training: ch3 → ch4 → ch5 → ch6
- If you care about reasoning: ch5 → ch7 → ch8 → ch9
- If you care about architecture design: ch6 → ch11 → ch12
- If you have a math background and want to tackle the hardest material: ch5 → ch6 → ch8 (theorem proof chain)
- If you're a beginner and need to build geometric intuition first: ch1 → ch2 → ch3 → ch10
