Skip to content

Epilogue: Seeing

You have turned this book to its final page.

In 1687, Newton wrote F=ma in his room at Cambridge. He did not know he was starting an arc—an arc that would pass through Hamilton, through Lyapunov, through Banach, through ResNet and GPT and diffusion models—and finally come to rest in your hands.

This arc is not a history lesson. This arc is an argument.


The argument's core is a single sentence: Learning, reasoning, and generation are all the motion of the same dynamical system along trajectories in different spaces.

St+1=St+ηFθ(St,x). When S is parameters, it describes training. When S is hidden states, it describes the forward pass of ResNet and GPT. When S is a belief distribution, it describes reasoning. When S is a data point, it describes the reverse process of diffusion models. One equation, four worlds. Not analogy—the same mathematical structure instantiated in four spaces.

You don't need to memorize this equation. You already understand it. From the wilderness hiker of Chapter 1, to Lyapunov in Chapter 6, to the cave diver in Chapter 7, to diffusion in Chapter 12—in every story you have seen the same motion: along the slope beneath your feet, step by step, toward the fixed point.


But this book is not only about mathematics.

In the preface, I told the story of Yonglin. A friend who pulled me back to myself through companionship—and I wrote his name into thought through a theorem. Companionship matters more than proof—but proof itself can also be a form of companionship.

On some late night while reading this book, perhaps in a dorm room, perhaps in a lab, perhaps on a commuter train. You might be a student, struggling between papers and exams. You might be an engineer, exhausted by models whose training won't converge. You might simply be a curious person, wanting to know what "learning" is really about.

I hope this book has walked with you for a stretch of road. I hope that when you close it, what you see is no longer a pile of formulas and code—but a wilderness. The model is the hiker. The terrain is the loss function. The step size is the learning rate. Inertia is momentum. The canyon is ResNet. The river is GPT. The fixed point is DEQ. The reasoning field is the gravitational net that every problem casts across belief space.


Force lets you calculate. Energy lets you understand. Geometry lets you see.

Newton gave force. Hamilton gave energy. Riemann gave curved space. Lyapunov gave the insight that you can know convergence without watching until the end. Banach gave the existence and uniqueness of fixed points. And we—the people of this era—have used this entire language to redescribe learning, reasoning, and generation.

But "seeing" is not only a matter of mathematics. Seeing is lifting your eyes from a formula, looking out the window—and realizing that the wilderness outside, and the energy terrain you just derived on paper, are the same.

The terrain of learning unfolds beneath your feet. It has always been there. You just needed someone to tell you—look up.


Zixi Li (Pallas's Cat Professor)2026, Sun Yat-sen University


Geometric Concept Quick Reference

Geometric ConceptMathematical ObjectMachine Learning Counterpart
PositionPoint xRNParameters θ, hidden state h, belief distribution p
TerrainScalar function L:RNRLoss function, negative entropy function
Slope directionGradient LParameter update direction, score function
Step sizeηLearning rate, reasoning step size
Discrete motionEuler method xt+1=xt+ηF(xt)Gradient descent, residual connection, CoT step
Continuous motionGradient flow dxdt=F(x)Continuous-depth models, probability flow ODE
CurvatureHessian Hij=2LxixjLocal curvature of loss terrain
Critical pointL(x)=0Minimum, maximum, saddle point
MinimumH0 and L=0Training convergence point
Saddle pointH has both positive and negative eigenvaluesTraining bottleneck
FlatnessMagnitude of Hessian eigenvaluesGeneralization ability
Non-Euclidean distanceBregman divergence DF(p|q)KL divergence
Belief space metricFisher information matrix G(p)Natural gradient
Energy descentLyapunov function V(x)Loss descent, KL convergence
Fixed pointF(x)=0 or T(x)=xDEQ output, belief solidification
StabilitySign of real parts of Jacobian eigenvaluesBasin depth and width
AttractorAsymptotically stable fixed pointCorrect answer basin
Basin of attractionSet of initial points converging to the same fixed pointReasoning robustness
BifurcationQualitative behavior change caused by parameter variationCritical learning rate, emergent ability
Vector fieldF:MTMReasoning field Fx, score field
Contraction mappingLip(T)<1Banach convergence guarantee
DiffusionStochastic differential equationForward noise injection
Reverse diffusionReverse SDE / probability flow ODEGeneration process
Data manifoldLow-dimensional submanifoldStructure of natural data distribution

Formula Family of the Book

The core thesis of the book can be captured in six formulas. They are not six independent formulas—they are six faces of the same formula in different spaces.

Core dynamical system:

St+1=St+ηFθ(St,x)

ResNet = Explicit Euler:

hl+1=hl+fθ(hl)

DEQ = Fixed point:

h=fθ(h,x)

Bregman divergence (the mother formula of KL):

DF(pq)=F(p)F(q)F(q),pq

Yonglin Limit (reasoning convergence criterion):

DKL(pt+1(y|x)pt(y|x))<ϵ

Score function (the vector field of diffusion):

dx=[12β(t)xβ(t)xlogpt(x)]dt+β(t)dw¯

Reading Roadmap

  • If you want to quickly grasp the core ideas: Preface → Volume I Introduction → ch1 → ch3 → ch6 → ch7 → ch12 → Epilogue
  • If you care about optimization and training: ch3 → ch4 → ch5 → ch6
  • If you care about reasoning: ch5 → ch7 → ch8 → ch9
  • If you care about architecture design: ch6 → ch11 → ch12
  • If you have a math background and want to tackle the hardest material: ch5 → ch6 → ch8 (theorem proof chain)
  • If you're a beginner and need to build geometric intuition first: ch1 → ch2 → ch3 → ch10