Chapter 20: The Formal Contract of Heuristics — The Precise Mathematical Definition of "Approximately Right"
Admissibility is not an engineering compromise. It is a mathematical promise.
The conclusion of Chapter 19 is disheartening: many reasoning problems are computationally fundamentally hard. Not because the algorithms are not good enough, but because the geometric shape of the problem itself dictates there are no shortcuts.
Yet humans and machines have been solving these "hard" problems all along. Imprecisely, not always optimally, but usually well enough, fast enough.
A question lurks here: what does "good enough" mean? If the answer is arbitrary, then any algorithm that doesn't crash can be called a "heuristic." But if "good enough" has a precise mathematical meaning, then a heuristic is not an engineering makeshift, but a contractual promise — promising what, at what cost, under what conditions it will be honored.
What this chapter sets out to do is write this contract clearly.
20.0 The Optimistic Mapmaker Game: Heuristics Must Sign a Contract
Chapter 19 told us that many mazes are too large to expect exhaustive search. So we hire a mapmaker. The mapmaker cannot tell you the complete path, but can only give an estimate at every intersection: roughly how much farther from here to the destination.
This is the heuristic function
But this mapmaker must sign a contract. The first contract is called admissibility: you may be optimistic, but you must not exaggerate the difficulty. That is, the estimated distance must not exceed the true shortest distance:
If the mapmaker underestimates the distance, the search merely takes a few more steps; if the mapmaker overestimates the distance, the search may directly abandon the genuinely optimal path. The second contract is called consistency: estimates between adjacent intersections must satisfy the triangle inequality; one cannot today say "the destination is very close," and after one step suddenly say "the destination is absurdly far."
The Formal Skeleton of This Game
- State space: nodes in the search graph, known cost
, heuristic estimate , priority queue. - Legal moves: expand the node with the smallest
, and update its successors. - Transition rules: take node
from the open list, add or update successor , with cost . - Victory condition: find the optimal path under provable guarantees, or obtain an approximation ratio / PAC-quality bound.
- Failure mode: the heuristic has no contract, is merely "nice-looking," and ultimately delivers the fastest wrong answer.
This game makes "approximately right" precise. A heuristic without a contract is merely luck; a heuristic with a contract is a mathematical object. Admissibility, consistency, approximation ratio, PAC guarantees — all are different forms of contract clauses.
The dignity of a heuristic lies not in always producing the optimal answer, but in honestly stating: what I promise, what I do not promise; under what conditions I am reliable, under what conditions I will fail.
Optimism can be a virtue, or it can be fraud. A* allows the heuristic function to be optimistic, because optimism makes you continue exploring; it does not allow the heuristic function to be arrogant, because arrogance makes you prematurely abandon the true path. The entire ethic of heuristics is hidden inside this inequality.
20.1 Two Failures of Approximation
First, let us look at two approximation schemes, both intuitively reasonable, both wrong under certain conditions.
Scheme One: Greedy algorithm. At each step, choose the locally best-looking option, no backtracking. In the traveling salesman problem, at each step go to the nearest unvisited city; in graph coloring, assign the current vertex the smallest-numbered color that doesn't conflict with neighbors.
Greedy algorithms are typically fast (polynomial time), but the results can be very poor — sometimes off from the optimal solution by an exponential factor. Greed makes no promise; it is merely "nice-looking."
Scheme Two: Random search. Randomly sample the solution space, report the best result found. Random enough, enough times, perhaps a good solution can be found.
Random search also makes no promise — "perhaps" is not a contract. In the worst case, good solutions in the solution space are extremely sparse, and random search may never find them.
A great many practical systems are using exactly "perhaps" — random restarts, multiple sampling, taking the best result. These methods sometimes work in practice, but you don't know under what circumstances they will fail, nor what the probability of failure is. Without a contract, there is no failure condition; you can't even tell whether the algorithm is working. "Sometimes works" is not a guarantee; it is luck.
The common problem of these two schemes: no provable bound on "how far off." What heuristic contracts need is precisely this bound.
20.2 Admissibility: Never Overestimate
The most classic heuristic framework in search problems is the A* algorithm. Its core is a heuristic function
The A* algorithm maintains a priority queue, expanding at each step the node with the smallest estimated total cost
The condition for this algorithm to find the optimal solution rests precisely on one property of
Definition (Admissibility): A heuristic function
where
The meaning of admissibility:
Theorem (Optimality of A)* : If the heuristic function
This is the first form of the heuristic contract: give me an estimate that never overestimates, and I guarantee you the optimal solution. The cost is that you must spend time constructing this estimate, and the algorithm may need to explore a larger search space.
A common mistake is to use a rough upper bound on the true cost as the heuristic function — this will of course overestimate, thereby destroying admissibility. For instance, in map search, if straight-line distance is used as
20.3 Consistency: The Constraint of the Triangle Inequality
Admissibility guarantees the quality of the final result, but the efficiency of A* also depends on another property of
Definition (Consistency): A heuristic function
where
This is a triangle inequality: the estimated cost from
Consistency implies admissibility (it can be verified), but the converse does not hold. Consistency is a stronger condition.
Under consistency, the
These two properties — admissibility and consistency — constitute the precise mathematical content of the heuristic contract. With them, a "heuristic" is no longer a casual guess, but an estimation mechanism operating within provable bounds.
Returning to the Optimistic Mapmaker Game, admissibility stipulates that the mapmaker cannot call a short road a long road; consistency stipulates that he cannot arbitrarily change his story between adjacent intersections. The cleverness of A* lies not in "trusting intuition," but in trusting only intuition that has signed a contract. Without this contract, a heuristic is merely a guide who talks a good game; with this contract, it becomes a legitimate move within the formal system.
20.4 Approximation Ratio: Worst-Case Guarantees
The A* framework works well on problems where optimality is meaningful. But for many NP-complete problems, the optimal solution itself is hard to find, even when approximation is allowed. Here another kind of contract is needed: approximation ratio.
Definition (Approximation algorithm): For a minimization problem, if algorithm
The approximation ratio is the worst-case version of the heuristic contract: regardless of the input, the quality of the output is guaranteed within a factor of
Some famous results:
- Vertex cover problem: there exists a 2-approximation algorithm (find a maximal matching, take both endpoints).
- Traveling salesman problem (metric version): there exists a 1.5-approximation algorithm (Christofides' algorithm, 1976). This result stood for nearly half a century, only slightly improved in 2020.
- Traveling salesman problem (general version, not satisfying the triangle inequality): if P ≠ NP, then for any constant
, no polynomial-time -approximation algorithm exists.
This last result is crucial: for certain problems, approximation is equally NP-hard. Allowing error does not always make the problem easier. The approximation ratio itself has an information-theoretic lower bound, a bound arising from the intrinsic structure of the problem, independent of any specific algorithm.
The proof of the inapproximability of the general version of the traveling salesman problem uses reduction: if a polynomial-time
20.5 PAC Learning: The Probabilistic Version of "Approximately Right"
Approximation ratios concern the quality of solutions; the PAC learning framework applies the same spirit to learning problems.
Definition (PAC Learning): A concept class
In plain language: with high probability (at least
Many people, upon seeing "probably approximately correct," think this is lowering the standard. Quite the opposite —
The core conclusion of the PAC framework: the sample size
In the PAC framework, the required sample size also depends on another quantity: the VC dimension (Vapnik-Chervonenkis dimension) of the hypothesis space
20.6 The Unification of the Three Contracts
Let us lay out the content of this chapter side by side:
| Contract type | Promise | Condition | Cost |
|---|---|---|---|
| Admissibility + A* | Find optimal solution | May explore larger space | |
| Approximation ratio | Solution within | Worst-case guarantee | Sacrifice optimality |
| PAC learning | Probably approximately correct | Sufficiently many samples | Allow failure probability |
Three contracts, three different kinds of "approximately right" — corresponding to three different cost structures and guarantee types. Their commonality is: all turn "approximately" from an intuitive word into a mathematical quantity. Contracts can be violated, but violations are detectable. This is entirely consistent with the spirit of the formal system in Chapter 14: precise definitions enable precise criticism.
Heuristics are not defective products forced upon us because exact reasoning is too hard. They are rational promises about reasoning quality under computational constraints — knowing what they promise, knowing where the boundaries of their promises lie.
Unresolved
Can the boundaries of approximation be broken? For some problems, the approximation ratio has an insurmountable lower bound (assuming P ≠ NP). But approximation algorithm theory still has vast unresolved questions: what is the optimal approximation ratio for the traveling salesman problem? Does the Unique Games Conjecture hold — it determines the approximation lower bounds for a great many problems. Questions in this direction are often easier to state than P ≠ NP itself, and equally deep.
The computational version of PAC learning: The PAC framework guarantees sample complexity, but not computational complexity. Some concept classes are learnable in the sample sense, but the learning algorithm itself may require exponential time. Distinguishing "information-theoretically learnable" from "computationally learnable" is the core tension of learning theory — the former asks how much data is needed, the latter asks how much time is needed.
Can heuristics be automatically discovered? A* requires human-designed
Exercises
★ Warm-up
The A* algorithm commonly uses straight-line distance as the heuristic function
(zero heuristic, degenerates into Dijkstra's algorithm) straight-line distance from to the goal (Euclidean distance) straight-line distance from to the goal the true shortest path length from to the goal (assuming you already know it) Manhattan distance from to the goal (only allowing up/down/left/right movement)
★★ Derivation
The cost of admissibility: Design a simple search problem (draw a 5-node graph, label edge weights), such that: an admissible heuristic function
requires expanding more nodes to find the optimal solution, while a non-admissible finds the same solution with fewer nodes. Write out the specific values of the two 's, and trace the expansion order of A* in both cases. What does this illustrate about what admissibility guarantees, and what it does not guarantee? The relationship between PAC and Bayesian approaches: Bayesian inference from Chapter 17 and PAC learning from this chapter both deal with "generalizing from finite data." PAC guarantees hold for the worst-case distribution; Bayesian guarantees depend on the prior. Which kind of guarantee makes fewer assumptions? In a scenario where the "prior is completely wrong," which framework will collapse first?
★★★ Challenge
The general version of the traveling salesman problem (TSP) (not satisfying the triangle inequality) has been proven: if P ≠ NP, then for any constant
The proof of this "inapproximability" uses reduction: if a
The goal of this exercise is not to write out the complete proof, but to understand: why allowing error does not always make the problem easier — in some cases, approximation is just as hard as exact solution.
