Prove More, Run Less: When a Sound Analysis Becomes an Optimizing Compiler

There is a claim hiding inside abstract interpretation that sounds too strong the first time you say it out loud: a compiler optimization is just runtime work you proved you don't have to do. Not a metaphor — a definition. Walk the textbook list and every entry is a static fact plus the runtime work that fact makes redundant. Constant folding needs a proof that the operands are constants. Dead-branch elimination needs a proof that a guard is always false. Bounds-check elimination needs a proof that the index lies in range. Overflow-check removal needs a proof that the arithmetic doesn't wrap. The optimization is the deletion; the analysis is the license.

Once you see optimization that way, the obvious question is whether you can run it backwards: if every optimization is licensed by a proof, and an abstract interpreter is a machine that manufactures sound proofs about a program, then how much of a compiler is already sitting inside the analysis? The answer turns out to be: most of it.

Two halves of one enterprise

It's tempting to say abstract interpretation is "compiling by proving." That's almost right, and the gap is the interesting part.

An abstract interpreter proves a property: at this program point, the set of reachable states is contained in γ(a). It is a one-sided over-approximation. It constrains the possible execution traces, but it does not, by itself, change a single instruction.

"Compiling by proving" — in the Futamura sense, and in the recent K-framework work of the same name — proves an equivalence: this fast residual program computes exactly what running the interpreter on the source would. It specializes the interpreter to the program. Notably, it needs no facts about inputs at all.

So they are different kinds of proof. The hinge that fuses them is this: a precise-enough over-approximation becomes an equivalence. If the interval domain proves the top of the stack is the singleton {0}, the abstraction has collapsed to a point, and now folding [push k, add] into [push k] isn't merely permitted — it's forced. The invariant got strong enough to specialize the code. Abstract interpretation hands you the fact; partial-evaluation-style rewriting cashes it in; and the rewrite itself must be proven semantics-preserving. The two halves meet exactly there.

This is why the load-bearing caveat is soundness is cheap, precision is expensive. You can always prove a sound but useless invariant — ⊤ everywhere — and it licenses nothing. Optimization power is bounded by the precision of the analysis, which is the undecidable, open-ended part. "The more we prove, the less we run" only bites if "prove more" means "prove more singletons, more unreachable branches, more no-overflow guarantees" — the facts that collapse a runtime choice into a static certainty.

The field already converged on the architecture

If you want aggressive optimization and a guarantee that it preserves meaning, there is a shape that recurs across decades of otherwise-unrelated work: untrusted search, trusted certificate. Let an aggressive, unverified optimizer propose a transformed fragment — a superoptimizer doing a MaxSAT search, an LLVM backend lowering to native code, even a language model. Then prove the proposal correct after the fact, against the certified facts. If the proof checks, ship it; if not, fall back. You never verify the optimizer's internals — only a small, re-checkable certificate. This decouples optimization power from optimization trust, and it's the single most important leverage point in the whole story.

That pattern has three classical disciplines, differing only in what you end up trusting:

Verified construction. Prove the compiler correct once and for all. CompCert (~20 passes, each a forward-simulation proof) and CakeML are the gold standard. The keystone result for the thesis here is Lesbre & Lemerre's Compiling with Abstract Interpretation (PLDI 2024): structure an abstract interpreter as a free algebra over the standard domain signature and it is a compiler — functor domains become passes, and soundness of the analysis translates directly into a forward-simulation proof of the compiler. The analysis and the optimizing compiler are the same artifact viewed twice.
Translation validation. Don't verify the compiler — verify each run. Pnueli's translation validation and Necula's application to GCC compare the program before and after each pass and prove semantics preserved per-run; you only have to trust the validator. Alive2 does this for LLVM via SMT and found dozens of real bugs; a Coq engine does it for jump-free EVM blocks (CAV 2023) and found bugs in solc, GASOL, and SuperStack.
Proof-carrying code. The compiler emits a checkable certificate alongside the code; the consumer re-checks it independently (Necula & Lee). Rinard's credible compilation sharpened this into a relational proof that the target implements the source. The recent Compiling by Proving — co-authored by the original KEVM author — consolidates K's semantic rules into faster compiled rules, each carrying its preservation proof, for a ~2× geomean speedup on concrete EVM execution.

The quiet punchline, for anyone who has built a machine-checked refinement between two interpreters: every optimization is a refinement — opt ⊑ orig, observationally. That is precisely the shape of an observational-equivalence relation you might already have proving "this concrete interpreter refines KEVM." Repoint the same relation at "this optimized fragment refines the original" and you have a certifying optimizer with no new metatheory. The kernel that checks soundness checks the optimization.

One block, four proofs, native-speed output

Make it concrete. Take a single EVM basic block: push an offset, load calldata, add a constant, test the result, conditionally jump to a revert. The bare interpreter, on every execution, dispatches each opcode, meters gas per op, performs a full 256-bit modular reduction on the ADD, and evaluates the branch at runtime.

Now prove four facts:

No-overflow (interval over the 256-bit word): the add never wraps — so lower it to a native machine add and drop the modular reduction.
Guard always false (sign/interval): the test is statically 0 — so the jump is never taken, and both the branch test and the entire dead target block disappear.
Static block gas: the block's gas cost is a constant — so the per-op meter updates collapse into a single upfront debit.
Straight-line control flow: the block has one successor — so it becomes one native basic block with no per-op dispatch.

What's left is a short, branch-free, native sequence that does exactly the irreducible work — read the calldata word, add — and nothing else. Every deletion was licensed by a kernel-checked proof. That is "prove more, run less," made of code rather than slogans.

And it's why the EVM is such a good place to do this. Proving is expensive; it only pays when amortized. EVM bytecode is immutable (deployed once), executed billions of times, and gas-metered per operation — so you pay the proof cost once, at compile time, and collect on every execution forever. The economics that make heavy proving a bad deal for a throwaway script make it an overwhelming win for a deployed contract.

The honest floor

The principle is real but not unbounded, and it's worth naming where it stops. Precision, not quantity, is the currency — a thousand sound-but-vague theorems optimize nothing. There is an input-dependence floor: work that genuinely depends on inputs — a calldata-driven branch, an external call's result — can be bounded by analysis but never eliminated; the residual is the program's intrinsic uncertainty. Proof cost must be amortized, which is why this favors ahead-of-time contract compilation over a hot-path JIT. And the trusted base is a real choice: a native JIT that lowers to LLVM trusts LLVM and a sandbox; the proof-carrying route trusts only a minimal kernel, at the cost of having to produce a kernel-checkable certificate for every transformation — which is exactly why the untrusted-search/verified-certificate split matters, since it keeps the clever, untrusted part outside the proof.

I built this out as a visual, interactive explainer — with a "runtime-work ledger" you can drive: toggle each proven fact and watch the corresponding runtime work get struck off a real EVM block, with a live gauge of how much of the per-execution cost you've compiled away. It also has the full optimization-as-proof catalogue, a filterable map of the prior art by trust discipline, and a buildable sketch of the refinement-as-optimizer-checker.

→ Open the interactive explainer

It's the third in a small series. Where one behavior, many views asks in what sense are all these tools talking about the same program, and one substrate, many instruments turns that into a build recipe, this one follows the thread to its end: once the instruments share a substrate and earn their soundness, the soundness proof is itself the raw material for a certified optimizing compiler. The flashcards below are the TL;DR.

Two halves of one enterprise

The field already converged on the architecture

One block, four proofs, native-speed output

The honest floor

Flashcards Available