An abstract interpreter is a proof machine: every sound invariant it computes constrains the possible execution traces. And every classical compiler optimization is the same move — delete a piece of runtime work because a proof says the outcome is already fixed. Push that to its limit and analysis becomes a certified optimizing compiler. This doc makes the equivalence precise, shows where the literature already lives, and sketches how to point our existing KEVM-refinement machinery at it — without trusting a single line of the optimizer.
Here is one EVM basic block and the runtime work an interpreter does to execute it. Each proven fact below is something an abstract interpreter (or a refinement proof) in our system could establish. Toggle the facts and watch runtime work get struck off — the gas/dispatch you no longer have to pay because a proof already settled the outcome.
"Is abstract interpretation the same as compiling by proving?" Almost — they're the two halves. One produces facts; the other turns facts into faster code while preserving meaning.
Proves a property: "reachable states here ⊆ γ(a)". A one-sided over-approximation. It constrains traces but does not, by itself, change the program. This is run_sound / analyze_sound in our framework.
Proves an equivalence: "this fast residual = running the interpreter on the program". Specializes the interpreter to the program (Futamura's 1st projection). Needs no input facts at all.
If the interval domain proves the top of stack is the singleton {0}, the abstraction has collapsed to a point — and now folding [push k, add] → [push k] isn't merely licensed, it's forced. The invariant got strong enough to specialize the code. That is where the two halves meet: AI hands you the fact, partial-evaluation-style rewriting cashes it in, and the rewrite itself must be proven semantics-preserving.
Futamura (1971) showed specialize(interp, program) = a compiled program; specialize(specialize, interp) = a compiler. GraalVM/Truffle is the production realization. Abstract interpretation supplies the static input knowledge that makes specialization residualize less code — so the two stack: compress the interpreter to the program, then specialize further using proven invariants.
This isn't a metaphor. Walk the textbook list and each entry is literally a fact an analysis proves plus the runtime work that fact lets you remove. Click one.
Decades of work attack the same problem — "how do I know the fast code still means the same thing?" — in three distinct ways. They differ in what you trust. Filter by discipline.
They prove the equivalence outright: an abstract interpreter, structured as a free algebra over the standard abstract-domain signature, is a compiler — functor domains become compiler passes, and soundness of the analysis translates directly to a forward-simulation proof of the compiler (completeness ↔ backward simulation). This is the precise, mechanizable form of the intuition driving this whole doc: the analysis and the compiler are the same artifact viewed twice.
Across every strand — SuperStack's MaxSAT search, Alive2's SMT checker, Compiling-by-Proving's rule consolidation, CompCert's validated register allocator — the field converged on one shape. Let an aggressive, unverified optimizer propose; prove the result correct after the fact. It decouples optimization power from optimization trust.
run_sound yields certified facts per PC.Optimizes orig opt using the facts. This is our refinement relation, repointed.opt ⊑ orig observationally. That is exactly the shape of bN_refines / ObsRel, which today proves "concrete interpreter refines KEVM." Repoint the same relation at "optimized refines original" and you have an optimizing-compiler certifier with no new metatheory — the kernel that checks soundness also checks the optimization.Take the block from §01. The bare interpreter dispatches each opcode, meters gas per op, does a full 256-bit modular reduction on the ADD, and tests the JUMPI branch at runtime. Four certified facts collapse all of that.
; per-op dispatch + gas meter + checks PUSH1 0x00 ; dispatch, +3 gas CALLDATALOAD ; dispatch, +3 gas PUSH1 0x20 ; dispatch, +3 gas ADD ; dispatch, +3 gas, ; 256-bit modular reduce ISZERO ; dispatch, +3 gas PUSH2 .dest ; dispatch, +3 gas JUMPI ; dispatch, +10 gas, ; runtime branch test .dest: ; dead target — never taken ...revert... ; compiled & kept anyway
; one straight-line native block debit 28 gas ; D11: block gas is ; statically known → 1 debit mov r0, [calldata+0] add r0, 0x20 ; interval: no overflow → ; native add, no mod-reduce ; ISZERO+JUMPI: interval proves ; guard always false → branch ; & .dest block DELETED (D12 CFG) fallthrough ; no dispatch, no per-op meter
Word): the add never wraps → lower to a native machine add, drop the modular reduction.ISZERO result is statically 0 → JUMPI never taken → delete the branch test and the dead target block.Proving is expensive and only pays when amortized. EVM bytecode is immutable (deployed once), executed billions of times, and every op is gas-metered — so you pay the proof cost once at deploy/compile and win on every execution forever. For an AOT contract compiler the amortization is overwhelming; this is exactly why the recent EVM-perf work clusters here.
We already have the concrete semantics, certified analyses, and a refinement relation. Three things turn that into a certifying optimizer — and two of them are already on the board.
Config is the prerequisite for producing facts on actual contracts.Optimizes orig opt := ∀ c, ObsRel (run orig c) (run opt c) and a growing library of fact-conditioned rewrite lemmas (fold, dead-branch, no-wrap lowering, LICM).Optimizes. Its internals stay outside the TCB.Optimizes proof — never trusts our search.Sketch of the new layer (the relation already type-checks in spirit — it's ObsRel quantified over inputs):
def Optimizes (orig opt : List Op) : Prop := ∀ c : Config, ObsRel (run orig c) (run opt c) -- a good rewrite: provable, ships theorem add_zero_optimizes : Optimizes [.push 0, .add] [] := -- discharged by Word.add_zero — observably identical by intro c; simp [run, Word.add_zero, ObsRel] -- a bad rewrite: the proof simply fails (kill #4, negative test) theorem swap_not_nil : ¬ Optimizes [.swap 1] [] := -- swaps are observable
The principle is real but not unbounded. Four limits keep it honest.
You can always prove a sound but useless invariant (⊤ everywhere) — it licenses nothing. Optimization power is bounded by the precision of the analysis, which is the undecidable, expensive part. Soundness is the cheap half; the hard, open-ended research is in domains precise enough to yield singletons, unreachability, and no-wrap on real code.
Work that genuinely depends on inputs — a CALLDATALOAD-driven branch, an external call's return — can be bounded by analysis but never eliminated. The residual after specialization is irreducible: it's the program's intrinsic uncertainty. "Less at runtime" asymptotes to the real information the program needs from its inputs.
Proving can be slow or undecidable; it only pays when amortized over many executions. AOT contract compilation is the perfect case (prove once, run billions). A JIT is harder — the proof must be cheaper than the execution it saves. Our domain (immutable, gas-metered, hot bytecode) sits squarely in the favorable regime.
revmc lowers to LLVM and trusts LLVM + a sandbox — fast to build, large TCB. Our framing puts the kernel in the TCB instead: the optimization certificate is checked by the same minimal kernel that checks soundness, so no new trust is added. The cost is that every optimization must produce a kernel-checkable proof — which is exactly why the "untrusted search + verified certificate" split (§05) matters: it keeps the clever part outside the proof.
ObsRel refinement is already the relation that certifies the consumption — so "the more we prove, the less we run" is, for us, less a slogan than a build plan: #112 to produce facts, an Optimize/ layer to consume them, and the existing kernel to keep it all honest.