A wave of recent work attacks EVM efficiency from opposite ends of the trust spectrum — from proofs that compile themselves into fast code to unverified superoptimizers backed by a verified checker to native JIT that trades proof for a sandbox. This explores the four strands, what each actually does, and the surprising punchline: the field has independently arrived at the same "untrusted search, verified certificate" pattern our refinement machinery already implements. We end with a concrete way to turn our KEVM-equivalence proofs into an optimizer checker.
Every approach to making the EVM faster makes a trade between how much you trust the result and how much speed you buy. Plotting them this way is clarifying — and it shows exactly the empty corner our work targets. Click a point.
The proof is the compiler. Construct a correctness proof by symbolic execution, then compile the proof's graph into fast rewrite rules. Top-right: very fast and correct-by-construction — but only for code you've paid to prove. Everett Hildenbrandt is a KEVM original author — this is built on the same K/matching-logic foundation as our reference oracle.
Positions are qualitative. The diagonal from bottom-left (fast, untrusted) to top-right (proven, fast) is the field's frontier; the bottom-right corner — fast native execution with a machine-checked certificate — is largely empty, and is where a refinement-based checker would sit.
The most striking recent paper, and the one closest to us. The traditional pipeline is "write a compiler, then (maybe) verify it." This inverts it: the act of proving a program correct produces the optimized code as a by-product, so optimizer and proof can never disagree — they're the same artifact.
A K semantics is modular: one EVM opcode is many small rules. The paper builds an All-Path Reachability Proof (APRP) — a directed graph whose vertices are execution states and whose edges are steps (sequential), covers (subsumption), and branches (conditionals) — then compiles that graph back into code by step compression: adjacent edges vA →ᴹ vB →ᴺ vC collapse into one vA →ᴹ⁺ᴺ vC, erasing all the intermediate pattern-matching.
Their worked example is ADD. Executed modularly it fires five rules; compiled, those five become one:
The scaling is the real message: opcode-granularity compilation gives consistent modest speedups; compiling whole programs by proof gives orders of magnitude — because the bigger the proven region, the more intermediate steps get compressed away.
It lives on the same K / matching-logic substrate as KEVM — our reference oracle — and is co-authored by a KEVM originator. We already have a K→Lean importer (task K0). Their thesis ("proofs and fast code are the same object") is the performance-side mirror of ours ("proofs and sound analysis are the same object"). The shared foundation means their proof artifacts are, in principle, in our universe.
A different, very pragmatic philosophy from the Albert & Rubio group (Complutense Madrid et al., CAV 2023; extended 2025). Don't verify the optimizer — verify each optimization it produces. This is exactly the "untrusted search, verified certificate" split, and it's the strand most directly reusable by us.
The Coq framework has three pieces: a symbolic execution engine that runs a block to a symbolic state; a set of simplification lemmas that normalize symbolic states; and a checker that compares the two normalized states. If the optimized block's symbolic state matches the original's, the optimization is certified semantically equivalent — over all inputs, not sampled.
Limitation worth noting: it's confined to jump-free blocks — straight-line sequences, no control flow. Whole-program / cross-loop optimization is out of scope, which is exactly where an invariant-aware analysis (ours) could extend it.
These are the "untrusted search" half of the pattern above. They don't prove anything; they search for the cheapest equivalent instruction sequence and lean on a separate checker (or testing) for safety.
Finds optimal replacements for stack-bytecode blocks (EVM and WebAssembly). Three ideas: a greedy pass to bound the optimal length, a MaxSAT encoding of the optimization as weighted soft clauses, and domain-specific dominance/redundancy constraints to prune the search. Evaluated on 500k+ sequences; ~4× faster than prior superoptimizers with larger gas savings.
Albert, García de la Banda, Hernández-Cerezo, Ignatiev, Rubio, Stuckey.
Earlier work in the same lineage, handling memory and storage (not just the stack) via a Max-SMT formulation — important because real gas costs are dominated by SLOAD/SSTORE/MLOAD/MSTORE, which pure stack reasoning ignores.
The optimizer whose output the Coq checker (§03) validates.
Key framing: these tools are where the cleverness lives — and precisely the part you do not want to formally verify, because the search heuristics churn constantly. The pattern says: let them be brilliant and untrusted; verify the artifact, not the engine.
At the opposite extreme from "compiling by proving": compile EVM bytecode straight to native machine code for raw throughput, and get safety from isolation rather than correctness proof.
JIT/AOT-compiles EVM bytecode to native code via the LLVM backend, eliminating interpreter dispatch overhead. ~6.9× on compute-heavy workloads; integrated into reth behind --experimental.compiler; validated by syncing a node and checking the state root.
The native code is generated from untrusted bytecode, so safety comes from a sandbox: the compiled code can't touch host memory and can only call a narrow, validated ABI. There is no proof that the native code matches the bytecode's semantics — correctness rests on the compiler being right and the state-root check at the tip.
The connecting insight: we built bN_refines / kevm_refines to prove our concrete interpreter ≈ KEVM. That is structurally identical to proving optimized block ≈ original block. The same observable-equivalence relation that powers Pillar B is a ready-made verified optimizer checker — and we'd be checking against a KEVM-faithful semantics, not a hand-rolled one.
Config. Repoint it from "ours vs KEVM" to "original vs optimized" and it certifies block rewrites.Our charter is machine-checked soundness of analysis, off the product critical path — not shipping a faster client. So this literature is relevant as a consumer of our certificates and a validation of our architecture, not a pivot. The strongest concrete play is positioning the refinement machinery as a verified checker for unverified EVM optimizers — a natural extension of the B-track that turns "we proved our interpreter = KEVM" into "we can prove anyone's optimization preserves KEVM semantics."
Here's how the B-track relation becomes a block-equivalence checker, with a real peephole rewrite and the negative test that makes it trustworthy. This reuses the exact shape of RefB7.lean — just block-to-block instead of step-to-KEVM.
Run both blocks from the same symbolic Config via our KEVM-faithful step, read off the observable (final stack / memory / storage / gas), and demand they agree. The relation is the one we already wrote for Pillar B.
-- the checker is just our observable-equivalence relation, block-to-block def blockObs (ops : List Op) (c : Config) : Option (Obs Word) := obsOf (EvmAlgebra.run aluConcrete (ops.length) (c.withCode ops)) def Optimizes (orig opt : List Op) : Prop := ∀ c : Config, ObsRel (blockObs orig c) (blockObs opt c) -- equal over ALL configs -- (a gas-aware Obs adds: gas(opt) ≤ gas(orig))
Prove Optimizes orig opt axiom-clean and you have a machine-checked certificate that the rewrite preserves KEVM semantics — the same trust level as every bN_refines.
The classic peephole win: adding zero is a no-op. x PUSH1 0 ADD ≡ x. The optimizer drops two instructions and the Gverylow gas they'd cost.
... ; stack top = x PUSH1 0x00 ; stack: x, 0 ADD ; stack: x+0 = x
... ; stack top = x (removed) (removed)
theorem add_zero_optimizes : Optimizes [.push 0, .add] [] := by -- both leave the stack as `x :: rest`; ADD's denotation is `x + 0 = x` in Word intro c; simp [blockObs, EvmAlgebra.run, aluConcrete, Word.add_zero, ObsRel] #print axioms add_zero_optimizes -- ⊆ {propext, Classical.choice, Quot.sound} ⇒ certified
Word.add_zero is a theorem. The optimizer's rewrite is now machine-checked equivalent over every input — exactly what the Coq checker does, but against our KEVM-faithful semantics.Now a plausible-looking but wrong rewrite — the kind real superoptimizers have shipped. Someone "simplifies" SWAP1 SWAP1 (a true no-op) to nothing... but mistakenly applies the same rule to SWAP1 SWAP2, which is not the identity. The checker must reject it.
SWAP1 ; b, a, c SWAP2 ; c, a, b (NOT a permutation-identity)
(removed — wrongly treated as no-op) (removed) ; claims: a, b, c
-- the kill-#4-style negative test: the bad rewrite must FAIL to certify theorem swap12_not_optimizable_to_nil : ¬ Optimizes [.swap 1, .swap 2] [] := by -- exhibit one config where observables differ: stack [a,b,c] with a≠c intro h have hbad := h (Config.ofStack [1, 2, 3]) simp [blockObs, EvmAlgebra.run, aluConcrete, ObsRel] at hbad ; [3,2,1] ≠ [1,2,3] ⇒ contradiction
lake build goes red — precisely because the rewrite changes the observable on [a,b,c]. This is the same red-on-unsound discipline as every bucket's negative test: a wrong optimization cannot certify itself.run already exist; the work is (1) a block-to-block Obs with a gas component (rides on the D11 gas refinement), (2) an ingest path from an optimizer's (orig, opt) pairs through our decoder, and (3) for non-trivial blocks, a symbolic Config + the simplification lemmas the Coq group built — the genuinely new proof engineering. The result: a verified checker that accepts SuperStack/GASOL/solc output (or rejects it with a counterexample), backed by the Lean kernel and KEVM-faithful semantics.