Literature exploration · 2026-05-26

Proofs meet performance on the EVM

A wave of recent work attacks EVM efficiency from opposite ends of the trust spectrum — from proofs that compile themselves into fast code to unverified superoptimizers backed by a verified checker to native JIT that trades proof for a sandbox. This explores the four strands, what each actually does, and the surprising punchline: the field has independently arrived at the same "untrusted search, verified certificate" pattern our refinement machinery already implements. We end with a concrete way to turn our KEVM-equivalence proofs into an optimizer checker.

4 strands mapped KEVM lineage a buildable design sketch
01 · The landscape

One picture: trust on one axis, speed on the other

Every approach to making the EVM faster makes a trade between how much you trust the result and how much speed you buy. Plotting them this way is clarifying — and it shows exactly the empty corner our work targets. Click a point.

← unverified machine-checked → slower ··· faster
Compiling by Proving
Verified block-opt (Coq)
SuperStack
revmc (JIT/AOT)
◆ us (refinement)
Zhao, Hildenbrandt, Conejero, Zhao · arXiv 2509.21793 · Sept 2025

Compiling by Proving

The proof is the compiler. Construct a correctness proof by symbolic execution, then compile the proof's graph into fast rewrite rules. Top-right: very fast and correct-by-construction — but only for code you've paid to prove. Everett Hildenbrandt is a KEVM original author — this is built on the same K/matching-logic foundation as our reference oracle.

Positions are qualitative. The diagonal from bottom-left (fast, untrusted) to top-right (proven, fast) is the field's frontier; the bottom-right corner — fast native execution with a machine-checked certificate — is largely empty, and is where a refinement-based checker would sit.

02 · The headline result

Compiling by Proving — when the proof is the optimization

The most striking recent paper, and the one closest to us. The traditional pipeline is "write a compiler, then (maybe) verify it." This inverts it: the act of proving a program correct produces the optimized code as a by-product, so optimizer and proof can never disagree — they're the same artifact.

The mechanism, intuitively

A K semantics is modular: one EVM opcode is many small rules. The paper builds an All-Path Reachability Proof (APRP) — a directed graph whose vertices are execution states and whose edges are steps (sequential), covers (subsumption), and branches (conditionals) — then compiles that graph back into code by step compression: adjacent edges vA →ᴹ vB →ᴺ vC collapse into one vA →ᴹ⁺ᴺ vC, erasing all the intermediate pattern-matching.

Their worked example is ADD. Executed modularly it fires five rules; compiled, those five become one:

1. validate stack depth
2. charge gas (Gverylow)
3. pop a, b · push a+b mod 2²⁵⁶
4. stack-pointer bookkeeping
5. advance program counter
one compiled
ADD rule
all 5 effects, atomically, no intermediate matching
Why it's correct by construction: the compiled rule is extracted from a proof that the five-rule sequence has exactly that effect. Compiled rules are installed at higher priority than the original modular rules, which stay as a trusted fallback — so the formal semantics is never abandoned, only short-circuited where a proof exists. This is the "one definition, many tools" K philosophy with the interpreter↔model gap closed: the fast interpreter and the formal model are derived from the same proof.

What it actually buys (their numbers)

2.19×
geo-mean speedup, EVM concrete execution (2,756 tests; median 2.00×)
5.29×
top-decile EVM opcode speedup
1.44×
symbolic execution, 70 proof tasks
522×
whole-program Rust/MIR loop (2438s → 4.7s)

The scaling is the real message: opcode-granularity compilation gives consistent modest speedups; compiling whole programs by proof gives orders of magnitude — because the bigger the proven region, the more intermediate steps get compressed away.

Why this matters to us specifically

It lives on the same K / matching-logic substrate as KEVM — our reference oracle — and is co-authored by a KEVM originator. We already have a K→Lean importer (task K0). Their thesis ("proofs and fast code are the same object") is the performance-side mirror of ours ("proofs and sound analysis are the same object"). The shared foundation means their proof artifacts are, in principle, in our universe.

03 · Verified optimization

Translation validation — let an unverified tool optimize, then check it

A different, very pragmatic philosophy from the Albert & Rubio group (Complutense Madrid et al., CAV 2023; extended 2025). Don't verify the optimizer — verify each optimization it produces. This is exactly the "untrusted search, verified certificate" split, and it's the strand most directly reusable by us.

② Coq checker
symbolically execute both blocks → symbolic states → simplify → compare
③ Verdict
✓ equivalent (accept) · ✗ not equivalent (reject + bug report)

The Coq framework has three pieces: a symbolic execution engine that runs a block to a symbolic state; a set of simplification lemmas that normalize symbolic states; and a checker that compares the two normalized states. If the optimized block's symbolic state matches the original's, the optimization is certified semantically equivalent — over all inputs, not sampled.

The payoff that proves the point: applied as a checker against real optimizers — the standard Solidity compiler's peephole pass, and the superoptimizers GASOL and SuperStack — it found genuine bugs, which were reported and fixed. That's the whole argument for the pattern: the clever optimizer can be wrong, and a cheap verified checker catches it without anyone having to verify the optimizer's internals.

Limitation worth noting: it's confined to jump-free blocks — straight-line sequences, no control flow. Whole-program / cross-loop optimization is out of scope, which is exactly where an invariant-aware analysis (ours) could extend it.

05 · The fast, untrusted corner

revmc & native JIT — speed bought with a sandbox, not a proof

At the opposite extreme from "compiling by proving": compile EVM bytecode straight to native machine code for raw throughput, and get safety from isolation rather than correctness proof.

revmc (Paradigm, 2024)

JIT/AOT-compiles EVM bytecode to native code via the LLVM backend, eliminating interpreter dispatch overhead. ~6.9× on compute-heavy workloads; integrated into reth behind --experimental.compiler; validated by syncing a node and checking the state root.

The trust model

The native code is generated from untrusted bytecode, so safety comes from a sandbox: the compiled code can't touch host memory and can only call a narrow, validated ABI. There is no proof that the native code matches the bytecode's semantics — correctness rests on the compiler being right and the state-root check at the tip.

The gap this leaves: a JIT this fast has no machine-checked equivalence to the EVM semantics — a miscompile would be a consensus bug. That missing certificate is exactly the bottom-right corner of the landscape map, and exactly what a refinement-based checker could supply.

06 · Where we fit

Our refinement machinery is a translation validator

The connecting insight: we built bN_refines / kevm_refines to prove our concrete interpreter ≈ KEVM. That is structurally identical to proving optimized block ≈ original block. The same observable-equivalence relation that powers Pillar B is a ready-made verified optimizer checker — and we'd be checking against a KEVM-faithful semantics, not a hand-rolled one.

field: translation validation (Coq, §03)
ours: ObsRel / obsOf / bN_refines
Our per-bucket observable-equivalence relation already relates two step semantics over a shared Config. Repoint it from "ours vs KEVM" to "original vs optimized" and it certifies block rewrites.
have the machinery
field: untrusted search + verified check
ours: tier-C certificate pattern
The exact split our abstract-interpreter doc names as tier-C. SuperStack/GASOL = the untrusted search; a Lean refinement proof = the certificate. Lean kernel is the only trusted base.
same philosophy
field: jump-free blocks only (§03 limit)
ours: whole-program invariants (Pillar A)
Our analyzer computes reachable-state invariants. A proven invariant licenses optimizations a block-local checker can't justify — e.g. "operands < 2¹²⁸ here ⇒ drop the wrap-check." Invariant-aware verified optimization.
extends the field
field: revmc native JIT, no proof (§05)
ours: proof-carrying compilation
Emit a Lean-checkable certificate that JIT output matches bytecode semantics — filling revmc's empty corner. The compiled artifact ships with its own proof.
research lane
field: Compiling by Proving (K, §02)
ours: K→Lean importer (K0)
Same K/matching-logic substrate as KEVM and our oracle. Their proof-derived rules and our Lean refinements share a source of truth; the importer is the bridge.
shared foundation
field: gas as the optimization target
ours: gas/refund refinement (D11)
Verified gas optimization needs a verified gas model. Our D11 gas refinement is exactly the cost semantics a certified gas-optimizer checker would measure against.
on the roadmap

The honest scope note

Our charter is machine-checked soundness of analysis, off the product critical path — not shipping a faster client. So this literature is relevant as a consumer of our certificates and a validation of our architecture, not a pivot. The strongest concrete play is positioning the refinement machinery as a verified checker for unverified EVM optimizers — a natural extension of the B-track that turns "we proved our interpreter = KEVM" into "we can prove anyone's optimization preserves KEVM semantics."

07 · A concrete design sketch

Refinement-as-optimizer-checker, worked end to end

Here's how the B-track relation becomes a block-equivalence checker, with a real peephole rewrite and the negative test that makes it trustworthy. This reuses the exact shape of RefB7.lean — just block-to-block instead of step-to-KEVM.

Run both blocks from the same symbolic Config via our KEVM-faithful step, read off the observable (final stack / memory / storage / gas), and demand they agree. The relation is the one we already wrote for Pillar B.

-- the checker is just our observable-equivalence relation, block-to-block
def blockObs (ops : List Op) (c : Config) : Option (Obs Word) :=
  obsOf (EvmAlgebra.run aluConcrete (ops.length) (c.withCode ops))

def Optimizes (orig opt : List Op) : Prop :=
  ∀ c : Config, ObsRel (blockObs orig c) (blockObs opt c)   -- equal over ALL configs
                                                       -- (a gas-aware Obs adds: gas(opt) ≤ gas(orig))

Prove Optimizes orig opt axiom-clean and you have a machine-checked certificate that the rewrite preserves KEVM semantics — the same trust level as every bN_refines.

The classic peephole win: adding zero is a no-op. x PUSH1 0 ADDx. The optimizer drops two instructions and the Gverylow gas they'd cost.

original
...        ; stack top = x
PUSH1 0x00 ; stack: x, 0
ADD        ; stack: x+0 = x
optimized
...        ; stack top = x
(removed)
(removed)
theorem add_zero_optimizes : Optimizes [.push 0, .add] [] := by
  -- both leave the stack as `x :: rest`; ADD's denotation is `x + 0 = x` in Word
  intro c; simp [blockObs, EvmAlgebra.run, aluConcrete, Word.add_zero, ObsRel]
#print axioms add_zero_optimizes   -- ⊆ {propext, Classical.choice, Quot.sound} ⇒ certified
Accept. The proof goes through because Word.add_zero is a theorem. The optimizer's rewrite is now machine-checked equivalent over every input — exactly what the Coq checker does, but against our KEVM-faithful semantics.

Now a plausible-looking but wrong rewrite — the kind real superoptimizers have shipped. Someone "simplifies" SWAP1 SWAP1 (a true no-op) to nothing... but mistakenly applies the same rule to SWAP1 SWAP2, which is not the identity. The checker must reject it.

original — stack: a, b, c
SWAP1  ; b, a, c
SWAP2  ; c, a, b   (NOT a permutation-identity)
optimized (BUGGY)
(removed — wrongly treated as no-op)
(removed)          ; claims: a, b, c
-- the kill-#4-style negative test: the bad rewrite must FAIL to certify
theorem swap12_not_optimizable_to_nil :
    ¬ Optimizes [.swap 1, .swap 2] [] := by
  -- exhibit one config where observables differ: stack [a,b,c] with a≠c
  intro h
  have hbad := h (Config.ofStack [1, 2, 3])
  simp [blockObs, EvmAlgebra.run, aluConcrete, ObsRel] at hbad   ; [3,2,1] ≠ [1,2,3] ⇒ contradiction
Reject. The certificate fails to provelake build goes red — precisely because the rewrite changes the observable on [a,b,c]. This is the same red-on-unsound discipline as every bucket's negative test: a wrong optimization cannot certify itself.

What this would take to build: the relation and run already exist; the work is (1) a block-to-block Obs with a gas component (rides on the D11 gas refinement), (2) an ingest path from an optimizer's (orig, opt) pairs through our decoder, and (3) for non-trivial blocks, a symbolic Config + the simplification lemmas the Coq group built — the genuinely new proof engineering. The result: a verified checker that accepts SuperStack/GASOL/solc output (or rejects it with a counterexample), backed by the Lean kernel and KEVM-faithful semantics.