The fidelity work proves the semantics are right. But the analysis itself — the fixpoint, the widening, the transfer functions — is a separate machine. This walks what the interpreter engine actually is in Lean today, how the domains and the KEVM-fidelity streams plug into it, and the deeper question: how do you use Lean proofs to make a domain or a specific program analysis more powerful — a sharper widening, a proven loop bound, a licensed precise transfer — and have the analyzer come back with strictly more refined, still-sound answers?
"The abstract interpreter" sounds like one thing. In this codebase it is three, with sharply different jobs and sharply different trust stories. Getting the boundaries right is the whole point — it's what lets you change one without re-proving the others.
The generic machinery that drives a computation: the fuel-stepper and the CFG fixpoint+widening loop. Written once, parametric over everything. Lives in Framework/. Proven sound once and never touched again.
A lattice + transfer functions + a widening, bundled with one γ-containment lemma per operation. This is data you feed the engine. Sign, Interval, … each an instance under Evm/. This is where precision lives.
KEVM equivalence — bN_refines under Evm/Kevm/. Certifies the concrete interpreter matches the reference. Says nothing about abstraction. Orthogonal to ① and ②; they meet only at the concrete step.
stepThere isn't one abstract interpreter — there are two evaluators in Framework/, for two shapes of program. Both are fully parametric (kill-criterion #1: no if domain == … anywhere) and both already carry their soundness theorem.
The straight-line evaluator. An EvmAlgebra σ is literally just a step function; run is fuel recursion over it. The concrete EVM interpreter and the abstract ALU-recovery interpreter are both instances of this same run.
-- Framework/Machine.lean — the whole sequential engine structure EvmAlgebra (σ : Type) where step : σ → StepResult σ def EvmAlgebra.run (M : EvmAlgebra σ) : Nat → σ → StepResult σ | 0, s => .cont s | n+1, s => match M.step s with | .cont s' => M.run n s' | r => r -- the only obligation a domain proves; run_sound lifts it to all fuel + all γ-related starts theorem run_sound (H : SoundEvm MC MA γ) : -- H.step_ok : one abstract step ⊒ one concrete step ∀ fuel s a, γ.Refines (MC.run fuel s) (MA.run fuel a) ...
Drives real EVM Config today. The abstract side reuses it via AluRecovery — the abstract stack machine is just another σ.
The textbook abstract-interpretation algorithm: joins at control-flow merges, widening at loop heads to force termination. Runs over a structured WHILE language (skip / assign / seq / ite / whileD).
-- Framework/Analyzer.lean — the fixpoint + widening engine (excerpt) def astep (fuel : ℕ) : Stmt C nvars → AState D nvars → AState D nvars | .skip, a => a | .assign v e, a => aAssign V a v e | .seq c₁ c₂, a => astep fuel c₂ (astep fuel c₁ a) | .ite b c₁ c₂, a => aJoin (astep fuel c₁ (assume b a)) -- ⊔ at the merge (astep fuel c₂ (assume (.not b) a)) | .whileD b c, a => whileInv fuel b c a -- the loop fixpoint def whileInv (fuel) (b c a) : AState D nvars := match fuel with | 0 => topState -- fuel out ⇒ ⊤ (always sound) | f+1 => let next := aWiden DW a (aJoin a (astep f c (assume b a))) if (∀ v, next v = a v) then a else whileInv f b c next -- ▽ until stable theorem analyze_sound : collecting g c (γA a) ⊆ γA (astep fuel c a) -- end-to-end, generic
Proven sound generically (whileInv_entry → whileInv_postfix → while_run_in_inv → analyze_sound), for any AbstractDomain + DomainWidening.
Every soundness theorem above quantifies over the domain, the concretisation γ, the transfer functions, and the widening. None inspects a carrier. So a new domain, a sharper widening, or a program-specific lemma is added as data or as a side proof — the engine and its analyze_sound/run_sound proof are recompiled unchanged and still hold. That's what makes the rest of this document possible.
Widening is the algorithm that makes the loop fixpoint terminate over lattices with infinite ascending chains (like intervals). In this codebase, termination is neither the engine's problem nor "magic" — it is factored into one structure with exactly one proof obligation.
-- Framework/Widening.lean — what a widening must supply structure WideningOp (T : Type) where widen : T → T → T -- the ▽ operator measure : T → ℕ -- a well-founded rank on values widen_lt_of_ne : ∀ a b, widen a b ≠ a → measure (widen a b) < measure a -- ↑ the ONLY obligation: a step that changes the value drops the rank -- proved ONCE, reused by every domain — never re-proved: theorem widenIter_stabilises (init step) : ∃ N, ∀ n, N ≤ n → widenIter init step n = widenIter init step N
widen_lt_of_ne alone (the measure along the iteration is an antitone ℕ-sequence, hence eventually constant). So you can replace widen with something far more precise — threshold/landmark widening, widening-with-care, delayed widening — and as long as you can still prove that one strict-decrease lemma, the entire termination guarantee transfers for free and the engine never changes. Precision goes up; the proof burden stays at one lemma.
The analyzer additionally asks for le_widen_right : b ⊑ widen a b (the order-side soundness fact, packaged in DomainWidening) — the single property the loop's post-fixpoint extraction relies on. Any widening you can prove those two facts about is plug-compatible with the existing analyze_sound.
The proof that engine + a real domain + widening + soundness all compose with zero open hypotheses already exists: Evm/Analysis.lean's analyze_sound_itv. But read the carrier carefully — that's where the live gap is.
Config; run_sound lifts straight-line abstract execution over actual state.astep/whileInv) proven sound generically for any domain + widening.analyze_sound_itv: Engine ② instantiated with a real interval domain (intervalZ), real transfer functions, real widening — no remaining assumptions.widenIter_stabilises: termination, domain-agnostic, proved once.analyze_sound_itv's carrier is ℤ, and Engine ② currently runs over the toy WHILE language, not the real EVM Config.astep/whileInv's abstract step + CFG fixpoint over Config, so full abstract interpretation runs on real EVM programs.Intervals are genuinely sound over ℤ; EVM's 256-bit modular wraparound is the domain's problem to model, not the engine's — which is exactly why the EVM interval transfer functions coarsen aggressively on the operations that can wrap. The engine stays clean; the wraparound complexity is quarantined in the domain's per-op soundness lemmas.
"More powerful analysis" can mean a tighter widening, a precise transfer the modular semantics would otherwise force to ⊤, a proven loop bound, or an entirely new kind of claim. These sit at four levels — and at each, a proof licenses a precision gain the engine then propagates for free. Click a level.
run_sound-shaped theorem.The most common power-up and the cheapest. The domain is data; you replace a piece of that data with something more precise and re-prove only its local obligation.
widen_lt_of_ne (measure = # of non-threshold bounds) and le_widen_rightamul/asub that doesn't collapse to ⊤ in the common non-wrapping case.obligation: re-prove that op's *_sound γ-containment lemmaanalyze_sound / run_sound recompile unchanged; the analyzer returns strictly tighter invariants on the same programs. No engine edit.This is precisely why WideningOp exposes one obligation and ValueTransfer one lemma per op: the surface you must re-prove to gain precision is minimal and local.
Sign knows signs; Interval knows magnitudes. A reduced product Sign × Interval can prove things neither can alone (e.g. x ≠ 0 from Sign refines an Interval that straddles 0). Build it as a new AbstractDomain (A.Elt × B.Elt) with a reduction operator ρ that exchanges information between the components.
γ (ρ x) ⊇ γ x (it never drops a concrete value), and the product's lattice laws hold*_sound via the two factors' lemmasGeneralises to any pair; the combinator itself can be proven sound once over arbitrary AbstractDomain A, AbstractDomain B, so future products are nearly free.
The richest idea, and the one the question is really after. Here the proof is about one program fragment, and it licenses the analyzer to do something it otherwise couldn't justify. Three shapes:
assume = id (guard-agnostic — sound but coarse). For a fragment, prove the guard b implies a state narrowing N: every concrete state satisfying b is still in γ (N a).claim: ∀ s ∈ γA a, g s b = true → s ∈ γA (N a) → replace assume b with the proven Nr decreases on the loop body → upgrades the partial-correctness analyze_sound to a total-correctness statement for that loop, and can bound fuel exactlyMUL's operands stay below 2¹²⁸ on every reachable state.claim: operands < 2¹²⁸ → license the precise, non-wrapping transfer for that site instead of the coarsen-to-⊤ wrap-safe one → a tight interval where you'd otherwise get ⊤BigStep / run behaviour, prove it, and use the proof term as the side-condition discharge that unlocks a sharper transfer / filter / fuel bound. The analyzer then produces a strictly more refined result, and end-to-end soundness still holds because the precise step was used only where its proof obligation was met.The deepest power-up: a fundamentally new kind of analysis. Each is a new algebra + a new fundamental lemma analogous to run_sound (plan §5 EP-C). Not stubs — a genuinely different metatheory, but built to the same shape so it reuses the framework discipline.
astep with its own soundness.A crucial freedom: not every improvement needs a soundness proof. Abstract interpretation cleanly separates finding an answer from checking it. The search can be arbitrary — a heuristic, an SMT solver, an ML model, an LLM — as long as its output is re-validated by a Lean-checked condition. This gives four trust tiers, from fully-verified to fully-unverified-but-still-sound.
e.g. threshold widening: prove it terminates (widen_lt_of_ne) and that it's a refinement. Strongest guarantee; most work. Right for things baked into a domain permanently.
Most new domains/transfers. The *_sound lemma is mandatory (the build goes red without it); whether it's actually tighter is judged by running it. The default tier for Level-1/2 work.
The big one. whileInv_postfix + while_run_in_inv say: any post-fixpoint of the abstract body is a sound invariant. So you can throw away the fixpoint search entirely — have an external oracle guess a candidate invariant — and feed it into the same Lean-checked post-fixpoint test. The engine doesn't care how the invariant was found; soundness rides entirely on the checked certificate. Same for ranking functions: guess r, check it decreases.
Widening-point selection, variable ordering, iteration strategy, which sites to attempt a precise transfer on. These change speed and precision but every result they steer toward is still re-checked downstream, so they need no proof at all. Safe to leave entirely unverified.
Four concrete examples of the loop "propose an improvement, (optionally) prove it, get a sharper analysis". Each names the precise obligation against the live signatures above.
WideningOp Itv whose measure counts bounds not yet at a threshold-or-∞; prove widen_lt_of_ne (each widening step that changes a value retires at least one such bound) and le_widen_right.widenIter_stabilises and analyze_sound are untouched — termination and soundness transfer automatically.assume = id
Level 3tier B
›
if x < 100, the analyzer should know x ∈ [−∞, 99]. Today assume is the identity, so it learns nothing.N b a and prove the soundness side-condition ∀ s ∈ γA a, g s b = true → s ∈ γA (N b a) — i.e. narrowing never excludes a concrete state that actually satisfies the guard. Swap assume b for the proven N b.astep_sound's structure is preserved.r : State → ℕ for a loop (e.g. r = 100 − i).r strictly decreases across the loop body on every state in the invariant and is bounded below. That single checked fact is the certificate — how the oracle found r is irrelevant.collecting ⊆ γA … — "if it terminates, the post-state is covered") into a total-correctness claim for that loop, and lets you pick a finite fuel that provably suffices — turning the fuel parameter from a soundness hedge into an exact bound.MUL site, claim the operands are small enough that 256-bit wraparound can't occur.∀ reachable s, operand₁ s · operand₂ s < 2²⁵⁶ (often itself discharged by a prior interval pass). Use the proof to select the precise non-wrapping transfer (ℤ-style) at that site rather than the conservative coarsen-to-⊤ one the modular semantics forces in general.Beyond the immediate Config-wiring gap, the engine/proof separation opens a research surface. Ordered roughly by distance from what exists today.
Config
the gap · #112
›
astep/whileInv's abstract step and CFG fixpoint over the actual EVM Config carrier, not the toy WHILE/ℤ demonstrator.run_sound already does straight-line.x − y ≤ c) for invariants intervals can't express — relevant for storage-slot and offset reasoning.x ≡ 0 mod 32) — a natural fit for EVM's word-aligned memory and calldata copies.analyze_sound.