Companion to the program-status doc · 2026-05-26

The abstract interpreter, the proofs, and how to add power

The fidelity work proves the semantics are right. But the analysis itself — the fixpoint, the widening, the transfer functions — is a separate machine. This walks what the interpreter engine actually is in Lean today, how the domains and the KEVM-fidelity streams plug into it, and the deeper question: how do you use Lean proofs to make a domain or a specific program analysis more powerful — a sharper widening, a proven loop bound, a licensed precise transfer — and have the analyzer come back with strictly more refined, still-sound answers?

Two generic engines Four levels to add power Verify ⇄ guess-and-check
01 · The distinction

Three things that are constantly conflated

"The abstract interpreter" sounds like one thing. In this codebase it is three, with sharply different jobs and sharply different trust stories. Getting the boundaries right is the whole point — it's what lets you change one without re-proving the others.

① The engine

The generic machinery that drives a computation: the fuel-stepper and the CFG fixpoint+widening loop. Written once, parametric over everything. Lives in Framework/. Proven sound once and never touched again.

② The domain (data)

A lattice + transfer functions + a widening, bundled with one γ-containment lemma per operation. This is data you feed the engine. Sign, Interval, … each an instance under Evm/. This is where precision lives.

③ The fidelity proof

KEVM equivalence — bN_refines under Evm/Kevm/. Certifies the concrete interpreter matches the reference. Says nothing about abstraction. Orthogonal to ① and ②; they meet only at the concrete step.

The load-bearing idea: precision is the domain's job, termination is the widening's job, soundness is a per-operation lemma's job, and fidelity is the KEVM stream's job — and the engine knows about none of them. That separation is what makes "add power" a local edit: you change a domain or prove a program-specific fact, and the engine + its soundness theorem carry it end-to-end unchanged.
abstract ⊒ concrete←—— Pillar A: SoundAlu / step_ok (the domain's per-op lemma)
composes ⇣ at the shared concrete step
concrete = KEVM←—— Pillar B: bN_refines (the fidelity stream)
abstract analysis ⊒ KEVM reality— and the engine that ran it never knew which domain or which opcode
02 · The engine, concretely

Two generic drivers, both already proven sound

There isn't one abstract interpreter — there are two evaluators in Framework/, for two shapes of program. Both are fully parametric (kill-criterion #1: no if domain == … anywhere) and both already carry their soundness theorem.

The straight-line evaluator. An EvmAlgebra σ is literally just a step function; run is fuel recursion over it. The concrete EVM interpreter and the abstract ALU-recovery interpreter are both instances of this same run.

-- Framework/Machine.lean — the whole sequential engine
structure EvmAlgebra (σ : Type) where step : σ → StepResult σ

def EvmAlgebra.run (M : EvmAlgebra σ) : Nat → σ → StepResult σ
  | 0,   s => .cont s
  | n+1, s => match M.step s with | .cont s' => M.run n s' | r => r

-- the only obligation a domain proves; run_sound lifts it to all fuel + all γ-related starts
theorem run_sound (H : SoundEvm MC MA γ) :  -- H.step_ok : one abstract step ⊒ one concrete step
    ∀ fuel s a, γ.Refines (MC.run fuel s) (MA.run fuel a) ...

Drives real EVM Config today. The abstract side reuses it via AluRecovery — the abstract stack machine is just another σ.

The textbook abstract-interpretation algorithm: joins at control-flow merges, widening at loop heads to force termination. Runs over a structured WHILE language (skip / assign / seq / ite / whileD).

-- Framework/Analyzer.lean — the fixpoint + widening engine (excerpt)
def astep (fuel : ) : Stmt C nvars → AState D nvars → AState D nvars
  | .skip,        a => a
  | .assign v e,  a => aAssign V a v e
  | .seq c₁ c₂,   a => astep fuel c₂ (astep fuel c₁ a)
  | .ite b c₁ c₂, a => aJoin (astep fuel c₁ (assume b a))         -- ⊔ at the merge
                            (astep fuel c₂ (assume (.not b) a))
  | .whileD b c,  a => whileInv fuel b c a                         -- the loop fixpoint

def whileInv (fuel) (b c a) : AState D nvars :=
  match fuel with
  | 0   => topState                                           -- fuel out ⇒ ⊤ (always sound)
  | f+1 => let next := aWiden DW a (aJoin a (astep f c (assume b a)))
           if (∀ v, next v = a v) then a else whileInv f b c next  -- ▽ until stable

theorem analyze_sound : collecting g c (γA a) ⊆ γA (astep fuel c a)  -- end-to-end, generic

Proven sound generically (whileInv_entrywhileInv_postfixwhile_run_in_invanalyze_sound), for any AbstractDomain + DomainWidening.

Why "the engine knows nothing" is the whole game

Every soundness theorem above quantifies over the domain, the concretisation γ, the transfer functions, and the widening. None inspects a carrier. So a new domain, a sharper widening, or a program-specific lemma is added as data or as a side proof — the engine and its analyze_sound/run_sound proof are recompiled unchanged and still hold. That's what makes the rest of this document possible.

03 · Widening — termination as one lemma

The whole termination cost of a new domain is a single descending-chain lemma

Widening is the algorithm that makes the loop fixpoint terminate over lattices with infinite ascending chains (like intervals). In this codebase, termination is neither the engine's problem nor "magic" — it is factored into one structure with exactly one proof obligation.

-- Framework/Widening.lean — what a widening must supply
structure WideningOp (T : Type) where
  widen          : T → T → T                                -- the ▽ operator
  measure        : T →                               -- a well-founded rank on values
  widen_lt_of_ne : ∀ a b, widen a b ≠ a → measure (widen a b) < measure a
                                                            -- ↑ the ONLY obligation: a step that changes the value drops the rank

-- proved ONCE, reused by every domain — never re-proved:
theorem widenIter_stabilises (init step) :
    ∃ N, ∀ n, N ≤ n → widenIter init step n = widenIter init step N
This is the lever for "a better widening". Termination follows from widen_lt_of_ne alone (the measure along the iteration is an antitone ℕ-sequence, hence eventually constant). So you can replace widen with something far more precise — threshold/landmark widening, widening-with-care, delayed widening — and as long as you can still prove that one strict-decrease lemma, the entire termination guarantee transfers for free and the engine never changes. Precision goes up; the proof burden stays at one lemma.

The analyzer additionally asks for le_widen_right : b ⊑ widen a b (the order-side soundness fact, packaged in DomainWidening) — the single property the loop's post-fixpoint extraction relies on. Any widening you can prove those two facts about is plug-compatible with the existing analyze_sound.

04 · What's wired today

The composition exists end-to-end — but not yet over the real EVM state

The proof that engine + a real domain + widening + soundness all compose with zero open hypotheses already exists: Evm/Analysis.lean's analyze_sound_itv. But read the carrier carefully — that's where the live gap is.

✓ Proven & closed

  • Engine ① drives real EVM Config; run_sound lifts straight-line abstract execution over actual state.
  • Engine ② (astep/whileInv) proven sound generically for any domain + widening.
  • analyze_sound_itv: Engine ② instantiated with a real interval domain (intervalZ), real transfer functions, real widening — no remaining assumptions.
  • widenIter_stabilises: termination, domain-agnostic, proved once.

⛔ The gap — D-uplift-2 (#112)

  • analyze_sound_itv's carrier is , and Engine ② currently runs over the toy WHILE language, not the real EVM Config.
  • The looping analyzer (joins + widening) is not yet wired over actual bytecode CFGs.
  • Closing it = instantiate astep/whileInv's abstract step + CFG fixpoint over Config, so full abstract interpretation runs on real EVM programs.

Why intervals live over ℤ, not Word — and why that's deliberate

Intervals are genuinely sound over ℤ; EVM's 256-bit modular wraparound is the domain's problem to model, not the engine's — which is exactly why the EVM interval transfer functions coarsen aggressively on the operations that can wrap. The engine stays clean; the wraparound complexity is quarantined in the domain's per-op soundness lemmas.

05 · Adding power with proofs

Four levels at which a Lean proof makes the analysis sharper

"More powerful analysis" can mean a tighter widening, a precise transfer the modular semantics would otherwise force to ⊤, a proven loop bound, or an entirely new kind of claim. These sit at four levels — and at each, a proof licenses a precision gain the engine then propagates for free. Click a level.

LEVEL 1 · in a domain
Sharper operators
A better widening, a more precise transfer function, a new lattice.
data + 1 lemma
LEVEL 2 · across domains
Products & reductions
Combine domains; prove the reduction sound once to get precision neither has alone.
combinator + proof
LEVEL 3 · in a program
Proven program facts
A proof about this fragment — termination, no-overflow, a guard's effect — licenses a precise step.
side lemma → license
LEVEL 4 · in the engine
New lift theorems
A new semantics (relational, backward, gas) = a new run_sound-shaped theorem.
new metatheory

Level 1 — sharpening the operators inside a domain

The most common power-up and the cheapest. The domain is data; you replace a piece of that data with something more precise and re-prove only its local obligation.

Better widening. Swap naïve interval widening (any unstable bound → ±∞) for threshold widening: snap to the nearest constant drawn from the program text instead of jumping to ∞.obligation: re-prove widen_lt_of_ne (measure = # of non-threshold bounds) and le_widen_right
Sharper transfer. Give a tighter amul/asub that doesn't collapse to ⊤ in the common non-wrapping case.obligation: re-prove that op's *_sound γ-containment lemma
Payoff. analyze_sound / run_sound recompile unchanged; the analyzer returns strictly tighter invariants on the same programs. No engine edit.

This is precisely why WideningOp exposes one obligation and ValueTransfer one lemma per op: the surface you must re-prove to gain precision is minimal and local.

Level 2 — combining domains (reduced products)

Sign knows signs; Interval knows magnitudes. A reduced product Sign × Interval can prove things neither can alone (e.g. x ≠ 0 from Sign refines an Interval that straddles 0). Build it as a new AbstractDomain (A.Elt × B.Elt) with a reduction operator ρ that exchanges information between the components.

ρ
Define the reduction ρ : A×B → A×B that tightens each component using the other.obligation: ρ is sound — γ (ρ x) ⊇ γ x (it never drops a concrete value), and the product's lattice laws hold
Lift the transfers componentwise, then apply ρ.obligation: each lifted *_sound via the two factors' lemmas
Payoff. A strictly more precise domain that drops into the same engine. The product is itself just another instance — kill-#1 holds.

Generalises to any pair; the combinator itself can be proven sound once over arbitrary AbstractDomain A, AbstractDomain B, so future products are nearly free.

Level 3 — proving a fact about a specific program

The richest idea, and the one the question is really after. Here the proof is about one program fragment, and it licenses the analyzer to do something it otherwise couldn't justify. Three shapes:

G
Proven guard filter. Today assume = id (guard-agnostic — sound but coarse). For a fragment, prove the guard b implies a state narrowing N: every concrete state satisfying b is still in γ (N a).claim: ∀ s ∈ γA a, g s b = true → s ∈ γA (N a) → replace assume b with the proven N
T
Proven termination / ranking function. Exhibit a ranking function that strictly decreases each loop iteration and is bounded below.claim: r decreases on the loop body → upgrades the partial-correctness analyze_sound to a total-correctness statement for that loop, and can bound fuel exactly
Proven no-overflow. Prove this MUL's operands stay below 2¹²⁸ on every reachable state.claim: operands < 2¹²⁸ → license the precise, non-wrapping transfer for that site instead of the coarsen-to-⊤ wrap-safe one → a tight interval where you'd otherwise get ⊤
The pattern: encode the claim as a Lean proposition about the fragment's BigStep / run behaviour, prove it, and use the proof term as the side-condition discharge that unlocks a sharper transfer / filter / fuel bound. The analyzer then produces a strictly more refined result, and end-to-end soundness still holds because the precise step was used only where its proof obligation was met.

Level 4 — a new lift theorem in the metatheory

The deepest power-up: a fundamentally new kind of analysis. Each is a new algebra + a new fundamental lemma analogous to run_sound (plan §5 EP-C). Not stubs — a genuinely different metatheory, but built to the same shape so it reuses the framework discipline.

  • Relational / trace semantics — reason about pairs of executions (non-interference, constant-time). New γ over relations; new lift.
  • Backward analysis — necessary preconditions, weakest-precondition style. A dual astep with its own soundness.
  • Quantitative (gas) semantics — a cost-carrying algebra; the lift bounds worst-case gas. Already on the roadmap as the gas refinement.
06 · Verify, or guess-and-check

You don't have to prove the search — only the certificate

A crucial freedom: not every improvement needs a soundness proof. Abstract interpretation cleanly separates finding an answer from checking it. The search can be arbitrary — a heuristic, an SMT solver, an ML model, an LLM — as long as its output is re-validated by a Lean-checked condition. This gives four trust tiers, from fully-verified to fully-unverified-but-still-sound.

A · proven sound + proven better

Verify both soundness and the improvement

e.g. threshold widening: prove it terminates (widen_lt_of_ne) and that it's a refinement. Strongest guarantee; most work. Right for things baked into a domain permanently.

B · proven sound, empirically better

Prove only soundness; let precision be measured

Most new domains/transfers. The *_sound lemma is mandatory (the build goes red without it); whether it's actually tighter is judged by running it. The default tier for Level-1/2 work.

C · unverified search, verified certificate

An oracle proposes; Lean checks the proof obligation

The big one. whileInv_postfix + while_run_in_inv say: any post-fixpoint of the abstract body is a sound invariant. So you can throw away the fixpoint search entirely — have an external oracle guess a candidate invariant — and feed it into the same Lean-checked post-fixpoint test. The engine doesn't care how the invariant was found; soundness rides entirely on the checked certificate. Same for ranking functions: guess r, check it decreases.

D · fully unverified, can't affect soundness

Heuristics that only guide, never decide

Widening-point selection, variable ordering, iteration strategy, which sites to attempt a precise transfer on. These change speed and precision but every result they steer toward is still re-checked downstream, so they need no proof at all. Safe to leave entirely unverified.

Why tier C is the sweet spot for "adding power". The hard, creative part of program analysis — finding a clever invariant or ranking function — is exactly where heuristics, solvers, and learned models shine and where formal proof is painful. The framework lets that part be untrusted, because the check is a single Lean-verified condition (post-fixpoint / strict-decrease). You get the cleverness of unverified search with the trust of a machine-checked kernel. The proof obligation is small and fixed; the search behind it can be anything.
The one rule: the certificate check must be the only thing soundness depends on. If an unverified oracle's output is ever trusted rather than re-checked, it silently re-enters the trusted base — exactly the runtime-gate failure mode this whole architecture was built to delete. Oracles propose; the Lean-checked condition disposes.
07 · Worked thought-experiments

Claim → Lean obligation → what the analyzer gains

Four concrete examples of the loop "propose an improvement, (optionally) prove it, get a sharper analysis". Each names the precise obligation against the live signatures above.

AThreshold widening — a more optimal ▽ Level 1tier A
  • Propose: instead of widening an unstable bound straight to ±∞, snap it to the nearest threshold — a constant appearing in the program (loop guards, array sizes). The classic fix for over-eager interval widening.
  • Lean obligation: a WideningOp Itv whose measure counts bounds not yet at a threshold-or-∞; prove widen_lt_of_ne (each widening step that changes a value retires at least one such bound) and le_widen_right.
  • Gain: loops with a constant bound get the exact bound as their invariant instead of ⊤/∞. widenIter_stabilises and analyze_sound are untouched — termination and soundness transfer automatically.
BProven guard narrowing — replacing assume = id Level 3tier B
  • Propose: on the true-branch of if x < 100, the analyzer should know x ∈ [−∞, 99]. Today assume is the identity, so it learns nothing.
  • Lean obligation: define a narrowing N b a and prove the soundness side-condition ∀ s ∈ γA a, g s b = true → s ∈ γA (N b a) — i.e. narrowing never excludes a concrete state that actually satisfies the guard. Swap assume b for the proven N b.
  • Gain: dramatically tighter branch- and loop-invariants (this is where most real precision in interval analysis comes from). Still a sound coarsening, so astep_sound's structure is preserved.
CProven loop termination — a ranking function Level 3tier C
  • Propose (unverified search): an oracle guesses a ranking function r : State → ℕ for a loop (e.g. r = 100 − i).
  • Lean check (verified certificate): prove r strictly decreases across the loop body on every state in the invariant and is bounded below. That single checked fact is the certificate — how the oracle found r is irrelevant.
  • Gain: upgrades the analyzer's partial-correctness result (collecting ⊆ γA … — "if it terminates, the post-state is covered") into a total-correctness claim for that loop, and lets you pick a finite fuel that provably suffices — turning the fuel parameter from a soundness hedge into an exact bound.
DProven no-overflow — licensing a precise transfer Level 3tier C
  • Propose: at a specific MUL site, claim the operands are small enough that 256-bit wraparound can't occur.
  • Lean obligation: prove ∀ reachable s, operand₁ s · operand₂ s < 2²⁵⁶ (often itself discharged by a prior interval pass). Use the proof to select the precise non-wrapping transfer (ℤ-style) at that site rather than the conservative coarsen-to-⊤ one the modular semantics forces in general.
  • Gain: a tight interval result exactly where the coarse wrap-safe transfer would have returned ⊤. Soundness holds because the precise transfer is used only under the proof that its side-condition is met — this is the Word-vs-ℤ wraparound gap (from §04) being closed locally by proof instead of conservatively everywhere.
08 · Future uplift & research

Where the interpreter goes next

Beyond the immediate Config-wiring gap, the engine/proof separation opens a research surface. Ordered roughly by distance from what exists today.

1Wire the fixpoint engine over real EVM Config the gap · #112
  • Instantiate astep/whileInv's abstract step and CFG fixpoint over the actual EVM Config carrier, not the toy WHILE/ℤ demonstrator.
  • Requires reconstructing structured control flow (loops/branches) from EVM jump-based bytecode — the JUMP/JUMPI refinement (deferred to D12) feeds this.
  • Outcome: full looping abstract interpretation runs on real bytecode, joining what run_sound already does straight-line.
2A richer domain library — relational & non-interval Level 1–2
  • Octagons / polyhedra (relational: x − y ≤ c) for invariants intervals can't express — relevant for storage-slot and offset reasoning.
  • Congruence / modular domains (x ≡ 0 mod 32) — a natural fit for EVM's word-aligned memory and calldata copies.
  • Reduced products combining several of the above, each proven sound once against the framework.
3Oracle-guided invariants — tier-C at scale verify-the-certificate
  • Build the certificate-checking path as a first-class API: an external engine (SMT, ML, LLM) proposes candidate invariants/ranking functions; Lean checks the post-fixpoint / strict-decrease condition.
  • Decouples the expensive, creative search from the cheap, trusted check — the search can evolve freely without ever touching the trusted base.
  • Directly turns the "guess a clever invariant" problem into "produce a Lean-checkable certificate", which is where the precision ceiling of fixpoint-only analysis is broken.
4New lift theorems — backward, relational, quantitative Level 4
  • Backward / WP: necessary-precondition analysis for "can this revert / this assertion fail" — a dual engine with its own analyze_sound.
  • Relational: 2-execution semantics for non-interference and constant-time properties — security-relevant for EVM contracts.
  • Quantitative gas: a cost-carrying algebra whose lift bounds worst-case gas — folds the deferred gas refinement into the same metatheory shape.
5Proof-carrying analysis results — exportable certificates research
  • Emit, for each analyzed contract, a Lean-checkable certificate that a third party can re-verify without trusting the analyzer — the analysis output becomes itself a proof object.
  • Pairs with tier-C: the invariant + its post-fixpoint check is the certificate; ship both.
  • Pushes the trusted base all the way down to the Lean kernel for downstream consumers, not just for us.