An EVM abstract-interpretation pilot · 2026-05-24

One Behavior, Many Views An interpreter, a compiler, an abstract interpreter, a symbolic executor, and a theorem prover walk into a program. They each say something true. None of them contradict each other. Here is why — visually, with one worked example you can poke at.

The big idea from the dialogue this doc grew out of: every one of these tools takes a description of behavior (a semantics) plus its own extra inputs, and produces a view of one and the same underlying object — the program's behavior. They are coordinate systems on a single space. The catch — and the whole reason this repo exists — is that "they won't contradict each other" is a directed relationship that has to be earned by a theorem, not a freebie. Scroll, click, and watch a single computation refract through six lenses.

00The thesis, in four lines

One object. Many views. Agreement is directed and earned.

  • There is one underlying thing. A program's behavior — extensionally, a partial function Program × Input ⇀ Output, or a relation, or a set of execution traces. Everything below is a lens on this.
  • Every tool is a sound view of it. An interpreter, compiler, abstract interpreter, symbolic executor, verifier, prover — each reads a semantics and reports something true about the one behavior.
  • The views sit at different precision. They don't all say the same thing — they say things at different rungs of a lattice. "Exactly 6" and "positive" are both true and don't conflict.
  • Non-contradiction is a theorem, not a gift. Share the semantics literally and agreement is free. Use different descriptions and a linking theorem (equivalence, adequacy, simulation, soundness, conformance) is what guarantees they agree. Skip the theorem and you get drift.

This is the conceptual companion to the repo's three partition surveys and the while-derivation-study. Where those ask "which languages/formats land where," this one asks the deeper question underneath: in what sense are all these tools talking about the same thing at all?

01The one object — four ways to write down the same behavior

Our worked example, threaded through the whole page. A tiny imperative (WHILE) program that computes a factorial:

n := 3; r := 1; while (n > 0) { r := r * n; n := n - 1; } // final state: r = 6, n = 0

The behavior of this program is a single mathematical object. We can write it down in different notations, but they all pin down the same thing — the same input/output pairs, the same trace. Switch between four equivalent descriptions:

Partial function · ⟦P⟧ : State ⇀ State

The denotation maps a starting store to its final store. For our closed program (no free inputs) it is a single point:

⟦P⟧(∅) = { n ↦ 0, r ↦ 6 }

Partial because a non-terminating program maps to (undefined). This is the extensional view: it says what the program computes and nothing about how.

Relation · ⟨P, σ⟩ ⇓ σ′

Big-step natural semantics relates a configuration to a result. Same content, written as a judgment that a derivation tree witnesses:

⟨P, ∅⟩ ⇓ { n ↦ 0, r ↦ 6 }

A relation is more general than a function — it permits non-determinism (one input, many outputs). For deterministic WHILE the relation is a function, which is exactly what a functionality lemma proves.

Trace · the sequence of intermediate states

Small-step semantics exposes the path, not just the endpoints. The behavior is the whole sequence:

{n↦3,r↦1} → {n↦3,r↦3} → {n↦2,r↦3} → {n↦2,r↦6} → {n↦1,r↦6} → {n↦1,r↦6} → {n↦0,r↦6}

Strictly more information than the function: two programs can share an input/output map yet differ in their traces (timing, intermediate observability). Trace equivalence ⟹ I/O equivalence, not vice-versa.

Interaction tree · behavior with effects

When the program talks to the world (reads, writes, calls), the behavior is a (possibly infinite) tree of internal steps and external events:

Tau · Tau · ... · Ret { n↦0, r↦6 } (no I/O here ⇒ a finite stick of Taus ending in Ret)

Interaction trees (Coq's itree) are the modern "one object" used to unify operational and denotational reasoning about effectful, possibly-divergent programs. Our pure example collapses to a finite chain.

The point of the tab strip: you did not change the program and you did not change what it does — you changed the notation for writing down its behavior. Function, relation, trace, and interaction tree are presentations. Which one you pick determines what you can conveniently say, but the object underneath is fixed.

02Six lenses on one computation — click a tool

Each tool below reads (a description of) that behavior and reports a view. The center chip is the ground truth. Watch the relation symbol change: = exact, refines, α over-approximates, proves a property about. None of these answers is wrong. They live at different precisions.

Interpreterruns it
Compiler / IRlowers it
Abstract interp.approximates it
Symbolic exec.explores paths
Deductive verifierchecks a spec
Theorem proverproves a theorem
The behaviorn ↦ 0, r ↦ 6
(ground truth)
=

Every lens is reading the same semantics of WHILE. The difference is not the program and not the meaning — it is what question each tool is built to answer and how much it deliberately throws away to answer it decidably/cheaply.

03The precision lattice — coarser views are still true

The abstract-interpretation lens isn't a single thing — it's a whole tower. Pick how much detail to keep, and you get a different (still sound) answer for r after the loop. Lower = more precise; higher = cheaper & more decidable. Each rung is connected to the concrete by a Galois connection (α, γ): abstract up with α, concretize down with γ.

α ↑ γ ↓ ⊤ (any) Sign Parity Interval Concrete (=) ⊥ (unreachable)
Read it bottom-up. The concrete level knows r = 6 exactly. Each step up forgets something — magnitude, then everything but sign or parity, then everything. Crucially, every higher answer contains the truth: 6 ∈ [1,∞], 6 is (+), 6 is even, 6 ∈ ⊤. A sound abstract interpreter is never wrong — only, by design, vague.

This tower is exactly what the pilot derives: the EVM Sign domain over 256-bit words, with the per-rule obligation α ∘ F ∘ γ ⊑ F♯ certifying that the abstract transfer never under-approximates the concrete one.

04Semantic styles — the same meaning, written five ways

Before any tool runs, someone has to describe the behavior. There are several house styles for doing so. They are interchangeable presentations of the same meaning — and the equivalences between them (§07) are theorems. Here is the WHILE while-loop construct in each:

Small-step structural operational semantics — "one transition at a time"

⟨while b do c, σ⟩ → ⟨if b then (c; while b do c) else skip, σ⟩

Unrolls the loop by exactly one reduction. Produces a trace. The native style of rewriting and of K / Maude. Best for: interleaving, concurrency, modelling cost.

Big-step natural semantics — "input to final answer"

⟨b,σ⟩⇓true ⟨c,σ⟩⇓σ′ ⟨while b do c,σ′⟩⇓σ″ ───────────────────────────────────────────────── ⟨while b do c, σ⟩ ⇓ σ″ ⟨b,σ⟩⇓false ⟹ ⟨while b do c,σ⟩⇓σ

Relates a configuration directly to its result via a derivation tree. Compact, great for proving properties; says nothing about divergence (no tree ⟹ no judgment). Native to relational specs in Coq/Agda.

Denotational — "meaning is a mathematical function"

⟦while b do c⟧ = fix (λf. λσ. if ⟦b⟧σ then f(⟦c⟧σ) else σ)

The loop's meaning is the least fixed point of a functional on a domain (CPO). Compositional: ⟦whole⟧ is built from ⟦parts⟧. Native to λ-calculus / domain theory. Best for equational reasoning & full abstraction.

Abstract machine — "explicit control + data stacks"

⟨ WHILE b c · K | σ ⟩ ↦ ⟨ b · BRANCH(c·WHILE b c, ε) · K | σ ⟩

A first-order transition system with an explicit code/continuation stack — halfway to an implementation (CEK/SECD machines). Derivable from the others by Danvy's CPS + defunctionalization. Native substrate for VMs & bytecode (the EVM is one).

Definitional interpreter — "the semantics is a program"

eval (While b c) σ = if evalB b σ then eval (While b c) (eval c σ) else σ

A total/monadic function in a host language that is the meaning. Tagless-final & monadic interpreters live here. The repo's Rule_ir + Domain functor is this shape: one interpreter, instantiate the monad/domain to get concrete or abstract.

Why they're the same: run any of them on our program and you get r = 6. The differences are intensional (how the meaning is presented, what's easy to prove) not extensional (what the meaning is). Danvy's functional/syntactic correspondence even gives a mechanical path between several of them — they are literally interconvertible, not just "morally equal."

05Computational substrates & their natural styles

Underneath the semantic styles sit a few models of computation. They're all Turing-equivalent as machines — but each makes a different style cheap, which is why specific tools cluster on specific substrates. The substrate is the medium; the style is what you paint with it.

SubstrateComputation is…Natural styleTools that live hereFit
λ-calculusfunction application / β-reductiondenotational, big-stepCoq, Agda, Lean, Haskell●●●
Rewriting logicmatching + rule applicationsmall-step SOSK, Maude●●●
Resolution / Hornproof search / unificationrelational judgmentsTwelf, λProlog, Prolog●●●
Abstract machinestack/register transitionsabstract machineEVM, JVM, WASM, CompCert IRs●●
SAT / SMTconstraint satisfactionnone — a decision engineZ3, CVC5 (infrastructure)
SMT is the odd one out — and the most misunderstood. It is not a way of giving meaning to a program; it has no notion of "behavior." It is a decision procedure that the other tools call: the symbolic executor and the verifier translate questions-about-behavior into formulas and hand them to Z3. SMT is plumbing under the lenses, not a lens.

06What's actually shared — the three-and-a-half layer factoring

If all these tools view one object, what exactly do they have in common, and what is per-tool? Factoring a "semantics" reveals the shared spine. Toggle a layer to see who owns it:

①aConcrete syntax & parser. The surface — keywords, brackets, an AST shape.PER-TOOL · not shared
①bAbstract signature + relation. The operators/sorts and the rules over them — the meaning generator. THE base.SHARED · the spine
Reading commitments. Mode (which args are in/out), binding representation, abstract domain. Design acts.PER-TOOL · design choice
Query / configuration. The conjecture, the property, the precision budget. Per-use, not semantics at all.PER-USE · the question

①a is genuinely free

Tools need a common meaning, not a common grammar. Give each tool its own parser that targets the same abstract signature and the surface syntax can differ entirely — many front-ends, one IR (GHC Core, Rust MIR, the repo's Rule_ir). Binding representation is the sticky residue that resists free interconversion (the POPLmark tax).

①b is what must agree

If two tools share ①b literally, agreement is automatic — they read the same generator. The instant they don't (KEVM's rules vs. the execution-layer spec; small-step vs. big-step), agreement becomes a theorem you owe: a linking result (§07). That obligation is the price of not sharing the spine.

Named instances of this "base + interpretation" pattern: skeletal semantics (skeleton + semantic functions), tagless-final / monadic interpreters, and this repo's Rule_ir + Domain functor — one rule IR, instantiate the domain to get the concrete interpreter or the Sign analyzer.

07Linking theorems — the receipts that forbid contradiction

This is the load-bearing section. "They won't contradict each other" is true because of these theorems, not in spite of missing them. Each one certifies that two descriptions of behavior denote the same object — and each names the failure mode (drift) it rules out. Expand each:

small-step ≡ big-step  ·  equivalence
⟨P,σ⟩ →* ⟨skip,σ′⟩ ⟺ ⟨P,σ⟩ ⇓ σ′
The trace view and the input/output view land on the same final states. Guards against: a step-semantics that "gets there differently" than the natural semantics claims — e.g. an off-by-one in loop unrolling that only the trace would expose.
operational ≡ denotational  ·  adequacy / full abstraction
⟦P⟧σ = σ′ ⟺ ⟨P,σ⟩ ⇓ σ′ (adequacy) P ≈ Q (same behavior in all contexts) ⟺ ⟦P⟧ = ⟦Q⟧ (full abstraction)
The "run it" meaning and the "math function" meaning coincide. Guards against: a denotation that identifies programs the operational semantics distinguishes (too coarse), or distinguishes ones it can't tell apart (not fully abstract).
concrete ⊒ abstract  ·  Galois soundness  this repo's obligation
α ∘ F ∘ γ ⊑ F♯
The abstract transfer F♯ never under-approximates the concrete F: every concrete behavior is covered by the abstract answer. Guards against: an analyzer that says "definitely positive" when the real value could be negative — an unsound result, the only kind that can actually lie. The pilot emits this obligation per ALU rule and diff-tests it.
source ⊒ target  ·  simulation / refinement
σ ~ σ̂ ∧ ⟨P,σ⟩→σ′ ⟹ ∃σ̂′. ⟨compile(P),σ̂⟩→*σ̂′ ∧ σ′ ~ σ̂′
Compiled code exhibits no behavior the source forbade (it may resolve source nondeterminism). Guards against: a miscompilation — the CompCert theorem. The observable result of our factorial survives lowering to a stack machine.
model ≡ reference  ·  conformance / differential testing
∀ t ∈ battery. KEVM(t) = EELS(t) (empirical, not proved)
Two independent semantics of the same artifact agree on a test battery. Guards against: the two definitions silently drifting apart over time. This is the pragmatic stand-in when a full equivalence proof is too expensive — and it's exactly how the pilot certifies the derived Sign transfers against a hand-written reference Sign domain.
The failure mode all of these exist to prevent is drift. Two views of "the same" behavior that have quietly diverged. The entire value proposition of deriving F♯ from F (this repo) instead of hand-writing it: make the linking theorem hold by construction, so there's no gap to drift into.

08"Theorem prover" is four different things

A recurring confusion from the dialogue: a theorem prover is not a Turing machine of a special kind — it's a program that manipulates proofs, and the family splits along two axes (who drives, and what's trusted). Curry–Howard is why this is even coherent: a proof is a program, a proposition is a type, so "checking a proof" and "type-checking a program" are the same operation.

Proof checker

trusts: tiny kernel

Given a complete proof term, verifies it. Decidable, fast, dumb. The Coq/Lean kernel. Trust bottoms out here.

Proof assistant

drives: human + tactics

Helps you build the proof term interactively; emits something the checker accepts. Coq, Lean, Agda, Isabelle. Power: higher-order logic, full expressiveness.

Automated prover

drives: search

Tries to find the proof itself for a restricted logic (first-order). Vampire, E. First-order validity is only semi-decidable — it may run forever.

SMT solver

drives: decision procedure

Decides satisfiability in decidable theories (arithmetic, arrays, bitvectors). Z3, CVC5. Returns SAT/UNSAT/UNKNOWN. The engine the others call.

Where they sit relative to the one object: a prover/assistant establishes a property true of the behavior (a cut through the space, the lens). A checker certifies that establishment is valid. An automated prover/solver is the search/decision engine that finds or confirms the cut. None of them is the behavior — they're tools for asserting true things about it, related to the ground truth by "this property holds of ⟦P⟧."

09K framework — write the semantics once, get the lenses for free

This is the payoff and the reason the repo exists. K's bet is the synthesis made operational: define ①b once as rewrite rules, and the tools below are derived from that one definition — correct-by-construction, no per-tool re-description, no drift. The meta-theory chain rewriting logic → matching logic → reachability logic is what makes each derivation a theorem.

LensDerived from the K definition?How
Parser (①a)autofrom the syntax declarations
Concrete interpreterautoLLVM backend executes the rewrite rules
Symbolic executorautoHaskell/Booster backend + Z3
Deductive verifierautoreachability-logic prover over the same rules
Sound abstract interpreterNOT autoneeds a Galois connection, best transformer, join + widening, a fixpoint — extra structure K doesn't synthesize
That last row is the entire research question of the pilot. Parser, interpreter, symbolic executor, verifier all fall out of one K definition. The sound abstract interpreter does not — and the pilot asks whether the missing pieces (α/γ, best transformer, widening) can be derived from the K rule cheaply enough to beat hand-writing them. If effort_mechanical / effort_hand_written ≥ 1, the move is killed. Everything in this doc is the conceptual scaffolding for why that question is even well-posed: because all these tools are views of one object, a derivation between them is meaningful rather than a category error.

10Watch / listen — the same ideas from the people who built them

Grigore Roșu — The K Framework

Epicenter #239 · podcast/video

§09 in interview form: one semantics, every tool derived correct-by-construction. The closest match to this whole page; EVM-flavored. Start here.

Patrick Cousot — Abstract Interpretation: From Principles to Application

J.P. Morgan Distinguished Lecture, 2021

The inventor on §03's precision lattice and the α/γ soundness relation that is this repo's obligation.

Xavier Leroy — Mechanized Semantics

Collège de France 2019–20 · slides + Coq

The rigorous version of §04 and §07: all five styles for one language, plus the linking theorems (CompCert simulation) proved.

Type Theory Forall

podcast

The accessible companion to §08 — proofs-as-programs, seL4/Isabelle verification, what a prover actually is.

Patrick Cousot — Abstract Induction

talk

On fixpoints + widening — the piece K does not auto-derive (§09's last row).

The Type Theory Podcast

podcast

Categorical / denotational angle (e.g. Dybjer) — pairs with the λ-calculus substrate in §05 and the denotational style in §04.

If you watch two: Roșu for the framing, Cousot (J.P. Morgan) for the soundness half. Together they cover both halves of "all sound views of one behavior, related by α/γ and linking theorems."