An EVM abstract-interpretation pilot · 2026-05-24
The big idea from the dialogue this doc grew out of: every one of these tools takes a description of behavior (a semantics) plus its own extra inputs, and produces a view of one and the same underlying object — the program's behavior. They are coordinate systems on a single space. The catch — and the whole reason this repo exists — is that "they won't contradict each other" is a directed relationship that has to be earned by a theorem, not a freebie. Scroll, click, and watch a single computation refract through six lenses.
This is the conceptual companion to the repo's three partition surveys and the while-derivation-study. Where those ask "which languages/formats land where," this one asks the deeper question underneath: in what sense are all these tools talking about the same thing at all?
Our worked example, threaded through the whole page. A tiny imperative (WHILE) program that computes a factorial:
The behavior of this program is a single mathematical object. We can write it down in different notations, but they all pin down the same thing — the same input/output pairs, the same trace. Switch between four equivalent descriptions:
The denotation maps a starting store to its final store. For our closed program (no free inputs) it is a single point:
Partial because a non-terminating program maps to ⊥ (undefined). This is the extensional view: it says what the program computes and nothing about how.
Big-step natural semantics relates a configuration to a result. Same content, written as a judgment that a derivation tree witnesses:
A relation is more general than a function — it permits non-determinism (one input, many outputs). For deterministic WHILE the relation is a function, which is exactly what a functionality lemma proves.
Small-step semantics exposes the path, not just the endpoints. The behavior is the whole sequence:
Strictly more information than the function: two programs can share an input/output map yet differ in their traces (timing, intermediate observability). Trace equivalence ⟹ I/O equivalence, not vice-versa.
When the program talks to the world (reads, writes, calls), the behavior is a (possibly infinite) tree of internal steps and external events:
Interaction trees (Coq's itree) are the modern "one object" used to unify operational and denotational reasoning about effectful, possibly-divergent programs. Our pure example collapses to a finite chain.
Each tool below reads (a description of) that behavior and reports a view. The center chip is the ground truth. Watch the relation symbol change: = exact, ⊑ refines, α over-approximates, ⊢ proves a property about. None of these answers is wrong. They live at different precisions.
Every lens is reading the same semantics of WHILE. The difference is not the program and not the meaning — it is what question each tool is built to answer and how much it deliberately throws away to answer it decidably/cheaply.
The abstract-interpretation lens isn't a single thing — it's a whole tower. Pick how much detail to keep, and you get a different (still sound) answer for r after the loop. Lower = more precise; higher = cheaper & more decidable. Each rung is connected to the concrete by a Galois connection (α, γ): abstract up with α, concretize down with γ.
This tower is exactly what the pilot derives: the EVM Sign domain over 256-bit words, with the per-rule obligation α ∘ F ∘ γ ⊑ F♯ certifying that the abstract transfer never under-approximates the concrete one.
Before any tool runs, someone has to describe the behavior. There are several house styles for doing so. They are interchangeable presentations of the same meaning — and the equivalences between them (§07) are theorems. Here is the WHILE while-loop construct in each:
Unrolls the loop by exactly one reduction. Produces a trace. The native style of rewriting and of K / Maude. Best for: interleaving, concurrency, modelling cost.
Relates a configuration directly to its result via a derivation tree. Compact, great for proving properties; says nothing about divergence (no tree ⟹ no judgment). Native to relational specs in Coq/Agda.
The loop's meaning is the least fixed point of a functional on a domain (CPO). Compositional: ⟦whole⟧ is built from ⟦parts⟧. Native to λ-calculus / domain theory. Best for equational reasoning & full abstraction.
A first-order transition system with an explicit code/continuation stack — halfway to an implementation (CEK/SECD machines). Derivable from the others by Danvy's CPS + defunctionalization. Native substrate for VMs & bytecode (the EVM is one).
A total/monadic function in a host language that is the meaning. Tagless-final & monadic interpreters live here. The repo's Rule_ir + Domain functor is this shape: one interpreter, instantiate the monad/domain to get concrete or abstract.
Underneath the semantic styles sit a few models of computation. They're all Turing-equivalent as machines — but each makes a different style cheap, which is why specific tools cluster on specific substrates. The substrate is the medium; the style is what you paint with it.
| Substrate | Computation is… | Natural style | Tools that live here | Fit |
|---|---|---|---|---|
| λ-calculus | function application / β-reduction | denotational, big-step | Coq, Agda, Lean, Haskell | ●●● |
| Rewriting logic | matching + rule application | small-step SOS | K, Maude | ●●● |
| Resolution / Horn | proof search / unification | relational judgments | Twelf, λProlog, Prolog | ●●● |
| Abstract machine | stack/register transitions | abstract machine | EVM, JVM, WASM, CompCert IRs | ●● |
| SAT / SMT | constraint satisfaction | none — a decision engine | Z3, CVC5 (infrastructure) | ○ |
If all these tools view one object, what exactly do they have in common, and what is per-tool? Factoring a "semantics" reveals the shared spine. Toggle a layer to see who owns it:
Tools need a common meaning, not a common grammar. Give each tool its own parser that targets the same abstract signature and the surface syntax can differ entirely — many front-ends, one IR (GHC Core, Rust MIR, the repo's Rule_ir). Binding representation is the sticky residue that resists free interconversion (the POPLmark tax).
If two tools share ①b literally, agreement is automatic — they read the same generator. The instant they don't (KEVM's rules vs. the execution-layer spec; small-step vs. big-step), agreement becomes a theorem you owe: a linking result (§07). That obligation is the price of not sharing the spine.
Named instances of this "base + interpretation" pattern: skeletal semantics (skeleton + semantic functions), tagless-final / monadic interpreters, and this repo's Rule_ir + Domain functor — one rule IR, instantiate the domain to get the concrete interpreter or the Sign analyzer.
This is the load-bearing section. "They won't contradict each other" is true because of these theorems, not in spite of missing them. Each one certifies that two descriptions of behavior denote the same object — and each names the failure mode (drift) it rules out. Expand each:
A recurring confusion from the dialogue: a theorem prover is not a Turing machine of a special kind — it's a program that manipulates proofs, and the family splits along two axes (who drives, and what's trusted). Curry–Howard is why this is even coherent: a proof is a program, a proposition is a type, so "checking a proof" and "type-checking a program" are the same operation.
trusts: tiny kernel
Given a complete proof term, verifies it. Decidable, fast, dumb. The Coq/Lean kernel. Trust bottoms out here.
drives: human + tactics
Helps you build the proof term interactively; emits something the checker accepts. Coq, Lean, Agda, Isabelle. Power: higher-order logic, full expressiveness.
drives: search
Tries to find the proof itself for a restricted logic (first-order). Vampire, E. First-order validity is only semi-decidable — it may run forever.
drives: decision procedure
Decides satisfiability in decidable theories (arithmetic, arrays, bitvectors). Z3, CVC5. Returns SAT/UNSAT/UNKNOWN. The engine the others call.
This is the payoff and the reason the repo exists. K's bet is the synthesis made operational: define ①b once as rewrite rules, and the tools below are derived from that one definition — correct-by-construction, no per-tool re-description, no drift. The meta-theory chain rewriting logic → matching logic → reachability logic is what makes each derivation a theorem.
| Lens | Derived from the K definition? | How |
|---|---|---|
| Parser (①a) | auto | from the syntax declarations |
| Concrete interpreter | auto | LLVM backend executes the rewrite rules |
| Symbolic executor | auto | Haskell/Booster backend + Z3 |
| Deductive verifier | auto | reachability-logic prover over the same rules |
| Sound abstract interpreter | NOT auto | needs a Galois connection, best transformer, join + widening, a fixpoint — extra structure K doesn't synthesize |
Epicenter #239 · podcast/video
§09 in interview form: one semantics, every tool derived correct-by-construction. The closest match to this whole page; EVM-flavored. Start here.
J.P. Morgan Distinguished Lecture, 2021
The inventor on §03's precision lattice and the α/γ soundness relation that is this repo's obligation.
Collège de France 2019–20 · slides + Coq
The rigorous version of §04 and §07: all five styles for one language, plus the linking theorems (CompCert simulation) proved.
podcast
The accessible companion to §08 — proofs-as-programs, seL4/Isabelle verification, what a prover actually is.
talk
On fixpoints + widening — the piece K does not auto-derive (§09's last row).
podcast
Categorical / denotational angle (e.g. Dybjer) — pairs with the λ-calculus substrate in §05 and the denotational style in §04.
If you watch two: Roșu for the framing, Cousot (J.P. Morgan) for the soundness half. Together they cover both halves of "all sound views of one behavior, related by α/γ and linking theorems."