A General Framework for Program Analysis — One Substrate, Many Instruments

00The thesis: a framework is a substrate plus a bench of instruments

Two tools (an interpreter, an abstract interpreter) is not a framework — it's two tools. A framework is the thing that lets you add the third, fourth, nth without re-describing the language each time.

The repo's earlier docs establish the kernel idea: a program is a finite term over a signature Σ; its behaviour is fixed by one Σ-algebra; and every tool is the same fold plus a little tool-specific apparatus.¹³ Three questions follow, and they are exactly the ones this document answers:

Question	Answered in
How do we add any* tool — not just the two we have?*	§02 the six-slot instrument interface; §04 the catalog
How do we guarantee every tool is reading the same underlying behaviour?	§01 one derivable substrate; §03 the monad knob; §05 the linking-theorem zoo
What must a frontend / program* provide to be analyzable by them all?*	§06 the analyzability contract

The one-sentence framework. Fix a single executable semantics as the substrate; express every analysis as an instrument that folds that substrate into its own carrier; and bind each instrument back to the substrate with an explicit linking theorem. Adding a tool = filling six slots and discharging one theorem. Nothing is hand-reimplemented, so nothing can drift.

This sits above the companion explainer one behaviour, many views (the lenses + linking theorems) and generalises it into a build recipe and a contract.

01The substrate — one source of truth, everything else derived

"All tools see the same behaviour" is not a hope; it is an architectural decision about where the semantics lives and how each tool gets at it.

There are only two honest ways to make two tools agree about a program:

Strategy	How agreement is obtained	Failure mode
Share the semantics literally	both tools fold the same definition; agreement is definitional, free	none — but you must have one definition to share
Link two descriptions by theorem	each tool has its own description; a proof (adequacy, simulation, soundness, conformance) certifies they agree	skip the theorem and you get drift — two "semantics" that silently disagree

The disease the substrate cures is drift: a hand-written interpreter, a separate hand-written abstract semantics, a third hand-written verifier model — three reimplementations of "the language" that nobody proved equal, diverging quietly as the language evolves. The cure is to keep one semantics and derive or link the rest.

What the substrate must be

Not every semantics can serve as a substrate. It has to be all four of:

structuralA term algebra over Σ

programs are finite trees of operators; this is what every instrument folds over.

executableA computable transition / fold

a rewrite relation or a recursive evaluator — so the concrete instrument is a real interpreter, not a paper definition.

tool-neutralNo tool baked in

it commits to meaning, not to "interpreter" or "verifier"; every tool is downstream of it.

machine-readableReified, not just code

the rules are data a tool can ingest (K's compiled.json, a skeletal .sk, a Σ-algebra term) — the precondition for deriving rather than rewriting.

compositionalMeaning of the whole from the parts

so a fold/homomorphism exists at all — the structural invariant every instrument relies on.

classifiedOperators carry families

Arithmetic / Comparison / Store / Control / Structural — each instrument treats a family uniformly (this repo's Rule_classifier).

Three concrete substrates the repo already uses or surveys, all satisfying the four:

Substrate	Form	Derivation engine
K rewrite rules¹¹	operational SOS, reified to `compiled.json`	`kompile` → interpreter; `kprove` → verifier (reachability logic); this repo's `RULE_SOURCE` port ingests the JSON
Skeletal semantics¹²	a `.sk` file: typed skeletons with unspecified hooks	`necroml` → a functorised interpreter parametric over a monad + hooks (§03)
Σ-algebra / denotation¹³	one carrier + one operation per constructor; `⟦·⟧ : T_Σ → D`	a fold; swap `D` for a different sound view

The K promise, generalised. "Define a language once and get a parser, interpreter, symbolic execution engine and verifier for free" is the substrate principle in slogan form. The framework below is what that slogan looks like when you take it seriously enough to add arbitrary instruments and demand a theorem for each.

02The instrument interface — six slots and a concession

Generalising the repo's five ports: any analysis method, however exotic, is the substrate folded through a fixed set of choices. Define the choices and you have defined the tool.

An instrument is a filling of these six slots, plus the undecidability concession it pays (there is always one — Rice's theorem):

slot 1Carrier

what the fold lands in — concrete state, an abstract lattice, a symbolic store + path condition, a logical formula, a proof tree.

slot 2Transfer per operator

the interpretation of each Σ-constructor in that carrier — the abstract f♯, the symbolic op, the typing rule, the wp clause.

slot 3Fixpoint strategy

how loops/recursion are handled — run-to-halt, Kleene + widening, bounded unrolling, a supplied invariant, a ranking function.

slot 4Query language

what you may ask — one value, "∀ reachable φ", "∃ input reaching L", a Hoare triple, an LTL formula, a 2-safety relation.

slot 5Linking obligation

the theorem binding this view to the substrate (§05): identity, Galois soundness, under-approx soundness, adequacy, relative completeness, by-construction.

slot 6Evidence / output

what it hands back — a value, an invariant, a counterexample witness, a proof object, a certificate + small trusted base.

Read the slots as a recipe. To add a tool to the framework you do not write a new semantics — you choose a carrier (1), say how each operator acts on it (2), pick how loops converge (3), fix what you can ask (4), discharge the link to the substrate (5), and define what it emits (6). Slots 1–3 make it run; slot 5 makes it trustworthy; slots 4 & 6 make it useful.

Mapped to this repo's ports: slot 1+2+3 = the TRANSFER_DERIVER over a Domain.ABSTRACT_DOMAIN; slot 5 = the VALIDATOR port (the differential oracle); the substrate = the RULE_SOURCE + CONCRETE_INTERPRETER ports. The framework is "make slot-1 (the carrier) a free parameter and watch the instrument change."

03The monad knob — one skeleton, swap the mode, get a different instrument

The deepest answer to "how do they all see the same behaviour": literal sharing. Several instruments can be the same generated code, differing only in one plugged-in module. This is real in this repo, not an analogy.

The skeletal substrate compiles (necroml) to a functor MakeInterpreter (F : UNSPEC) over a MONAD. The generated dispatch — pop the stack, select the opcode, push the result — is shared, opcode-agnostic, proof-free code. Two slots are left open:

module type MONAD = sig
  type 'a t
  val ret    : 'a -> 'a t
  val bind   : 'a t -> ('a -> 'b t) -> 'b t
  val branch : (unit -> 'a t) list -> 'a t   (* ← the fork point: pick one? join all? keep all? *)
  val fail   : unit -> 'a t                   (* stuck / underflow *)
  val apply  : ('a -> 'b t) -> 'a -> 'b t
  val extract: 'a t -> 'a
end
(* UNSPEC also supplies the per-operator word hooks: add_word, div_word, lt_word, … *)

The monad decides the mode of analysis (how branches combine, what a value is); the hooks decide the domain (what + means). Turn the monad knob and the very same skeleton becomes a different instrument — by construction, sharing the dispatch they can never disagree on. Pick a setting:

This is the by-construction route to "same behaviour". When two instruments are the same generated skeleton over different monads, there is no transcription to get wrong and no dispatch theorem to prove — soundness reduces to the per-operator hook lemma alone. It is exactly the result of Abstracting Abstract Machines⁹ and Abstracting Definitional Interpreters¹⁰ (one machine/interpreter, a knob that tunes it from concrete to abstract) and of Skeletal Semantics'¹² consistent concrete and abstract interpretations — and it is what the pilot's --deriver necro path delivers: ratio 0.000, nine domain-lemma obligations, zero hand proofs.

05The linking-theorem zoo — how a view earns agreement

Slot 5 is the load-bearing one. Non-contradiction between instruments is a theorem, never a gift. These are the species of theorem, weakest trusted-assumption first.

Linking theorem	Shape	Buys you	Used by
Literal sharing / by-construction	same generated fold, different module plugged in	agreement with no dispatch proof; only per-op lemmas remain	monad-knob instruments (§03); necro deriver
Identity	the instrument is the reference semantics	it defines the ground truth others are checked against	concrete interpreter / oracle
Galois soundness	`α ∘ F ∘ γ ⊑ F♯`	over-approximation never lies (proves absence)	abstract interp, type systems, dataflow, MC-as-AI
Under-approx soundness	`[P] C [Q]`: every reported state genuinely reachable	every bug found is real, with a witness (proves presence)	symbolic exec, concolic, BMC, incorrectness logic⁴
Adequacy	operational ⇔ denotational agree on observables	two styles of the substrate describe one behaviour	denotational vs SOS; itrees¹⁴
Simulation / refinement	`C ⊑ C'` via a simulation relation	one program/compiler preserves another's behaviour	translation validation, CompCert⁸, equivalence
Relative completeness	complete modulo an assertion-language oracle (Cook)	the only incompleteness is the logic's, not the method's	deductive verifiers³
Kernel soundness (de Bruijn / LCF)	every inference re-checked by a tiny kernel	shrinks the trusted base; the proof producer may be buggy	proof checkers, PCC⁵, seL4⁵
Conformance (differential)	random/exhaustive samples agree with the oracle	cheap empirical binding when a proof is too costly	this repo's `VALIDATOR` port (γ-containment testing)

Conformance is the weakest link — and the repo's pragmatic default. Differential testing against the KEVM oracle checks soundness on sampled inputs; it does not prove it. The by-construction route (§03) is strictly stronger: it removes the dispatch obligation entirely. The framework lets you choose per instrument how much rigour to buy — and to upgrade a link from conformance to by-construction without touching the substrate.

06The analyzability contract — what a frontend must expose

The user's third question: what would a program / frontend need so that all these instruments can analyse it? It is a layered contract. Click a layer to "provide" it and watch which instruments come online.

Instruments unlocked by what you've provided

The contract is cumulative and substrate-shaped. Layers 0–2 (a signature, a derivable reference semantics, a named configuration) are the substrate — provide them and the concrete interpreter and the universal collecting/abstract analyses already exist. Each higher layer (a domain + Galois connection, an assertion language, a property logic, a proof system) unlocks one more family of instruments. A frontend that stops at Layer 0 (syntax only) is analyzable by nothing; one that reaches Layer 6 (discharge mechanisms) is analyzable, with proof, by all of them.

Note the asymmetry: the substrate layers (0–3) are paid once per language; the instrument extras (4–6) are paid once per tool, and several tools share them (every numeric instrument reuses the same domain; every safety instrument reuses the configuration's observable cells).

07How the pilot already realises it — and what's missing

The framework is not hypothetical here. The pilot is a two-instrument instance of it; the gaps are exactly the slots not yet filled.

SUBSTRATEPort ① RULE_SOURCE + Port ② CONCRETE_INTERPRETER

real KEVM rules ingested as IR; the concrete oracle (Evm_concrete / KEVM-LLVM) is the identity-linked reference. Satisfies all four substrate criteria.

INSTRUMENT — abstract interpPort ④ TRANSFER_DERIVER over Port ③ Domain (Sign, Interval)

slot 1–3 filled; calculational and necro (by-construction) derivers both behind one interface.

LINKPort ⑤ VALIDATOR — differential γ-containment vs the oracle

slot 5 = conformance today; necro upgrades the dispatch portion to by-construction.

The gaps — instruments the user named, expressed as new ports

Each is "fill the six slots against the existing substrate." Crucially, §03's monad knob means slots 1–3 are often a new monad instantiation of the existing .sk, not new dispatch code.

Instrument	What to add (slots 1–4)	Link (slot 5)	Reuse
Symbolic executor	a `SYMBOLIC` monad (`branch` = fork + accumulate path condition) over the same skeleton; an SMT discharger for feasibility & witness extraction	under-approx soundness (every reported path feasible)	same generated `exec`; same word hooks lifted to symbolic terms
Deductive verifier	an assertion language over the configuration cells; a `wp`/VC generator read off the rewrite rules; an SMT/proof discharger; loop-invariant input	relative completeness (Cook) — exact modulo the assertion logic	the `Rule_ir` + cell structure; the oracle for VC sanity
Proof checker	reachability-logic / Hoare proof objects; a small kernel re-checking each step + leaf side-conditions	kernel soundness (de Bruijn) — TCB = kernel + spec	the verifier's VCs are the checker's side-conditions
Gas / cost analyzer	a `WRITER` (cost) monad over the skeleton; recurrence/ranking solving for loops	soundness of the cost model vs operational step count	same dispatch; EVM gas table as hooks

The headline. Three of the four gaps reuse the same generated dispatch and word hooks — they are new MONAD instantiations, not new semantics. That is the framework paying off: the cost of the n-th instrument collapses from "re-describe the language" to "plug in a monad and discharge one link."

08The options — three ways to build the framework, and how they compose

These are not mutually exclusive; the strongest system uses all three at different layers — which is, in fact, what this repo is converging on.

AFederation by conformance

Keep each instrument as its own (possibly hand-written) port; bind each to the substrate by a differential oracle (the current Port ⑤). Adding a tool = write the port + a sampling harness.

Cheapest to add an instrument; no proof burden; works for tools you can't easily generate.

Catches real drift on every sampled input; great as a regression gate.

Binding is testing, not proof — soundness holds only on sampled inputs.

Each port can still encode the dispatch wrongly; nothing forces it to share the substrate's control flow.

BMonad-parametric generation (necro / AAM / ADI)

Write the operational core once as a skeleton; generate the functorised dispatch; obtain each instrument by plugging in a monad (mode) + hooks (domain). §03.

Dispatch is shared by construction — zero dispatch obligations; soundness reduces to per-op hook lemmas (ratio 0.000).

Adding an instrument is often just a new monad — the strongest answer to "same behaviour".

Requires the semantics in skeletal form and a monad per mode; precision is bounded by the domain's declared primitives (calc ⊑ necro).

Not every analysis fits a clean monad (relational/hyperproperty reasoning needs product constructions on top).

CMeta-language native (K / Maude / Coq + itrees)

Adopt a framework whose toolchain itself derives many instruments from one definition: K's kompile (interpreter) + kprove (verifier via matching/reachability logic¹¹); Maude's rewriting + LTL model checker; or a denotational substrate in Coq with interaction trees¹⁴ giving interpreter + relational + refinement proofs.

Maximum leverage — parser, interpreter, symbolic engine, prover all from one source, upstream of you.

The substrate is already reified and battle-tested (KEVM is this).

You inherit the meta-language's trusted base and its expressiveness ceiling.

Instruments outside what the toolchain ships (a bespoke cost analysis, a custom domain) still need options A or B around it.

The recommended composition (and the repo's actual trajectory). C is the source — K/KEVM is the substrate the rules come from. B is the trust core — necro generates the instruments that must be sound by construction. A is the federation layer — the differential validator binds everything (including hand-written or third-party instruments) and guards against drift on every commit. Source → generate → federate: one substrate, many instruments, each linked at the strongest rigour you can afford.

09Knowledge Check

Click a card to reveal the answer.

0 / 7 self-marked correct

What turns "two tools" into a framework?

A shared substrate (one derivable semantics) plus a fixed instrument interface (six slots) so the n-th tool is added by filling slots + discharging one linking theorem — never by re-describing the language.

Two honest ways to make tools agree about a program — and the disease that strikes if you do neither?

(1) Share the semantics literally — agreement is free. (2) Link two descriptions by a theorem (adequacy, simulation, soundness, conformance). Skip both and you get drift: multiple hand-written "semantics" that silently disagree.

In the monad-knob picture, what does the branch combinator decide, and how does it differ across interpreter / abstract interp / symbolic executor?

It decides how multiple possible continuations combine. Interpreter: pick one (identity). Abstract interpreter: join all into one lattice element. Symbolic executor: keep all, each with its path condition. Same generated dispatch; different monad.

Why is model checking (or type inference, or dataflow) not a rival to abstract interpretation in this framework?

They are instances of it — same six slots, different carrier/query. Types are an abstract interpretation (Cousot 1997); dataflow is model checking of an abstract interpretation (Schmidt 1998); temporal model checking is AI over trace properties (Cousot 2000).

Which linking theorem is the weakest, and which is strongest — and how does the framework let you move between them?

Weakest: conformance (differential testing — checks, doesn't prove). Strongest: by-construction (shared generated dispatch — no dispatch theorem at all). You can upgrade an instrument from conformance to by-construction without touching the substrate.

What are the substrate layers of the analyzability contract, and why are they cheaper than they look?

L0 signature/grammar, L1 a derivable reference semantics (compositional → a fold exists), L2 a named observable configuration, L3 rule families. They are paid once per language; the per-instrument extras (domain, assertion language, proof system) are paid once per tool and shared across tools of the same family.

How do the three build options (A federation, B generation, C meta-language) compose in the recommended system?

C is the source (K/KEVM provides the substrate), B is the trust core (necro generates instruments sound by construction), A is the federation layer (the differential validator binds everything and catches drift). Source → generate → federate.

1 / 7

10References

The framework is a synthesis; these are its load-bearing sources.

1P. Cousot & R. Cousot, "Abstract Interpretation: A Unified Lattice Model…," POPL 1977 — Galois connections, soundness, widening. P. Cousot, Principles of Abstract Interpretation, MIT Press 2021 — AI as the unifying frame for static analysis.
2P. Cousot & R. Cousot, "Systematic Design of Program Analysis Frameworks," POPL 1979; "Abstract Interpretation Frameworks," J. Logic & Computation 1992 — the collecting semantics and calculational derivation of analyses.
3C. A. R. Hoare, "An Axiomatic Basis…," CACM 1969; E. Dijkstra, "Guarded Commands" (wp), CACM 1975; S. Cook, "Soundness and Completeness of an Axiom System…," SIAM J. Comput. 1978 (relative completeness). Tools: Dafny, Why3, F*, Frama-C.
4J. C. King, "Symbolic Execution and Program Testing," CACM 1976; KLEE (OSDI 2008); DART/CUTE (2005, concolic); P. O'Hearn, "Incorrectness Logic," POPL 2020 — the under-approximate dual of Hoare logic.
5R. Milner, Edinburgh LCF, 1979 (small-kernel discipline); de Bruijn, AUTOMATH; G. Necula, "Proof-Carrying Code," POPL 1997; A. Appel, "Foundational PCC," LICS 2001; G. Klein et al., "seL4," SOSP 2009.
6P. Cousot, "Types as Abstract Interpretations," POPL 1997 — type systems as instances of AI.
7D. Schmidt, "Data Flow Analysis is Model Checking of Abstract Interpretations," POPL 1998; P. Cousot & R. Cousot, "Temporal Abstract Interpretation," POPL 2000 — model checking as AI over trace properties; Clarke–Emerson / Queille–Sifakis (model checking, 1981); Clarke et al., CEGAR (2000); Biere et al., BMC (1999).
8A. Pnueli, M. Siegel, E. Singerman, "Translation Validation," TACAS 1998; X. Leroy, "Formal Verification of a Realistic Compiler" (CompCert), CACM 2009 — simulation / semantic preservation.
9D. Van Horn & M. Might, "Abstracting Abstract Machines," ICFP 2010 — derive a sound abstract interpreter from an abstract machine by a single store-allocation knob.
10D. Darais, N. Labich, P. C. Nguyen, D. Van Horn, "Abstracting Definitional Interpreters," ICFP 2017 — one monadic definitional interpreter, concrete and abstract by swapping the monad. (See also Liang–Hudak–Jones, "Monad Transformers and Modular Interpreters," POPL 1995.)
11G. Roșu & T. Șerbănuță, "An Overview of the K Semantic Framework," JLAP 2010; G. Roșu, "Matching Logic," LMCS 2017 and reachability logic — one definition, derived interpreter + symbolic engine + prover. KEVM: Hildenbrandt et al., CSF 2018.
12M. Bodin, P. Gardner, T. Jensen, A. Schmitt, "Skeletal Semantics and their Interpretations," POPL 2019 — one skeletal definition, consistent concrete and abstract interpretations; the Necro toolchain (this repo's Option 3).
13J. Goguen, J. Thatcher, E. Wagner, J. Wright (ADJ), "Initial Algebra Semantics…," JACM 1977; J. Reynolds, "Definitional Interpreters for Higher-Order Programming Languages," 1972 — program = term, meaning = fold.
14L. Xia et al., "Interaction Trees: Representing Recursive and Impure Programs in Coq," POPL 2020 — one denotation for effectful, possibly-divergent programs; adequacy + refinement.
15M. Clarkson & F. Schneider, "Hyperproperties," CSF 2008 / JCS 2010; N. Benton, "Relational Hoare Logic," POPL 2004; Barthe–D'Argenio–Rezk, self-composition, CSFW 2004; Sousa–Dillig, Cartesian Hoare Logic, PLDI 2016.

One Substrate, Many Instruments