The companion docs map which tool answers what. This one goes up a level: a general framework in which any analysis method — interpreter, abstract interpreter, symbolic executor, model checker, type system, deductive verifier, proof checker — is a single instrument reading one shared semantics. We pin down what makes them all look at the same behaviour, what a program must expose to be analyzable by every one of them, and the architectural options for building it.
Two tools (an interpreter, an abstract interpreter) is not a framework — it's two tools. A framework is the thing that lets you add the third, fourth, nth without re-describing the language each time.
The repo's earlier docs establish the kernel idea: a program is a finite term over a
signature Σ; its behaviour is fixed by one Σ-algebra; and every tool is
the same fold plus a little tool-specific apparatus.13 Three questions follow, and
they are exactly the ones this document answers:
| Question | Answered in |
|---|---|
| How do we add any tool — not just the two we have? | §02 the six-slot instrument interface; §04 the catalog |
| How do we guarantee every tool is reading the same underlying behaviour? | §01 one derivable substrate; §03 the monad knob; §05 the linking-theorem zoo |
| What must a frontend / program provide to be analyzable by them all? | §06 the analyzability contract |
This sits above the companion explainer one behaviour, many views (the lenses + linking theorems) and generalises it into a build recipe and a contract.
"All tools see the same behaviour" is not a hope; it is an architectural decision about where the semantics lives and how each tool gets at it.
There are only two honest ways to make two tools agree about a program:
| Strategy | How agreement is obtained | Failure mode |
|---|---|---|
| Share the semantics literally | both tools fold the same definition; agreement is definitional, free | none — but you must have one definition to share |
| Link two descriptions by theorem | each tool has its own description; a proof (adequacy, simulation, soundness, conformance) certifies they agree | skip the theorem and you get drift — two "semantics" that silently disagree |
The disease the substrate cures is drift: a hand-written interpreter, a separate hand-written abstract semantics, a third hand-written verifier model — three reimplementations of "the language" that nobody proved equal, diverging quietly as the language evolves. The cure is to keep one semantics and derive or link the rest.
Not every semantics can serve as a substrate. It has to be all four of:
programs are finite trees of operators; this is what every instrument folds over.
a rewrite relation or a recursive evaluator — so the concrete instrument is a real interpreter, not a paper definition.
it commits to meaning, not to "interpreter" or "verifier"; every tool is downstream of it.
the rules are data a tool can ingest (K's compiled.json, a skeletal .sk, a Σ-algebra term) — the precondition for deriving rather than rewriting.
so a fold/homomorphism exists at all — the structural invariant every instrument relies on.
Arithmetic / Comparison / Store / Control / Structural — each instrument treats a family uniformly (this repo's Rule_classifier).
Three concrete substrates the repo already uses or surveys, all satisfying the four:
| Substrate | Form | Derivation engine |
|---|---|---|
| K rewrite rules11 | operational SOS, reified to compiled.json | kompile → interpreter; kprove → verifier (reachability logic); this repo's RULE_SOURCE port ingests the JSON |
| Skeletal semantics12 | a .sk file: typed skeletons with unspecified hooks | necroml → a functorised interpreter parametric over a monad + hooks (§03) |
| Σ-algebra / denotation13 | one carrier + one operation per constructor; ⟦·⟧ : T_Σ → D | a fold; swap D for a different sound view |
Generalising the repo's five ports: any analysis method, however exotic, is the substrate folded through a fixed set of choices. Define the choices and you have defined the tool.
An instrument is a filling of these six slots, plus the undecidability concession it pays (there is always one — Rice's theorem):
what the fold lands in — concrete state, an abstract lattice, a symbolic store + path condition, a logical formula, a proof tree.
the interpretation of each Σ-constructor in that carrier — the abstract f♯, the symbolic op, the typing rule, the wp clause.
how loops/recursion are handled — run-to-halt, Kleene + widening, bounded unrolling, a supplied invariant, a ranking function.
what you may ask — one value, "∀ reachable φ", "∃ input reaching L", a Hoare triple, an LTL formula, a 2-safety relation.
the theorem binding this view to the substrate (§05): identity, Galois soundness, under-approx soundness, adequacy, relative completeness, by-construction.
what it hands back — a value, an invariant, a counterexample witness, a proof object, a certificate + small trusted base.
Mapped to this repo's ports:
slot 1+2+3 = the TRANSFER_DERIVER over a Domain.ABSTRACT_DOMAIN;
slot 5 = the VALIDATOR port (the differential oracle);
the substrate = the RULE_SOURCE + CONCRETE_INTERPRETER ports. The framework is
"make slot-1 (the carrier) a free parameter and watch the instrument change."
The deepest answer to "how do they all see the same behaviour": literal sharing. Several instruments can be the same generated code, differing only in one plugged-in module. This is real in this repo, not an analogy.
The skeletal substrate compiles (necroml) to a functor
MakeInterpreter (F : UNSPEC) over a MONAD. The generated dispatch — pop the
stack, select the opcode, push the result — is shared, opcode-agnostic, proof-free code. Two slots are
left open:
module type MONAD = sig type 'a t val ret : 'a -> 'a t val bind : 'a t -> ('a -> 'b t) -> 'b t val branch : (unit -> 'a t) list -> 'a t (* ← the fork point: pick one? join all? keep all? *) val fail : unit -> 'a t (* stuck / underflow *) val apply : ('a -> 'b t) -> 'a -> 'b t val extract: 'a t -> 'a end (* UNSPEC also supplies the per-operator word hooks: add_word, div_word, lt_word, … *)
The monad decides the mode of analysis (how branches combine, what a value is);
the hooks decide the domain (what + means). Turn the monad knob and
the very same skeleton becomes a different instrument — by construction, sharing the dispatch
they can never disagree on. Pick a setting:
--deriver necro path
delivers: ratio 0.000, nine domain-lemma obligations, zero hand proofs.The bench. Pick an instrument; see its six slots filled and the concession it pays. Filter by which corner of the undecidability triangle it concedes.
Slot 5 is the load-bearing one. Non-contradiction between instruments is a theorem, never a gift. These are the species of theorem, weakest trusted-assumption first.
| Linking theorem | Shape | Buys you | Used by |
|---|---|---|---|
| Literal sharing / by-construction | same generated fold, different module plugged in | agreement with no dispatch proof; only per-op lemmas remain | monad-knob instruments (§03); necro deriver |
| Identity | the instrument is the reference semantics | it defines the ground truth others are checked against | concrete interpreter / oracle |
| Galois soundness | α ∘ F ∘ γ ⊑ F♯ | over-approximation never lies (proves absence) | abstract interp, type systems, dataflow, MC-as-AI |
| Under-approx soundness | [P] C [Q]: every reported state genuinely reachable | every bug found is real, with a witness (proves presence) | symbolic exec, concolic, BMC, incorrectness logic4 |
| Adequacy | operational ⇔ denotational agree on observables | two styles of the substrate describe one behaviour | denotational vs SOS; itrees14 |
| Simulation / refinement | C ⊑ C' via a simulation relation | one program/compiler preserves another's behaviour | translation validation, CompCert8, equivalence |
| Relative completeness | complete modulo an assertion-language oracle (Cook) | the only incompleteness is the logic's, not the method's | deductive verifiers3 |
| Kernel soundness (de Bruijn / LCF) | every inference re-checked by a tiny kernel | shrinks the trusted base; the proof producer may be buggy | proof checkers, PCC5, seL45 |
| Conformance (differential) | random/exhaustive samples agree with the oracle | cheap empirical binding when a proof is too costly | this repo's VALIDATOR port (γ-containment testing) |
The user's third question: what would a program / frontend need so that all these instruments can analyse it? It is a layered contract. Click a layer to "provide" it and watch which instruments come online.
Note the asymmetry: the substrate layers (0–3) are paid once per language; the instrument extras (4–6) are paid once per tool, and several tools share them (every numeric instrument reuses the same domain; every safety instrument reuses the configuration's observable cells).
The framework is not hypothetical here. The pilot is a two-instrument instance of it; the gaps are exactly the slots not yet filled.
real KEVM rules ingested as IR; the concrete oracle (Evm_concrete / KEVM-LLVM) is the identity-linked reference. Satisfies all four substrate criteria.
slot 1–3 filled; calculational and necro (by-construction) derivers both behind one interface.
slot 5 = conformance today; necro upgrades the dispatch portion to by-construction.
Each is "fill the six slots against the existing substrate." Crucially, §03's monad knob means slots
1–3 are often a new monad instantiation of the existing .sk, not new dispatch code.
| Instrument | What to add (slots 1–4) | Link (slot 5) | Reuse |
|---|---|---|---|
| Symbolic executor | a SYMBOLIC monad (branch = fork + accumulate path condition) over the same skeleton; an SMT discharger for feasibility & witness extraction | under-approx soundness (every reported path feasible) | same generated exec; same word hooks lifted to symbolic terms |
| Deductive verifier | an assertion language over the configuration cells; a wp/VC generator read off the rewrite rules; an SMT/proof discharger; loop-invariant input | relative completeness (Cook) — exact modulo the assertion logic | the Rule_ir + cell structure; the oracle for VC sanity |
| Proof checker | reachability-logic / Hoare proof objects; a small kernel re-checking each step + leaf side-conditions | kernel soundness (de Bruijn) — TCB = kernel + spec | the verifier's VCs are the checker's side-conditions |
| Gas / cost analyzer | a WRITER (cost) monad over the skeleton; recurrence/ranking solving for loops | soundness of the cost model vs operational step count | same dispatch; EVM gas table as hooks |
MONAD instantiations, not new semantics.
That is the framework paying off: the cost of the n-th instrument collapses from "re-describe the
language" to "plug in a monad and discharge one link."These are not mutually exclusive; the strongest system uses all three at different layers — which is, in fact, what this repo is converging on.
Keep each instrument as its own (possibly hand-written) port; bind each to the substrate by a differential oracle (the current Port ⑤). Adding a tool = write the port + a sampling harness.
Cheapest to add an instrument; no proof burden; works for tools you can't easily generate.
Catches real drift on every sampled input; great as a regression gate.
Binding is testing, not proof — soundness holds only on sampled inputs.
Each port can still encode the dispatch wrongly; nothing forces it to share the substrate's control flow.
Write the operational core once as a skeleton; generate the functorised dispatch; obtain each instrument by plugging in a monad (mode) + hooks (domain). §03.
Dispatch is shared by construction — zero dispatch obligations; soundness reduces to per-op hook lemmas (ratio 0.000).
Adding an instrument is often just a new monad — the strongest answer to "same behaviour".
Requires the semantics in skeletal form and a monad per mode; precision is bounded by the domain's declared primitives (calc ⊑ necro).
Not every analysis fits a clean monad (relational/hyperproperty reasoning needs product constructions on top).
Adopt a framework whose toolchain itself derives many instruments from one definition: K's
kompile (interpreter) + kprove (verifier via matching/reachability
logic11); Maude's rewriting + LTL model checker; or a denotational substrate in Coq with
interaction trees14 giving interpreter + relational + refinement proofs.
Maximum leverage — parser, interpreter, symbolic engine, prover all from one source, upstream of you.
The substrate is already reified and battle-tested (KEVM is this).
You inherit the meta-language's trusted base and its expressiveness ceiling.
Instruments outside what the toolchain ships (a bespoke cost analysis, a custom domain) still need options A or B around it.
Click a card to reveal the answer.
What turns "two tools" into a framework?
A shared substrate (one derivable semantics) plus a fixed instrument interface (six slots) so the n-th tool is added by filling slots + discharging one linking theorem — never by re-describing the language.
Two honest ways to make tools agree about a program — and the disease that strikes if you do neither?
(1) Share the semantics literally — agreement is free. (2) Link two descriptions by a theorem (adequacy, simulation, soundness, conformance). Skip both and you get drift: multiple hand-written "semantics" that silently disagree.
In the monad-knob picture, what does the branch combinator decide, and how does it differ across interpreter / abstract interp / symbolic executor?
It decides how multiple possible continuations combine. Interpreter: pick one (identity). Abstract interpreter: join all into one lattice element. Symbolic executor: keep all, each with its path condition. Same generated dispatch; different monad.
Why is model checking (or type inference, or dataflow) not a rival to abstract interpretation in this framework?
They are instances of it — same six slots, different carrier/query. Types are an abstract interpretation (Cousot 1997); dataflow is model checking of an abstract interpretation (Schmidt 1998); temporal model checking is AI over trace properties (Cousot 2000).
Which linking theorem is the weakest, and which is strongest — and how does the framework let you move between them?
Weakest: conformance (differential testing — checks, doesn't prove). Strongest: by-construction (shared generated dispatch — no dispatch theorem at all). You can upgrade an instrument from conformance to by-construction without touching the substrate.
What are the substrate layers of the analyzability contract, and why are they cheaper than they look?
L0 signature/grammar, L1 a derivable reference semantics (compositional → a fold exists), L2 a named observable configuration, L3 rule families. They are paid once per language; the per-instrument extras (domain, assertion language, proof system) are paid once per tool and shared across tools of the same family.
How do the three build options (A federation, B generation, C meta-language) compose in the recommended system?
C is the source (K/KEVM provides the substrate), B is the trust core (necro generates instruments sound by construction), A is the federation layer (the differential validator binds everything and catches drift). Source → generate → federate.
The framework is a synthesis; these are its load-bearing sources.