This repo doesn't just describe the EVM — it executes real bytecode through a Lean 4 interpreter whose soundness is a machine-checked theorem. The decoder, the state machine, the gas meter, and the conformance harness are all here, and the same evaluator that runs your bytecode is the one the proofs reason about.
The Ethereum Virtual Machine is a stack-based virtual machine. A transaction hands it a contract's bytecode (a flat byte string) plus calldata, and execution is one small-step loop repeated until the program halts.
pc. Most opcodes are one byte. The PUSH1…PUSH32 family carry 1–32 immediate bytes inline that get loaded onto the stack — so they're 2–33 bytes wide.ADD pops two, pushes one; SLOAD pops a key, pushes the stored value; and so on.gas would go negative, the frame halts out-of-gas and reverts. Gas is the EVM's halting guarantee and its DoS price.pc forward by 1 (or 1 + immediate width for PUSH). JUMP/JUMPI write pc directly — but only to a byte tagged JUMPDEST, which is what keeps control flow honest.STOP / RETURN (success, maybe return data), REVERT (undo state, return data), INVALID, or SELFDESTRUCT. Otherwise go back to step 1.Stack — 256-bit words, depth ≤ 1024.
Memory — a byte array, zero-extended, grows on access.
Storage — the SLOAD/SSTORE key→value map per account.
Transient — EIP-1153, cleared each tx.
Every account's balance, nonce, code, storage. CALL/CREATE push a new frame with its own stack, memory and gas budget.
Mental model: the EVM is a pure function from (world state, transaction) to (new world state, return data), computed by iterating a single small-step transition. That "single transition iterated" shape is exactly what makes it amenable to a one-evaluator-proved-once design.
The repo's CLAUDE.md still describes an "ALU-only" demonstrator — that's stale. What's actually here is a real, machine-checked EVM interpreter with a CLI to drive it.
Two execution subcommands live in Main.lean:
run-bytecode <hex> — decode hex into a ByteArray, build an initial Config, run, and print the terminal stack / return-data / gas (or full KEVM-shaped JSON with --json). Optional --calldata <hex> and --gas <N>.run-fixture <file.json> — load an ethereum/tests GeneralStateTest, execute it, and diff the post-state account-by-account.Verified live in this session: feeding 6005600301… to the decoder produced the exact instruction stream push 1 0x05 · push 1 0x07 · add · … with a correct JUMPDEST table, and the execution machine (Evm.Step, Evm.Metered, Evm.Decode, Evm.Config) builds green. (The only red on the current WIP branch is an unrelated half-finished ABI proof in Evm/Abi.lean — not the executor.)
Here's the exact program above — 60 05 60 03 01 00 — executed one transition at a time. Watch pc walk the code and the stack grow and collapse. This is the loop from §1, and it's this loop that EvmAlgebra.run implements in Lean.
pc.Note: gas figures here are illustrative (3 per PUSH/ADD) to show the meter moving — the real schedule lives in Evm/Schedule.lean and the metered machine Evm.Metered.aluEvmMetered.
Execution is a pipeline of four Lean modules. The shape mirrors the EVM yellow paper and KEVM cell-for-cell, but everything is a total Lean function so the kernel can reason about it.
Decodes the entire ≈146-opcode set. Evm/Op.lean carries 72+ distinct opcodes plus the families collapsed into Fin-carrying constructors:
inductive Op | add | mul | sub | div | sdiv | mod | … | keccak256 | address | balance | caller | calldataload | … | basefee | blobhash | mload | mstore | sload | sstore | tload | tstore | mcopy | jump | jumpi | pc | gas | jumpdest | push (n : Fin 33) (imm : Word) -- PUSH0..PUSH32 0x5f–0x7f | dup (n : Fin 16) -- DUP1..DUP16 | swap (n : Fin 16) -- SWAP1..SWAP16 | log (n : Fin 5) -- LOG0..LOG4 | stop | return | revert | invalid | selfdestruct | create | call | callcode | delegatecall | create2 | staticcall
It handles PUSH-immediate consumption, truncated-PUSH-at-end-of-stream (zero-padding), and the JUMPDEST scan that marks valid jump targets.
Trust posture: the decoder is upstream of the soundness TCB. Its correctness is discharged by round-trip theorems (decode ∘ encode = id), all proved axiom-clean — not by trusting the code.
One structure holding every cell step reads or writes, mapped to KEVM's <callState> / <block> / <network>:
structure Machine where stack : List Word -- head = top, soft bound 1024 memory : MemSpec -- total byte fn, default 0 outside touched region pc : ℕ gas : ℕ -- OOG = would-go-negative structure Account where balance, nonce, code storage : Word → Word -- spec rep, default 0 origStorage : Word → Word -- pre-tx snapshot (refunds) transientStorage : Word → Word -- EIP-1153
Big maps (memory, storage, world) use the spec representation: a total function with a default, so there are no partiality side-conditions to carry through proofs. Word = BitVec 256.
One function decodes the opcode at pc once and dispatches to a per-bucket handler. The dispatch list is frozen; opcode buckets Steps/B1..B11 + Call own disjoint branches:
def stepConcrete : Config → StepResult Config -- decode env.code at machine.pc, dispatch decoded Op to Steps.Bn.step inductive StepResult (σ) -- the halting taxonomy | cont (s : σ) -- keep stepping | halt (s : σ) (out : ByteArray) -- STOP / RETURN | revert (s : σ) (out : ByteArray) | stuck -- underflow / INVALID / fuel-out
The decoder runs in exactly one place; buckets never re-decode, never edit the dispatch list, never add a constructor. That discipline is what keeps the proof obligations local.
The whole-program driver is a fuel-indexed iteration of step. Fuel is what makes it a total function even though real EVM programs loop:
structure EvmAlgebra (σ) where step : σ → StepResult σ -- THE ONE EVALUATOR: iterate step until terminal or fuel runs out def EvmAlgebra.run (M : EvmAlgebra σ) : Nat → σ → StepResult σ | 0, s => .stuck | fuel + 1, s => match M.step s with | .cont s' => M.run fuel s' | r => r
The concrete EVM is just aluEvm : EvmAlgebra Config. Run it with aluEvm.run 10000 c0. The very same run is instantiated by the abstract interpreters too — which is the whole trick (§06).
There are deliberately two concrete interpreters — and keeping them apart is a design decision, not an accident.
The soundness reference. No gas accounting at all, so the proofs about what a program computes never get tangled up with what it costs. This is what run-bytecode uses by default.
A separate machine that adds the static opcode-cost frontier and out-of-gas halting. Engaged by --gas N. It refines the reference: metered execution does exactly what the gas-free machine does, or it halts OOG earlier — never something different.
Why split them? Gas is an enforcement layer, not part of the meaning of a program. By proving the gas-free machine correct and then proving the metered machine refines it, you get a clean theorem about computation and a clean theorem about resource bounds, without either contaminating the other.
Plenty of EVMs execute bytecode — geth, revm, evmone. What's rare is an EVM where the interpreter and its correctness proof are the same artifact, checked by a kernel you can't lie to.
The killer move: EvmAlgebra.run is parametric. Plug in Config and you get the concrete EVM. Plug in an abstract domain (Sign, Interval, …) and you get a static analyzer — the exact same run. Soundness is then a single theorem:
-- ★ THE FUNDAMENTAL LEMMA — proved ONCE, by induction on fuel theorem run_sound (h : SoundEvm MC MA γ) : ∀ (fuel : Nat) (sA) (sC), sC ∈ γ sA → StepResult.Refines γ (MC.run fuel sC) (MA.run fuel sA)
Read it plainly: for every program, every input, and every step count, the concrete run is refined by the abstract run — provided each opcode's per-step transfer is sound (the SoundEvm witness). Add a new abstract domain and you only prove the per-opcode arms; whole-program soundness is free, with zero edits to the framework.
SoundEvm witness, and lake build goes red. A half-built or dishonest reading can't certify itself — it fails to compile a proof.
run_sound, per-domain soundness, decoder round-trips). Axiom-clean: only propext, Classical.choice, Quot.sound. No sorry.run. The concrete and abstract interpreters can't disagree about what an opcode means, because there's one definition.Evm/Yul.lean, compileStmt_correct & friends) — compiling then executing agrees with evaluating the source, control flow included, proved.kevm_pilot proof-status --check walks every soundness theorem and collects its axioms at runtime, asserting kernel-cleanliness. The proof of trustworthiness ships with the binary.A verification project earns trust by being explicit about what it does not prove. These are tracked non-executions, never faked passes.
| Gap | What it means |
|---|---|
keccak256 is opaque | Kept an uninterpreted function so soundness proofs never depend on a hash implementation. Consequence: no byte-exact state-root hashing — fixture checks compare accounts field-by-field instead. |
| ECDSA sender recovery | Not modelled; fixtures use their explicit sender field rather than recovering it from the signature. |
| CREATE-as-transaction | A fixture whose to is empty (contract creation) is reported skipped, not executed. |
| d/g/v vector expansion | The EEST data/gas/value index matrix is reduced to the index-0 entry by the fixture decoder. |
The discipline: every gap is a visible SKIP or a documented bound — the harness never prints PASS for something it didn't actually check. That honesty is what makes the PASSes worth anything.
# first time: grab the prebuilt Mathlib cache lake exe cache get # build the framework, EVM frontend + domains, and the CLI lake build Framework Evm kevm_pilot # run some bytecode (PUSH1 5; PUSH1 3; ADD; STOP) lake exe kevm_pilot run-bytecode 600560030100 # run it metered, with calldata, as JSON lake exe kevm_pilot run-bytecode <hex> --gas 21000 --calldata <hex> --json # run an ethereum/tests state-test fixture lake exe kevm_pilot run-fixture path/to/fixture.json # audit the soundness theorems' axiom sets at runtime lake exe kevm_pilot proof-status --check
| Module | Holds |
|---|---|
Framework/Machine.lean | EvmAlgebra, the one fuel-driven run, StepResult, and run_sound — the keystone. |
Evm/Op.lean · Evm/Decode.lean | The ≈146-opcode vocabulary and the round-trip-proved bytecode decoder. |
Evm/Config.lean | The machine state: stack, memory, gas, world, exec env, block context. |
Evm/Step.lean · Evm/Steps/B1..B11 | The small-step dispatcher and per-bucket opcode handlers. |
Evm/Metered.lean · Evm/Schedule.lean | The gas-enforcing machine and the hardfork gas schedule. |
Evm/Fixture.lean · scripts/eest_consume.py | The EEST state-test consumer and pass-rate harness. |
Main.lean | The kevm_pilot CLI: run-bytecode, run-fixture, proof-status. |