Rented Power, Owned Trust: The Commercial Case for Certified Hyper-Optimization

Six weeks ago I wrote Prove More, Run Less — the technical thesis that an abstract interpreter and an optimizing compiler are the same artifact, viewed twice. An analysis produces sound facts; a compiler turns sound facts into faster code while preserving meaning. Push the analysis until its facts are precise enough — singletons, unreachability, no-overflow — and they don't merely license an optimization, they force it. The two halves meet at a single relation: Optimizes orig opt := ∀ c, ObsRel (run orig c) (run opt c). It is exactly the refinement relation that already proves "concrete interpreter refines KEVM" in the Lean repo where I run an EVM inside a proof assistant — repointed at "optimized refines original," with no new metatheory.

That post answered can you build it. This one is the question I left hanging: if you could, what would you do with it?

A category, not a feature

The starting move is to be precise about what "it" is. I'm going to call this certified hyper-optimization, and the architecture is the one the field has been converging on for thirty years without quite naming:

Untrusted search plus a kernel-checked correctness certificate. Let any aggressive, unverified optimizer — a superoptimizer like Souper, an LLVM backend, a frontier LLM — propose fast code. Then prove, in a minimal trusted kernel, that the proposal is observationally equivalent to the original. Checks → ship the fast code with the certificate. Fails → discard, fall back.

This is not a clever way to ship a compiler. It is a structural inversion of where value accumulates. In a verified compiler like CompCert — beautiful, shipping, avionics-qualified — every new optimization is new proof burden. The optimizer is structurally weak because making it stronger means more theorems. The certifying-optimizer split flips that. Capability is rented and improving — you bolt on whatever the field invents next, and the trusted base doesn't grow by a single line. Trust is owned and fixed — a small kernel, one metatheory, the same relation forever. Souper today, an LLVM backend tomorrow, a better LLM next year: each one becomes your improvement the moment you wrap it.

That asymmetry is why this is a category, not a feature. A feature gets out-shipped. A category that turns everyone else's progress into your moat compounds. It is also, not coincidentally, the only architecture that is increasing in value as LLMs get better. An LLM is the perfect untrusted proposer: brilliant at generating candidates, terrible as a trusted authority. A hallucinated rewrite is fatal in any other architecture. Here, the kernel catches it for free — a wrong rewrite simply fails to certify and is discarded. The same property that makes LLMs unreliable as compilers makes them ideal as the search half of this compiler.

Where does this actually pay?

Three conditions have to converge for the architecture to make commercial sense.

Hot / amortized. Proving is expensive. It only pays when the cost is amortized over many executions of the same code. The arithmetic is proof_cost (paid once) < per_execution_saving × number_of_executions.
Correctness is expensive to get wrong. If a miscompile is a free retry, nobody pays for a certificate. If it's a nine-figure exploit, a recall, or a fatal accident, they will.
Optimization is hard and high-stakes. Hand-tuning is slow and error-prone, and the people who can do it are scarce. The market has to actually want speed.

EVM bytecode is the extreme case of all three. Contracts are immutable — you deploy once and prove once. They run billions of times. Every opcode is gas-metered, which means every saved opcode is denominated in money. The amortization inequality is decisively favorable, the cash payoff is on-chain-auditable, and the competitors leave the lane wide open: solc --optimize is fast but has shipped miscompiles (a memory-op bug in 0.8.13–0.8.14); Paradigm's revmc is fast (~19× on Fibonacci, ~6.9× on compute-heavy workloads) but trusts LLVM and a sandbox in its TCB. The product nobody offers is the one this architecture makes natural: superoptimizer-class gains with a kernel-checkable correctness certificate.

That's the wedge. The thesis is broader — the same capability works wherever hot+correct+hard converges — but EVM is where you start, because it's also where I already hold an unfair advantage: an axiom-clean Lean refinement of EVM against the K semantics of the EVM (KEVM), proved against only Lean's core axioms, with the soundness witness machine-checked. Productizing it is a matter of repointing the relation, not standing up new metatheory.

The honest tension nobody talks about

If the case were purely "EVM is great, deploy a certified compiler, get rich," I wouldn't be writing this post — I'd just be building. The hard part of the analysis is the tension I have to name, not paper over:

The dollar pool we'd price against is deflating by protocol design. Per-unit gas fell ~95% from 2024 peaks. Dencun and Fusaka collapsed data-availability cost; rollups got dramatically cheaper. Pricing a product as "share of measured gas saved" on a pool that's shrinking 95% in eighteen months is not a good business.

The thesis survives only on the legs that don't deflate.

Execution-client COGS. Sequencer and RPC CPU is real cash, paid in dollars to AWS regardless of what gas costs. Every cycle the optimizer saves is a cycle the operator doesn't pay for. This is the AbsInt lane — the same shape that makes a verified compiler commercially viable in avionics, applied to L2 infrastructure.
zkVM proving cost. A zkVM's cost is linear in instructions executed. Every optimized instruction is directly monetizable. Unlike user-facing gas, proving cost doesn't deflate — it scales with throughput, and throughput is going up.

Deploy-time end-user-gas pricing is the weakest, first-to-be-killed leg. That's the bit of the pitch I had to demote. The durable bet is the cost-of-goods-sold inside the infrastructure that runs the chain.

Sizing, comparables, what they actually prove

The horizontal TAM is genuinely enormous — hundreds of billions a year across all execution and compute spend where hot + correct + hard converge. AI inference $106B (2025) heading to $255B (2030); blockchain fees $6.9B/yr; automotive software $20.1B; serverless ~$22B. Numbers like that are why this is a category-bet, not a niche-tool bet.

The serviceable market, anchored on bit-exact certificate fits (where observational equivalence is the right correctness contract), comes down to the ~$2.7B/yr Ethereum L1 fee pool plus growing L2 execution — and shrinks further to the durable legs above. The realistic SOM today, given we have the EVM machinery and a legible buyer, is in the tens of millions per year.

The comparables sharpen the picture more than the TAM does.

Certora raised $43.2M, secures 14 of the top-20 DeFi protocols. They are the bull case: in a vertical where correctness equals nine-figure money, formal verification can sell fast.
AbsInt commercializes CompCert into Airbus. A verified optimizing compiler is a real, paid product — that question is closed.
Souper + Alive2 ship our exact architecture (untrusted superoptimizer + SMT-checked validation) into LLVM. They prove the technique works at scale.
Runtime Verification owns the world's best KEVM semantics. They are a ~$1.8M/year services shop.

That last comparable is the one that makes me careful. RV has the deepest EVM-semantics team on earth and the business is small, because semantic depth alone is nearly worthless. Value is captured only where the certificate sits on top of a saving the buyer already pays for in cash, in a domain where being wrong is a nine-figure event. The whole reason EVM (immutable, gas-metered, adversarial) is the right beachhead is that it gives both — a priced saving and a high-stakes correctness contract. Souper and Alive2 do not get those, and they ended up as free infrastructure that LLVM consumed. The dominant failure mode for the whole category is "you built it; LLVM consumed it; you captured nothing."

The one cheap experiment that decides everything

Every load-bearing number in the revenue model collapses to one unproven quantity: how much exploitable precision real EVM contracts have beyond what solc --via-ir already harvests. The free optimizer already eats the cheap wins. We compete on the marginal code it leaves behind. If that margin is fat, the deploy-time-gas wedge is alive; if it's thin, the wedge dies and we pivot the same machinery to zkVM proving cost.

This is cheap to measure. It is an engineering pass on existing public bytecode — no market study, no design partner cycle, no fundraise. And it is currently unmeasured.

So the recommendation is conditional go:

Take the top-100 immutable mainnet contracts by traffic.
Run certified rewrites on them.
Measure the median incremental gas reduction over solc --via-ir, not over unoptimized bytecode.
≥ 3% median → the wedge is live. Land one white-glove DeFi reference, ship a public on-chain before/after delta — a single reference worth more than any deck. < 3% median → the deploy-time-gas product is dead at current economics; pivot the same machinery to execution-client COGS and zkVM, or stop.

No GTM spend, no design-partner outreach, no fundraise predicated on the wedge until that number returns. The gating risk is resolvable cheaply now. Committing before measuring it would be backwards. This is the single highest-leverage decision in the whole plan.

The thing I want to be right about

The interesting part of this thesis is not "build a faster EVM" — revmc already exists, and a 5% gas optimizer is a better tool, not a better company. The interesting part is the architectural shape: a small, fixed kernel that certifies arbitrary, improving search. That shape is the only one I can find that compounds rather than commoditizes as LLMs get better. Every other compiler architecture has to defend against AI-generated code; this one consumes it.

If the precision experiment lands, the product I want to be building is not the EVM gas optimizer. It is the "bring your own untrusted optimizer / LLM" trust layer — the certifier as a horizontal trust substrate that runs underneath whatever search the field invents next. The EVM gas optimizer is the wedge that buys the right to build that. The deflation-resistant zkVM lane is the durable backbone underneath it.

The hard step is the first one. Run the experiment. Then decide.

The full research set this post synthesizes — eight workstreams, including the candid internal strategy doc with the bear case in full and the seven kill-criteria with the signals to watch — lives in the veric-k-evm-pilot research repo. The technical companion to this commercial argument is Prove More, Run Less; the proof-assistant-as-execution-engine companion is Running an EVM Inside a Proof Assistant.