The optimizing-compiler thesis — prove more, run less — doesn't stop at machine code. Push it through the ISA, the gates, the transistors, and off the digital cliff into analog and physics, and "optimization" becomes a search for a physical system whose natural evolution is the program's denotation.
The companion piece to this one argued that a compiler optimization is just runtime work you proved you don't have to do, and that a precise-enough abstract interpretation hands you the proofs that license deleting it. That story ended at native machine code. This one asks the obvious next question: why stop at the instruction set?
A general-purpose CPU is the most expensive interpreter ever built — expensive precisely because it is general. It interprets a stream of opcodes it has never seen before, so on every single instruction it must fetch, decode, check dependencies, rename registers, predict branches, and route operands through a datapath wide enough for any program. None of that work computes your answer. It is the cost of being ready to run any program — a tax you pay, billions of times, for flexibility you used once.
And it is not one tax but a stack of them. Your program is interpreted by the ISA; the ISA is interpreted by the microarchitecture; the microarchitecture is a pattern of gates; the gates are an abstraction over transistors switching; the transistors are an abstraction over charge moving through silicon. Every layer in that tower is one interpreter standing on another, and each adds its own overhead and its own act of forgetting what the layer above actually meant.
The endpoint of "prove more, run less," followed honestly, is not a faster instruction stream. It is no instruction stream at all — a physical artifact whose behaviour, by construction and by proof, is the program.
Here is the full tower from a mathematical specification down to device physics. Each rung interprets the one above it. Click a rung to see the interpretation tax it charges and what you trade by collapsing it into the layer below.
How much does collapsing levels actually buy? Orders of magnitude — and the figure climbs the further down and the more specialized you go. Pick a target to see the measured gain, what it removes, and what it costs.
Every rung down is the same bargain in different proportions: you give up generality and pay a steeper one-time engineering cost, in exchange for efficiency and — past a point — a harder but more rewarding proof. The bars below track how those four quantities move as you descend. Drag the slider (or click a depth).
The natural objection: surely the machine-checked guarantee evaporates once you leave the comfortable world of software semantics? It does not. The same discipline that proves "this optimized block refines the original" — a refinement/simulation relation checked by a kernel — has already been carried clean through high-level synthesis, RTL, and onto an FPGA. The tower has a verified spine.
The shape is invariant under the descent. At every rung the correctness statement is the same: lower(p) ⊑ p — the artifact one level down refines the meaning one level up, observationally. Stack the refinements and you get an end-to-end theorem from the source semantics to the gate-level netlist, each link independently kernel-checked. This is precisely the ObsRel / refinement relation this project already proves between its concrete interpreter and KEVM — repointed one rung lower, and then again, and again.
Everything so far stayed digital — exact, discrete, bit-faithful. But the tower doesn't bottom out at gates. Underneath the digital abstraction is continuous physics, and you can compile directly into it, skipping the entire digital reconstruction.
Sara Achour's Arco (PLDI 2016) and Legno (ASPLOS 2020) take a dynamical system written as first-order ODEs and compile it onto a reconfigurable analog device — wiring up integrators, multipliers, and adders so that the device's physical evolution in real time is a solution to the equations. There is no clock, no instruction, no numerical integration loop. The capacitor charging is the integral. The compiler's job is to find a configuration that is algebraically equivalent to the target system and to manage noise and dynamic range; the answer falls out of physics at a tiny fraction of the energy a digital solver would burn.
Further still: physical neural networks (McMahon, Wright et al., Nature 2022) treat an arbitrary physical system — coupled optics, a vibrating plate, a nonlinear electronic circuit — as a trainable computer. Their physics-aware training runs the forward pass on the real hardware and the backward pass on a differentiable model, so the substrate's own messy dynamics are co-opted into doing the inference, and the training automatically learns to exploit and tolerate the device's noise rather than fight it. The computation isn't mapped onto the physics; the physics is shaped until it is the computation.
If "prove more, run less" can delete interpretation overhead all the way down, is there any floor at all? Yes — and it is thermodynamic, not technological.
Landauer's principle says that erasing one bit of information must dissipate at least kT·ln 2 of energy as heat — about 2.9 × 10⁻²¹ joules (≈ 0.018 eV) at room temperature. Crucially, the floor applies only to irreversible operations — the ones that discard information. A logically reversible computation, in principle, need not dissipate anything. Modern processors run roughly a billion times above the Landauer limit, so there is an enormous gap between what physics demands and what we actually pay.
This closes the loop with the upper tower in a satisfying way. At the top, a proof lets you delete a runtime operation. At the very bottom, the deepest reason an operation costs energy is that it erases information — and a proof that the information was redundant is, thermodynamically, a license not to pay for erasing it. Same idea; the units just changed from CPU cycles to joules.
There is one more move, and it is the strangest. So far the program told the hardware what to compute. But some problems are defined as "find the configuration that minimizes this cost" — and there are physical systems that minimize energy for free, as a matter of natural law.
Ising machines exploit this. The Ising model's ground state is the minimum of an energy function, and any NP problem can be encoded as an Ising problem with only polynomial overhead. So you build a physical system — coupled spins, optical parametric oscillators, superconducting qubits, or an analog electronic network — whose energy landscape is your optimization problem, then let it relax: anneal the temperature down and the system settles into (an approximation of) the answer. The search isn't executed instruction by instruction; it's performed by the physics seeking its own ground state.
The Futamura projections are the classical statement that an interpreter, a compiler, and a compiler-generator are not different kinds of thing — they are one operation, specialization, applied at different times. Specialize an interpreter to a fixed program and you have compiled that program. The insight of this whole exploration is that the projection doesn't have to stop at software.
A CPU is the residual of specializing physics to "be ready for any program." A circuit is the residual of specializing it to your program. An analog device is the residual of specializing it to your program's denotation. At each step you partially-evaluate one more layer of the tower against the fixed program, and the interpreter for that layer vanishes into structure. The logical endpoint of compiling-by-proving is:
The vision is real, the literature is real, and none of it is free. The costs are exactly the inverse of the gains, and they must be named.
The third piece followed the chain from a sound analysis to a certified optimizing compiler. This one follows it one tower further — past the instruction set, through the gates, off the digital cliff, and down to the thermodynamic floor — where "optimize the program" finally becomes "build the physics that is the program, and prove it."
Grounding — TPU / Jouppi et al. ISCA 2017 · SHA-256 ASIC efficiency · ROVER (verified e-graph RTL) · Lutsig verified Verilog · CakeML on Silver (PLDI 2019) · Lakeroad (ASPLOS 2024) · Kôika (MIT) · Calyx · Achour, Arco (PLDI 2016) & Legno (ASPLOS 2020) · McMahon/Wright et al., Nature 2022 (physical neural networks) · Ising machines (Nat. Rev. Phys. 2022) · Landauer (1961).