Compiling Down to Physics

The optimizing-compiler thesis — prove more, run less — doesn't stop at machine code. Push it through the ISA, the gates, the transistors, and off the digital cliff into analog and physics, and "optimization" becomes a search for a physical system whose natural evolution is the program's denotation.

tower of interpreters verified HLS analog & physical compute Landauer limit Futamura → physics

01The interpretation tax, all the way down

The companion piece to this one argued that a compiler optimization is just runtime work you proved you don't have to do, and that a precise-enough abstract interpretation hands you the proofs that license deleting it. That story ended at native machine code. This one asks the obvious next question: why stop at the instruction set?

A general-purpose CPU is the most expensive interpreter ever built — expensive precisely because it is general. It interprets a stream of opcodes it has never seen before, so on every single instruction it must fetch, decode, check dependencies, rename registers, predict branches, and route operands through a datapath wide enough for any program. None of that work computes your answer. It is the cost of being ready to run any program — a tax you pay, billions of times, for flexibility you used once.

And it is not one tax but a stack of them. Your program is interpreted by the ISA; the ISA is interpreted by the microarchitecture; the microarchitecture is a pattern of gates; the gates are an abstraction over transistors switching; the transistors are an abstraction over charge moving through silicon. Every layer in that tower is one interpreter standing on another, and each adds its own overhead and its own act of forgetting what the layer above actually meant.

The whole thesis in one line: specialization is collapsing levels of the tower. Each level you collapse — by proving enough about the program to bake it into the layer below — is interpretation overhead deleted permanently. Compiling to native code collapses the top two rungs. Compiling to a circuit collapses most of them. Compiling to physics collapses the lot.

The endpoint of "prove more, run less," followed honestly, is not a faster instruction stream. It is no instruction stream at all — a physical artifact whose behaviour, by construction and by proof, is the program.

02The tower of interpreters

Here is the full tower from a mathematical specification down to device physics. Each rung interprets the one above it. Click a rung to see the interpretation tax it charges and what you trade by collapsing it into the layer below.

Read it as a gradient, not a wall. There is no sharp line between "software" and "hardware" — only a continuum of interpreters, each more specialized and less general than the one above. A compiler moves you down a rung. A hardware-synthesis tool moves you down several. An analog device drops you off the bottom of the digital part of the tower entirely.

03The descent, in numbers

How much does collapsing levels actually buy? Orders of magnitude — and the figure climbs the further down and the more specialized you go. Pick a target to see the measured gain, what it removes, and what it costs.

The trade structure of the descent

Every rung down is the same bargain in different proportions: you give up generality and pay a steeper one-time engineering cost, in exchange for efficiency and — past a point — a harder but more rewarding proof. The bars below track how those four quantities move as you descend. Drag the slider (or click a depth).

generality
efficiency
build / NRE cost
proof difficulty
This is why the EVM is the perfect target, again. The descent only pays when amortized: huge one-time engineering cost, then a win on every execution forever. Deployed contracts are immutable and run billions of times — exactly the multiplicity that justifies sinking proof and synthesis effort into a specialized artifact. The economics that make custom silicon absurd for a throwaway script make it overwhelming for a hot, frozen, gas-metered contract.

04The proof keeps going down

The natural objection: surely the machine-checked guarantee evaporates once you leave the comfortable world of software semantics? It does not. The same discipline that proves "this optimized block refines the original" — a refinement/simulation relation checked by a kernel — has already been carried clean through high-level synthesis, RTL, and onto an FPGA. The tower has a verified spine.

The shape is invariant under the descent. At every rung the correctness statement is the same: lower(p) ⊑ p — the artifact one level down refines the meaning one level up, observationally. Stack the refinements and you get an end-to-end theorem from the source semantics to the gate-level netlist, each link independently kernel-checked. This is precisely the ObsRel / refinement relation this project already proves between its concrete interpreter and KEVM — repointed one rung lower, and then again, and again.

The leverage point survives too: untrusted search, trusted certificate. Lakeroad lets an unverified program-synthesis engine propose an FPGA technology mapping, then formally verifies the proposal against the primitive semantics. ROVER lets aggressive e-graph rewriting propose a faster datapath, then validates each rewrite. You never trust the optimizer's cleverness — only a small, re-checkable certificate. The clever part stays outside the trusted base, exactly as it did for software optimization.

05Off the digital cliff: analog & physical computing

Everything so far stayed digital — exact, discrete, bit-faithful. But the tower doesn't bottom out at gates. Underneath the digital abstraction is continuous physics, and you can compile directly into it, skipping the entire digital reconstruction.

Compiling a differential equation into a circuit that obeys it

Sara Achour's Arco (PLDI 2016) and Legno (ASPLOS 2020) take a dynamical system written as first-order ODEs and compile it onto a reconfigurable analog device — wiring up integrators, multipliers, and adders so that the device's physical evolution in real time is a solution to the equations. There is no clock, no instruction, no numerical integration loop. The capacitor charging is the integral. The compiler's job is to find a configuration that is algebraically equivalent to the target system and to manage noise and dynamic range; the answer falls out of physics at a tiny fraction of the energy a digital solver would burn.

This is the purest form of the thesis. A digital ODE solver interprets the equation — it represents the state as bits and grinds an integration rule, paying the full interpretation tax. The analog circuit is the equation; the work was compiled away into the choice of components. The program didn't run faster — it stopped being a program and became a law of motion.

Training the substrate itself to compute

Further still: physical neural networks (McMahon, Wright et al., Nature 2022) treat an arbitrary physical system — coupled optics, a vibrating plate, a nonlinear electronic circuit — as a trainable computer. Their physics-aware training runs the forward pass on the real hardware and the backward pass on a differentiable model, so the substrate's own messy dynamics are co-opted into doing the inference, and the training automatically learns to exploit and tolerate the device's noise rather than fight it. The computation isn't mapped onto the physics; the physics is shaped until it is the computation.

The price of leaving the digital cliff: exactness. Above the cliff, refinement is bit-for-bit equality. Below it, "correct" becomes "within a tolerance, under a noise model" — the proof obligation changes from an equation to an approximation bound, and the trusted base now includes a model of the device physics. You trade decidable, exact correctness for orders-of-magnitude energy savings and an analog answer. For the right workload that is a spectacular bargain; for a financial ledger it is a category error. Knowing which is which is the whole engineering judgement.

06The physical floor

If "prove more, run less" can delete interpretation overhead all the way down, is there any floor at all? Yes — and it is thermodynamic, not technological.

Landauer's principle says that erasing one bit of information must dissipate at least kT·ln 2 of energy as heat — about 2.9 × 10⁻²¹ joules (≈ 0.018 eV) at room temperature. Crucially, the floor applies only to irreversible operations — the ones that discard information. A logically reversible computation, in principle, need not dissipate anything. Modern processors run roughly a billion times above the Landauer limit, so there is an enormous gap between what physics demands and what we actually pay.

The thermodynamic restatement of the thesis. Reframe optimization as: dissipation is the cost of forgetting. Every bit your computation throws away is a minimum quantum of heat you are forced to pay. So "prove more, run less" has a deepest form — prove more about which information is genuinely discardable, and dissipate less. Reversible and adiabatic computing chase exactly this: arrange the computation so it forgets as little as possible, and approach the floor where the only heat you pay is for the answers you truly threw away.

This closes the loop with the upper tower in a satisfying way. At the top, a proof lets you delete a runtime operation. At the very bottom, the deepest reason an operation costs energy is that it erases information — and a proof that the information was redundant is, thermodynamically, a license not to pay for erasing it. Same idea; the units just changed from CPU cycles to joules.

07Let physics do the search

There is one more move, and it is the strangest. So far the program told the hardware what to compute. But some problems are defined as "find the configuration that minimizes this cost" — and there are physical systems that minimize energy for free, as a matter of natural law.

Ising machines exploit this. The Ising model's ground state is the minimum of an energy function, and any NP problem can be encoded as an Ising problem with only polynomial overhead. So you build a physical system — coupled spins, optical parametric oscillators, superconducting qubits, or an analog electronic network — whose energy landscape is your optimization problem, then let it relax: anneal the temperature down and the system settles into (an approximation of) the answer. The search isn't executed instruction by instruction; it's performed by the physics seeking its own ground state.

The endpoint of the tower. This is "compile the program into a physical system whose evolution is the answer" taken to its logical extreme: compile the problem into a physical system whose equilibrium is the solution. The compiler's job collapses entirely into the encoding — get the couplings right and you never write a search loop at all. The catch, of course, is precision and embedding overhead: physics gives you an approximate ground state, the encoding can be lossy, and proving the physical optimum matches the logical one is its own hard problem.

08Futamura, generalized to the physical stack

The Futamura projections are the classical statement that an interpreter, a compiler, and a compiler-generator are not different kinds of thing — they are one operation, specialization, applied at different times. Specialize an interpreter to a fixed program and you have compiled that program. The insight of this whole exploration is that the projection doesn't have to stop at software.

A CPU is the residual of specializing physics to "be ready for any program." A circuit is the residual of specializing it to your program. An analog device is the residual of specializing it to your program's denotation. At each step you partially-evaluate one more layer of the tower against the fixed program, and the interpreter for that layer vanishes into structure. The logical endpoint of compiling-by-proving is:

Find a physical system whose natural evolution is the program's denotation — and let the soundness proof be the refinement that certifies the physics implements the spec. The program isn't run. It's built, and proven to be what was meant.

The honest accounting

The vision is real, the literature is real, and none of it is free. The costs are exactly the inverse of the gains, and they must be named.

Where this actually lands today. The fully-verified spine reaches RTL and FPGAs (CakeML→Silver, Lutsig, Kami) — that is shipping, machine-checked technology. Verified-by-construction HLS (Calyx, Kôika, Lakeroad) is research-grade but real. Analog, physical, and Ising compilation are genuine and growing, but their verification story — proving the physics matches the spec under a noise model — is the open frontier. The thesis is sound the whole way down; the proof technology is mature at the top and a research program at the bottom. That gap is the interesting work.

The third piece followed the chain from a sound analysis to a certified optimizing compiler. This one follows it one tower further — past the instruction set, through the gates, off the digital cliff, and down to the thermodynamic floor — where "optimize the program" finally becomes "build the physics that is the program, and prove it."

Grounding — TPU / Jouppi et al. ISCA 2017 · SHA-256 ASIC efficiency · ROVER (verified e-graph RTL) · Lutsig verified Verilog · CakeML on Silver (PLDI 2019) · Lakeroad (ASPLOS 2024) · Kôika (MIT) · Calyx · Achour, Arco (PLDI 2016) & Legno (ASPLOS 2020) · McMahon/Wright et al., Nature 2022 (physical neural networks) · Ising machines (Nat. Rev. Phys. 2022) · Landauer (1961).