An opinionated grounding for what this project is, why it sits in a 50-year research tradition, and why data engineering is the first vertical where it matters most.
Start with two observations most practitioners already half-believe.
Now notice how much of your profession is organized around not having this discipline. You write unit tests because you don't have a proof. You add integration tests because your unit tests didn't cover interactions. You build staging environments because your integration tests don't model real traffic. You pay for observability because staging lies. You run postmortems because observability missed the thing. You write runbooks because postmortems don't prevent recurrence. You hire SREs because runbooks don't execute themselves.
Each layer is an after-the-fact substitute for a before-the-fact guarantee. Each exists because some property of the system could not be stated precisely enough for a machine to check it mechanically, once and for all.
This is the move the AG tower generalizes. The tower is a substrate in which "thou shalt not X" is itself a composable, layer-by-layer specification — the refuse-ifs are specified by grammar equations, and they propagate upward by construction.
Click any domain below to see the pattern: someone turned an ambient, discipline-enforced property into a mechanically-checked one, and a whole class of bugs went away.
Null dereferences, buffer overflows, and use-after-free are not algorithmic bugs — they are ownership bugs. Rust turns ownership into a first-class part of the type. The compiler refuses to build code that could possibly leak, dangle, or race on memory.
Data races and lost updates are the concurrency versions of memory-safety bugs. Rust's ownership discipline extends to threads; session-typed languages go further, typing entire communication protocols.
Refinement types attach arithmetic predicates to ordinary types: "this Int is non-negative," "this length is less than that buffer's capacity." LiquidHaskell and F* discharge them via SMT — no manual proofs.
AWS's Zelkova (FMCAD 2018) reframed "did I write this IAM policy correctly?" from "deploy and check" to "let SMT decide." Byron Cook reports a billion SMT queries a day at AWS for exactly this class of question.
The AWS TLA+ paper (Newcombe et al., CACM 2015) is the industry-grounding case: TLA+ specs caught subtle bugs in DynamoDB, S3, and EBS that months of fuzzing and chaos testing had missed. The 2024 follow-up (Brooker & Desai, ACM Queue) shows this is now routine AWS engineering, not research.
Project Everest (Microsoft Research et al.) ships a formally verified TLS stack — HACL*, EverCrypt, EverParse — inside Windows, Linux, and Firefox.
seL4 is a microkernel with a machine-checked proof of functional correctness all the way down to the C source. Klein et al., SOSP 2009. The existence proof for "an entire foundational layer can be machine-checked."
CompCert (Leroy, CACM 2009) is a C compiler whose generated assembly is proven in Coq to preserve source semantics. This is directly relevant to the AG tower — a verified compilation pipeline is what the tower aspires to at much larger scope.
Differential privacy (Dwork & Roth, 2014) converted "is this release safe?" from "trust our process" into a mathematical statement with a tunable epsilon.
Jif (Andrew Myers, POPL 1999) showed that "secret data cannot reach a public sink" can be a type check, not a manual review. This is the right shape for modern PII handling and data residency.
Capability Myths Demolished (Miller, Yee, Shapiro, 2003) is the canonical argument that ambient authority (ACLs) is composable only by luck, while object capabilities are composable by design.
OPA / Rego normalized "policy is executable code over structured data, not a PDF." The direction of travel is obvious: GDPR, HIPAA, SOX are audit regimes today because their rules were not written in a form a machine could verify. Any rule that CAN be written decidably will eventually be.
These numbers are the price of running systems whose properties cannot be proved.
Individually anecdotal. Collectively, a market of practically unbounded size for any tool that can move properties from "we run it and hope" into "the compiler refuses to build it."
If the horizontal story is "types ate application code, proofs are eating infrastructure," the vertical story is: data engineering has not yet had either event. Click each pain below to expand.
Ralph Kimball, writing in 2003, called the grain declaration "a binding contract on the design" and declared grain "the most frequent design error by far." Twenty-three years later, the dbt Labs blog explains grain to analytics engineers as "the combination of columns at which records in a table are unique" — and advises writing uniqueness tests and hoping they run.
The whole semantic-layer movement — Malloy, MetricFlow, Cube — is an independent re-discovery that grain needs to be language-level, not convention-level. Malloy's README is explicit: "the fan and chasm traps are solved." They are — but at query time, not at compile time.
Every large data org today runs OpenLineage, Marquez, DataHub, or Amundsen. Each is a metadata-overlay architecture: the pipeline runs, emits events, some other system scrapes them into a graph, a UI renders it. Column-level lineage is the current frontier, and it is extremely hard to get right because SQL is a bad language to statically analyze.
Lineage in an AG tower is not a metadata overlay. It is a by-product of compilation. Every attribute equation that references a child attribute or a field is already producing the edge you would otherwise reconstruct by parsing an execution log.
Financial data pipelines survive on reconciliations — nightly cross-checks between two independently-computed views of the same underlying truth, to detect silent divergence. Every large financial institution runs dozens or hundreds of them.
None of them would need to exist if the two views were proved equivalent by construction rather than hoped to match and alerted when they don't.
Great Expectations, Soda, dbt tests, Elementary, Monte Carlo — the entire "data observability" category is an admission that correctness can only be assessed after a pipeline has already produced (and often published) a wrong answer.
Barr Moses coined "data downtime" in 2019 precisely to name the phenomenon. Monte Carlo's 2022 survey found data teams spending two days a week firefighting bad data. These are runtime-only tools solving what should be a compile-time problem.
The most load-bearing line in the pitch, and the easiest to miss the depth of.
The second loop is faster, cheaper, and stronger. The difference: the compiler is an axiom-checker. In the SQL loop, the axiom-checker is a business user with a spreadsheet a week later.
The AG tower is a concrete synthesis of several long research threads. Seeing the threads makes it look obvious rather than strange. Click each event.
Invented attribute grammars: the "meaning" of a program is computed by attaching attributes to the nodes of a derivation tree, with equations attached to each production rule for propagating those attributes.
The closure properties of relational algebra, the typed constraints on what operations are meaningful — all of this is already a compile-time story.
Observed that partially evaluating an interpreter with respect to a source program yields a compiled program. Iterating gives the three projections: specialization, compilation, and compiler generation.
Origin of "program testing can be used to show the presence of bugs, but never to show their absence." The foundational statement that correctness is the central problem of programming.
Introduced 3-Lisp and procedural reflection — the infinite tower of metacircular interpreters, each defined in the language one level below.
Turing Award lecture. The canonical self-compiling-compiler parable — a compiler that recognizes its own source can encode behavior invisible in its source code.
A talk delivered entirely in one-syllable words plus words defined from prior syllables — mimicking the thesis: a language must be small enough to define, but must contain the seeds of its own growth.
"Grain is the most frequent design error by far" and must be declared before dimensions or facts. "The grain declaration becomes a binding contract on the design."
Locates complexity — especially state and control — as the root of software pain. Proposes Functional Relational Programming: essential state as relations, logic as relational algebra plus pure functions.
Machine-checked proofs of full functional correctness: seL4 for an OS kernel, CompCert for a C compiler. The existence proofs that entire foundational layers can be verified end-to-end.
The paper that ended the "formal methods don't scale" debate — TLA+ specs caught subtle bugs at AWS that fuzzing and chaos testing had missed. Wadler's essay canonically explains that a type is a proposition, a program is a proof.
A single Haskell expression can be compiled into many different categories — hardware circuits, automatic derivatives, interval analyses, incremental computations — by changing the interpretation functor.
A Rust-based attribute grammar compilation tower. Each layer is one AG — one tree — that compiles into the next layer's source code. The seed self-compiles: gen1 == gen2 byte-for-byte at 141,358 bytes today.
The synthesis: Knuth's attribute grammars, Futamura's specialization, Smith's reflective towers, Thompson's fixpoint discipline, Steele's seed-growth aesthetic, Elliott's per-target compilation freedom — all in one artifact. Applied first to data engineering, where the cost of not having it is quantified in days-per-week and trillions-per-year.
Six principles the whole system rests on.
stage0 binary is the compiler for every subsequent layer.Node tree. One construction primitive: Node::new. Ergonomics are a product of the tower, not the foundation.Every other link in this doc expands one of these five in a specific direction.
Curated primary sources across the three grounding directions.
Test your understanding. Click a card to reveal the answer, then mark whether you got it. Use the arrows or Shuffle to navigate.
In one sentence: what is "the move" that has defined the last 20 years of serious industrial programming?
Pull a property from "things we hope to catch at runtime" into "things the compiler refuses to build without." TypeScript, Rust, TLA+, Zelkova, OPA, differential privacy — all the same move, in different domains.
Why is a reconciliation job a "runtime confession of a missing compile-time guarantee"?
A reconciliation is a nightly cross-check between two independently-computed views of the same truth, run because you cannot prove they agree. If the two views were proved equivalent by construction, the reconciliation would have nothing to do. Its existence admits the absence of that proof.
What is "grain," and why has Kimball's 2003 naming of it not fixed the problem?
Grain is the combination of columns at which records in a table are unique — a binding contract on a data model. Kimball called it "the most frequent design error by far" in 2003, yet the 2026 state of the art is still "think deeply and write a uniqueness test" because no mainstream tool makes grain a language-level property the compiler can refuse to violate.
What does it mean that the AG tower's fixpoint is a concrete instance of Futamura's third projection?
Futamura (1971): partially evaluating an interpreter against a source program yields a compiled program; iterating gives a compiler generator. The AG tower's seed compiles itself into stage0, which compiles itself into a byte-identical stage0 at 141,358 bytes — a concrete, terminating version of Futamura's compiler-generating-a-compiler.
Why does "validation lives in the seed" beat "validation lives in a Rust walker"?
Because the seed is the compiler. Productions carry forward between layers; a Rust walker does not. Writing validation as AG equations means every layer above inherits the checks automatically and the emitter's own lookup machinery is reused. A hand-coded walker re-implements what the emitter already knows and stops propagating the moment you grow the tower.
Why data engineering first, as opposed to starting with a more obviously guarantee-heavy domain like crypto or OS kernels?
Three reasons: (1) the gap between guarantees practitioners want and what their tools provide is widest there; (2) the diagnoses are already written — Kimball on grain, Codd on relational closure, Hellerstein on declarative dataflow, Brandon on SQL; (3) the cost of the status quo is both well-quantified (2 days/week firefighting; $2T/yr software quality) and paid in cash, monthly, by every company that ships dashboards.
What is Brian Cantwell Smith's "reflective tower," and how does the AG tower differ from it?
Smith's 3-Lisp (1984) introduced an infinite tower of metacircular interpreters, each defined in the language one level below. Conceptually elegant but not finitary. The AG tower is the same shape but terminates: every layer is independent; the bottom is a checked-in bootstrap; every byte is reproducible via fixpoint.
What does "per-layer packaging freedom" mean, and why is it the tower's analogue of Conal Elliott's "Compiling to Categories"?
Each layer's Root equation emits arbitrary text — a Rust crate bundle, a Markdown manifest, a WASM+JS pair, a Nix derivation — whatever its host shim understands. Elliott (ICFP 2017) showed one Haskell expression can be compiled into many categories (circuits, derivatives, incremental computations) by swapping the interpretation. The AG tower is that idea in AG clothing: the layer above picks the target; the layer below emits text into it.