Software is a guarantee problem.
The AG tower is a guarantee machine.

An opinionated grounding for what this project is, why it sits in a 50-year research tradition, and why data engineering is the first vertical where it matters most.

attribute grammars compile · don't interpret self-hosting fixpoint 141,358 bytes data engineering, first

00The framing

Start with two observations most practitioners already half-believe.

Program testing can be used to show the presence of bugs, but never to show their absence. — Edsger W. Dijkstra, The Humble Programmer, 1972 Turing Award lecture

Every syntactically incorrect program should be rejected by the compiler, and every syntactically correct program should give a result … predictable and comprehensible. — C.A.R. Hoare, The Emperor's Old Clothes, 1980 Turing Award lecture

Now notice how much of your profession is organized around not having this discipline. You write unit tests because you don't have a proof. You add integration tests because your unit tests didn't cover interactions. You build staging environments because your integration tests don't model real traffic. You pay for observability because staging lies. You run postmortems because observability missed the thing. You write runbooks because postmortems don't prevent recurrence. You hire SREs because runbooks don't execute themselves.

Each layer is an after-the-fact substitute for a before-the-fact guarantee. Each exists because some property of the system could not be stated precisely enough for a machine to check it mechanically, once and for all.

The move that defines the last 20 years of serious industrial programming — TypeScript at Stripe, Rust at AWS and Cloudflare, TLA+ at DynamoDB/S3/EBS, SMT-backed IAM at AWS, Dafny and F*, seL4, CompCert, OPA/Rego, differential privacy — is always the same move: pull a property from "things we hope to catch at runtime" into "things the compiler refuses to build without."

This is the move the AG tower generalizes. The tower is a substrate in which "thou shalt not X" is itself a composable, layer-by-layer specification — the refuse-ifs are specified by grammar equations, and they propagate upward by construction.

01The horizontal: every domain is a guarantee domain

Click any domain below to see the pattern: someone turned an ambient, discipline-enforced property into a mechanically-checked one, and a whole class of bugs went away.

Memory safety

ownership types

→ Rust, RustBelt proof

Concurrency

session/ownership types

→ data-race freedom

Numeric / units

refinement types

→ LiquidHaskell, Idris

Access control

SMT over policy

→ AWS IAM Zelkova

Distributed consistency

temporal logic

→ TLA+ at AWS

Cryptography

verified implementation

→ Project Everest, HACL*

OS kernel

functional correctness

→ seL4

Compilation

semantic preservation

→ CompCert

Privacy

epsilon-DP

→ differential privacy

Confidentiality

information-flow types

→ Jif (Myers, 1999)

Authority

object capabilities

→ Fuchsia, Capsicum, WASM

Policy · compliance · legal

decidable rule systems

→ OPA, Rego

Memory safety

Null dereferences, buffer overflows, and use-after-free are not algorithmic bugs — they are ownership bugs. Rust turns ownership into a first-class part of the type. The compiler refuses to build code that could possibly leak, dangle, or race on memory.

Case in point: Cloudbleed (2017) and Heartbleed (2014) cost billions in aggregate incident response. Both are single unchecked-bounds bugs. Both are structurally impossible in a language that makes the bound part of the type. RustBelt (POPL 2018) is the machine-checked safety proof.

Concurrency

Data races and lost updates are the concurrency versions of memory-safety bugs. Rust's ownership discipline extends to threads; session-typed languages go further, typing entire communication protocols.

Case in point: Therac-25 killed patients via a race in an interlock that a type system would catch structurally.

Numeric correctness and units

Refinement types attach arithmetic predicates to ordinary types: "this Int is non-negative," "this length is less than that buffer's capacity." LiquidHaskell and F* discharge them via SMT — no manual proofs.

Case in point: Ariane 5 Flight 501 lost a $370M rocket to an unchecked 64-bit-float to 16-bit-int conversion in reused code. A refinement type on the range would have caught it before integration.

Access control

AWS's Zelkova (FMCAD 2018) reframed "did I write this IAM policy correctly?" from "deploy and check" to "let SMT decide." Byron Cook reports a billion SMT queries a day at AWS for exactly this class of question.

Why it generalizes: access control is a proof obligation about what requests a policy can satisfy. Treating it as a search problem in a decidable logic makes the answer exact and instant.

Distributed consistency

The AWS TLA+ paper (Newcombe et al., CACM 2015) is the industry-grounding case: TLA+ specs caught subtle bugs in DynamoDB, S3, and EBS that months of fuzzing and chaos testing had missed. The 2024 follow-up (Brooker & Desai, ACM Queue) shows this is now routine AWS engineering, not research.

The thesis: formal design accelerates software because rework is the real cost.

Cryptography

Project Everest (Microsoft Research et al.) ships a formally verified TLS stack — HACL*, EverCrypt, EverParse — inside Windows, Linux, and Firefox.

Case in point: Heartbleed explicitly motivates Everest. The whole arc from "bug that cost billions" to "let's just verify it" took less than a decade.

OS kernel

seL4 is a microkernel with a machine-checked proof of functional correctness all the way down to the C source. Klein et al., SOSP 2009. The existence proof for "an entire foundational layer can be machine-checked."

Compilation

CompCert (Leroy, CACM 2009) is a C compiler whose generated assembly is proven in Coq to preserve source semantics. This is directly relevant to the AG tower — a verified compilation pipeline is what the tower aspires to at much larger scope.

Privacy

Differential privacy (Dwork & Roth, 2014) converted "is this release safe?" from "trust our process" into a mathematical statement with a tunable epsilon.

You cannot un-learn this framing: privacy should not be a review meeting. Privacy should be a theorem.

Confidentiality / information flow

Jif (Andrew Myers, POPL 1999) showed that "secret data cannot reach a public sink" can be a type check, not a manual review. This is the right shape for modern PII handling and data residency.

Authority / capabilities

Capability Myths Demolished (Miller, Yee, Shapiro, 2003) is the canonical argument that ambient authority (ACLs) is composable only by luck, while object capabilities are composable by design.

Modern descendants: Fuchsia, Capsicum, WASM component model — every serious sandbox in the last decade.

Policy · compliance · legal

OPA / Rego normalized "policy is executable code over structured data, not a PDF." The direction of travel is obvious: GDPR, HIPAA, SOX are audit regimes today because their rules were not written in a form a machine could verify. Any rule that CAN be written decidably will eventually be.

The shape repeats everywhere. An ambient, discipline-enforced property becomes a mechanically-checked one, and the whole downstream surface of bugs collapses. The move is available in far more domains than the industry has begun to use.

02The cost of not having guarantees

These numbers are the price of running systems whose properties cannot be proved.

$2.08T

US cost of poor software quality, 2020 (CISQ)

2 days / wk

data-pro time firefighting bad data (Monte Carlo 2022)

40%

of data-pro time on data-quality evaluation

$440M

Knight Capital loss — dead code + a feature flag

$370M

Ariane 5 — a float-to-int conversion

~50%

of firms saw revenue hit by a data-quality incident (2025)

Individually anecdotal. Collectively, a market of practically unbounded size for any tool that can move properties from "we run it and hope" into "the compiler refuses to build it."

03The vertical: data engineering is 30 years behind

If the horizontal story is "types ate application code, proofs are eating infrastructure," the vertical story is: data engineering has not yet had either event. Click each pain below to expand.

1 Grain

▼

Ralph Kimball, writing in 2003, called the grain declaration "a binding contract on the design" and declared grain "the most frequent design error by far." Twenty-three years later, the dbt Labs blog explains grain to analytics engineers as "the combination of columns at which records in a table are unique" — and advises writing uniqueness tests and hoping they run.

The whole semantic-layer movement — Malloy, MetricFlow, Cube — is an independent re-discovery that grain needs to be language-level, not convention-level. Malloy's README is explicit: "the fan and chasm traps are solved." They are — but at query time, not at compile time.

Every aggregation bug is a grain bug. Every grain bug is a missing type.

2 Lineage

▼

Every large data org today runs OpenLineage, Marquez, DataHub, or Amundsen. Each is a metadata-overlay architecture: the pipeline runs, emits events, some other system scrapes them into a graph, a UI renders it. Column-level lineage is the current frontier, and it is extremely hard to get right because SQL is a bad language to statically analyze.

Lineage in an AG tower is not a metadata overlay. It is a by-product of compilation. Every attribute equation that references a child attribute or a field is already producing the edge you would otherwise reconstruct by parsing an execution log.

The field is paying very high costs to recover a property the compiler could hand them for free.

3 Reconciliation

▼

Financial data pipelines survive on reconciliations — nightly cross-checks between two independently-computed views of the same underlying truth, to detect silent divergence. Every large financial institution runs dozens or hundreds of them.

None of them would need to exist if the two views were proved equivalent by construction rather than hoped to match and alerted when they don't.

A reconciliation is a runtime confession that a compile-time guarantee was missing.

4 Silent drift

▼

Great Expectations, Soda, dbt tests, Elementary, Monte Carlo — the entire "data observability" category is an admission that correctness can only be assessed after a pipeline has already produced (and often published) a wrong answer.

Barr Moses coined "data downtime" in 2019 precisely to name the phenomenon. Monte Carlo's 2022 survey found data teams spending two days a week firefighting bad data. These are runtime-only tools solving what should be a compile-time problem.

Runtime data-quality tools are the category-wide admission of defeat.

Why this vertical specifically: the bugs are structural, not algorithmic. They are grain errors, fan-out joins, silent schema drift, wrong join keys. These are the kinds of things a type system was designed to catch.

04"You shouldn't have to run it"

The most load-bearing line in the pitch, and the easiest to miss the depth of.

The SQL loop

Write some SQL.
Run it on a sample.
Look at the output.
Decide it looks right.
Ship it.
Discover it's wrong a week later via a business user.

The Rust loop

Write some Rust.
The compiler refuses to build it — types don't line up.
Fix the types.
The compiler accepts it.
Ship it.

The second loop is faster, cheaper, and stronger. The difference: the compiler is an axiom-checker. In the SQL loop, the axiom-checker is a business user with a spreadsheet a week later.

"You shouldn't have to run it" means: you should be able to hand your pipeline to a compiler, and the compiler should refuse to build it if the grain is wrong, the reconciliation doesn't close, or the lineage contains an implicit cartesian product. The win is not speed — it is the removal of an entire category of preventable production incident.

05The intellectual lineage

The AG tower is a concrete synthesis of several long research threads. Seeing the threads makes it look obvious rather than strange. Click each event.

1968 Semantics of Context-Free Languages Donald E. Knuth

Invented attribute grammars: the "meaning" of a program is computed by attaching attributes to the nodes of a derivation tree, with equations attached to each production rule for propagating those attributes.

Why it grounds the tower: every layer in the AG tower is literally one Knuth-style tree where synthesized equations compute the next layer's source code. The tower is a 57-year extension of this one idea.

1970 A Relational Model of Data Edgar F. Codd

The closure properties of relational algebra, the typed constraints on what operations are meaningful — all of this is already a compile-time story.

Why it grounds the tower: the industry built its ergonomics on top of SQL instead of on top of the underlying theory. The AG tower for data reverses that mistake.

1971 Partial Evaluation of Computation Process Yoshihiko Futamura

Observed that partially evaluating an interpreter with respect to a source program yields a compiled program. Iterating gives the three projections: specialization, compilation, and compiler generation.

Why it grounds the tower: the AG tower's fixpoint (the seed compiles itself into stage0 which compiles itself into a byte-identical stage0) is a constructive, real-world instance of Futamura's third projection. Truffle / Graal at Oracle is the industrial existence proof.

1972 The Humble Programmer (EWD 340) Edsger W. Dijkstra

Origin of "program testing can be used to show the presence of bugs, but never to show their absence." The foundational statement that correctness is the central problem of programming.

Why it grounds the tower: every runtime data-quality tool is the thing Dijkstra warned against. The tower is the thing he asked for.

1984 Reflection and Semantics in LISP Brian Cantwell Smith

Introduced 3-Lisp and procedural reflection — the infinite tower of metacircular interpreters, each defined in the language one level below.

Why it grounds the tower: the direct intellectual ancestor of "the seed IS the compiler." Smith's tower is conceptual. The AG tower actually terminates, byte-for-byte, at stage0.

1984 Reflections on Trusting Trust Ken Thompson

Turing Award lecture. The canonical self-compiling-compiler parable — a compiler that recognizes its own source can encode behavior invisible in its source code.

Why it grounds the tower: taking Thompson seriously pushes you toward the tower's fixpoint discipline — every byte of the bootstrap is reproducible from the seed. The bootstrap is not an opaque artifact; it is the fixed-point of a compilation you can re-run from scratch.

1998 Growing a Language Guy Steele · OOPSLA keynote

A talk delivered entirely in one-syllable words plus words defined from prior syllables — mimicking the thesis: a language must be small enough to define, but must contain the seeds of its own growth.

Why it grounds the tower: the aesthetic spine. The AG tower's seed is small (50 productions). Every layer above grows from it by adding more productions. If you hand a friend ONE resource from this timeline, hand them this talk.

2003 Declaring the Grain Ralph Kimball

"Grain is the most frequent design error by far" and must be declared before dimensions or facts. "The grain declaration becomes a binding contract on the design."

Why it grounds the tower: a binding contract currently enforced by prose discipline. A compiler could enforce it mechanically — twenty-three years of proof that the field cannot escape grain by discipline alone.

2006 Out of the Tar Pit Ben Moseley & Peter Marks

Locates complexity — especially state and control — as the root of software pain. Proposes Functional Relational Programming: essential state as relations, logic as relational algebra plus pure functions.

Why it grounds the tower: the AG tower is a specific kind of complexity-elimination machine. The tar-pit essay is the most-cited manifesto for the shape of system the tower is an instance of.

2009 seL4 · CompCert NICTA · INRIA

Machine-checked proofs of full functional correctness: seL4 for an OS kernel, CompCert for a C compiler. The existence proofs that entire foundational layers can be verified end-to-end.

Why it grounds the tower: CompCert is the direct precedent for a verified compilation pipeline — exactly what the AG tower aspires to at much larger scope.

2015 How AWS Uses Formal Methods · Propositions as Types Newcombe et al. · Philip Wadler

The paper that ended the "formal methods don't scale" debate — TLA+ specs caught subtle bugs at AWS that fuzzing and chaos testing had missed. Wadler's essay canonically explains that a type is a proposition, a program is a proof.

Why it grounds the tower: types are lightweight proofs. Once you see this, the tower's "pull guarantees into the compiler" move is not exotic — it is the natural extension of what types already do.

2017 Compiling to Categories Conal Elliott · ICFP

A single Haskell expression can be compiled into many different categories — hardware circuits, automatic derivatives, interval analyses, incremental computations — by changing the interpretation functor.

Why it grounds the tower: one spec, many targets. The AG tower's per-layer prelude freedom is Elliott's move in AG clothing: the layer above chooses the interpretation; the layer below emits text into it.

2025 The AG Tower · fixpoint at 141,358 bytes this project

A Rust-based attribute grammar compilation tower. Each layer is one AG — one tree — that compiles into the next layer's source code. The seed self-compiles: gen1 == gen2 byte-for-byte at 141,358 bytes today.

The synthesis: Knuth's attribute grammars, Futamura's specialization, Smith's reflective towers, Thompson's fixpoint discipline, Steele's seed-growth aesthetic, Elliott's per-target compilation freedom — all in one artifact. Applied first to data engineering, where the cost of not having it is quantified in days-per-week and trillions-per-year.

What it is: the spec IS the compiler. Validation lives in the seed. Always compile, never interpret. Ergonomics are a product of the tower, not the foundation.

06What the AG tower is, concretely

Six principles the whole system rests on.

The seed IS the compiler

No separate "codegen AG." Every production is an attribute equation. Emission is attribute computation. Validation is attribute computation.

Always compile, never interpret

The interpreter used for the one-time bootstrap has been deleted. The checked-in stage0 binary is the compiler for every subsequent layer.

Everything compiled at generation time

Type annotations, declared types, nonterminal names — all resolved before any byte of the generated compiler runs.

Validation lives in the seed

Spec validation is built AS AG equations, not as a hand-coded pass. The seed refuses to compile specs that violate its invariants. Default answer when adding a diagnostic: write a seed equation.

Per-layer packaging freedom

A layer's Root equation emits arbitrary text — a crate bundle, a Markdown manifest, a WASM+JS pair, a Nix derivation, anything its host shim understands.

One representation

Everything is a Node tree. One construction primitive: Node::new. Ergonomics are a product of the tower, not the foundation.

What this buys you

A system in which "thou shalt not X" is itself a composable, layer-by-layer specification — not hardcoded in Rust.
A concrete existence proof that compilation is the right substrate for giving real-world guarantees. The tower's fixpoint shows you can bootstrap guarantee-carrying infrastructure from a very small seed.
A target for applying Futamura-style specialization, reflective-tower staging, and category-compiled interpretations to domains that have never seen them — beginning with data engineering.

Why data engineering first: the gap between guarantees practitioners want and guarantees their tools provide is widest there, the diagnoses are already written (Kimball on grain; Codd on relational closure; Hellerstein on declarative dataflow; Tabb on fan/chasm traps; Brandon on SQL), and the cost of the status quo is both well-quantified and paid in cash, monthly, by every company that ships dashboards.

07If a friend only has an afternoon — five readings

Every other link in this doc expands one of these five in a specific direction.

Growing a Language — Guy Steele, OOPSLA 1998

The aesthetic and the vision in one talk. If a friend reads nothing else, they should watch this.

Out of the Tar Pit — Moseley & Marks, 2006

Why software is hard, and why the AG tower's shape is the specific kind of complexity-elimination they call for.

Propositions as Types — Philip Wadler, CACM 2015

Why types are lightweight proofs — and why "stronger types" and "machine-checked guarantees" are the same mathematical object.

How AWS Uses Formal Methods — Newcombe et al., CACM 2015

The paper that ended the "formal methods don't scale" debate, with enough industrial detail to convince a skeptic.

Declaring the Grain — Kimball 2003, paired with the dbt 2024 primer

Twenty-one years of the field naming the same bug class, and still not being able to compile it away. The case for the data-engineering vertical in miniature.

08The full reading lists

Curated primary sources across the three grounding directions.

09Knowledge check

Test your understanding. Click a card to reveal the answer, then mark whether you got it. Use the arrows or Shuffle to navigate.

0 / 8 self-marked correct

In one sentence: what is "the move" that has defined the last 20 years of serious industrial programming?

Pull a property from "things we hope to catch at runtime" into "things the compiler refuses to build without." TypeScript, Rust, TLA+, Zelkova, OPA, differential privacy — all the same move, in different domains.

Why is a reconciliation job a "runtime confession of a missing compile-time guarantee"?

A reconciliation is a nightly cross-check between two independently-computed views of the same truth, run because you cannot prove they agree. If the two views were proved equivalent by construction, the reconciliation would have nothing to do. Its existence admits the absence of that proof.

What is "grain," and why has Kimball's 2003 naming of it not fixed the problem?

Grain is the combination of columns at which records in a table are unique — a binding contract on a data model. Kimball called it "the most frequent design error by far" in 2003, yet the 2026 state of the art is still "think deeply and write a uniqueness test" because no mainstream tool makes grain a language-level property the compiler can refuse to violate.

What does it mean that the AG tower's fixpoint is a concrete instance of Futamura's third projection?

Futamura (1971): partially evaluating an interpreter against a source program yields a compiled program; iterating gives a compiler generator. The AG tower's seed compiles itself into stage0, which compiles itself into a byte-identical stage0 at 141,358 bytes — a concrete, terminating version of Futamura's compiler-generating-a-compiler.

Why does "validation lives in the seed" beat "validation lives in a Rust walker"?

Because the seed is the compiler. Productions carry forward between layers; a Rust walker does not. Writing validation as AG equations means every layer above inherits the checks automatically and the emitter's own lookup machinery is reused. A hand-coded walker re-implements what the emitter already knows and stops propagating the moment you grow the tower.

Why data engineering first, as opposed to starting with a more obviously guarantee-heavy domain like crypto or OS kernels?

Three reasons: (1) the gap between guarantees practitioners want and what their tools provide is widest there; (2) the diagnoses are already written — Kimball on grain, Codd on relational closure, Hellerstein on declarative dataflow, Brandon on SQL; (3) the cost of the status quo is both well-quantified (2 days/week firefighting; $2T/yr software quality) and paid in cash, monthly, by every company that ships dashboards.

What is Brian Cantwell Smith's "reflective tower," and how does the AG tower differ from it?

Smith's 3-Lisp (1984) introduced an infinite tower of metacircular interpreters, each defined in the language one level below. Conceptually elegant but not finitary. The AG tower is the same shape but terminates: every layer is independent; the bottom is a checked-in bootstrap; every byte is reproducible via fixpoint.

What does "per-layer packaging freedom" mean, and why is it the tower's analogue of Conal Elliott's "Compiling to Categories"?

Each layer's Root equation emits arbitrary text — a Rust crate bundle, a Markdown manifest, a WASM+JS pair, a Nix derivation — whatever its host shim understands. Elliott (ICFP 2017) showed one Haskell expression can be compiled into many categories (circuits, derivatives, incremental computations) by swapping the interpretation. The AG tower is that idea in AG clothing: the layer above picks the target; the layer below emits text into it.

1 / 8

00The framing

01The horizontal: every domain is a guarantee domain

Memory safety

Concurrency

Numeric correctness and units

Access control

Distributed consistency

Cryptography

OS kernel

Compilation

Privacy

Confidentiality / information flow

Authority / capabilities

Policy · compliance · legal

02The cost of not having guarantees

03The vertical: data engineering is 30 years behind

04"You shouldn't have to run it"

The SQL loop

The Rust loop

05The intellectual lineage

06What the AG tower is, concretely

What this buys you

07If a friend only has an afternoon — five readings

08The full reading lists

Framing

Types as lightweight proofs

Formal methods at industrial scale

Policy · privacy · security as formal problems

"Would not have happened with the right types/proofs"

Cost of not having guarantees

Grain as foundational contract

Semantic / metrics layer

Cost of data bugs

Runtime DQ — the category-wide admission

Lineage & reconciliation

Typed / formal queries

Theoretical backbone

"Pipelines should be provable"

Attribute grammars

Futamura / partial evaluation

Multi-stage programming

Self-hosting / bootstrap

Reflective towers

Compiling to categories · growing a language

09Knowledge check