Verifying a property is easy. Discovering where to assert it is hard. AI agents just solved the hard part.
Every formal verification system runs into the same fundamental asymmetry. It maps directly to the most famous unsolved problem in computer science.
Mechanical. Deterministic. Fast. A compiler can do this in milliseconds — just follow the rules and check constraints.
Requires understanding the codebase. Judgment. Expertise. Knowing what matters and where to look.
For decades, formal verification required rare, expensive expertise applied manually. Here's what that looked like:
Expert studies the codebase, understands architecture, dependencies, data flow
Expert decides: "This module should be marked side-effect-free"
Expert writes the annotation, the specification, the formal assertion
System verifies: confirmed. The property holds.
Move on to the next property. The cycle starts again.
The same workflow, transformed. What took months of human expertise now takes seconds of AI pattern recognition.
Agent scans the entire codebase, understands structure, dependencies, patterns
Agent generates thousands of candidate annotations:
"This function looks pure" •
"This module has no network calls" •
"This data never reaches an unauthenticated endpoint"
System verifies each annotation against the actual code structure
Most succeed → proven properties added to knowledge base. Failures are informative → tell the agent what to refactor.
Each proven property makes the system smarter. Like sudoku — the more cells you fill, the easier the remaining ones become.
When verification fails, the failure reason is valuable. It tells the agent exactly what to fix — and each fix makes the entire codebase more provable.
| Before AI Agents | Now |
|---|---|
| NP part: human expert, weeks per module | NP part: AI agent, seconds per codebase |
| Verification: already fast, but waiting on humans | Verification: instant, running continuously |
| Economics: only viable for life-critical systems | Economics: viable for ALL software |
| Cascade: slow (human generates each annotation) | Cascade: fast (agent generates, system infers) |
| Coverage: sparse (experts focus on risky areas) | Coverage: comprehensive (agent scans everything) |
Teams use testing (sampling), code review (trust), and convention (hope). Bugs are found in production. Security gaps are discovered by attackers. Architecture erodes because nothing enforces it.
AI agents annotate → framework verifies → properties accumulate → trust grows mechanically. Every module has a provenance. Every data path is traced. Regressions are caught before merge, not after deploy.
Developers work at the architecture level. Every arrow in your C4 diagram is a proven property. The codebase is a living proof, not a living document. Code that can't be proven correct can't be shipped.