P3 Engineers P4 Technical Investors

The Sudoku Insight

Verifying a property is easy. Discovering where to assert it is hard. AI agents just solved the hard part.

The Core Dynamic

The Oldest Pattern in Formal Methods

Every formal verification system runs into the same fundamental asymmetry. It maps directly to the most famous unsolved problem in computer science.

P — Easy
VERIFY: Given the claim "this function is pure", check if it holds.

Mechanical. Deterministic. Fast. A compiler can do this in milliseconds — just follow the rules and check constraints.

NP — Hard
DISCOVER: Which functions SHOULD be annotated as pure?

Requires understanding the codebase. Judgment. Expertise. Knowing what matters and where to look.

This is why formal methods have been expensive. The verification is cheap — it's mechanical, deterministic, automatable. The expensive part is the human expertise needed to know what to annotate, where to assert properties, which invariants matter.
The Historical Bottleneck

Humans Did the NP Part

For decades, formal verification required rare, expensive expertise applied manually. Here's what that looked like:

1
Human — Weeks/Months

Expert studies the codebase, understands architecture, dependencies, data flow

2
Human — Days

Expert decides: "This module should be marked side-effect-free"

3
Human — Hours

Expert writes the annotation, the specification, the formal assertion

4
Machine — Milliseconds

System verifies: confirmed. The property holds.

5
Human — Repeat

Move on to the next property. The cycle starts again.

Bottleneck: Steps 1-3 are human, slow, expensive. Step 4 is machine, fast, cheap.
This is why formal methods were niche — they required rare, expensive expertise applied manually. The ROI only made sense for life-critical systems: avionics, medical devices, nuclear reactors. Everyone else relied on testing and hope.
The Breakthrough

AI Agents Change This

The same workflow, transformed. What took months of human expertise now takes seconds of AI pattern recognition.

1
AI Agent — Seconds

Agent scans the entire codebase, understands structure, dependencies, patterns

2
AI Agent — Seconds

Agent generates thousands of candidate annotations:
"This function looks pure""This module has no network calls""This data never reaches an unauthenticated endpoint"

3
Machine — Milliseconds each

System verifies each annotation against the actual code structure

4
Continuous

Most succeed → proven properties added to knowledge base. Failures are informative → tell the agent what to refactor.

0
Annotations Generated
0
Verified
0
Failed (Informative)
0.0s
Total Time
The NP part — knowing what to assert — is exactly what AI agents are good at. Pattern recognition across large codebases. Generating hypotheses at scale. They don't need to be right every time. They just need to be right enough — and the verification catches the rest.
The Cascade Effect

The Sudoku Cascade

Each proven property makes the system smarter. Like sudoku — the more cells you fill, the easier the remaining ones become.

Proven: 0 / 0
Coverage: 0%
Cascade: 1.0x
Unknown
Candidate
Agent-proven
Inferred
Failed
Early proofs are slow — human-guided or agent-generated. Later proofs are automatic — inferred from the accumulation of knowledge. Each new proven property enables 2–3 more inferences. The cascade accelerates until the codebase converges on full coverage.
When Verification Fails

The Refactoring Feedback Loop

When verification fails, the failure reason is valuable. It tells the agent exactly what to fix — and each fix makes the entire codebase more provable.

Example 1: Purity via Dependency Injection
Before — Can't prove pure
// Imports side-effectful module import { logger } from './logger'; function processData(data) { logger.info('Processing...'); // ← side effect return transform(data); }
After — Agent refactors, provably pure
function processData(data, log = noop) { log('Processing...'); // ← caller controls side effects return transform(data); }
Proven pure Now testable Semantically convertible to Rust
Example 2: Immutability Cascade
Before — Can't prove immutable
let config = loadConfig(); // ... 200 lines ... config.timeout = 5000; // ← mutation
After — Agent refactors, provably immutable
const config = Object.freeze(loadConfig()); const updatedConfig = { ...config, timeout: 5000 };
Proven immutable Loop is deterministic Function is pure
Example 3: Auth Path Completion
Before — Framework reports gap
// Framework reports: "Endpoint /api/data missing auth middleware" app.get('/api/data', handler);
After — Agent adds auth middleware
app.get('/api/data', authMiddleware, handler);
All endpoints proven authenticated Zero unprotected routes
Each fix doesn't just solve one problem. It makes the entire codebase more provable. The refactoring feedback loop converges toward a codebase that is: more modular, more testable, more provable, and (as a side effect) cleaner and better-designed.
The Shift

Why This Wasn't Possible Before

Before AI Agents Now
NP part: human expert, weeks per module NP part: AI agent, seconds per codebase
Verification: already fast, but waiting on humans Verification: instant, running continuously
Economics: only viable for life-critical systems Economics: viable for ALL software
Cascade: slow (human generates each annotation) Cascade: fast (agent generates, system infers)
Coverage: sparse (experts focus on risky areas) Coverage: comprehensive (agent scans everything)
The P side (verification) was always ready. The NP side (annotation) was the bottleneck. AI agents removed the bottleneck. This framework is the verification infrastructure that makes it all work.
The Trajectory

The Implication for Software Development

Today

Teams use testing (sampling), code review (trust), and convention (hope). Bugs are found in production. Security gaps are discovered by attackers. Architecture erodes because nothing enforces it.

Near-term

AI agents annotate → framework verifies → properties accumulate → trust grows mechanically. Every module has a provenance. Every data path is traced. Regressions are caught before merge, not after deploy.

Long-term

Developers work at the architecture level. Every arrow in your C4 diagram is a proven property. The codebase is a living proof, not a living document. Code that can't be proven correct can't be shipped.