Invariant Research

Bad plans die cheaply.

AI agents burn expensive inference on invalid plans, repeated replanning, tool retries, and self-critique loops. Invariant moves failure detection out of the GPU loop and into deterministic CPU verification.

6 GPU calls avoided

45 tool calls avoided

12 invalid steps blocked

3 signed receipts

This is the agent application of the Invariant verification stack.
Generated is not verified. Same engine. Different evidence. Same receipt.

The Failure

The agent generated a plausible plan. The plan was invalid.

A deployment workflow with 10 actions and explicit preconditions. The agent skipped a required artifact-signing step. The plan looks correct in natural language but violates a structural precondition before execution.

Agent's Proposed Plan (9 actions)

1.+ create_release_branch

2.+ build_artifact

3.X deploy_to_staging

4. run_integration_tests

5. create_backup

6. approve_migration_plan

7. update_schema

8. deploy_to_production

9. notify_customer

REJECTED at action 3: artifact_signed must be true (is false)

What Was Missing

1.+ create_release_branch

2.+ build_artifact

! sign_artifact (skipped)

3.+ deploy_to_staging

4.+ run_integration_tests

5.+ create_backup

6.+ approve_migration_plan

7.+ update_schema

8.+ deploy_to_production

9.+ notify_customer

ACCEPTED all 10 actions valid; goal state reached

The verifier stepped through the actions, checked preconditions against the evolving state, and rejected the plan at the first impossible transition. Validation time: 19.6 microseconds. 7 downstream actions blocked before any tool call or GPU replan.

The Receipt

A deterministic failure certificate, not an opinion.

The receipt is application/vnd.svr.receipt+json, cryptographically signed, Ed25519-verifiable, and cacheable. The platform does not need to ask another model whether this plan is good. It already has a structural proof that the plan is impossible.

SVR Receipt application/vnd.svr.receipt+json

verdict: "contradicted" filing_safety: "BLOCKED" failed_action: "deploy_to_staging" failed_index: 2 missing: { "artifact_signed": true } reason: "Plan rejected at action 2 (deploy_to_staging): violated precondition [artifact_signed=True]. 2 prior actions were valid; failure is structural, not stochastic." items_checked: 3 items_passed: 2 items_failed: 1 wall_us: 19.6 method: "deterministic_algebraic" gpu_required: false parameters: 0 signature: "b7a49f8e...925ad90f" (Ed25519, 64 bytes) signature_status: "VALID"

The Comparison

Three ways to handle an invalid plan.

The same invalid deployment plan, three approaches. Aggregate across three failure modes: missing artifact signature, premature production deploy, and missing backup before schema update.

Lane A: Baseline Agent

GPU calls9

Tool calls45

Diagnosis3

Replans3

Receipts0

Steps blocked0

Lane B: LLM Self-Check

GPU calls9

Tool calls27

Diagnosis3

Replans3

Receipts0

Steps blocked0

Lane C: Invariant

GPU calls3

Tool calls0

Diagnosis0

Replans0

Receipts3

Steps blocked12

Lane B (LLM self-check) shows the optimistic case where the critic catches the issue. When it misses, the numbers match the baseline. Invariant's 3 GPU calls are the planner itself, which still synthesizes. The verifier runs on CPU.

The Asymmetry

Synthesis is hard. Validation is cheap.

The agent may spend unbounded compute synthesizing a plan. But validating a proposed plan against a formal transition model is polynomial-time: check the preconditions, apply the effects, verify the goal. That gap creates the margin-recovery layer.

Plan Synthesis

PSPACE-complete

In the general case, finding a valid plan is computationally explosive. The agent explores, backtracks, retries, self-critiques, and burns GPU the whole way.

Plan Validation

Polynomial-time

Step through the actions. Check preconditions. Apply effects. Verify the goal. Deterministic. Reproducible. Cacheable. Runs on CPU in microseconds.

Invariant does not replace the agent. It gates the agent. The agent proposes. Invariant checks. Bad plans die cheaply. Good plans move forward with receipts.

Run It Yourself

The demo is reproducible.

The deployment-precondition demo runs locally. One command validates all plans and produces signed receipts. Another prints the three-lane margin comparison.

View on GitHub SIGMA Performance Contact Invariant