Use case

Fix flaky tests with Codna

Flaky tests waste engineering hours and erode trust in CI. Codna does flaky test detection through the dependency graph — it finds the shared state, ordering, or timing cause deterministically, then fixes it and proves the test green.

The problem

Flakiness rarely lives in the test

A test that passes alone but fails in the suite almost never has a bug in the test itself. The cause is usually shared state left behind by a neighbor, async work that resolves in a different order under load, or a global side effect set several files away. Engineers lose hours rerunning the suite, bisecting test order, and staring at logs that change between runs. The easy escape — adding a retry or a sleep — hides the flake instead of fixing it, and the next regression lands on top of it. Codna's blast-radius graph surfaces what the test actually depends on, so an AI debugging tool can fix the cause instead of masking the symptom.

How Codna fixes it

How Codna stabilizes a flaky test

1

Map what the test touches

Codna's deterministic engine resolves the whole repo into a dependency and blast-radius graph in about 60ms for zero LLM tokens — surfacing the fixtures, globals, and ordering the test really depends on.

2

Fix from a tiny evidence bundle

The agent works from a ~600-token evidence bundle scoped to the suspect cause — 162x less context than reading the suite — so it targets shared state or timing, not a retry.

3

Prove it green

Codna re-runs the test and its neighbors against your own suite, so the fix lands only after the flake is actually gone.

codna fix . --issue "test_orders is flaky under parallel run"

What you get

What you get

Deterministic flaky test detection

A zero-token map of the repo into a dependency and blast-radius graph in ~60ms — no RAG, no embeddings — so the real cause of the flake surfaces instead of a guess.

Fixes verified by your own tests

Every patch re-runs the failing test and its neighbors against your suite before it lands, so a flake is proven gone rather than masked with a retry.

About $0.04 per verified fix

The agent fixes from a ~600-token evidence bundle instead of reading the whole suite, so a stabilized test costs pennies.

The proof

Fewer tokens. Faster. Verified.

Codna16K
Cline65K
Cursor81K
Total tokens to fix 8 verified bug-fix scenarios — measured head-to-head vs the Codex and Gemini CLIs.

Frequently asked

Codna targets the root cause — shared state, async ordering, or timing — by tracing what the test depends on in the blast-radius graph, then verifies the fix by re-running the test. It does not mask the flake with a retry or a sleep.

Codna maps the repo deterministically and surfaces the fixtures, globals, and ordering a test depends on for zero tokens. That points straight at the likely cause, so you skip the rerun-and-bisect loop.

Yes. The native GitHub App can triage a flaky check and open a test-verified fix PR with the root cause attached — no log copy-paste required. You can also run it from the CLI or via the MCP server in Cursor or Claude.

About $0.04 per verified fix. The deterministic map costs zero LLM tokens, so the agent works from a ~600-token evidence bundle instead of the whole suite.

Codna builds the dependency graph from the source itself, so it works across languages and whatever runner you already use — pytest, Vitest, Jest, node:test, go test, and others. Your existing runner is what verifies the fix.

Yes. Codna supports self-hosting with your own keys (BYOK), fail-closed egress, and no training on your code.

Understand. Fix. Evolve.