Public head-to-head benchmark
Our first measured comparison against the OpenAI Codex and Google Gemini CLIs — with the full dataset published.
- 8 bug-fix scenarios on identical checkouts: single bugs, three-at-once, and repos the agents had never seen.
- 5× fewer tokens and 1.7× faster than Cursor, with every fix verified by a passing test (87/87).
- Per-scenario data downloadable from the benchmarks page, with full methodology.