Every major AI model ranked by SWE-bench — the benchmark that measures whether a model can resolve real GitHub issues, not toy problems. Pricing and context columns are included because a leaderboard you can't act on is just trivia.
Scores from provider publications and public leaderboards · Pricing verified daily
Current leader
Claude Fable 593.4%Anthropic · $10/1M input · 1M tokens context
Best value in the top 10: GPT-5.4 — 74.9% at $0.19999999999999998/1M input tokens.
SWE-bench is a benchmark that tests whether an AI model can resolve real GitHub issues from real open-source repositories — write a patch, pass the tests. Unlike puzzle-style benchmarks, it measures the messy, multi-file work software engineers actually do, which is why it has become the standard for comparing coding models.
Which AI model has the highest SWE-bench score?
Claude Fable 5 (Anthropic) currently leads at 93.4%, ahead of Claude Mythos 5 at 93.4%.
What is a good SWE-bench score?
Anything above 70% is frontier-class in 2026 — the model can resolve most real GitHub issues autonomously. The current leader, Claude Fable 5, is at 93.4%. Two years ago the best models scored under 20%, which is how fast this benchmark moves.
Does the highest SWE-bench score mean the best coding model for me?
Not always. Score-per-dollar matters for daily work: GPT-5.4 delivers 74.9% at $0.19999999999999998/1M input tokens, which is the best value in the top 10. Reserve the outright leader for the hardest tasks and route volume work to the value pick.
How often is this leaderboard updated?
Scores are updated whenever providers publish new results, and the pricing columns update automatically from our daily-verified pricing data.
Newsletter
Get notified when the SWE-bench leader changes
We track new benchmark publications. When a model takes the top spot, you'll know first.
No spam. Useful updates only. Affiliate disclosures always clearly labeled.