UseRightAI
HomeModelsAsk AIComparePricingWhat's New
UseRightAICut through AI hype. Pick what works.

Independent AI model tracker. Live pricing, real benchmarks, zero vendor bias.

X (Twitter)LinkedInUpdatesContact

Compare

Opus 4.8 vs Opus 4.7Fable 5 vs Opus 4.8New AI Models 2026ChatGPT vs ClaudeGPT-4o vs Claude SonnetClaude vs GeminiDeepSeek vs ChatGPTMistral vs ClaudeGemini Flash vs GPT-4o MiniLlama vs ChatGPTAll comparisons →Build your own →

Best For

CodingWritingDevelopersProduct ManagersDesignersSalesBest Cheap AIBest Free AI

Pricing & Data

API Token PricingPrice HistoryBenchmark ScoresPrivacy & SafetySubscription PlansCost CalculatorWhich AI is Cheapest?

Company

About UseRightAIContactWhat ChangedAll ModelsDisclosuresPrivacy PolicyTerms of Service

© 2026 UseRightAI. Independent · Free forever · Not affiliated with any AI provider.

Affiliate links are clearly labeled. See disclosures.

HomeBenchmarksSWE-bench Leaderboard

Leaderboard

SWE-bench Leaderboard

Every major AI model ranked by SWE-bench — the benchmark that measures whether a model can resolve real GitHub issues, not toy problems. Pricing and context columns are included because a leaderboard you can't act on is just trivia.

Scores from provider publications and public leaderboards · Pricing verified daily

Current leader
Claude Fable 593.4%Anthropic · $10/1M input · 1M tokens context

Best value in the top 10: GPT-5.4 — 74.9% at $0.19999999999999998/1M input tokens.

#ModelProviderSWE-benchInput $/1MContext
1Claude Fable 5Anthropic93.4%$101M tokens
2Claude Mythos 5Anthropic93.4%$101M tokens
3Claude Opus 4.8Anthropic88.6%$101M tokens
4Claude Opus 4.7Anthropic80%$51M tokens
5Claude Sonnet 4.6Anthropic79.6%$31M tokens
6GPT-5.4OpenAI74.9%$0.19999999999999998272k tokens
7Claude Opus 4.6Anthropic72.5%$151M tokens
8Gemini 3.1 ProGoogle63.2%$22M tokens
9Grok 4xAI54%$1.252M tokens
10DeepSeek R1DeepSeek49.2%$0.55128k tokens
11GPT-4oOpenAI46%$0.15128k tokens
12Claude Haiku 4Anthropic43%$0.8200k tokens
13DeepSeek V3DeepSeek42%$0.27128k tokens
14Gemini 3.1 FlashGoogle35%$0.251M tokens
15Llama 4 MaverickMeta32%$0.15256k tokens
16Mistral Large 2Mistral28%$2128k tokens
17GPT-4o MiniOpenAI23.6%$0.15128k tokens
All benchmark scores →Best AI for coding →Claude Fable 5 alternatives →API cost calculator →

FAQ

What is SWE-bench?

SWE-bench is a benchmark that tests whether an AI model can resolve real GitHub issues from real open-source repositories — write a patch, pass the tests. Unlike puzzle-style benchmarks, it measures the messy, multi-file work software engineers actually do, which is why it has become the standard for comparing coding models.

Which AI model has the highest SWE-bench score?

Claude Fable 5 (Anthropic) currently leads at 93.4%, ahead of Claude Mythos 5 at 93.4%.

What is a good SWE-bench score?

Anything above 70% is frontier-class in 2026 — the model can resolve most real GitHub issues autonomously. The current leader, Claude Fable 5, is at 93.4%. Two years ago the best models scored under 20%, which is how fast this benchmark moves.

Does the highest SWE-bench score mean the best coding model for me?

Not always. Score-per-dollar matters for daily work: GPT-5.4 delivers 74.9% at $0.19999999999999998/1M input tokens, which is the best value in the top 10. Reserve the outright leader for the hardest tasks and route volume work to the value pick.

How often is this leaderboard updated?

Scores are updated whenever providers publish new results, and the pricing columns update automatically from our daily-verified pricing data.

Newsletter

Get notified when the SWE-bench leader changes

We track new benchmark publications. When a model takes the top spot, you'll know first.

No spam. Useful updates only. Affiliate disclosures always clearly labeled.