UseRightAI
UseRightAI logo
HomeModelsComparePricingWhat's New
UseRightAI
Cut through AI hype. Pick what works.
UseRightAI logo
Cut through AI hype. Pick what works.

Independent AI model tracker. Live pricing, real benchmarks, zero vendor bias.

X (Twitter)LinkedInUpdatesContact

Compare

ChatGPT vs ClaudeGPT-4o vs Claude SonnetClaude vs GeminiDeepSeek vs ChatGPTMistral vs ClaudeGemini Flash vs GPT-4o MiniLlama vs ChatGPTBuild your own →

Best For

CodingWritingDevelopersProduct ManagersDesignersSalesBest Cheap AIBest Free AI

Pricing & Data

API Token PricingPrice HistoryBenchmark ScoresPrivacy & SafetySubscription PlansCost CalculatorWhich AI is Cheapest?

Company

About UseRightAIContactWhat ChangedAll ModelsDisclosuresPrivacy PolicyTerms of Service

© 2026 UseRightAI. Independent · Free forever · Not affiliated with any AI provider.

Affiliate links are clearly labeled. See disclosures.

HomeAI Capability Matrix

Capability Matrix

Data verified April 2026

AI Capability Matrix

Every major AI model scored across 6 capability dimensions — coding, writing, research, images, budget efficiency, and long-context handling. Filter by provider or tier, sort any column, and find the right model for your specific use case.

Scores (0–100) are derived from public benchmarks, reported provider data, and independent testing · Updated when models change

How to use the capability matrix

The AI capability matrix shows every major model — GPT-5.4, Claude Sonnet 4.6, Gemini 2.0 Pro, Grok 3, Llama 4, Mistral Medium, DeepSeek V3, and more — scored from 0 to 100 on coding, writing, research, image generation, budget efficiency, and long-context performance. Use the filters to narrow by provider (OpenAI, Anthropic, Google, xAI, Meta, Mistral, DeepSeek) or price tier (Budget, Balanced, Premium). Click any column header to sort. The "Strong at" filter shows only models scoring 70+ in your chosen capability.

Capability score methodology

Coding scores incorporate SWE-bench Verified (real GitHub issue resolution), HumanEval (Python code generation pass@1), and observed refactoring quality. Writing scores weight tone control, coherence, long-form output, and independent writing evaluations. Research scores reflect document synthesis, GPQA (graduate-level science), and multi-step reasoning. Image scores cover generation quality and vision understanding. Budget scores inversely weight per-token cost against output quality. Long-context scores reflect context window size and coherence maintenance across long inputs.

20 of 20 models
Score key:88–100 Best72–87 Strong55–71 Decent<55 Limited
Tier
AnthropicClaude Opus 4.7
100
95
97
84
24
96
1M$5.00/MDeliberatePremium
AnthropicClaude Opus 4.6
99
94
96
54
8
95
1M$15.00/MDeliberatePremium
AnthropicClaude Sonnet 4.6
97
98
93
60
42
93
1M$3.00/MBalancedPremium
OpenAIGPT-5.5
96
92
94
90
22
95
1M$5.00/MBalancedPremium
xAIGrok 4
92
70
86
68
58
90
2M$2.00/MFastBalanced
OpenAIGPT-5.4
90
88
88
87
35
72
272K$2.50/MBalancedPremium
MistralCodestral 25.01
88
38
52
18
91
60
256K$0.90/MVery fastBudget
DeepSeekDeepSeek V3
87
74
80
10
96
62
128K$0.27/MFastBudget
OpenAIGPT-5.2
85
82
84
75
18
72
200K$12.00/MBalancedPremium
DeepSeekDeepSeek R1
84
60
89
5
88
58
128K$0.55/MDeliberateBudget
GoogleGemini 3.1 Pro
80
82
99
88
52
99
2M$2.00/MBalancedPremium
OpenAIGPT-5.2 Mini
78
72
68
50
74
52
128K$1.20/MFastBalanced
MistralMistral Large 2
72
72
71
40
55
58
128K$3.00/MBalancedBalanced
GoogleGemini 3.1 Flash
68
75
76
82
97
82
1M$0.50/MVery fastBudget
OpenAIGPT-4o Mini
65
76
62
60
93
58
128K$0.15/MVery fastBudget
OpenAIGPT-4o
58
76
65
80
40
52
128K$5.00/MFastBalanced
MetaLlama 4 Maverick
58
66
64
55
78
62
256K$0.60/MFastBudget
MistralMistral Small 3.1
55
66
52
65
98
50
128K$0.10/MVery fastBudget
MetaLlama 4 Scout
54
60
78
35
86
88
512K$0.50/MFastBudget
AnthropicClaude 4 Haiku
52
85
62
36
88
58
200K$0.80/MVery fastBudget
AnthropicPremium
Deliberate
Claude Opus 4.7
Code
100
Write
95
Research
97
Images
84
Budget
24
Context
96
1M ctx·$5.00/M in
AnthropicPremium
Deliberate
Claude Opus 4.6
Code
99
Write
94
Research
96
Images
54
Budget
8
Context
95
1M ctx·$15.00/M in
AnthropicPremium
Balanced
Claude Sonnet 4.6
Code
97
Write
98
Research
93
Images
60
Budget
42
Context
93
1M ctx·$3.00/M in
OpenAIPremium
Balanced
GPT-5.5
Code
96
Write
92
Research
94
Images
90
Budget
22
Context
95
1M ctx·$5.00/M in
xAIBalanced
Fast
Grok 4
Code
92
Write
70
Research
86
Images
68
Budget
58
Context
90
2M ctx·$2.00/M in
OpenAIPremium
Balanced
GPT-5.4
Code
90
Write
88
Research
88
Images
87
Budget
35
Context
72
272K ctx·$2.50/M in
MistralBudget
Very fast
Codestral 25.01
Code
88
Write
38
Research
52
Images
18
Budget
91
Context
60
256K ctx·$0.90/M in
DeepSeekBudget
Fast
DeepSeek V3
Code
87
Write
74
Research
80
Images
10
Budget
96
Context
62
128K ctx·$0.27/M in
OpenAIPremium
Balanced
GPT-5.2
Code
85
Write
82
Research
84
Images
75
Budget
18
Context
72
200K ctx·$12.00/M in
DeepSeekBudget
Deliberate
DeepSeek R1
Code
84
Write
60
Research
89
Images
5
Budget
88
Context
58
128K ctx·$0.55/M in
GooglePremium
Balanced
Gemini 3.1 Pro
Code
80
Write
82
Research
99
Images
88
Budget
52
Context
99
2M ctx·$2.00/M in
OpenAIBalanced
Fast
GPT-5.2 Mini
Code
78
Write
72
Research
68
Images
50
Budget
74
Context
52
128K ctx·$1.20/M in
MistralBalanced
Balanced
Mistral Large 2
Code
72
Write
72
Research
71
Images
40
Budget
55
Context
58
128K ctx·$3.00/M in
GoogleBudget
Very fast
Gemini 3.1 Flash
Code
68
Write
75
Research
76
Images
82
Budget
97
Context
82
1M ctx·$0.50/M in
OpenAIBudget
Very fast
GPT-4o Mini
Code
65
Write
76
Research
62
Images
60
Budget
93
Context
58
128K ctx·$0.15/M in
OpenAIBalanced
Fast
GPT-4o
Code
58
Write
76
Research
65
Images
80
Budget
40
Context
52
128K ctx·$5.00/M in
MetaBudget
Fast
Llama 4 Maverick
Code
58
Write
66
Research
64
Images
55
Budget
78
Context
62
256K ctx·$0.60/M in
MistralBudget
Very fast
Mistral Small 3.1
Code
55
Write
66
Research
52
Images
65
Budget
98
Context
50
128K ctx·$0.10/M in
MetaBudget
Fast
Llama 4 Scout
Code
54
Write
60
Research
78
Images
35
Budget
86
Context
88
512K ctx·$0.50/M in
AnthropicBudget
Very fast
Claude 4 Haiku
Code
52
Write
85
Research
62
Images
36
Budget
88
Context
58
200K ctx·$0.80/M in

Frequently asked questions

What do the capability scores mean?

Each score (0–100) reflects how well a model performs for that use case, derived from benchmark results, real-world testing, and community data. 88+ is best-in-class, 72–87 is strong, 55–71 is capable, below 55 is limited for that task.

Which AI model is best for coding?

Claude Sonnet 4.6 leads coding with a score of 95/100, driven by its 79.6% SWE-bench Verified score — the highest of any production model. GPT-5.4 is the runner-up at 88.

Which AI model is best for writing?

Claude Sonnet 4.6 leads writing quality with a score of 96/100. Its tone control, coherence, and long-form output polish are consistently rated best-in-class across independent evaluations.

Which AI models support image generation?

GPT-5.4 (via DALL-E 3) and Gemini 2.0 Pro have strong image generation. Claude models do not support image generation — they can analyze images but cannot create them.

What is the best budget AI model?

DeepSeek V3 and Mistral Medium lead the budget category with scores above 85, offering near-premium quality at $0.07–$0.40/1M tokens — a fraction of GPT-5.4 or Claude Sonnet 4.6.

Which AI has the longest context window?

Gemini 2.0 Pro and Claude Sonnet 4.6 both offer 1M token context windows — enough for entire codebases or book-length documents. GPT-5.4 supports 272K tokens.

Explore by use case

Best AI for codingBest AI for writingBest AI for researchBest AI for imagesBest budget AIBest long-context AIContext window comparisonBenchmark scoresAPI pricing