Every major AI model scored across 6 capability dimensions — coding, writing, research, images, budget efficiency, and long-context handling. Filter by provider or tier, sort any column, and find the right model for your specific use case.
Scores (0–100) are derived from public benchmarks, reported provider data, and independent testing · Updated when models change
How to use the capability matrix
The AI capability matrix shows every major model — GPT-5.4, Claude Sonnet 4.6, Gemini 2.0 Pro, Grok 3, Llama 4, Mistral Medium, DeepSeek V3, and more — scored from 0 to 100 on coding, writing, research, image generation, budget efficiency, and long-context performance. Use the filters to narrow by provider (OpenAI, Anthropic, Google, xAI, Meta, Mistral, DeepSeek) or price tier (Budget, Balanced, Premium). Click any column header to sort. The "Strong at" filter shows only models scoring 70+ in your chosen capability.
Capability score methodology
Coding scores incorporate SWE-bench Verified (real GitHub issue resolution), HumanEval (Python code generation pass@1), and observed refactoring quality. Writing scores weight tone control, coherence, long-form output, and independent writing evaluations. Research scores reflect document synthesis, GPQA (graduate-level science), and multi-step reasoning. Image scores cover generation quality and vision understanding. Budget scores inversely weight per-token cost against output quality. Long-context scores reflect context window size and coherence maintenance across long inputs.
Each score (0–100) reflects how well a model performs for that use case, derived from benchmark results, real-world testing, and community data. 88+ is best-in-class, 72–87 is strong, 55–71 is capable, below 55 is limited for that task.
Which AI model is best for coding?
Claude Sonnet 4.6 leads coding with a score of 95/100, driven by its 79.6% SWE-bench Verified score — the highest of any production model. GPT-5.4 is the runner-up at 88.
Which AI model is best for writing?
Claude Sonnet 4.6 leads writing quality with a score of 96/100. Its tone control, coherence, and long-form output polish are consistently rated best-in-class across independent evaluations.
Which AI models support image generation?
GPT-5.4 (via DALL-E 3) and Gemini 2.0 Pro have strong image generation. Claude models do not support image generation — they can analyze images but cannot create them.
What is the best budget AI model?
DeepSeek V3 and Mistral Medium lead the budget category with scores above 85, offering near-premium quality at $0.07–$0.40/1M tokens — a fraction of GPT-5.4 or Claude Sonnet 4.6.
Which AI has the longest context window?
Gemini 2.0 Pro and Claude Sonnet 4.6 both offer 1M token context windows — enough for entire codebases or book-length documents. GPT-5.4 supports 272K tokens.