Search, filter, and sort every tracked model by provider, use case, pricing tier, speed, and context window — all in one place.
Rankings refresh dailyScored on 6 criteriaNo paid rankings
Instant answer
If you want the shortest answer: Claude Fable 5 for coding and writing, Mistral Small 3.1 for cost-sensitive work, and Claude 4 Haiku when latency and throughput matter most.
Use the directory to compare by the thing that actually changes the decision: coding benchmark score, writing quality, cost per million tokens, speed, or context window size. That usually narrows the right model in under a minute.
The current directory includes 23 models across multiple providers, with all entries mapped to the same pricing, speed, and use-case structure.
Claude 4 Haiku is the fastest broad-use option when latency matters more than maximum reasoning depth.
AnthropicBudget
Best for
Fast budget writing, support automation, and cost-sensitive Anthropic integrations
Price
$0.80/1M
Context
200k tokens
Comparison table
Compare the tradeoffs
This table compares the defaults most people actually need to understand first: best overall, best budget, fastest broad-use option, and the strongest cheap coding specialist.
Strong coding value with 2M context — an underrated pick at this price.
xAI
Coding and research at competitive pricing with maximum context
$2.00/1M
$6.00/1M
2M tokens
Fast
When to use what
Use this as a practical filter before you start browsing the whole directory. It shows which leading option fits each common decision style and where it becomes the wrong pick.
Google's flagship with the largest context window of any frontier model at 2M tokens, Deep Think reasoning, and the best price-to-performance among premium models.
Verdict
Best for research and deep document analysis — 2M context at the best premium price.
Quality score
89%
Pricing
$2.00/1M in
$12.00/1M out
Speed
Balanced
Best for research, deep document analysis, and long-context reasoning at competitive pricing
Context
2M tokens
The 2M context window is a genuine competitive advantage — no other frontier model gets close for document-heavy workflows.
Research leader2M contextBest value premiumDeep Think
Best for
Research, deep document analysis, and long-context reasoning at competitive pricing
Mistral's ultra-budget multimodal model — exceptionally cheap with vision support, built for high-volume lightweight tasks where cost is the primary constraint.
Verdict
Ultra-cheap multimodal model for massive-volume, low-complexity pipelines.
Quality score
57%
Pricing
$0.10/1M in
$0.30/1M out
Speed
Very fast
Best for ultra-high-volume classification, summarisation, and lightweight vision tasks
Context
128k tokens
At $0.10/1M input, the cost question disappears. The only question is whether the task complexity exceeds what Mistral Small can handle.
BudgetMultimodalUltra cheapMistral
Best for
Ultra-high-volume classification, summarisation, and lightweight vision tasks
OpenAI's latest agentic flagship for coding, research, computer-use workflows, and long multi-step knowledge work.
Verdict
Best OpenAI flagship for agentic coding, research, and computer-use work.
Quality score
94%
Pricing
$5.00/1M in
$30.00/1M out
Speed
Balanced
Best for agentic coding, computer-use workflows, and complex research tasks
Context
1M tokens
Ranked from public benchmark and pricing data verified April 26, 2026: SWE-Bench Pro 58.6%, Terminal-Bench 2.0 82.7%, $5/$30 per 1M tokens, 1M API context.
AgenticCodingComputer useLong contextPremium
Best for
Agentic coding, computer-use workflows, and complex research tasks
Open-source frontier model from DeepSeek that matches GPT-4o class performance at a fraction of the cost — the most disruptive budget option for coding and general tasks.
Verdict
GPT-4o-class coding quality at under $0.30/1M — the best value in the directory.
Quality score
71%
Pricing
$0.27/1M in
$1.10/1M out
Speed
Fast
Best for coding, reasoning, and general tasks at extreme cost efficiency
Context
128k tokens
DeepSeek V3 shocked the market on release. At this price point with this capability level, it forces a reconsideration of when premium models are actually worth it.
Open sourceBudgetCodingDeepSeek
Best for
Coding, reasoning, and general tasks at extreme cost efficiency
Anthropic's newest Opus flagship — 69.2% SWE-Bench Pro, 88.6% SWE-Bench Verified, 1890 Arena Elo (121 pts ahead of GPT-5.5), and native parallel subagents. Same $5/$25 price as Opus 4.7.
Verdict
Best value premium coder — frontier-grade at half of Fable 5's price.
Quality score
97%
Pricing
$5.00/1M in
$25.00/1M out
Speed
Deliberate
Best for hardest coding tasks, parallel agentic workflows, and high-fidelity vision
Context
1M tokens
Launched May 27, 2026. Available on Claude API, AWS Bedrock, Google Vertex AI, Microsoft Foundry, and GitHub Copilot. Fast mode available at $10/$50 per 1M tokens.
CodingParallel subagentsAgenticLong contextPremiumBest value premium
Best for
Hardest coding tasks, parallel agentic workflows, and high-fidelity vision
Anthropic's new Mythos-class flagship and the most capable coding model anyone can use — 80.3% SWE-Bench Pro, an 11-point jump over Opus 4.8. 1M context, 128K output, native parallel subagents. Released June 9, 2026.
Verdict
New global #1 — 80.3% SWE-Bench Pro, the most capable model generally available.
Quality score
98%
Pricing
$10.00/1M in
$50.00/1M out
Speed
Deliberate
Best for the hardest coding tasks, autonomous multi-step agents, and frontier-grade reasoning
Context
1M tokens
Launched June 9, 2026 as the public, Mythos-class release. Available on the Claude API, Microsoft Foundry, and Google Vertex AI. Free for all users until June 22, 2026. Same underlying model as Claude Mythos 5, with safeguards that block specific high-risk cyber responses.
Coding leaderSWE-Bench Pro #1Mythos-classParallel subagentsAgenticLong contextPremiumNew
Best for
The hardest coding tasks, autonomous multi-step agents, and frontier-grade reasoning
Anthropic's most powerful frontier model — the same underlying model as Fable 5 with safeguards lifted in some areas, restricted to vetted enterprise and research partners. The capability ceiling of mid-2026.
Verdict
The frontier ceiling — same model as Fable 5, safeguards lifted, partner-only.
Quality score
98%
Pricing
$10.00/1M in
$50.00/1M out
Speed
Deliberate
Best for frontier cybersecurity research, autonomous vulnerability discovery, and the absolute capability ceiling
Context
1M tokens
Launched June 9, 2026 alongside Fable 5, following the April Project Glasswing private preview on Google Cloud. Restricted to vetted enterprise and research partners due to advanced cybersecurity capabilities. Same underlying model and benchmarks as Claude Fable 5.
FrontierRestricted accessCybersecuritySWE-Bench Pro #1Mythos-classPremiumNew
Best for
Frontier cybersecurity research, autonomous vulnerability discovery, and the absolute capability ceiling
Balanced enterprise model with consistent reasoning, good speed, and a dependable middle-ground — especially for European teams with data residency requirements.
Verdict
Best balanced generalist for EU teams with data residency needs.
Quality score
67%
Pricing
$3.00/1M in
$9.00/1M out
Speed
Balanced
Best for balanced team usage with eu data residency requirements
Context
128k tokens
The EU hosting angle is the real differentiator here — for teams outside Europe, other models perform better.
EU hostingBalancedTeam default
Best for
Balanced team usage with EU data residency requirements
Open-source reasoning model that matches o1-class performance on math, science, and complex coding at a fraction of the cost — the best open alternative to proprietary reasoning models.
Verdict
Open-source o1-class reasoning at a fraction of the cost.
Quality score
68%
Pricing
$0.55/1M in
$2.19/1M out
Speed
Deliberate
Best for math, science, complex reasoning, and multi-step problem solving at budget cost
Context
128k tokens
R1 is a genuine milestone for open-source AI. The reasoning quality is real — the tradeoff is latency, not capability.
ReasoningOpen sourceBudgetDeepSeek
Best for
Math, science, complex reasoning, and multi-step problem solving at budget cost
Start with Claude Fable 5 for the best daily-driver default. Use Mistral Small 3.1 if cost is the priority. Use Grok 4 if you need the strongest coding performance.
Which AI model is cheapest?
Mistral Small 3.1 is the best cheap default balancing cost, usefulness, and context window. Grok 4 is the cheapest coding specialist.
Which AI model is best for coding?
Grok 4 is the strongest budget coding option in the directory. Claude Fable 5 is the practical all-around default that also excels at coding tasks.
How should I compare models?
Start with your main use case, then compare price, speed, and context window. The best model changes quickly when one of those priorities matters more than the others.