UseRightAI
UseRightAI logo
HomeModelsComparePricingWhat's New
UseRightAI
Cut through AI hype. Pick what works.
UseRightAI logo
Cut through AI hype. Pick what works.

Independent AI model tracker. Live pricing, real benchmarks, zero vendor bias.

X (Twitter)LinkedInUpdatesContact

Compare

ChatGPT vs ClaudeGPT-4o vs Claude SonnetClaude vs GeminiDeepSeek vs ChatGPTMistral vs ClaudeGemini Flash vs GPT-4o MiniLlama vs ChatGPTBuild your own →

Best For

CodingWritingDevelopersProduct ManagersDesignersSalesBest Cheap AIBest Free AI

Pricing & Data

API Token PricingPrice HistoryBenchmark ScoresPrivacy & SafetySubscription PlansCost CalculatorWhich AI is Cheapest?

Company

About UseRightAIContactWhat ChangedAll ModelsDisclosuresPrivacy PolicyTerms of Service

© 2026 UseRightAI. Independent · Free forever · Not affiliated with any AI provider.

Affiliate links are clearly labeled. See disclosures.

API PricingSubscription Plans
Pricing

AI model pricing comparison

See what you pay, what context you get, and where the best value lives for coding, writing, and high-volume usage.

Rankings refresh dailyScored on 6 criteriaNo paid rankings
Instant answer

If you want the shortest pricing answer, start with Mistral Small 3.1 for the best value default. Use Mistral: Mistral Nemo only when raw lowest API price matters more than output quality.

Cheap does not automatically mean efficient. The real pricing decision is whether lower token cost saves more money than the extra review, rewrites, or mistakes it creates.

This page compares raw cost, context, and practical usefulness so you can avoid false-economy pricing decisions.

Read the pricing guideWhich AI is cheapest?

Clear recommendation block

The safest value pick, the raw cheapest API, and the fast default worth considering before you optimize around price alone.

Cheapest overall
Mistral: Mistral Nemo
Teams needing a cheap, fast, multilingual workhorse for classification, summarization, or light coding tasks at scale.
$0.02/1M input
Best budget for coding
Meta: Llama 3.1 8B Instruct
High-throughput applications where cost and speed matter more than frontier-level quality, such as chatbots, content classification, and text summarization.
$0.05/1M output
Best budget for writing
Meta: Llama 3.1 8B Instruct
High-throughput applications where cost and speed matter more than frontier-level quality, such as chatbots, content classification, and text summarization.
$0.02/1M input
Comparison table

Compare the tradeoffs

This table focuses on the pricing decisions teams actually make first: best value default, absolute cheapest option, budget coding pick, and a fast low-cost option.

MistralBudget

Mistral Small 3.1

Ultra-cheap multimodal model for massive-volume, low-complexity pipelines.

Best for
Ultra-high-volume classification, summarisation, and lightweight vision tasks
Speed
Very fast
Input cost
$0.35/1M
Output cost
$0.56/1M
Context
128k tokens
MistralBudget

Mistral: Mistral Nemo

When to use what

Use this section to decide whether you should optimize for raw API cost, value per prompt, cheaper coding throughput, or faster user-facing response time.

Best value default

Mistral Small 3.1

Model page

Ultra-cheap multimodal model for massive-volume, low-complexity pipelines.

When to use

Ultra-high-volume classification, summarisation, and lightweight vision tasks

When not to use

You need reliable multi-step reasoning or coding quality — it won't hold up.

Cheapest raw option

Mistral: Mistral Nemo

Model page

How we evaluate AI models

Pricing recommendations are based on a mix of list price, real-world usefulness, speed, context window, and whether a lower-cost model still holds up under practical workloads.

Performance

Benchmark scores from SWE-bench (coding), ARC-AGI-2 (reasoning), and MMLU (knowledge breadth) — cross-referenced against Chatbot Arena community votes to filter out cherry-picked provider claims.

Pricing

Input and output costs verified directly against each provider's official API pricing page. Updated whenever a price change is detected. Value-per-dollar is weighted separately from raw benchmark rank.

Context window

Advertised context sizes are noted but scored against real-world usability — models that degrade significantly at large contexts are penalised even if the window is technically available.

Real-world usability

Production signals matter more than lab scores. We weight Cursor and Windsurf defaults, HackerNews sentiment, developer surveys, and which models teams actually keep using after the honeymoon period.

Consistency

One-off wins on cherry-picked benchmarks don't move our rankings. We favour models that stay dependable across repeated prompts, diverse task types, and long sessions without degrading.

Speed

Time-to-first-token and output throughput from Artificial Analysis speed benchmarks. Latency is categorised from Very fast to Deliberate — relevant when building interactive or high-throughput products.

Data sources

CodingSWE-benchReasoningARC-AGI-2KnowledgeMMLUCommunityChatbot ArenaSpeedArtificial AnalysisCostProvider pricing pages

Pricing calculator

See your monthly API cost vs consumer subscription across all models.

50
11,500 / month500

600 input + 700 output tokens

ModelMonthly API costAnnual API costvs Subscription
MistralMistral Small 3.1
$0.40$4.86API only
OpenAIGPT-4o Mini
$0.76$9.18
API cheaper

Sub wins at 39,216 msg/mo

DeepSeekDeepSeek V3
$1.40$16.78API only
MetaLlama 4 Scout
$1.71$20.52Free via Meta AI
MetaLlama 4 Maverick
$2.22$26.64Free via Meta AI
DeepSeekDeepSeek R1
$2.79$33.53API only

API costs are estimates based on the token counts above and listed per-million-token prices from each provider. Subscription plans include usage caps and may not cover all models — check provider pages for current limits. Prices update from our database when providers change their rates.

Pricing filters

Compare 119 models by cost profile, provider, and context.

MistralBudget

Mistral: Mistral Nemo

Mistral Nemo is open-weight (Apache 2.0 license), so self-hosting is an option for teams that want to eliminate API costs entirely. Pricing via API is through Mistral's La Plateforme. The model uses a Tekken tokenizer which is more efficient than older Mistral tokenizers, especially for non-English text.

Input cost
$0.02/1M
Output cost
$0.03/1M
Context
131k tokens
Notes
Teams needing a cheap, fast, multilingual workhorse for classification, summarization, or light coding tasks at scale.
View model
MetaBudget

Meta: Llama 3.1 8B Instruct

Being open-weight, this model can be run locally or self-hosted via providers like Together AI, Fireworks, or Groq, often at even lower costs. The 16K context window is a meaningful limitation compared to other models in this price tier.

Input cost
$0.02/1M
Output cost
$0.05/1M
Context
16k tokens
Notes
High-throughput applications where cost and speed matter more than frontier-level quality, such as chatbots, content classification, and text summarization.
View model
MetaBudget

Meta: Llama 3 8B Instruct

As an open-weight model, Llama 3 8B can be self-hosted via platforms like Ollama, Replicate, or Together AI. The 8,192 token context window is a significant practical limitation. Pricing listed reflects hosted API inference; self-hosted costs vary.

Input cost
$0.03/1M
Output cost
$0.04/1M
Context
8k tokens
Notes
High-volume, cost-sensitive applications where speed and price matter more than peak accuracy.
View model
GoogleBudget

Google: Gemma 2 9B

Pricing reflects API access through third-party providers; Google also offers Gemma 2 9B weights for free download and self-hosting. The 8,192 token limit is a hard architectural constraint of this version.

Input cost
$0.03/1M
Output cost
$0.09/1M
Context
8k tokens
Notes
Lightweight text tasks, classification, and summarization where cost matters more than frontier-level quality.
View model
MistralBudget

Mistral: Mistral Small 3

Pricing is exceptionally competitive at $0.05/$0.08 per 1M tokens. Available via Mistral's La Plateforme API and various third-party providers. GDPR-friendly EU-based hosting is a notable advantage for European enterprise customers. No image input or output support.

Input cost
$0.05/1M
Output cost
$0.08/1M
Context
33k tokens
Notes
High-volume, cost-sensitive applications like customer support automation, content drafting, and lightweight code assistance.
View model
MistralBudget

Mistral: Ministral 3 3B 2512

Priced at a flat $0.10/1M for both input and output, making cost estimation predictable. The '2512' suffix indicates a December 2025 release version. Best suited for batch processing, classification, or extraction pipelines where volume is high and task complexity is low.

Input cost
$0.10/1M
Output cost
$0.10/1M
Context
131k tokens
Notes
High-volume, low-latency tasks where cost and speed matter more than frontier-level reasoning.
View model
MetaBudget

Meta: Llama 3.2 1B Instruct

Output cost of ~$0.20/1M tokens is notably higher relative to input cost — factor this in for verbose generation tasks. Best suited for inference pipelines where outputs are short and structured. Available via multiple inference providers due to open-weight licensing.

Input cost
$0.03/1M
Output cost
$0.20/1M
Context
60k tokens
Notes
Ultra-low-cost text classification, simple Q&A, and high-volume automation pipelines where cost per token is critical.
View model
MistralBudget

Mistral: Mistral Small 3.2 24B

Mistral Small 3.2 is available as an open-weight model, making it deployable on-premises or via self-hosted infrastructure — a key differentiator over GPT-4o Mini and Claude Haiku for privacy-sensitive use cases.

Input cost
$0.07/1M
Output cost
$0.20/1M
Context
128k tokens
Notes
High-volume production workloads where cost matters but quality can't be sacrificed entirely — especially code generation and structured output tasks.
View model
MistralBudget

Mistral: Ministral 3 8B 2512

The '8B 2512' in the model name likely refers to a specific versioned release; despite the naming, this is based on Mistral's 3B architecture. Confirm parameter count and capabilities with Mistral's official documentation before production use.

Input cost
$0.15/1M
Output cost
$0.15/1M
Context
262k tokens
Notes
High-volume, latency-sensitive applications where cost per token matters more than top-tier quality.
View model
MistralBudget

Mistral: Mistral 7B Instruct v0.1

This is v0.1, the original release — not to be confused with v0.2 or v0.3 which substantially improve context length and quality. The listed context window of ~2,824 tokens is unusually small even among budget models. Marked as superseding Mistral Large 2 in the spec, which appears to be a data error — this model does not supersede Mistral Large 2 in capability or positioning.

Input cost
$0.11/1M
Output cost
$0.19/1M
Context
3k tokens
Notes
Ultra-low-cost simple text tasks like classification, short summarization, or lightweight chatbot responses where context length is not a concern.
View model
MetaBudget

Meta: Llama Guard 4 12B

Llama Guard 4 supports the MLCommons hazard taxonomy and is designed to be used as a shield model in multi-model architectures. Not suitable as a standalone AI assistant. Available via Meta's open model ecosystem and third-party API providers.

Input cost
$0.18/1M
Output cost
$0.18/1M
Context
164k tokens
Notes
Automated content safety screening and policy enforcement in LLM-powered applications
View model
GoogleBudget

Google: Gemini 2.0 Flash Lite

Pricing is among the lowest available in any major provider's lineup as of mid-2025. Context window of 1M tokens is a significant differentiator at this price tier. Check Google AI Studio and Vertex AI for rate limits on high-volume usage.

Input cost
$0.07/1M
Output cost
$0.30/1M
Context
1.0M tokens
Notes
High-throughput, cost-sensitive pipelines where speed and price matter more than top-tier reasoning quality.
View model
OpenAIBudget

OpenAI: gpt-oss-safeguard-20b

This is an open-weights safety/moderation-specific model, not a general assistant. Pricing reflects its budget-tier positioning. Availability may be limited or subject to change as it appears to be a research/infrastructure model rather than a consumer product. Verify OpenAI's terms around usage and redistribution for the OSS weights.

Input cost
$0.07/1M
Output cost
$0.30/1M
Context
131k tokens
Notes
Automated content moderation pipelines and safety classification at scale.
View model
MetaBudget

Llama 4 Scout

Worth considering for internal search, analysis, and review workflows where data sovereignty matters.

Input cost
$0.08/1M
Output cost
$0.30/1M
Context
512k tokens
Notes
Affordable self-hosted long-context workflows and analysis pipelines
View model
MistralBudget

Mistral: Devstral Small 1.1

Available via Mistral API and can be self-hosted via open weights. Pricing is among the lowest available for a code-specialized model. Designed to work within coding agent frameworks like SWE-agent and OpenHands.

Input cost
$0.10/1M
Output cost
$0.30/1M
Context
131k tokens
Notes
Developers who need a cheap, fast coding assistant for agentic workflows, code review, and multi-file repo tasks without paying flagship prices.
View model
MistralBudget

Mistral: Ministral 3 14B 2512

Model name suggests a December 2025 revision ('2512'). Pricing is symmetric at $0.20/1M for both input and output, which simplifies cost modeling. Confirm availability on your target API platform as Mistral model availability varies by provider.

Input cost
$0.20/1M
Output cost
$0.20/1M
Context
262k tokens
Notes
High-volume, cost-sensitive workflows like document triage, classification, summarization, and lightweight coding assistance where budget is the primary constraint.
View model
MistralBudget

Mistral: Mistral Small Creative

Context window of 32,768 tokens is notably smaller than competing budget models. Pricing is approximate ($0.10 input / $0.30 output per 1M tokens). Availability is through Mistral's API (La Plateforme) and may also be accessible via third-party providers. Confirm fine-tune scope before deploying for non-creative tasks.

Input cost
$0.10/1M
Output cost
$0.30/1M
Context
33k tokens
Notes
Budget-conscious creative writing tasks like short stories, marketing copy, and brainstorming where cost matters more than peak quality.
View model
MistralBudget

Mistral: Voxtral Small 24B 2507

Voxtral Small is audio-in capable but does not support image input. The 32K context window is notably short for a 2025 model. Pricing is via Mistral's API; availability through third-party providers may vary. Check whether your use case requires audio input — the text-only version of Mistral Small 3.1 may be more appropriate for pure text workloads.

Input cost
$0.10/1M
Output cost
$0.30/1M
Context
32k tokens
Notes
Transcribing, analyzing, and responding to audio input cost-effectively without needing a separate speech-to-text pipeline.
View model
OpenAIBudget

OpenAI: GPT-5 Nano

Output cost of ~$0.40/1M tokens means output-heavy workloads (long generations) will accumulate cost faster than input-heavy ones. Best suited for tasks where outputs are short-to-medium length. No image generation capability.

Input cost
$0.05/1M
Output cost
$0.40/1M
Context
400k tokens
Notes
High-volume, latency-sensitive applications like classification, autocomplete, summarization, and lightweight chat where cost-per-token matters most.
View model
MetaBudget

Meta: Llama 3.2 11B Vision Instruct

Available via multiple inference providers including Together AI, Fireworks, and OpenRouter. As an open-weight model, it can also be self-hosted for even lower marginal costs at scale. Part of Meta's Llama 3.2 family which also includes a 90B vision variant for heavier workloads.

Input cost
$0.24/1M
Output cost
$0.24/1M
Context
131k tokens
Notes
Budget-conscious developers who need basic vision capabilities without paying premium multimodal prices.
View model
GoogleBudget

Google: Gemini 2.0 Flash

Pricing listed is for standard (non-cached) input/output. Context caching is available and can reduce costs significantly for repeated long-context calls. Image and audio inputs are priced separately. Free tier available via Google AI Studio.

Input cost
$0.10/1M
Output cost
$0.40/1M
Context
1.0M tokens
Notes
High-throughput pipelines and agentic tasks where speed and cost matter more than peak reasoning quality.
View model
GoogleBudget

Google: Gemini 2.5 Flash Lite

Pricing is approximate based on listed rates. As a 'Lite' model, it may not support all multimodal features available in full Flash or Pro variants. Check Google AI Studio for feature availability and rate limits.

Input cost
$0.10/1M
Output cost
$0.40/1M
Context
1.0M tokens
Notes
High-volume, latency-sensitive applications like document triage, chatbot pipelines, and content classification at scale.
View model
GoogleBudget

Google: Gemini 2.5 Flash Lite Preview 09-2025

This is a preview model (09-2025 versioned) and may be subject to breaking changes or deprecation. Pricing is approximate based on listed rates. Not recommended for production systems requiring SLA guarantees. Check Google AI Studio or Vertex AI for GA alternatives.

Input cost
$0.10/1M
Output cost
$0.40/1M
Context
1.0M tokens
Notes
High-volume document processing, classification pipelines, and lightweight coding tasks where cost per token matters more than peak quality.
View model
OpenAIBudget

OpenAI: GPT-4.1 Nano

Pricing is $0.10/1M input and $0.40/1M output tokens. Officially supersedes GPT-4o in OpenAI's lineup for lightweight use cases. Context window of ~1.047M tokens is one of the largest available at this price tier.

Input cost
$0.10/1M
Output cost
$0.40/1M
Context
1.0M tokens
Notes
High-volume production workloads like classification, extraction, summarization, and simple Q&A where cost and speed matter more than frontier reasoning.
View model
MetaBudget

Llama Guard 3 8B

This model is designed exclusively for content moderation and safety classification tasks. It follows the MLCommons AI Safety benchmark taxonomy. It should be deployed as a guardrail layer alongside generative models, not as a replacement for them. Not suitable for end-user-facing conversational applications.

Input cost
$0.48/1M
Output cost
$0.03/1M
Context
131k tokens
Notes
Automated content safety screening and moderation for AI application pipelines at minimal cost.
View model
GoogleBudget

Gemma 4 26B A4B

As an open-weight model, Gemma 4 26B can also be self-hosted, making API pricing largely irrelevant at scale. The 'A4B' suffix denotes the active parameter count in its MoE configuration. Listed as superseding Gemini 3 Flash Preview, though Gemini 2.0 Flash remains a stronger hosted alternative.

Input cost
$0.13/1M
Output cost
$0.40/1M
Context
262k tokens
Notes
Cost-sensitive applications needing long-context processing with reasonable quality, such as document summarization pipelines or lightweight coding assistants.
View model
GoogleBudget

Gemma 4 31B

As an open-weight model, Gemma 4 31B can be self-hosted via Ollama or Hugging Face in addition to Google's API. Pricing shown is for hosted inference. No image input capability confirmed at launch.

Input cost
$0.14/1M
Output cost
$0.40/1M
Context
262k tokens
Notes
Cost-conscious developers needing a capable open-weight model for coding assistance, summarization, and document analysis at scale.
View model
OpenAIBudget

GPT-4o Mini

GPT-4o Mini punches well above its price for classification, summarisation, and simple writing. It struggles when tasks get complex.

Input cost
$0.15/1M
Output cost
$0.60/1M
Context
128k tokens
Notes
High-volume everyday tasks where GPT-4o quality is overkill
View model
MetaBudget

Llama 4 Maverick

Strong strategic fit for teams thinking about data sovereignty or custom fine-tuning.

Input cost
$0.15/1M
Output cost
$0.60/1M
Context
256k tokens
Notes
Flexible self-hosted deployments and mixed general workloads
View model
MistralBudget

Mistral: Mistral Small 4

Pricing at $0.15/$0.60 per million tokens makes this one of the most affordable capable models on the market. Available via Mistral's La Plateforme API and compatible with OpenAI-style endpoints. No image input support confirmed at launch.

Input cost
$0.15/1M
Output cost
$0.60/1M
Context
262k tokens
Notes
Teams needing reliable, fast text generation and coding assistance at near-commodity pricing without sacrificing too much quality.
View model
MetaBudget

Meta: Llama 3.1 70B Instruct

Pricing shown is via third-party API providers (e.g., OpenRouter, Together AI) — costs may vary. Meta releases Llama 3.1 weights publicly, enabling self-hosting at even lower cost. Not available directly from Meta as a hosted API.

Input cost
$0.40/1M
Output cost
$0.40/1M
Context
131k tokens
Notes
Teams needing capable open-weight LLM performance at budget pricing for coding assistance, summarization, or RAG pipelines.
View model
MistralBudget

Mistral: Saba

Pricing reflects Mistral API rates and may vary by reseller. The model's name 'Saba' references Arabic linguistic heritage, signaling its intended multilingual focus. No vision or tool-use capabilities documented at launch.

Input cost
$0.20/1M
Output cost
$0.60/1M
Context
33k tokens
Notes
Low-cost multilingual applications requiring Arabic, Hindi, or Urdu language support
View model
xAIBudget

xAI: Grok 3 Mini

Pricing is highly competitive at $0.30 input / $0.50 output per million tokens. Context window is 131K tokens. No vision/image input support. xAI's API platform is newer and may have availability or rate-limit considerations compared to established providers.

Input cost
$0.30/1M
Output cost
$0.50/1M
Context
131k tokens
Notes
Developers and researchers who need solid reasoning and logic tasks at near-throwaway pricing without committing to a full flagship model.
View model
xAIBudget

xAI: Grok 3 Mini Beta

Model is in Beta — API behavior, rate limits, and availability may change without notice. No multimodal support confirmed. Reasoning mode may increase effective latency on complex prompts despite fast base speed.

Input cost
$0.30/1M
Output cost
$0.50/1M
Context
131k tokens
Notes
Budget-conscious users who need light reasoning and logical tasks without paying flagship prices.
View model
MistralBudget

Mistral Small 3.1

At $0.10/1M input, the cost question disappears. The only question is whether the task complexity exceeds what Mistral Small can handle.

Input cost
$0.35/1M
Output cost
$0.56/1M
Context
128k tokens
Notes
Ultra-high-volume classification, summarisation, and lightweight vision tasks
View model
MistralBalanced

Mistral: Mixtral 8x7B Instruct

Pricing is symmetric at $0.54/1M for both input and output. As an open-weight model, costs can drop significantly if self-hosted. The 32K context window is a hard ceiling — plan accordingly for document-heavy workflows.

Input cost
$0.54/1M
Output cost
$0.54/1M
Context
33k tokens
Notes
Developers and teams needing a capable open-weight model for coding, multilingual tasks, and general instruction-following without flagship model pricing.
View model
MistralBudget

Mistral: Codestral 2508

Available via Mistral's La Plateforme API. Also accessible through Continue.dev, Cursor, and other IDE integrations that support the Codestral endpoint. FIM (fill-in-the-middle) mode is specifically supported for autocomplete use cases. Output price rounds to ~$0.90/1M tokens.

Input cost
$0.30/1M
Output cost
$0.90/1M
Context
256k tokens
Notes
High-volume code generation, completion, and refactoring tasks where cost efficiency and long-context handling matter most.
View model
MetaBalanced

Meta: Llama 3 70B Instruct

This is the original Llama 3 70B, not the 3.1 or 3.3 variants. Llama 3.1 70B offers a 128K context window at comparable pricing and is strongly preferred. Consider this model only if you have a specific reason to pin to the original Llama 3 checkpoint.

Input cost
$0.51/1M
Output cost
$0.74/1M
Context
8k tokens
Notes
Developers and researchers who need a capable open-weight model for coding, analysis, and instruction-following tasks at a mid-range price point.
View model
GoogleBalanced

Google: Gemma 2 27B

Symmetric input/output pricing at $0.65/1M tokens is straightforward but positions it oddly — it's pricier than GPT-4o Mini while lacking its multimodal features. Available via multiple inference providers including Google Vertex AI and third-party APIs.

Input cost
$0.65/1M
Output cost
$0.65/1M
Context
8k tokens
Notes
Teams that need strong open-weight model performance for coding and reasoning tasks without paying flagship prices.
View model
DeepSeekBudget

DeepSeek V3

DeepSeek V3 shocked the market on release. At this price point with this capability level, it forces a reconsideration of when premium models are actually worth it.

Input cost
$0.27/1M
Output cost
$1.10/1M
Context
128k tokens
Notes
Coding, reasoning, and general tasks at extreme cost efficiency
View model
AnthropicBudget

Anthropic: Claude 3 Haiku

Claude 3 Haiku is part of the original Claude 3 family (March 2024). Anthropic has since released Claude 3.5 Haiku, which is generally recommended over this model for new use cases. Still widely available via Anthropic API and AWS Bedrock.

Input cost
$0.25/1M
Output cost
$1.25/1M
Context
200k tokens
Notes
High-volume production pipelines, customer support bots, and real-time text processing where cost and latency are critical constraints.
View model
xAIBudget

xAI: Grok Code Fast 1

Pricing is asymmetric: input at ~$0.20/1M is excellent, but $1.50/1M output undercuts its budget appeal for generation-heavy use. Availability through xAI's API; check for rate limits and regional availability as xAI's infrastructure is still scaling.

Input cost
$0.20/1M
Output cost
$1.50/1M
Context
256k tokens
Notes
High-volume, low-latency coding tasks where cost per token matters more than peak quality.
View model
GoogleBudget

Gemini 3.1 Flash

The default budget pick for startups watching cost. The 1M context at this price is unmatched.

Input cost
$0.25/1M
Output cost
$1.50/1M
Context
1M tokens
Notes
High-volume everyday AI usage where speed and cost both matter
View model
OpenAIBudget

OpenAI: GPT-4.1 Mini

Pricing shown is $0.40 input / $1.60 output per 1M tokens. Cached input tokens are significantly cheaper. The 1M token context window is a standout feature at this price tier — few competitors match it. Supersedes GPT-4o as the recommended default for cost-conscious applications.

Input cost
$0.40/1M
Output cost
$1.60/1M
Context
1.0M tokens
Notes
High-volume production workloads that need reliable GPT-4-class instruction following without flagship pricing.
View model
MistralBudget

Mistral: Mistral Large 3 2512

Pricing of $0.50 input / $1.50 output per 1M tokens places it firmly in the budget-flagship category. Available via Mistral API (La Plateforme) and major cloud providers. December 2025 update ('2512') improves instruction following over the earlier 2407 release.

Input cost
$0.50/1M
Output cost
$1.50/1M
Context
262k tokens
Notes
Multilingual enterprise tasks, code generation, and long-document analysis where cost efficiency matters more than absolute state-of-the-art performance.
View model
OpenAIBudget

OpenAI: GPT-3.5 Turbo

GPT-3.5 Turbo is still available via OpenAI API and supports fine-tuning, which keeps it relevant for teams with existing trained models. However, OpenAI has deprioritized its development in favor of the GPT-4o family. Not multimodal — text only.

Input cost
$0.50/1M
Output cost
$1.50/1M
Context
16k tokens
Notes
High-volume, low-complexity tasks like chatbots, classification, summarization, and simple Q&A where cost matters more than cutting-edge quality.
View model
OpenAIBudget

OpenAI: GPT-5 Mini

Output cost of $2/1M tokens is higher than some competing budget models (Gemini Flash at ~$0.60/1M output). At scale, output-heavy tasks may erode cost advantages — monitor token ratios carefully. Supersedes GPT-4o, which may be deprecated on a rolling basis.

Input cost
$0.25/1M
Output cost
$2.00/1M
Context
400k tokens
Notes
High-volume production workloads — chatbots, summarization pipelines, and document Q&A — where cost efficiency matters more than peak reasoning.
View model
OpenAIBudget

OpenAI: GPT-5.1-Codex-Mini

At $2/1M output tokens, costs can accumulate in verbose code-generation tasks — monitor output token usage carefully in agentic loops. Not a general-purpose flagship replacement; best deployed alongside a stronger model for planning/reasoning layers.

Input cost
$0.25/1M
Output cost
$2.00/1M
Context
400k tokens
Notes
High-volume code generation, autocomplete pipelines, and developer tooling where cost efficiency matters more than peak reasoning depth.
View model
MistralBudget

Mistral: Devstral 2 2512

The December 2025 (2512) release date suggests this is a recent iteration. Pricing at $0.40 input / $2.00 output is notably competitive for a code-specialist model with 256K context. Verify availability and rate limits via Mistral API or partner providers.

Input cost
$0.40/1M
Output cost
$2.00/1M
Context
262k tokens
Notes
Budget-conscious developers who need a capable coding model for agentic workflows, code generation, and repository-scale context at a fraction of flagship pricing.
View model
MistralBudget

Mistral: Devstral Medium

Pricing is notably aggressive at ~$0.40 input / $2.00 output per 1M tokens. Available via Mistral's La Plateforme API. Part of the Devstral family, which is distinct from Mistral's general-purpose Mistral Medium line.

Input cost
$0.40/1M
Output cost
$2.00/1M
Context
131k tokens
Notes
Developers seeking capable code generation, debugging, and code review at a fraction of the cost of GPT-4-class models.
View model
MistralBudget

Mistral: Mistral Medium 3

Priced at $0.40 input / $2.00 output per 1M tokens. Officially supersedes Mistral Large 2, making it an easy drop-in upgrade for existing Mistral users. Available via Mistral's API and La Plateforme.

Input cost
$0.40/1M
Output cost
$2.00/1M
Context
131k tokens
Notes
Cost-conscious teams running high-volume coding, summarization, or multilingual tasks at enterprise scale.
View model
MistralBudget

Mistral: Mistral Medium 3.1

Officially supersedes Mistral Large 2, representing a generational shift in Mistral's lineup toward multimodal capability at lower cost tiers. Available via Mistral API and select cloud providers. No function calling limitations noted at this tier.

Input cost
$0.40/1M
Output cost
$2.00/1M
Context
131k tokens
Notes
Cost-sensitive teams needing solid coding, instruction-following, and basic vision tasks without paying flagship prices.
View model
DeepSeekBudget

DeepSeek R1

R1 is a genuine milestone for open-source AI. The reasoning quality is real — the tradeoff is latency, not capability.

Input cost
$0.55/1M
Output cost
$2.19/1M
Context
128k tokens
Notes
Math, science, complex reasoning, and multi-step problem solving at budget cost
View model
GoogleBudget

Google: Gemini 2.5 Flash

Output cost ($2.5/1M) is disproportionately higher than input cost ($0.3/1M), so generation-heavy use cases may see costs add up faster than expected. Thinking/reasoning mode may be available but incurs additional cost.

Input cost
$0.30/1M
Output cost
$2.50/1M
Context
1.0M tokens
Notes
High-volume document processing, summarization, and coding assistance where cost and speed matter more than peak accuracy.
View model
GoogleBudget

Google: Nano Banana (Gemini 2.5 Flash Image)

The 32,768 token context window is unusually small even for a budget model — verify this limit hasn't changed before deploying in production. The 'Nano Banana' name appears to be an internal or experimental identifier; confirm model availability and stability via Google AI Studio or Vertex AI before relying on it in critical workflows.

Input cost
$0.30/1M
Output cost
$2.50/1M
Context
33k tokens
Notes
Budget-conscious teams needing fast image analysis and visual question answering without flagship pricing.
View model
OpenAIBalanced

OpenAI: GPT Audio Mini

Audio tokens are priced differently from text tokens in OpenAI's API — audio input/output carries a significant premium over text tokens, so real-world costs for voice-heavy workloads will be substantially higher than the listed text token price suggests. Check OpenAI's audio token pricing separately.

Input cost
$0.60/1M
Output cost
$2.40/1M
Context
128k tokens
Notes
Building voice assistants, audio bots, and speech-enabled applications that need real-time audio processing at scale without breaking the budget.
View model
OpenAIBalanced

OpenAI: GPT-3.5 Turbo (older v0613)

This is a pinned legacy snapshot (v0613) and may eventually be deprecated by OpenAI. The 4,095-token context window is its most significant practical limitation. OpenAI's own GPT-4o mini offers drastically more context and better quality at a comparable price — strongly consider migrating.

Input cost
$1.00/1M
Output cost
$2.00/1M
Context
4k tokens
Notes
High-volume, cost-sensitive text tasks like classification, summarization, and simple Q&A where bleeding-edge quality is not required.
View model
GoogleBudget

Google: Gemini 3 Flash Preview

This is a preview model and may have limited availability, unstable rate limits, and pricing that changes before general availability. Output cost at $3/1M is notably higher than input cost, so applications generating long outputs should budget accordingly.

Input cost
$0.50/1M
Output cost
$3.00/1M
Context
1.0M tokens
Notes
High-volume document processing, summarization pipelines, and long-context tasks where cost efficiency matters more than frontier-level reasoning.
View model
OpenAIBalanced

OpenAI: GPT-3.5 Turbo Instruct

Uses the legacy /v1/completions endpoint, not /v1/chat/completions. The 4,095-token context window is a hard constraint that makes it unsuitable for most modern tasks. OpenAI has not deprecated it, but it receives no capability updates.

Input cost
$1.50/1M
Output cost
$2.00/1M
Context
4k tokens
Notes
Legacy completion API workflows, structured text generation, and simple instruction-following tasks where the chat format is not required.
View model
MistralBudget

Codestral 25.01

Ideal for teams running thousands of daily coding prompts where premium model costs add up quickly.

Input cost
$0.90/1M
Output cost
$2.70/1M
Context
256k tokens
Notes
Affordable high-volume coding support
View model
OpenAIBalanced

OpenAI: GPT-5 Image Mini

Output cost of $2/1M tokens is unusual — lower than input cost, which favors use cases with long inputs but short outputs like image captioning or document summarization. Verify image generation token pricing separately, as image outputs are often billed differently by OpenAI.

Input cost
$2.50/1M
Output cost
$2.00/1M
Context
400k tokens
Notes
Teams needing strong image analysis and generation integrated with text workflows at a reasonable cost.
View model
AnthropicBalanced

Anthropic: Claude 3.5 Haiku

Output cost of $4/1M is notably higher than competing fast/mini models. Input cost at ~$0.80/1M is competitive. Best value emerges in input-heavy pipelines like document classification or RAG retrieval where output tokens are minimal.

Input cost
$0.80/1M
Output cost
$4.00/1M
Context
200k tokens
Notes
High-volume, latency-sensitive applications like chatbots, classification, data extraction, and agentic tool use where speed and cost matter more than peak reasoning depth.
View model
AnthropicBudget

Claude 4 Haiku

Great for drafts, rewrites, and quick-turn internal workflows where Anthropic's tone quality matters.

Input cost
$0.80/1M
Output cost
$4.00/1M
Context
200k tokens
Notes
Fast budget writing, support automation, and cost-sensitive Anthropic integrations
View model
OpenAIBalanced

OpenAI: o3 Mini

Supports three reasoning effort settings via the API (low, medium, high), which significantly affect latency and token usage. No vision/image input support. Available via OpenAI API and ChatGPT Plus.

Input cost
$1.10/1M
Output cost
$4.40/1M
Context
200k tokens
Notes
Cost-effective deep reasoning on math, code, and structured logic problems where o3's full price isn't justified.
View model
OpenAIBalanced

OpenAI: o3 Mini High

The 'High' suffix refers to the reasoning_effort parameter set to 'high', which increases token usage and latency significantly versus o3 Mini at medium or low effort. Priced at $1.1/$4.4 per million tokens, it is far cheaper than o1 ($15/$60) and full o3, making it attractive for batch workloads.

Input cost
$1.10/1M
Output cost
$4.40/1M
Context
200k tokens
Notes
Solving hard math, competitive programming, and multi-step logical reasoning problems where accuracy matters more than speed.
View model
OpenAIBalanced

OpenAI: o4 Mini

Priced at $1.1/$4.4 per 1M tokens (input/output), o4 Mini is significantly cheaper than o3 ($10/$40) and o4. Output tokens are 4x the input price, so verbose reasoning traces can add up — use max_completion_tokens limits in production pipelines.

Input cost
$1.10/1M
Output cost
$4.40/1M
Context
200k tokens
Notes
Developers and analysts who need serious reasoning power for STEM tasks without paying full o4 or o3 prices.
View model
OpenAIBalanced

OpenAI: o4 Mini High

The 'High' suffix denotes maximum reasoning effort, distinct from o4 Mini (balanced) and o4 Mini Low. Higher effort means higher token consumption in internal reasoning traces, which can push effective cost above the stated $1.1/$4.4 per million for very complex queries. No image generation capability.

Input cost
$1.10/1M
Output cost
$4.40/1M
Context
200k tokens
Notes
Developers and researchers who need strong reasoning accuracy on hard STEM, math, or logic problems without paying full o3 pricing.
View model
AnthropicBalanced

Anthropic: Claude Haiku 4.5

Priced at $1/1M input and $5/1M output tokens, placing it above true budget models like Gemini Flash but below mid-tier flagships. Confirm availability of extended thinking or tool-use features via Anthropic's API documentation, as Haiku-tier models sometimes receive these capabilities later than Sonnet/Opus.

Input cost
$1.00/1M
Output cost
$5.00/1M
Context
200k tokens
Notes
High-volume production pipelines and real-time applications that need Claude-quality output without flagship-model costs.
View model
OpenAIBalanced

GPT-5.2 Mini

Best when you specifically need an OpenAI model in your stack.

Input cost
$1.20/1M
Output cost
$4.80/1M
Context
128k tokens
Notes
Budget technical workflows and high-volume product integrations
View model
OpenAIBalanced

OpenAI: GPT-3.5 Turbo 16k

OpenAI has been gradually deprecating older GPT-3.5 variants. Availability may be limited or sunset in the future. At $3/$4 per million tokens, this is dramatically overpriced relative to its capability in 2024-2025.

Input cost
$3.00/1M
Output cost
$4.00/1M
Context
16k tokens
Notes
Legacy integrations or applications that need slightly longer documents processed without upgrading to a modern model.
View model
xAIBalanced

Grok 4

Best when you want near-flagship coding quality with a massive context window at a mid-tier price.

Input cost
$2.00/1M
Output cost
$6.00/1M
Context
2M tokens
Notes
Coding and research at competitive pricing with maximum context
View model
MistralBalanced

Mistral Large 2

The EU hosting angle is the real differentiator here — for teams outside Europe, other models perform better.

Input cost
$2.00/1M
Output cost
$6.00/1M
Context
128k tokens
Notes
Balanced team usage with EU data residency requirements
View model
MistralBalanced

Mistral: Mixtral 8x22B Instruct

Available via Mistral API and as open weights (Apache 2.0 license) for self-hosting. The open-weight option is a key differentiator for privacy-sensitive or on-premise deployments. API pricing at $2/$6 per million tokens is mid-range but faces pressure from newer, cheaper alternatives.

Input cost
$2.00/1M
Output cost
$6.00/1M
Context
66k tokens
Notes
Teams needing strong multilingual capabilities and solid coding performance at a mid-tier price point without relying on OpenAI or Anthropic infrastructure.
View model
MistralBalanced

Mistral: Pixtral Large 2411

Available via Mistral API (la Plateforme) and supports self-hosted deployment. The '2411' suffix indicates a November 2024 release. Supersedes Mistral Large 2 as the primary flagship. Image input pricing follows the same $2/1M token rate.

Input cost
$2.00/1M
Output cost
$6.00/1M
Context
131k tokens
Notes
Teams needing a capable European-hosted multimodal model for document analysis, visual QA, and code generation with image context.
View model
OpenAIBalanced

OpenAI: GPT-4.1

Priced at $2/1M input and $8/1M output tokens — cheaper than GPT-4o at launch. The 1M context window is real but performance near the ceiling is less tested than Gemini's equivalent. No built-in image generation or voice modality.

Input cost
$2.00/1M
Output cost
$8.00/1M
Context
1.0M tokens
Notes
Developers and researchers needing accurate instruction-following and long-document analysis at a cost-efficient rate.
View model
OpenAIBalanced

OpenAI: o3

Pricing at $2/$8 per 1M input/output tokens is moderate for a reasoning model, but long internal reasoning traces can significantly inflate output token counts. Not available via all API tiers — check OpenAI access levels.

Input cost
$2.00/1M
Output cost
$8.00/1M
Context
200k tokens
Notes
Tackling hard technical problems — from competition-level math to multi-step code debugging — where accuracy matters more than speed.
View model
OpenAIBalanced

OpenAI: o4 Mini Deep Research

Deep Research mode requires agentic tool access (web browsing); pricing reflects token usage but research tasks can consume significant tokens across multi-step retrieval loops. Availability may depend on API tier or organizational access level. Not a drop-in replacement for the standard o4 Mini in general-purpose workflows.

Input cost
$2.00/1M
Output cost
$8.00/1M
Context
200k tokens
Notes
Automated research pipelines that require web browsing, source synthesis, and structured report generation at scale without flagship-model costs.
View model
GoogleBalanced

Google: Gemini 2.5 Pro

Pricing shown is for prompts under 200K tokens; inputs over 200K tokens are billed at $2.50/1M input and $15/1M output. Gemini 2.5 Pro includes built-in 'thinking' (reasoning) mode which can increase latency and cost further.

Input cost
$1.25/1M
Output cost
$10.00/1M
Context
1.0M tokens
Notes
Deep reasoning over very long documents, complex codebases, or multimodal inputs where context size is a constraint with other models.
View model
GoogleBalanced

Google: Gemini 2.5 Pro Preview 05-06

This is a preview model (05-06 date suffix indicates a versioned snapshot); Google may deprecate or change it without long notice. Confirm production readiness before building critical pipelines on this endpoint. The 1M context window applies to text and multimodal inputs combined.

Input cost
$1.25/1M
Output cost
$10.00/1M
Context
1.0M tokens
Notes
Complex multi-document analysis, long-context reasoning, and advanced coding tasks where a massive context window is essential.
View model
GoogleBalanced

Google: Gemini 2.5 Pro Preview 06-05

This is a preview model (06-05 date suffix indicates a versioned snapshot); Google may deprecate or modify it before a stable GA release. Pricing tiers differ based on prompt length — prompts over 200K tokens are charged at $2.50/1M input and $15/1M output, significantly increasing cost for very long-context use cases.

Input cost
$1.25/1M
Output cost
$10.00/1M
Context
1.0M tokens
Notes
Complex multi-step reasoning, large codebase analysis, and tasks requiring deep synthesis across very long documents.
View model
OpenAIBalanced

OpenAI: GPT-5

Pricing is asymmetric: cheap on input ($1.25/1M) but expensive on output ($10/1M), so it favors read-heavy or summarization tasks over verbose generation. The 400K context window is one of the largest available at this price tier. Supersedes GPT-4o, which remains available at lower cost for lighter workloads.

Input cost
$1.25/1M
Output cost
$10.00/1M
Context
400k tokens
Notes
High-stakes professional tasks requiring deep reasoning, precise instruction-following, and reliable multimodal understanding.
View model
OpenAIBalanced

OpenAI: GPT-5 Chat

Pricing is asymmetric — input is relatively affordable at $1.25/1M but output at $10/1M can accumulate quickly in agentic or verbose-output workflows. Cached input pricing may apply through the OpenAI API. Not to be confused with GPT-5 reasoning variants (o-series) which use chain-of-thought and have separate pricing.

Input cost
$1.25/1M
Output cost
$10.00/1M
Context
128k tokens
Notes
Complex professional tasks requiring nuanced reasoning, strong writing quality, and reliable instruction-following across long conversations.
View model
OpenAIBalanced

OpenAI: GPT-5 Codex

The $10/1M output cost means heavy code generation workloads can get expensive fast — budget carefully for bulk generation use cases. Context window of 400K is among the largest in its price tier. Supersedes GPT-4o, so existing GPT-4o coding workflows should consider migrating for improved performance.

Input cost
$1.25/1M
Output cost
$10.00/1M
Context
400k tokens
Notes
Professional developers who need to reason across large codebases, generate production-ready code, and debug complex multi-file projects.
View model
OpenAIBalanced

OpenAI: GPT-5.1

Pricing structure heavily favors input-heavy use cases like RAG and retrieval. The $10/1M output cost makes it expensive for long-form generation at scale. Context window of 400K is competitive but not best-in-class against Gemini 3.1 Pro's 1M+ window.

Input cost
$1.25/1M
Output cost
$10.00/1M
Context
400k tokens
Notes
Teams needing reliable, high-quality outputs across coding, writing, and analysis without paying premium GPT-5 prices.
View model
OpenAIBalanced

OpenAI: GPT-5.1 Chat

Output cost of $10/1M tokens is asymmetric compared to the $1.25 input price — high-volume generation tasks will become expensive quickly. No vision or image generation confirmed based on available specs. Supersedes GPT-4o in the OpenAI lineup but does not replace o-series reasoning models.

Input cost
$1.25/1M
Output cost
$10.00/1M
Context
128k tokens
Notes
Teams and developers who need GPT-4o-level quality with incremental improvements in accuracy and instruction adherence without paying flagship model prices.
View model
OpenAIBalanced

OpenAI: GPT-5.1-Codex

Asymmetric pricing ($1.25 input / $10 output) rewards read-heavy workflows like code review and repo analysis over generation-heavy tasks. The 400K context window is among the largest in the balanced price tier. No image input/output support confirmed at launch.

Input cost
$1.25/1M
Output cost
$10.00/1M
Context
400k tokens
Notes
Professional software engineers who need a high-capacity model for large codebase analysis, complex refactoring, and multi-file code generation.
View model
OpenAIBalanced

OpenAI: GPT-5.1-Codex-Max

Output cost of $10/1M tokens is the key budget consideration — input is competitively priced but output costs mirror GPT-4 Turbo-tier pricing. Best paired with a cheaper model for lightweight or repetitive coding subtasks. Context window of 400K is well-suited to monorepo analysis but verify token limits on your deployment tier.

Input cost
$1.25/1M
Output cost
$10.00/1M
Context
400k tokens
Notes
Professional developers and engineering teams working with complex, multi-file codebases who need accurate code generation, debugging, and architectural reasoning.
View model
OpenAIBalanced

GPT-4o

Strong when your work lives between visuals, messaging, and product context.

Input cost
$2.50/1M
Output cost
$10.00/1M
Context
128k tokens
Notes
Multimodal tasks and image-adjacent workflows
View model
OpenAIBalanced

OpenAI: GPT Audio

Audio tokens are counted differently from text tokens — a few seconds of audio can consume hundreds of tokens, so monitor usage carefully. Real-time audio streaming requires WebSocket or Realtime API endpoints, not the standard Chat Completions API. Availability may be limited by tier or region.

Input cost
$2.50/1M
Output cost
$10.00/1M
Context
128k tokens
Notes
Building voice assistants, real-time spoken dialogue systems, and applications that need to process or generate natural speech end-to-end.
View model
GooglePremium

Gemini 3.1 Pro

The 2M context window is a genuine competitive advantage — no other frontier model gets close for document-heavy workflows.

Input cost
$2.00/1M
Output cost
$12.00/1M
Context
2M tokens
Notes
Research, deep document analysis, and long-context reasoning at competitive pricing
View model
GoogleBalanced

Google: Nano Banana Pro (Gemini 3 Pro Image Preview)

This is a preview model — API behavior, pricing, and availability may change before general release. The 65K context window is unusually constrained for a Gemini Pro-tier model; double-check if your use case requires longer contexts before committing.

Input cost
$2.00/1M
Output cost
$12.00/1M
Context
66k tokens
Notes
Teams needing robust image analysis, visual question answering, and multimodal workflows at a mid-range price point.
View model
OpenAIBalanced

OpenAI: GPT-5.3 Chat

Output cost of $14/1M tokens is the primary budget consideration — workloads with high output-to-input ratios will accumulate costs quickly. No image generation capability. Supersedes GPT-5.2, which should be deprecated or deprioritized.

Input cost
$1.75/1M
Output cost
$14.00/1M
Context
128k tokens
Notes
Professionals and developers who need reliable, high-quality text generation and reasoning at a cost that scales reasonably with usage.
View model
OpenAIBalanced

OpenAI: GPT-5.3-Codex

Priced asymmetrically with low input cost ($1.75/1M) and high output cost ($14/1M), which rewards concise prompting but penalizes verbose code generation. The 400K context window is one of the largest available at this price tier. Supersedes GPT-5.2 with improved multi-file coherence; users on GPT-5.2 should migrate. No multimodal input support confirmed at launch.

Input cost
$1.75/1M
Output cost
$14.00/1M
Context
400k tokens
Notes
Professional developers tackling large-scale coding tasks, refactoring legacy codebases, or working across multi-file projects where deep context retention is critical.
View model
AnthropicBalanced

Anthropic: Claude 3.7 Sonnet (thinking)

Thinking tokens (the internal reasoning trace) count toward output token billing, which can significantly increase costs on complex queries. The thinking budget can often be configured via the API. Best used selectively for tasks that genuinely benefit from deliberation rather than as a default model.

Input cost
$3.00/1M
Output cost
$15.00/1M
Context
200k tokens
Notes
Tackling complex coding challenges, mathematical proofs, and multi-step logical problems where visible reasoning and higher accuracy matter more than speed.
View model
AnthropicBalanced

Anthropic: Claude Sonnet 4

Pricing at $3 input / $15 output positions this as a 'balanced' tier model, but output costs are notably higher than comparable models like GPT-4o ($10 output). Extended context (200K) is available by default. Check Anthropic's API for rate limits and availability by tier.

Input cost
$3.00/1M
Output cost
$15.00/1M
Context
200k tokens
Notes
Complex coding tasks, nuanced writing, and multi-step research where you need near-flagship quality without paying flagship prices.
View model
AnthropicBalanced

Anthropic: Claude Sonnet 4.5

Supersedes Claude 4 Haiku, positioning it as a step-up option rather than a true budget model. The 1M token context window is the headline feature. Output cost of $15/1M tokens is on the higher end for this tier — compare to Gemini 3.1 Pro at roughly $10/1M output before committing to high-volume use.

Input cost
$3.00/1M
Output cost
$15.00/1M
Context
1M tokens
Notes
Production applications that need Claude's nuanced writing and reasoning without the latency or cost of Opus-class models.
View model
AnthropicPremium

Claude Sonnet 4.6

Powers Cursor and Windsurf by default. If your team already uses either, you're already using this model.

Input cost
$3.00/1M
Output cost
$15.00/1M
Context
1M tokens
Notes
Daily coding, writing, and long-document work at a strong price-to-quality ratio
View model
xAIBalanced

xAI: Grok 3

Available via xAI API and integrated into X Premium subscriptions. Real-time X data access is a differentiating feature not available on competing models. Pricing is competitive but output costs are on the higher end for balanced-tier models.

Input cost
$3.00/1M
Output cost
$15.00/1M
Context
131k tokens
Notes
Users who need strong reasoning and coding capabilities with access to real-time X/Twitter data for current events and social context.
View model
xAIBalanced

xAI: Grok 3 Beta

Model is currently in beta, meaning capabilities and pricing may change. Real-time X data integration depends on xAI's API access policies, which may be subject to change. No image generation support confirmed.

Input cost
$3.00/1M
Output cost
$15.00/1M
Context
131k tokens
Notes
Users who want a frontier-capable model with real-time social context from X and strong STEM reasoning at a mid-range price point.
View model
OpenAIPremium

OpenAI: GPT-5 Image

Flat $10/1M input and output pricing is unusual — most flagship models charge more for output tokens. Verify whether image token costs (typically higher per effective token) are included under this pricing or billed separately, as OpenAI historically charges additional fees for image inputs.

Input cost
$10.00/1M
Output cost
$10.00/1M
Context
400k tokens
Notes
Complex workflows combining visual analysis, image generation, and long-document understanding in a single model call.
View model
OpenAIPremium

GPT-5.4

Unique value is the computer-use capability. If you're building agents that operate software, nothing else compares right now.

Input cost
$8.00/1M
Output cost
$15.00/1M
Context
272k tokens
Notes
Agentic workflows, desktop automation, and complex multi-step reasoning
View model
AnthropicBalanced

Anthropic: Claude Opus 4.5

Pricing is $5 input / $25 output per 1M tokens — identical output cost to GPT-5.4 tier models. Note the 'Supersedes Claude 4 Haiku' label appears to be a data anomaly; Opus 4.5 is the top-tier model, not a Haiku replacement. Confirm model availability on the Anthropic API dashboard as Opus-tier models sometimes have access restrictions.

Input cost
$5.00/1M
Output cost
$25.00/1M
Context
200k tokens
Notes
Complex multi-step reasoning, long-document analysis, and high-stakes writing tasks where output quality is non-negotiable.
View model
AnthropicPremium

Claude Opus 4.6

Best reserved for complex multi-file refactors, architecture decisions, and agentic coding pipelines where mistakes are expensive.

Input cost
$5.00/1M
Output cost
$25.00/1M
Context
1M tokens
Notes
Agentic coding, complex multi-step reasoning, and deep research
View model
AnthropicPremium

Claude Opus 4.7

Ranked from public benchmark and pricing data verified April 26, 2026: SWE-Bench Pro 64.3%, 1M context, $5/$25 per 1M tokens.

Input cost
$5.00/1M
Output cost
$25.00/1M
Context
1M tokens
Notes
Highest-ceiling coding, agentic workflows, and deep research
View model
AnthropicPremium

Anthropic: Claude 3.5 Sonnet

Pricing at $6 input / $30 output per million tokens is significantly higher than GPT-4o ($2.50/$10). Best accessed via Anthropic API or Amazon Bedrock. Claude 3.5 Sonnet (October 2024 version) supersedes the June 2024 release with improved performance.

Input cost
$6.00/1M
Output cost
$30.00/1M
Context
200k tokens
Notes
Complex coding tasks, multi-step reasoning, and long-document analysis where GPT-4o-class quality is needed without paying for the absolute top tier.
View model
OpenAIPremium

OpenAI: GPT-4 Turbo

GPT-4 Turbo is available via the OpenAI API. It has largely been succeeded by GPT-4o, which is faster, supports vision natively, and is cheaper. Organizations should evaluate whether migrating to GPT-4o or o3 makes more sense before building new workflows on this model.

Input cost
$10.00/1M
Output cost
$30.00/1M
Context
128k tokens
Notes
Complex multi-step tasks requiring deep reasoning, long document analysis, or sophisticated code generation where cost is secondary to quality.
View model
OpenAIPremium

OpenAI: GPT-4 Turbo (older v1106)

This is a pinned model snapshot (v1106) and will not receive capability updates. OpenAI may deprecate older snapshots over time. Knowledge cutoff is April 2023. Not recommended for new deployments given the superior cost-performance of GPT-4o and GPT-4.1.

Input cost
$10.00/1M
Output cost
$30.00/1M
Context
128k tokens
Notes
Teams requiring a pinned, stable version of GPT-4 Turbo for reproducible outputs in long-document analysis or complex instruction pipelines.
View model
OpenAIPremium

OpenAI: GPT-4 Turbo Preview

This is a 'preview' variant that OpenAI has largely deprecated in favor of gpt-4-turbo and gpt-4o. The endpoint may be retired or redirected by OpenAI without notice. Check the OpenAI model deprecation schedule before building production applications on this model.

Input cost
$10.00/1M
Output cost
$30.00/1M
Context
128k tokens
Notes
Complex multi-step reasoning, long-document analysis, and professional writing tasks requiring strong instruction-following.
View model
OpenAIPremium

OpenAI: o3 Deep Research

Deep Research mode involves agentic tool calls and web browsing, which can multiply effective token costs significantly. Pricing is per token but real-world research sessions often consume large amounts of both. Available via ChatGPT Plus/Pro and API; API access may require higher usage tiers.

Input cost
$10.00/1M
Output cost
$40.00/1M
Context
200k tokens
Notes
Conducting exhaustive, multi-source research that would take a human analyst hours to compile manually.
View model
OpenAIPremium

OpenAI: o1

At $15 input / $60 output per 1M tokens, a single complex back-and-forth session can cost dollars. o1-mini is available at a fraction of the price for lighter reasoning tasks. OpenAI has since released o3 and o3-mini, which largely supersede o1 for most reasoning use cases.

Input cost
$15.00/1M
Output cost
$60.00/1M
Context
200k tokens
Notes
Solving complex reasoning tasks where accuracy matters more than response time, such as competitive programming, advanced mathematics, and rigorous scientific analysis.
View model
AnthropicPremium

Anthropic: Claude Opus 4.1

Output pricing at $75/1M tokens is among the highest in the market — nearly 3x GPT-4.1's output cost. Batch API discounts may be available through Anthropic. Context window is 200K but very long prompts at Opus pricing can become extremely expensive quickly. Note: supersedes field lists Claude 4 Haiku, which is likely a data error — Opus 4.1 more logically succeeds Claude Opus 4.

Input cost
$15.00/1M
Output cost
$75.00/1M
Context
200k tokens
Notes
High-stakes professional work where output quality justifies premium pricing — legal analysis, advanced research synthesis, and complex agentic workflows.
View model
OpenAIPremium

OpenAI: GPT-4

At $30/$60 per million tokens, this is one of the most expensive text-only models available. The 8,191-token context window is a hard ceiling that makes it unsuitable for most document-processing tasks. OpenAI continues to offer it for API backward compatibility but actively recommends migrating to GPT-4o or GPT-4 Turbo. New projects should not default to this model.

Input cost
$30.00/1M
Output cost
$60.00/1M
Context
8k tokens
Notes
Teams or workflows locked into the original GPT-4 API that require reliable, high-quality text reasoning without needing long context or multimodal input.
View model
OpenAIPremium

OpenAI: GPT-4 (older v0314)

This is a frozen March 2023 snapshot of GPT-4, not a current model. OpenAI may deprecate legacy snapshots with limited notice. The 8,191-token context window is a hard constraint. Cost is identical to much more capable current models, making this a poor choice for new projects.

Input cost
$30.00/1M
Output cost
$60.00/1M
Context
8k tokens
Notes
Reproducible research or legacy workflows that require consistent, version-locked GPT-4 outputs.
View model
OpenAIPremium

OpenAI: o3 Pro

o3 Pro is only available via the OpenAI API and ChatGPT Pro subscription tier. Response times can range from tens of seconds to several minutes depending on problem complexity. Output pricing at $80/M tokens is 4x the cost of standard o3.

Input cost
$20.00/1M
Output cost
$80.00/1M
Context
200k tokens
Notes
Elite-level reasoning tasks where accuracy is paramount and cost is not a constraint — graduate-level math, competitive programming, and rigorous scientific analysis.
View model
OpenAIPremium

OpenAI: GPT-5 Pro

Output cost of $120/1M tokens is exceptionally high and will compound quickly in agentic or multi-turn workflows. Budget carefully. Context window of 400K is generous but falls short of Gemini 3.1 Pro's 1M+ offering for ultra-long document tasks.

Input cost
$15.00/1M
Output cost
$120.00/1M
Context
400k tokens
Notes
Demanding professional workflows requiring deep reasoning, nuanced writing, and sophisticated multi-step problem solving where cost is secondary to quality.
View model
AnthropicPremium

Anthropic: Claude Opus 4

At $15 input / $75 output per 1M tokens, Opus 4 is one of the most expensive models available. Anthropic recommends using Claude Sonnet 4 for most production use cases and reserving Opus 4 for tasks explicitly requiring maximum capability.

Input cost
$30.00/1M
Output cost
$150.00/1M
Context
200k tokens
Notes
Demanding professional tasks requiring deep reasoning, nuanced judgment, and high-quality long-form output.
View model
OpenAIPremium

GPT-5.2

Worth considering only if you have existing integrations built around this model.

Input cost
$21.00/1M
Output cost
$168.00/1M
Context
200k tokens
Notes
Serious coding and complex product work
View model
OpenAIPremium

GPT-5.5

Ranked from public benchmark and pricing data verified April 26, 2026: SWE-Bench Pro 58.6%, Terminal-Bench 2.0 82.7%, $5/$30 per 1M tokens, 1M API context.

Input cost
$30.00/1M
Output cost
$180.00/1M
Context
1M tokens
Notes
Agentic coding, computer-use workflows, and complex research tasks
View model
OpenAIPremium

OpenAI: o1-pro

o1-pro is available only via the OpenAI API and ChatGPT Pro subscription ($200/month). It does not support streaming and has longer latency than any other OpenAI model. Not suitable for high-volume workloads.

Input cost
$150.00/1M
Output cost
$600.00/1M
Context
200k tokens
Notes
Solving the hardest math, science, and engineering problems where accuracy is non-negotiable and cost is secondary.
View model

Tools teams often pair with pricing analysis

Reserved for future partners around monitoring, optimization, procurement, and evaluation.

AI code editor

Cursor

The AI-native editor most developers switch to when they want GPT-4 and Claude working inside their actual codebase — not a chat window next to it.

Most popular for coding
Free tier available. Used by 100k+ developers.Try it
AI research

Perplexity

The fastest way to get a sourced, current answer to any question. Pairs well with longer-form AI tools — use it to verify, then use Claude or GPT to synthesize.

Best for research & fact-checking

Next comparisons worth reading

AI model pricing comparisonWhich AI is cheapest?Best cheap AIBrowse all models

Newsletter

Track pricing changes without checking every provider page

Get concise updates when input costs, output costs, or value rankings change.

No spam. Useful updates only. Affiliate disclosures always clearly labeled.

FAQ

Which AI model is cheapest?

Mistral: Mistral Nemo is the cheapest raw API option in the current directory, but Mistral Small 3.1 is the better cheap default for most teams.

What is the best cheap AI API?

Mistral Small 3.1 is the best cheap AI API here because it balances low cost, high speed, and broad usefulness better than the absolute cheapest options.

When should I pay for a premium model?

Pay for a premium model when quality failures create expensive rework, missed edge cases, or costly downstream mistakes. Premium models rarely make sense for low-stakes high-volume prompts.

Which AI API is best for budget coding?

Meta: Llama 3.1 8B Instruct is the strongest budget coding specialist in the directory, while Mistral Small 3.1 is the better low-cost generalist if the work extends beyond pure coding.

Best overall value

Mistral Small 3.1

View
Why this recommendation

Mistral Small 3.1 is the best price-to-usefulness default for most teams.

MistralBudget
Best for
Ultra-high-volume classification, summarisation, and lightweight vision tasks
Price
$0.35/1M
Context
128k tokens
Cheapest raw API

Mistral: Mistral Nemo

View
Why this recommendation

Mistral: Mistral Nemo is the lowest-cost option by list price, but it is not automatically the best low-cost decision.

MistralBudget
Best for
Teams needing a cheap, fast, multilingual workhorse for classification, summarization, or light coding tasks at scale.
Price
$0.02/1M
Context
131k tokens
Best for speed

Anthropic: Claude 3 Haiku

View
Why this recommendation

Anthropic: Claude 3 Haiku is the better pick when low latency matters almost as much as low spend.

AnthropicBudget
Best for
High-volume production pipelines, customer support bots, and real-time text processing where cost and latency are critical constraints.
Price
$0.25/1M
Context
200k tokens

A dirt-cheap multilingual model perfect for bulk text tasks, but don't expect frontier-level reasoning.

Best for
Teams needing a cheap, fast, multilingual workhorse for classification, summarization, or light coding tasks at scale.
Speed
Fast
Input cost
$0.02/1M
Output cost
$0.03/1M
Context
131k tokens
MetaBudget

Meta: Llama 3.1 8B Instruct

The right tool for cheap, fast, high-volume tasks — not for anything that requires serious thinking.

Best for
High-throughput applications where cost and speed matter more than frontier-level quality, such as chatbots, content classification, and text summarization.
Speed
Very fast
Input cost
$0.02/1M
Output cost
$0.05/1M
Context
16k tokens
AnthropicBudget

Anthropic: Claude 3 Haiku

A capable budget workhorse, but Claude 3.5 Haiku has made it mostly obsolete for new deployments.

Best for
High-volume production pipelines, customer support bots, and real-time text processing where cost and latency are critical constraints.
Speed
Very fast
Input cost
$0.25/1M
Output cost
$1.25/1M
Context
200k tokens
ModelProviderBest forInputOutputContextSpeed
Mistral Small 3.1
Ultra-cheap multimodal model for massive-volume, low-complexity pipelines.
MistralUltra-high-volume classification, summarisation, and lightweight vision tasks$0.35/1M$0.56/1M128k tokensVery fast
Mistral: Mistral Nemo
A dirt-cheap multilingual model perfect for bulk text tasks, but don't expect frontier-level reasoning.
MistralTeams needing a cheap, fast, multilingual workhorse for classification, summarization, or light coding tasks at scale.$0.02/1M$0.03/1M131k tokensFast
Meta: Llama 3.1 8B Instruct
The right tool for cheap, fast, high-volume tasks — not for anything that requires serious thinking.
MetaHigh-throughput applications where cost and speed matter more than frontier-level quality, such as chatbots, content classification, and text summarization.$0.02/1M$0.05/1M16k tokensVery fast
Anthropic: Claude 3 Haiku
A capable budget workhorse, but Claude 3.5 Haiku has made it mostly obsolete for new deployments.
AnthropicHigh-volume production pipelines, customer support bots, and real-time text processing where cost and latency are critical constraints.$0.25/1M$1.25/1M200k tokensVery fast

A dirt-cheap multilingual model perfect for bulk text tasks, but don't expect frontier-level reasoning.

When to use

Teams needing a cheap, fast, multilingual workhorse for classification, summarization, or light coding tasks at scale.

When not to use

You need reliable multi-step reasoning, advanced code generation, or any image/multimodal processing.

Best budget coding pick

Meta: Llama 3.1 8B Instruct

Model page

The right tool for cheap, fast, high-volume tasks — not for anything that requires serious thinking.

When to use

High-throughput applications where cost and speed matter more than frontier-level quality, such as chatbots, content classification, and text summarization.

When not to use

You need deep reasoning, long document analysis, complex code generation, or outputs where quality directly impacts user trust.

Best for speed

Anthropic: Claude 3 Haiku

Model page

A capable budget workhorse, but Claude 3.5 Haiku has made it mostly obsolete for new deployments.

When to use

High-volume production pipelines, customer support bots, and real-time text processing where cost and latency are critical constraints.

When not to use

You need deep reasoning, complex coding tasks, or high-quality creative writing — Claude 3 Sonnet, GPT-4o Mini, or even Claude 3.5 Haiku will serve you better.

Free to use. Pro plan unlocks GPT-4o and Claude.
Try it
Unified model API

OpenRouter

One API key to access GPT-5, Claude 4, Gemini, Llama, and 100+ other models. Ideal for developers who want to switch models without rewriting integration code.

Best for developers & API users
Pay per token. No minimum spend.Try it

These tools are independently recommended based on real-world fit with the models on this site. Links may include affiliate or referral tracking — see our disclosures.

Sponsor this spot

Pricing page sponsor slot

A clean, clearly labeled placement for a future sponsor relevant to model selection, monitoring, or optimization.

AudienceDevelopers & AI power users
IntentActively choosing an AI model
PlacementNon-intrusive, clearly labeled
Get featured hereAsk a question

Sponsored placements are clearly labeled and kept separate from editorial recommendations.