A purpose-built budget audio model that excels at voice tasks but stumbles on context length and general-purpose depth.
52
Coding
63
Writing
58
Research
0
Images
88
Value
28
Long Context
Use this when
Transcribing, analyzing, and responding to audio input cost-effectively without needing a separate speech-to-text pipeline.
Skip this if
You need long-context document analysis, image understanding, or top-tier reasoning performance, as the 32K window and 24B scale will bottleneck complex tasks.
Pricing
$0.10/1M in
$0.30/1M out
→0%since May 2026
Context
32k tokens
Speed
Fast
Voxtral Small is audio-in capable but does not support image input. The 32K context window is notably short for a 2025 model. Pricing is via Mistral's API; availability through third-party providers may vary. Check whether your use case requires audio input — the text-only version of Mistral Small 3.1 may be more appropriate for pure text workloads.
Native audio input support — understands spoken language directly without requiring external STT preprocessing
Extremely low cost at $0.10/$0.30 per 1M tokens, undercutting GPT-4o Audio and Gemini 1.5 Flash significantly
Solid multilingual audio handling, reflecting Mistral's strong European language coverage
Compact 24B size keeps inference fast despite multimodal capabilities
Weaknesses
32K context window is restrictive compared to competitors — Gemini 3.1 Pro offers 1M+ tokens, limiting long audio or document tasks
No image understanding despite the multimodal framing, narrowing its real-world versatility
Reasoning and complex coding benchmarks lag behind GPT-4o and Claude Sonnet 4.6 at their respective tiers
Real-world use cases
What people actually use Mistral: Voxtral Small 24B 2507 for.
Transcribing and summarizing customer support call recordings in multiple languages
Building a budget voice assistant that processes spoken queries and returns structured text responses
Extracting action items from meeting audio files without a separate STT service
Price History
Mistral: Voxtral Small 24B 2507 pricing over time
→0% since May 9
48 data points · tracked daily since May 9, 2026
Ready to try it?
Start using Mistral: Voxtral Small 24B 2507
Transcribing, analyzing, and responding to audio input cost-effectively without needing a separate speech-to-text pipeline.. Start free — no card required.
Recommendations are made independently based on real-world use and public benchmarks. See our disclosures for details.
Compare alternatives
Similar models worth checking before you commit.
MistralBudget
Mistral: Mistral Medium 3.1
Mistral Medium 3.1 is a multimodal mid-tier model from Mistral that supersedes Mistral Large 2, offering vision capabilities alongside strong text performance at a significantly reduced price point. It targets the sweet spot between budget models and expensive flagships, with a 128K context window and competitive multilingual support.
Verdict
The best Mistral model for budget-conscious builders who still need multimodal capability and solid multilingual output.
Quality score
70%
Pricing
$0.40/1M in
$2.00/1M out
Speed
Fast
Best for cost-sensitive teams needing solid coding, instruction-following, and basic vision tasks without paying flagship prices.
Context
131k tokens
Officially supersedes Mistral Large 2, representing a generational shift in Mistral's lineup toward multimodal capability at lower cost tiers. Available via Mistral API and select cloud providers. No function calling limitations noted at this tier.
BudgetMultimodalMultilingualMid-tierVision
Best for
Cost-sensitive teams needing solid coding, instruction-following, and basic vision tasks without paying flagship prices.
Llama 3.2 11B Vision Instruct is Meta's open-weight multimodal model capable of understanding both text and images at an extremely low price point. It handles image captioning, visual question answering, and document analysis alongside standard text tasks.
Verdict
The go-to vision model when budget is the top constraint and good-enough accuracy is acceptable.
Quality score
57%
Pricing
$0.34/1M in
$0.34/1M out
Speed
Fast
Best for budget-conscious developers who need basic vision capabilities without paying premium multimodal prices.
Context
131k tokens
Available via multiple inference providers including Together AI, Fireworks, and OpenRouter. As an open-weight model, it can also be self-hosted for even lower marginal costs at scale. Part of Meta's Llama 3.2 family which also includes a 90B vision variant for heavier workloads.
Open-weightVisionBudgetMultimodalMeta
Best for
Budget-conscious developers who need basic vision capabilities without paying premium multimodal prices.
Mistral's ultra-budget multimodal model — exceptionally cheap with vision support, built for high-volume lightweight tasks where cost is the primary constraint.
Verdict
Ultra-cheap multimodal model for massive-volume, low-complexity pipelines.
Quality score
57%
Pricing
$0.35/1M in
$0.55/1M out
Speed
Very fast
Best for ultra-high-volume classification, summarisation, and lightweight vision tasks
Context
128k tokens
At $0.10/1M input, the cost question disappears. The only question is whether the task complexity exceeds what Mistral Small can handle.
BudgetMultimodalUltra cheapMistral
Best for
Ultra-high-volume classification, summarisation, and lightweight vision tasks
Pricing moves, ranking shifts, and capability updates.
New ModelMar 27, 2026
Mistral: Voxtral Small 24B 2507 — added to UseRightAI
Mistral: Voxtral Small 24B 2507 (Mistral) is now indexed. It supersedes Mistral Small 3.1. A purpose-built budget audio model that excels at voice tasks but stumbles on context length and general-purpose depth.
Mistral: Voxtral Small 24B 2507 is best for transcribing, analyzing, and responding to audio input cost-effectively without needing a separate speech-to-text pipeline.. It is a strong fit when that workflow matters more than the tradeoffs around budget pricing and fast speed.
When should I avoid Mistral: Voxtral Small 24B 2507?
You need long-context document analysis, image understanding, or top-tier reasoning performance, as the 32K window and 24B scale will bottleneck complex tasks.
What is a cheaper alternative to Mistral: Voxtral Small 24B 2507?
Meta: Llama 3.1 8B Instruct is the lower-cost option to compare first when you want a similar workflow fit with less token spend.
What is a faster alternative to Mistral: Voxtral Small 24B 2507?
Mistral Small 3.1 is the better pick when response time matters more than maximum depth or premium quality.
Newsletter
Get notified when Mistral: Voxtral Small 24B 2507 pricing changes
We track pricing daily. When this model drops or spikes, you'll know first.
No spam. Useful updates only. Affiliate disclosures always clearly labeled.