Home ModelsMistral: Voxtral Small 24B 2507

MistralBudget

Mistral: Voxtral Small 24B 2507

A purpose-built budget audio model that excels at voice tasks but stumbles on context length and general-purpose depth.

Coding

Writing

Research

Images

Value

Long Context

Use this when

Transcribing, analyzing, and responding to audio input cost-effectively without needing a separate speech-to-text pipeline.

Skip this if

You need long-context document analysis, image understanding, or top-tier reasoning performance, as the 32K window and 24B scale will bottleneck complex tasks.

Pricing

$0.10/1M in

$0.30/1M out

→0%since May 2026

Context

32k tokens

Speed

Fast

Voxtral Small is audio-in capable but does not support image input. The 32K context window is notably short for a 2025 model. Pricing is via Mistral's API; availability through third-party providers may vary. Check whether your use case requires audio input — the text-only version of Mistral Small 3.1 may be more appropriate for pure text workloads.

How to access

API

$0.09999999999999999/1M input tokens

Subscription = chat interface. API = build with it. Compare all subscription plans

Switch to instead if...

Best overall

Claude Fable 5

Cheaper option

Meta: Llama 3.1 8B Instruct

Faster option

Mistral Small 3.1

Strengths

Native audio input support — understands spoken language directly without requiring external STT preprocessing

Extremely low cost at $0.10/$0.30 per 1M tokens, undercutting GPT-4o Audio and Gemini 1.5 Flash significantly

Solid multilingual audio handling, reflecting Mistral's strong European language coverage

Compact 24B size keeps inference fast despite multimodal capabilities

Weaknesses

32K context window is restrictive compared to competitors — Gemini 3.1 Pro offers 1M+ tokens, limiting long audio or document tasks

No image understanding despite the multimodal framing, narrowing its real-world versatility

Reasoning and complex coding benchmarks lag behind GPT-4o and Claude Sonnet 4.6 at their respective tiers

Real-world use cases

What people actually use Mistral: Voxtral Small 24B 2507 for.

Transcribing and summarizing customer support call recordings in multiple languages

Building a budget voice assistant that processes spoken queries and returns structured text responses

Extracting action items from meeting audio files without a separate STT service

Price History

Mistral: Voxtral Small 24B 2507 pricing over time

→0% since May 9

48 data points · tracked daily since May 9, 2026

Ready to try it?

Start using Mistral: Voxtral Small 24B 2507

Transcribing, analyzing, and responding to audio input cost-effectively without needing a separate speech-to-text pipeline.. Start free — no card required.

Try Mistral: Voxtral Small 24B 2507 free Compare alternatives

Recommendations are made independently based on real-world use and public benchmarks. See our disclosures for details.

Compare alternatives

Similar models worth checking before you commit.

MistralBudget

Mistral: Mistral Medium 3.1

Mistral Medium 3.1 is a multimodal mid-tier model from Mistral that supersedes Mistral Large 2, offering vision capabilities alongside strong text performance at a significantly reduced price point. It targets the sweet spot between budget models and expensive flagships, with a 128K context window and competitive multilingual support.

Verdict

The best Mistral model for budget-conscious builders who still need multimodal capability and solid multilingual output.

Quality score

70%

Pricing

$0.40/1M in

$2.00/1M out

Speed

Fast

Best for cost-sensitive teams needing solid coding, instruction-following, and basic vision tasks without paying flagship prices.

Context

131k tokens

Officially supersedes Mistral Large 2, representing a generational shift in Mistral's lineup toward multimodal capability at lower cost tiers. Available via Mistral API and select cloud providers. No function calling limitations noted at this tier.

BudgetMultimodalMultilingualMid-tierVision

Best for

Cost-sensitive teams needing solid coding, instruction-following, and basic vision tasks without paying flagship prices.

View model

MetaBudget

Meta: Llama 3.2 11B Vision Instruct

Llama 3.2 11B Vision Instruct is Meta's open-weight multimodal model capable of understanding both text and images at an extremely low price point. It handles image captioning, visual question answering, and document analysis alongside standard text tasks.

Verdict

The go-to vision model when budget is the top constraint and good-enough accuracy is acceptable.

Quality score

57%

Pricing

$0.34/1M in

$0.34/1M out

Speed

Fast

Best for budget-conscious developers who need basic vision capabilities without paying premium multimodal prices.

Context

131k tokens

Available via multiple inference providers including Together AI, Fireworks, and OpenRouter. As an open-weight model, it can also be self-hosted for even lower marginal costs at scale. Part of Meta's Llama 3.2 family which also includes a 90B vision variant for heavier workloads.

Open-weightVisionBudgetMultimodalMeta

Best for

Budget-conscious developers who need basic vision capabilities without paying premium multimodal prices.

View model

MistralBudget

Mistral Small 3.1

Mistral's ultra-budget multimodal model — exceptionally cheap with vision support, built for high-volume lightweight tasks where cost is the primary constraint.

Verdict

Ultra-cheap multimodal model for massive-volume, low-complexity pipelines.

Quality score

57%

Pricing

$0.35/1M in

$0.55/1M out

Speed

Very fast

Best for ultra-high-volume classification, summarisation, and lightweight vision tasks

Context

128k tokens

At $0.10/1M input, the cost question disappears. The only question is whether the task complexity exceeds what Mistral Small can handle.

BudgetMultimodalUltra cheapMistral

Best for

Ultra-high-volume classification, summarisation, and lightweight vision tasks

View model

Change history

Pricing moves, ranking shifts, and capability updates.

New ModelMar 27, 2026

Mistral: Voxtral Small 24B 2507 — added to UseRightAI

Mistral: Voxtral Small 24B 2507 (Mistral) is now indexed. It supersedes Mistral Small 3.1. A purpose-built budget audio model that excels at voice tasks but stumbles on context length and general-purpose depth.

View model

FAQ

What is Mistral: Voxtral Small 24B 2507 best for?

Mistral: Voxtral Small 24B 2507 is best for transcribing, analyzing, and responding to audio input cost-effectively without needing a separate speech-to-text pipeline.. It is a strong fit when that workflow matters more than the tradeoffs around budget pricing and fast speed.

When should I avoid Mistral: Voxtral Small 24B 2507?

You need long-context document analysis, image understanding, or top-tier reasoning performance, as the 32K window and 24B scale will bottleneck complex tasks.

What is a cheaper alternative to Mistral: Voxtral Small 24B 2507?

Meta: Llama 3.1 8B Instruct is the lower-cost option to compare first when you want a similar workflow fit with less token spend.

What is a faster alternative to Mistral: Voxtral Small 24B 2507?

Mistral Small 3.1 is the better pick when response time matters more than maximum depth or premium quality.

Newsletter

Get notified when Mistral: Voxtral Small 24B 2507 pricing changes

We track pricing daily. When this model drops or spikes, you'll know first.

No spam. Useful updates only. Affiliate disclosures always clearly labeled.

Mistral: Voxtral Small 24B 2507

A purpose-built budget audio model that excels at voice tasks but stumbles on context length and general-purpose depth.

Coding

Writing

Research

Images

Value

Long Context

Use this when

Transcribing, analyzing, and responding to audio input cost-effectively without needing a separate speech-to-text pipeline.

Skip this if

You need long-context document analysis, image understanding, or top-tier reasoning performance, as the 32K window and 24B scale will bottleneck complex tasks.

Pricing

$0.10/1M in

$0.30/1M out

→0%since May 2026

Context

32k tokens

Speed

Fast

How to access

API

$0.09999999999999999/1M input tokens

Subscription = chat interface. API = build with it. Compare all subscription plans

Strengths

Native audio input support — understands spoken language directly without requiring external STT preprocessing

Extremely low cost at $0.10/$0.30 per 1M tokens, undercutting GPT-4o Audio and Gemini 1.5 Flash significantly

Solid multilingual audio handling, reflecting Mistral's strong European language coverage

Compact 24B size keeps inference fast despite multimodal capabilities

Weaknesses

32K context window is restrictive compared to competitors — Gemini 3.1 Pro offers 1M+ tokens, limiting long audio or document tasks

No image understanding despite the multimodal framing, narrowing its real-world versatility

Reasoning and complex coding benchmarks lag behind GPT-4o and Claude Sonnet 4.6 at their respective tiers

Real-world use cases

What people actually use Mistral: Voxtral Small 24B 2507 for.

Transcribing and summarizing customer support call recordings in multiple languages

Building a budget voice assistant that processes spoken queries and returns structured text responses

Extracting action items from meeting audio files without a separate STT service

FAQ

What is Mistral: Voxtral Small 24B 2507 best for?

When should I avoid Mistral: Voxtral Small 24B 2507?

You need long-context document analysis, image understanding, or top-tier reasoning performance, as the 32K window and 24B scale will bottleneck complex tasks.

What is a cheaper alternative to Mistral: Voxtral Small 24B 2507?

Meta: Llama 3.1 8B Instruct is the lower-cost option to compare first when you want a similar workflow fit with less token spend.

What is a faster alternative to Mistral: Voxtral Small 24B 2507?

Mistral Small 3.1 is the better pick when response time matters more than maximum depth or premium quality.

Mistral: Voxtral Small 24B 2507

Strengths

Weaknesses

Real-world use cases

Mistral: Voxtral Small 24B 2507 pricing over time

Start using Mistral: Voxtral Small 24B 2507

Compare alternatives

Mistral: Mistral Medium 3.1

Meta: Llama 3.2 11B Vision Instruct

Mistral Small 3.1

Change history

Mistral: Voxtral Small 24B 2507 — added to UseRightAI

FAQ

What is Mistral: Voxtral Small 24B 2507 best for?

When should I avoid Mistral: Voxtral Small 24B 2507?

What is a cheaper alternative to Mistral: Voxtral Small 24B 2507?

What is a faster alternative to Mistral: Voxtral Small 24B 2507?

Get notified when Mistral: Voxtral Small 24B 2507 pricing changes

Mistral: Voxtral Small 24B 2507

Strengths

Weaknesses

Real-world use cases

Mistral: Voxtral Small 24B 2507 pricing over time

Start using Mistral: Voxtral Small 24B 2507

Compare alternatives

Mistral: Mistral Medium 3.1

Meta: Llama 3.2 11B Vision Instruct

Mistral Small 3.1

Change history

Mistral: Voxtral Small 24B 2507 — added to UseRightAI

FAQ

What is Mistral: Voxtral Small 24B 2507 best for?

When should I avoid Mistral: Voxtral Small 24B 2507?

What is a cheaper alternative to Mistral: Voxtral Small 24B 2507?

What is a faster alternative to Mistral: Voxtral Small 24B 2507?

User reviews

Get notified when Mistral: Voxtral Small 24B 2507 pricing changes

User reviews