The best cheap model for long-document pipelines, but don't expect flagship-level reasoning.
58
Coding
55
Writing
52
Research
20
Images
93
Value
88
Long Context
Use this when
High-volume, latency-sensitive applications like document triage, chatbot pipelines, and content classification at scale.
Skip this if
You need high-quality code generation, complex reasoning chains, or nuanced long-form writing — upgrade to Gemini 2.5 Flash or Pro instead.
Pricing
$0.10/1M in
$0.40/1M out
→0%since May 2026
Context
1.0M tokens
Speed
Very fast
Pricing is approximate based on listed rates. As a 'Lite' model, it may not support all multimodal features available in full Flash or Pro variants. Check Google AI Studio for feature availability and rate limits.
Extremely low cost at $0.10/$0.40 per 1M tokens — cheaper than GPT-4.1 Mini and competitive with Claude Haiku 3.5
Massive 1M token context window is rare at this price tier, enabling long-document processing on a budget
Fast inference makes it practical for real-time applications and user-facing chatbots
Solid baseline reasoning inherited from the Gemini 2.5 architecture, outperforming older budget models like Gemini 1.5 Flash
Weaknesses
Noticeably weaker on complex multi-step reasoning and nuanced instruction-following compared to Gemini 2.5 Flash or Pro
Code generation quality lags behind GPT-4.1 Mini and Claude Haiku 3.5 on harder algorithmic problems
Not suitable for tasks requiring deep analysis, synthesis, or creative depth
Real-world use cases
What people actually use Google: Gemini 2.5 Flash Lite for.
Classifying and routing thousands of customer support tickets using the full conversation history as context
Summarizing 500-page legal or financial documents within a single API call using the 1M token window
Powering a low-cost FAQ chatbot that needs fast responses and handles moderate complexity queries
Price History
Google: Gemini 2.5 Flash Lite pricing over time
→0% since May 9
48 data points · tracked daily since May 9, 2026
Ready to try it?
Start using Google: Gemini 2.5 Flash Lite
High-volume, latency-sensitive applications like document triage, chatbot pipelines, and content classification at scale.. Start free — no card required.
Recommendations are made independently based on real-world use and public benchmarks. See our disclosures for details.
Compare alternatives
Similar models worth checking before you commit.
GoogleBudget
Gemma 4 26B A4B
Gemma 4 26B A4B is a sparse mixture-of-experts open model from Google, activating only ~4B parameters per forward pass despite having 26B total parameters. It offers a 262K context window at budget pricing, making it one of the more capable open-weight models for its cost tier.
Verdict
A lean, fast, and surprisingly capable budget model best suited for high-volume text tasks where cost efficiency trumps peak quality.
Quality score
59%
Pricing
$0.13/1M in
$0.40/1M out
Speed
Fast
Best for cost-sensitive applications needing long-context processing with reasonable quality, such as document summarization pipelines or lightweight coding assistants.
Context
262k tokens
As an open-weight model, Gemma 4 26B can also be self-hosted, making API pricing largely irrelevant at scale. The 'A4B' suffix denotes the active parameter count in its MoE configuration. Listed as superseding Gemini 3 Flash Preview, though Gemini 2.0 Flash remains a stronger hosted alternative.
Open-weightBudgetMoELong ContextGoogle
Best for
Cost-sensitive applications needing long-context processing with reasonable quality, such as document summarization pipelines or lightweight coding assistants.
Gemma 4 31B is Google's open-weight instruction-tuned model offering a strong balance of capability and cost efficiency at just $0.14/$0.40 per million tokens. It features a 262K context window and is designed for developers who need capable on-premise or API-hosted inference without flagship pricing.
Verdict
A well-priced, long-context open-weight model that's ideal for high-volume developer workloads but won't match frontier models on complex reasoning.
Quality score
66%
Pricing
$0.14/1M in
$0.40/1M out
Speed
Fast
Best for cost-conscious developers needing a capable open-weight model for coding assistance, summarization, and document analysis at scale.
Context
262k tokens
As an open-weight model, Gemma 4 31B can be self-hosted via Ollama or Hugging Face in addition to Google's API. Pricing shown is for hosted inference. No image input capability confirmed at launch.
Open WeightBudgetLong ContextCodingSelf-Hostable
Best for
Cost-conscious developers needing a capable open-weight model for coding assistance, summarization, and document analysis at scale.
Gemini 2.0 Flash Lite is Google's ultra-budget, high-speed model designed for high-volume, cost-sensitive applications. It sits below Gemini 2.0 Flash in capability but offers the lowest price point in the Gemini 2.0 family with a massive 1M token context window.
Verdict
The go-to model when cost and throughput are everything and task complexity is low.
Quality score
57%
Pricing
$0.07/1M in
$0.30/1M out
Speed
Very fast
Best for high-throughput, cost-sensitive pipelines where speed and price matter more than top-tier reasoning quality.
Context
1.0M tokens
Pricing is among the lowest available in any major provider's lineup as of mid-2025. Context window of 1M tokens is a significant differentiator at this price tier. Check Google AI Studio and Vertex AI for rate limits on high-volume usage.
Google: Gemini 2.5 Flash Lite is best for high-volume, latency-sensitive applications like document triage, chatbot pipelines, and content classification at scale.. It is a strong fit when that workflow matters more than the tradeoffs around budget pricing and very fast speed.
When should I avoid Google: Gemini 2.5 Flash Lite?
You need high-quality code generation, complex reasoning chains, or nuanced long-form writing — upgrade to Gemini 2.5 Flash or Pro instead.
What is a cheaper alternative to Google: Gemini 2.5 Flash Lite?
Meta: Llama 3.1 8B Instruct is the lower-cost option to compare first when you want a similar workflow fit with less token spend.
What is a faster alternative to Google: Gemini 2.5 Flash Lite?
Gemma 4 26B A4B is the better pick when response time matters more than maximum depth or premium quality.
Newsletter
Get notified when Google: Gemini 2.5 Flash Lite pricing changes
We track pricing daily. When this model drops or spikes, you'll know first.
No spam. Useful updates only. Affiliate disclosures always clearly labeled.