LLM Price Wars: Find the Cheapest AI for Your Use Case
Compare API pricing across 15 models from Anthropic, OpenAI, Google, DeepSeek, Meta, and Mistral. Adjust your usage, pick your use case, and find the best model for your budget.
Configure Your Usage
Top 5 by Cost
All Models (31)
Llama 4 Scout
GPT-5 Nano
GPT-4.1 Nano
Gemini 2.5 Flash-Lite
Gemini 2.0 Flash
DeepSeek V3.2
Grok 4.1 Fast
GPT-4o Mini
Llama 4 Maverick
Gemini 3.1 Flash-Lite
GPT-4.1 Mini
GPT-5 Mini
DeepSeek R1
Gemini 2.5 Flash
Gemini 3 Flash
o4-mini
Claude Haiku 4.5
Mistral Large
o3
GPT-4.1
GPT-5
Gemini 2.5 Pro
GPT-4o
Gemini 3.1 Pro
GPT-5.3 Codex
GPT-5.2
GPT-5.4
Claude Sonnet 4.6
Grok 4
Claude Opus 4.6
o3-pro
Pro Tips to Cut LLM Costs
Prompt Caching
Anthropic, OpenAI, and Google all offer prompt caching that cuts costs 50-90% on repeated system prompts and context. If you have a long system prompt, caching pays for itself instantly.
Prompt Optimization
Shorter prompts are cheaper prompts. Strip unnecessary examples, use concise instructions, and prefer structured outputs (JSON) to reduce output tokens. A 30% token reduction = 30% cost savings.
Model Routing
Route simple tasks (classification, extraction) to cheap models like GPT-4.1-nano and only use premium models for complex reasoning. A smart router can cut costs 70% with no quality loss on the easy tasks.
Response Caching
Cache identical or semantically similar requests with a vector similarity lookup. Common for RAG pipelines, FAQ bots, and search. Hit rates of 30-60% are typical in production.
Frequently Asked Questions
Prices are calculated based on each provider's published per-token API pricing. Monthly cost = (daily requests x 30) x (average input tokens x input price + average output tokens x output price) / 1,000,000. These are raw API costs and don't include platform fees or volume discounts.
For most production chatbots, Claude Sonnet 4 or GPT-4o offer the best balance of quality, speed, and cost. If you need the absolute cheapest option with acceptable quality, GPT-4o-mini or Gemini 2.5 Flash are excellent. For high-stakes conversations, Claude Opus 4 provides the best reasoning quality.
Llama 3.3 70B appears cheap at API pricing, but self-hosting requires GPU infrastructure (A100/H100 instances). At low volumes, API services are almost always cheaper. Self-hosting becomes cost-effective above ~50M tokens/day when you can keep GPUs at high utilization.
The biggest levers: 1) Use prompt caching (saves 50-90% on repeated prefixes). 2) Route simple tasks to cheaper models with a model router. 3) Optimize prompts to use fewer tokens. 4) Batch non-urgent requests. 5) Cache frequent responses. Most teams can cut costs 60-80% with these techniques.
Yes, LLM pricing has been dropping steadily. Prices have fallen roughly 90% over the past two years. We update this calculator regularly, but always verify current pricing on each provider's website before making budget commitments.
Most providers impose rate limits (requests/minute) and daily quotas, especially on cheaper tiers. High-volume users need to factor in enterprise plans or negotiate custom limits. This calculator focuses on per-token costs, not rate limit constraints.
Learn to Build AI Apps That Keep Costs Low
Our playbook teaches you prompt caching, model routing, and the production patterns that cut LLM costs by 80%.
Get the Playbook