Free Tool - Updated March 2026

LLM Price Wars: Find the Cheapest AI for Your Use Case

Compare API pricing across 15 models from Anthropic, OpenAI, Google, DeepSeek, Meta, and Mistral. Adjust your usage, pick your use case, and find the best model for your budget.

Configure Your Usage

Recommended pick
$63.75/mo
GPT-5 Mini
Switching from o3-pro to GPT-5 Mini saves you $2,636/mo

Top 5 by Cost

Llama 4 Scout$10.50/mo
GPT-5 Nano$12.75/mo
GPT-4.1 Nano$13.50/mo
Gemini 2.5 Flash-Lite$13.50/mo
Gemini 2.0 Flash$13.50/mo

All Models (31)

#1

Llama 4 Scout

Meta
Budget
Fastest
$10.50/mo
$0.0003 per request
ChatbotClassificationRAG/Search
#2

GPT-5 Nano

OpenAI
Budget
Fastest
$12.75/mo
$0.0004 per request
ClassificationRAG/Search
#3

GPT-4.1 Nano

OpenAI
Budget
Fastest
$13.50/mo
$0.0004 per request
ClassificationRAG/Search
#4

Gemini 2.5 Flash-Lite

Google
Budget
Fastest
$13.50/mo
$0.0004 per request
ClassificationRAG/Search
#5

Gemini 2.0 Flash

Google
Budget
Fastest
$13.50/mo
$0.0004 per request
ClassificationRAG/SearchChatbot
Best Value
#6

DeepSeek V3.2

DeepSeek
High Value
Fast
68.4%SWE-bench
$16.80/mo
$0.0006 per request
Code GenerationSummarization
#7

Grok 4.1 Fast

xAI
Mid Range
Fastest
50.6%SWE-bench
$18.00/mo
$0.0006 per request
ChatbotContent WritingSummarization
#8

GPT-4o Mini

OpenAI
Budget
Fastest
$20.25/mo
$0.0007 per request
ChatbotClassificationRAG/Search
#9

Llama 4 Maverick

Meta
Budget
Fast
18.4%SWE-bench
$21.00/mo
$0.0007 per request
ChatbotSummarization
#10

Gemini 3.1 Flash-Lite

Google
Mid Range
Fastest
60.4%SWE-bench
$48.75/mo
$0.0016 per request
ClassificationRAG/SearchChatbot
#11

GPT-4.1 Mini

OpenAI
Budget
Fastest
34.8%SWE-bench
$54.00/mo
$0.0018 per request
ChatbotSummarizationRAG/Search
Recommended for Chatbot
#12

GPT-5 Mini

OpenAI
Mid Range
Fastest
$63.75/mo
$0.0021 per request
ChatbotSummarizationRAG/Search
#13

DeepSeek R1

DeepSeek
High Value
Medium
$73.95/mo
$0.0025 per request
Data AnalysisCode Generation
#14

Gemini 2.5 Flash

Google
Mid Range
Fastest
55.6%SWE-bench
$79.50/mo
$0.0027 per request
RAG/SearchSummarizationChatbot
#15

Gemini 3 Flash

Google
Top Tier
Fastest
76.2%SWE-bench
$97.50/mo
$0.0032 per request
Code GenerationChatbotRAG/Search
#16

o4-mini

OpenAI
Mid Range
Fast
33.4%SWE-bench
$148.50/mo
$0.0050 per request
Data AnalysisClassification
#17

Claude Haiku 4.5

Anthropic
High Value
Fastest
68.8%SWE-bench
$165.00/mo
$0.0055 per request
RAG/SearchClassificationChatbot
#18

Mistral Large

Mistral
Mid Range
Fast
36.2%SWE-bench
$210.00/mo
$0.0070 per request
Content WritingSummarization
#19

o3

OpenAI
Mid Range
Medium
49.8%SWE-bench
$270.00/mo
$0.0090 per request
Data AnalysisCode Generation
#20

GPT-4.1

OpenAI
Mid Range
Fast
47.4%SWE-bench
$270.00/mo
$0.0090 per request
Code GenerationContent Writing
#21

GPT-5

OpenAI
High Value
Fast
68.8%SWE-bench
$318.75/mo
$0.011 per request
Content WritingChatbotSummarization
#22

Gemini 2.5 Pro

Google
Mid Range
Fast
46.8%SWE-bench
$318.75/mo
$0.011 per request
Data AnalysisCode GenerationContent Writing
#23

GPT-4o

OpenAI
Budget
Fast
27.2%SWE-bench
$337.50/mo
$0.011 per request
Content WritingChatbotSummarization
#24

Gemini 3.1 Pro

Google
High Value
Fast
70.2%SWE-bench
$390.00/mo
$0.013 per request
Code GenerationData AnalysisContent Writing
#25

GPT-5.3 Codex

OpenAI
High Value
Fast
75.2%SWE-bench
$446.25/mo
$0.015 per request
Code GenerationData Analysis
#26

GPT-5.2

OpenAI
High Value
Fast
75.4%SWE-bench
$446.25/mo
$0.015 per request
Code GenerationData Analysis
#27

GPT-5.4

OpenAI
Top Tier
Fast
77.2%SWE-bench
$487.50/mo
$0.016 per request
Code GenerationContent WritingData Analysis
#28

Claude Sonnet 4.6

Anthropic
Top Tier
Fast
76.2%SWE-bench
$495.00/mo
$0.017 per request
Code GenerationContent WritingChatbot
#29

Grok 4

xAI
Mid Range
Medium
58.6%SWE-bench
$495.00/mo
$0.017 per request
Data AnalysisCode Generation
#30

Claude Opus 4.6

Anthropic
Top Tier
Medium
79.2%SWE-bench
$825.00/mo
$0.028 per request
Code GenerationData Analysis
#31

o3-pro

OpenAI
High Value
Medium
$2,700/mo
$0.090 per request
Data AnalysisCode Generation

Pro Tips to Cut LLM Costs

$

Prompt Caching

Anthropic, OpenAI, and Google all offer prompt caching that cuts costs 50-90% on repeated system prompts and context. If you have a long system prompt, caching pays for itself instantly.

Prompt Optimization

Shorter prompts are cheaper prompts. Strip unnecessary examples, use concise instructions, and prefer structured outputs (JSON) to reduce output tokens. A 30% token reduction = 30% cost savings.

Model Routing

Route simple tasks (classification, extraction) to cheap models like GPT-4.1-nano and only use premium models for complex reasoning. A smart router can cut costs 70% with no quality loss on the easy tasks.

Response Caching

Cache identical or semantically similar requests with a vector similarity lookup. Common for RAG pipelines, FAQ bots, and search. Hit rates of 30-60% are typical in production.

Frequently Asked Questions

Prices are calculated based on each provider's published per-token API pricing. Monthly cost = (daily requests x 30) x (average input tokens x input price + average output tokens x output price) / 1,000,000. These are raw API costs and don't include platform fees or volume discounts.

For most production chatbots, Claude Sonnet 4 or GPT-4o offer the best balance of quality, speed, and cost. If you need the absolute cheapest option with acceptable quality, GPT-4o-mini or Gemini 2.5 Flash are excellent. For high-stakes conversations, Claude Opus 4 provides the best reasoning quality.

Llama 3.3 70B appears cheap at API pricing, but self-hosting requires GPU infrastructure (A100/H100 instances). At low volumes, API services are almost always cheaper. Self-hosting becomes cost-effective above ~50M tokens/day when you can keep GPUs at high utilization.

The biggest levers: 1) Use prompt caching (saves 50-90% on repeated prefixes). 2) Route simple tasks to cheaper models with a model router. 3) Optimize prompts to use fewer tokens. 4) Batch non-urgent requests. 5) Cache frequent responses. Most teams can cut costs 60-80% with these techniques.

Yes, LLM pricing has been dropping steadily. Prices have fallen roughly 90% over the past two years. We update this calculator regularly, but always verify current pricing on each provider's website before making budget commitments.

Most providers impose rate limits (requests/minute) and daily quotas, especially on cheaper tiers. High-volume users need to factor in enterprise plans or negotiate custom limits. This calculator focuses on per-token costs, not rate limit constraints.

Learn to Build AI Apps That Keep Costs Low

Our playbook teaches you prompt caching, model routing, and the production patterns that cut LLM costs by 80%.

Get the Playbook