Question 1

How are these prices calculated?

Accepted Answer

Prices are calculated based on each provider's published per-token API pricing. Monthly cost = (daily requests x 30) x (average input tokens x input price + average output tokens x output price) / 1,000,000. These are raw API costs and don't include platform fees or volume discounts.

Question 2

Which model should I pick for a production chatbot?

Accepted Answer

For most production chatbots, Claude Sonnet 4 or GPT-4o offer the best balance of quality, speed, and cost. If you need the absolute cheapest option with acceptable quality, GPT-4o-mini or Gemini 2.5 Flash are excellent. For high-stakes conversations, Claude Opus 4 provides the best reasoning quality.

Question 3

Are self-hosted models really cheaper?

Accepted Answer

Llama 3.3 70B appears cheap at API pricing, but self-hosting requires GPU infrastructure (A100/H100 instances). At low volumes, API services are almost always cheaper. Self-hosting becomes cost-effective above ~50M tokens/day when you can keep GPUs at high utilization.

Question 4

How can I reduce my LLM API costs?

Accepted Answer

The biggest levers: 1) Use prompt caching (saves 50-90% on repeated prefixes). 2) Route simple tasks to cheaper models with a model router. 3) Optimize prompts to use fewer tokens. 4) Batch non-urgent requests. 5) Cache frequent responses. Most teams can cut costs 60-80% with these techniques.

Question 5

Do prices change frequently?

Accepted Answer

Yes, LLM pricing has been dropping steadily. Prices have fallen roughly 90% over the past two years. We update this calculator regularly, but always verify current pricing on each provider's website before making budget commitments.

Question 6

What about rate limits and quotas?

Accepted Answer

Most providers impose rate limits (requests/minute) and daily quotas, especially on cheaper tiers. High-volume users need to factor in enterprise plans or negotiate custom limits. This calculator focuses on per-token costs, not rate limit constraints.

LLM Price Wars: Find the Cheapest AI for Your Use Case

Configure Your Usage

Top 5 by Cost

All Models (31)

Llama 4 Scout

GPT-5 Nano

GPT-4.1 Nano

Gemini 2.5 Flash-Lite

Gemini 2.0 Flash

DeepSeek V3.2

Grok 4.1 Fast

GPT-4o Mini

Llama 4 Maverick

Gemini 3.1 Flash-Lite

GPT-4.1 Mini

GPT-5 Mini

DeepSeek R1

Gemini 2.5 Flash

Gemini 3 Flash

o4-mini

Claude Haiku 4.5

Mistral Large

o3

GPT-4.1

GPT-5

Gemini 2.5 Pro

GPT-4o

Gemini 3.1 Pro

GPT-5.3 Codex

GPT-5.2

GPT-5.4

Claude Sonnet 4.6

Grok 4

Claude Opus 4.6

o3-pro

Pro Tips to Cut LLM Costs

Prompt Caching

Prompt Optimization

Model Routing

Response Caching

Frequently Asked Questions

Learn to Build AI Apps That Keep Costs Low