Budget routing

Cheapest LLM API for your workload

The cheapest model depends on cache ratio, output length, and traffic — not headline list prices. Start with a preset, then scan the ranked table for the lowest monthly estimate.

Related: topics/llm pricing for rag · compare/openai vs anthropic

Calculator

Step 1

Describe your workload

Start with a preset or dial in your own numbers.

Synced Jun 29, 2026

1K messages / mo

1K tokens avg

Step 2

Estimated monthly spend

$67.50

Rank #63 of 63 · save $67.44/mo vs #1

Input $22.50 Cached 0.34M tok Output $45.00
Total tokens1.00M
Messages1K
Avg tok/msg1K

High-quality reasoning agents and premium customer support flows.

Step 3

Compare all models

63 models priced for your workload

Best value

01
Amazon Nova MicroAWS Bedrock
Verified$0.06
02
Gemini 2.0 Flash-Lite (batch)Google Gemini
Estimated$0.07
03
Command RCohere
Verified$0.07
04
Gemini 2.5 Flash-Lite (batch)Google Gemini
Verified$0.07
05
Gemini 2.0 Flash (batch)Google Gemini
Estimated$0.08
06
Ministral 3 3BMistral
Verified$0.10
07
Amazon Nova LiteAWS Bedrock
Verified$0.11
08
Mistral Small 3.2Mistral
Verified$0.11
09
Gemini 2.0 Flash (standard)Google Gemini
Estimated$0.11
10
DeepSeek V4 FlashDeepSeek
Verified$0.11
11
Gemini 2.0 Flash-Lite (standard)Google Gemini
Verified$0.13
12
Gemini 2.5 Flash-Lite (standard)Google Gemini
Verified$0.14
13
Gemini 2.5 Flash Lite (standard)Vertex AI
Verified$0.14
14
Mistral Small CreativeMistral
Estimated$0.15
15
Ministral 3 14BMistral
Verified$0.20
16
Gemini 2.5 Flash Lite (priority)Vertex AI
Verified$0.26
17
Grok 4.1 Fast ReasoningxAI
Verified$0.28
18
Grok 4.1 Fast Non-ReasoningxAI
Verified$0.28
19
deepseek-chatDeepSeek
Verified$0.30
20
GPT-5.4 nanoOpenAI
Verified$0.40
21
Gemini 2.5 Flash (flex batch)Vertex AI
Verified$0.43
22
DeepSeek V4 ProDeepSeek
Verified$0.54
23
Mistral Large 3Mistral
Verified$0.60
24
GPT-5-mini GlobalAzure OpenAI
Needs review$0.61
25
SonarPerplexity
Verified$0.68
26
Gemini 2.5 Flash (standard)Google Gemini
Verified$0.76
27
Gemini 2.5 Flash (standard)Vertex AI
Verified$0.76
28
Mistral Medium 3.1Mistral
Verified$0.80
29
deepseek-reasonerDeepSeek
Verified$0.82
30
Claude Haiku 3.5Anthropic
Verified$1.36
31
Gemini 2.5 Flash (priority)Vertex AI
Verified$1.36
32
Amazon Nova ProAWS Bedrock
Verified$1.40
33
GPT-5.4 miniOpenAI
Verified$1.46
34
Gemini 2.5 Pro (batch)Google Gemini
Verified$1.55
35
Grok 4.3xAI
Verified$1.56
36
Grok 4.20 ReasoningxAI
Verified$1.56
37
Grok 4.20 Non-ReasoningxAI
Verified$1.56
38
Claude Haiku 4.5Anthropic
Verified$1.70
39
Gemini 2.5 Pro (flex batch)Vertex AI
Verified$1.72
40
Amazon Nova Pro (Latency Optimized)AWS Bedrock
Verified$1.75
41
Magistral Medium 1.2Mistral
Verified$2.75
42
Mistral Large 2.1Mistral
Verified$3.00
43
Gemini 2.5 Pro (standard)Google Gemini
Verified$3.06
44
GPT-5 Codex GlobalAzure OpenAI
Verified$3.06
45
Gemini 2.5 Pro (standard)Vertex AI
Verified$3.06
46
Sonar Reasoning ProPerplexity
Verified$3.50
47
Sonar Deep ResearchPerplexity
Verified$3.50
48
Command ACohere
Verified$4.38
49
Command R+Cohere
Verified$4.38
50
GPT-5.4OpenAI
Verified$4.87
51
Claude Sonnet 4.6Anthropic
Verified$5.09
52
Claude Sonnet 4.5Anthropic
Verified$5.09
53
Claude Sonnet 4Anthropic
Verified$5.09
54
Claude Sonnet 3.7Anthropic
Needs review$5.09
55
Gemini 2.5 Pro (priority)Vertex AI
Verified$5.51
56
Sonar ProPerplexity
Verified$6.00
57
Claude Opus 4.6Anthropic
Verified$8.48
58
Claude Opus 4.5Anthropic
Verified$8.48
59
Claude Opus 4.7Anthropic
Verified$10.00
60
GPT-5.5OpenAI
Verified$11.25
61
Claude Opus 4.1Anthropic
Verified$25.44
62
Claude Opus 4Anthropic
Verified$25.44
63
GPT-5.5 ProOpenAI
Verified$67.50

How costs are calculated

Prices from ai-provider-pricing-validated.json, validated Jun 2026. Confirm on official provider pages before billing decisions.

Embed

Want this calculator on your site?

Copy the iframe snippet below and paste it into any page, doc, or WordPress Custom HTML block.

<iframe src="https://modelcostcomparison.com/embed/ai-api-pricing-calculator?ref=topic-cheapest-llm-api" width="100%" height="980" style="border:0;border-radius:12px;overflow:hidden" loading="lazy" referrerpolicy="strict-origin-when-cross-origin" title="AI API Pricing Calculator by Model Cost Comparison"></iframe>

Model Cost Comparison · Built by Lazige · Methodology

How we calculate cost

Monthly estimate = (input tokens × input $/MTok) + (cached tokens × cached $/MTok when published) + (output tokens × output $/MTok), scaled to your message volume. See the methodology for validation sources and update cadence.

Use cases

Common workload patterns teams model here

Illustrative scenarios — not customer testimonials. Each card shows how a typical team shape (support bot, RAG, code assistant, or agent) maps to the calculator presets.

A support-bot preset with 50k messages/month surfaced three budget models in one pass — faster than copying rates from five pricing pages.
B2B SaaS support botHigh volume · short replies · 55% cache
Raising the cached-input slider made our RAG estimate realistic. We moved retrieval-heavy traffic to a cheaper model without changing reply quality.
Document Q&A / RAGRetrieval-heavy · 65% cache
PMs use the embed on internal docs to sanity-check model spend before vendor requests — everyone shares the same workload baseline.
Platform / internal toolingMixed presets · stakeholder decks
Before scaling an agent workflow, comparing monthly cost across every provider for the same token mix avoided over-provisioning on day one.
Tool-calling agentAgent preset · multi-step I/O
Finance teams grasp token mix faster with one screenshot from the ranking table — useful when justifying a move off a default premium model.
Cost review / FinOpsBoard prep · usage doubles scenario
Quarterly Bedrock vs Vertex vs direct API reviews start here — normalize the math before opening vendor spreadsheets.
Cloud architecture reviewMulti-cloud comparison
The code-assistant preset was a realistic starting point for a copilot MVP; we adjusted tokens after a pilot week and stayed within 10% of the estimate.
IDE / code copilotCode preset · long context
Gemini Flash placed top three for our exact cache ratio on a high-volume FAQ bot — easy to miss in a static pricing table.
High-volume FAQSupport preset · high cache