Instructions
How to use the AI API cost calculator
Compare 63 models across 11 providers in four steps. Set one workload, read the monthly estimate, rank every model, and tune cache-aware token math. Pricing reference updated Jun 2026.
Quick start (under 2 minutes)
Workload presets
Presets fill messages, tokens, and token-mix defaults. Click one on the calculator, then adjust if your traffic differs.
Support bot
High volume, short replies
50,000 / mo · 800 avg
Customer support, FAQ bots, ticket triage
Code assistant
Medium volume, long context
5,000 / mo · 4,000 avg
IDE copilots, PR review, refactors
RAG / search
Retrieval-heavy prompts
10,000 / mo · 2,500 avg
Document Q&A, knowledge bases, search-augmented apps
Agent / tools
Multi-step, mixed I/O
2,000 / mo · 8,000 avg
Tool-calling agents, workflows, multi-turn reasoning
01
Set your workload
Start with a preset that matches your app, or enter your own message volume and average tokens per message.
- Messages per month = API calls or chat turns (one user request + model reply).
- Tokens per message = prompt + completion size. Use your analytics average, or start with a preset.
- Presets pre-fill input/output and cache ratios for common product shapes.
02
Review the live estimate
The calculator shows estimated monthly spend for your selected model and where it ranks against every other priced model.
- Monthly cost updates instantly when you change workload inputs.
- Token breakdown splits input, cached input, and output spend.
- Rank shows position vs all models — #1 is cheapest for the same workload.
03
Compare all models
Use the ranking table and top picks to find the lowest-cost model that still fits your quality bar.
- Filter by provider (OpenAI, Anthropic, Google, Mistral, etc.).
- Search by model name. Click any row to inspect it in the estimate panel.
- Compare headline list prices fairly — one workload baseline for every provider.
04
Fine-tune token mix (optional)
Open Advanced token mix when RAG, agents, or long-context apps need a more accurate split.
- Input vs output ratio — share of tokens sent as prompt/context vs model reply.
- Cached input ratio — share of prompt tokens served from provider cache (lower rate when available).
- Per-message token summary updates live so you can sanity-check the math.
Embed on your site
Add the calculator to docs, internal tools, or landing pages with the embed widget. It loads the same live pricing engine as the main site.
FAQ
Common questions about using the calculator
ready to compare?
Run your workload through 63 models now
Same baseline, every provider — updated Jun 2026.