Instructions

How to use the AI API cost calculator

Q: What counts as one message in the calculator?

One message is one API call or chat turn: the user prompt plus the model response combined. If your app sends follow-ups in the same thread, count each turn separately unless you batch into a single API call.

Q: How do I estimate average tokens per message?

Check your provider usage dashboard for average input + output tokens per request. If you are planning a new product, start with a preset (Support bot, Code assistant, RAG, or Agent) and adjust after a pilot week of real traffic.

Q: When should I change the cached input ratio?

Raise it when you reuse long system prompts, RAG chunks, or conversation history — common in support bots and agents. Lower it for mostly fresh prompts. If a model has no cached rate, the calculator falls back to normal input pricing.

Q: Can I compare OpenAI and Anthropic on equal terms?

Yes. Every model is priced against the same messages/month, tokens/message, and token mix. The ranking shows who is cheapest for your workload, not who has the lowest headline per-million rate in isolation.

Q: How often should I re-run the calculator?

Pricing is validated weekly (last update Jun 2026). Re-run when you change models, traffic grows, or vendors publish new tiers. Always confirm on official provider pages before contractual commitments.

Compare 63 models across 11 providers in four steps. Set one workload, read the monthly estimate, rank every model, and tune cache-aware token math. Pricing reference updated Jun 2026.

Open calculator →Jump to steps

Workload presets

Presets fill messages, tokens, and token-mix defaults. Click one on the calculator, then adjust if your traffic differs.

Support bot
High volume, short replies
50,000 / mo · 800 avg
Customer support, FAQ bots, ticket triage
Code assistant
Medium volume, long context
5,000 / mo · 4,000 avg
IDE copilots, PR review, refactors
RAG / search
Retrieval-heavy prompts
10,000 / mo · 2,500 avg
Document Q&A, knowledge bases, search-augmented apps
Agent / tools
Multi-step, mixed I/O
2,000 / mo · 8,000 avg
Tool-calling agents, workflows, multi-turn reasoning

Set your workload

Start with a preset that matches your app, or enter your own message volume and average tokens per message.

Messages per month = API calls or chat turns (one user request + model reply).
Tokens per message = prompt + completion size. Use your analytics average, or start with a preset.
Presets pre-fill input/output and cache ratios for common product shapes.

Review the live estimate

The calculator shows estimated monthly spend for your selected model and where it ranks against every other priced model.

Monthly cost updates instantly when you change workload inputs.
Token breakdown splits input, cached input, and output spend.
Rank shows position vs all models — #1 is cheapest for the same workload.

Compare all models

Use the ranking table and top picks to find the lowest-cost model that still fits your quality bar.

Filter by provider (OpenAI, Anthropic, Google, Mistral, etc.).
Search by model name. Click any row to inspect it in the estimate panel.
Compare headline list prices fairly — one workload baseline for every provider.

Fine-tune token mix (optional)

Open Advanced token mix when RAG, agents, or long-context apps need a more accurate split.

Input vs output ratio — share of tokens sent as prompt/context vs model reply.
Cached input ratio — share of prompt tokens served from provider cache (lower rate when available).
Per-message token summary updates live so you can sanity-check the math.

Embed on your site

Add the calculator to docs, internal tools, or landing pages with the embed widget. It loads the same live pricing engine as the main site.

FAQ

Common questions about using the calculator

What counts as one message in the calculator?

How do I estimate average tokens per message?

When should I change the cached input ratio?

Can I compare OpenAI and Anthropic on equal terms?

How often should I re-run the calculator?

ready to compare?

Run your workload through 63 models now

Same baseline, every provider — updated Jun 2026.

Open calculator →Read methodology