Definition
The ongoing bill for every request your AI answers — a per-use “inference” charge — plus fixed costs for hosting, data, monitoring, and staff.
At a glance
- Cost scales with usage, not user count: every question reruns the model and costs fresh compute[2].
- Margins are thinner — roughly 50-65% gross vs 70-90% for mature software[1].
- The real bill is usually 2-3x the headline model price once you add hosting, data, monitoring, and staff[5].
- Spend is spiky: a viral moment can multiply your bill in one month.
How the bill works
Most products mix a fixed monthly fee with a variable per-use charge. Chatbot platforms run about $50-$200/month light, $300-$1,000/month growing, plus $1-$6 per resolved conversation[4]. Per conversation typically costs a few cents to tens of cents[1].
Why it costs more than the sticker
Mid-tier models run roughly $2.50-$3 per million input tokens and $15 per million output tokens in 2026[3]. But demand spikes are the real risk — one example jumped from ~$1,980 to ~$9,900 in a single month[4]. Budget for the spike, not the average.
What you can do
Prices have fallen sharply (about 80% across 2025-2026)[3]. Caching, batching, and using smaller models for simple tasks cut the per-use bill substantially[5].
Bottom line
A normal app is a car you buy once; an AI product is a taxi with the meter running — plan for a fixed base plus a variable bill that climbs with traffic.
References
- Unit economics for AI SaaS companies: A CFO guide for managing token-based costs and margins. Drivetrain www.drivetrain.ai
- Inference Cost Explained: How to Reduce LLM & AI Inference Spend. CloudZero www.cloudzero.com
- LLM API Pricing 2026: OpenAI vs Anthropic vs Gemini Live Comparison. CloudIDR www.cloudidr.com
- How Much Do AI Chatbots Cost? Estimates for 2026. Crescendo.ai www.crescendo.ai
- AI Infrastructure Costs: A Practical Guide. Cake AI www.cake.ai
Comments
Questions, corrections, and links welcome. Be specific and civil.