Definition
The ongoing bill for every request your AI answers — a per-use “inference” charge — plus fixed costs for hosting, data, monitoring, and staff.
At a glance
- Cost scales with usage, not user count: every question reruns the model and costs fresh compute[2].
- Margins are thinner — roughly 50-65% gross vs 70-90% for mature software[1].
- The real bill is usually 2-3x the headline model price once you add hosting, data, monitoring, and staff[5].
- Spend is spiky: a viral moment can multiply your bill in one month.
How the bill works
Most products mix a fixed monthly fee with a variable per-use charge. Chatbot platforms run about $50-$200/month light, $300-$1,000/month growing, plus $1-$6 per resolved conversation[4]. Per conversation typically costs a few cents to tens of cents[1].
Why it costs more than the sticker
Mid-tier models run roughly $2.50-$3 per million input tokens and $15 per million output tokens in 2026[3]. But demand spikes are the real risk — one example jumped from ~$1,980 to ~$9,900 in a single month[4]. Budget for the spike, not the average.
What you can do
Prices have fallen sharply (about 80% across 2025-2026)[3]. Caching, batching, and using smaller models for simple tasks cut the per-use bill substantially[5].
Bottom line
A normal app is a car you buy once; an AI product is a taxi with the meter running — plan for a fixed base plus a variable bill that climbs with traffic.