Definition

The ongoing bill for every request your AI answers — a per-use “inference” charge — plus fixed costs for hosting, data, monitoring, and staff.

At a glance

Cost scales with usage, not user count: every question reruns the model and costs fresh compute^[2].
Margins are thinner — roughly 50-65% gross vs 70-90% for mature software^[1].
The real bill is usually 2-3x the headline model price once you add hosting, data, monitoring, and staff^[5].
Spend is spiky: a viral moment can multiply your bill in one month.

How the bill works

Most products mix a fixed monthly fee with a variable per-use charge. Chatbot platforms run about $50-$200/month light, $300-$1,000/month growing, plus $1-$6 per resolved conversation^[4]. Per conversation typically costs a few cents to tens of cents^[1].

Why it costs more than the sticker

Mid-tier models run roughly $2.50-$3 per million input tokens and $15 per million output tokens in 2026^[3]. But demand spikes are the real risk — one example jumped from ~$1,980 to ~$9,900 in a single month^[4]. Budget for the spike, not the average.

What you can do

Prices have fallen sharply (about 80% across 2025-2026)^[3]. Caching, batching, and using smaller models for simple tasks cut the per-use bill substantially^[5].

Bottom line

A normal app is a car you buy once; an AI product is a taxi with the meter running — plan for a fixed base plus a variable bill that climbs with traffic.

What does it cost to run an AI product?

At a glance

How the bill works

Why it costs more than the sticker

What you can do

Bottom line

References