Definition
An AI training cluster is a single, tightly connected facility holding tens or hundreds of thousands of specialized chips (GPUs) that train large AI models together.
At a glance
- Ranked two ways: chip count (GPUs) and electrical power. One gigawatt powers roughly 750,000 homes.
- xAI’s Colossus (Memphis) jumped to 200,000+ chips in under a year, targeting 1 million.[2]
- Meta’s Prometheus (Ohio) is billed as the first 1-gigawatt AI data center, due in 2026.
- Each campus costs tens of billions and often builds its own power plant.
How it works
A cluster is a warehouse-sized building, not a single computer, packed with rows of GPUs wired together so they train one model at once. More chips plus more power means bigger, faster models. Power is the real bottleneck, and top sites plan for several gigawatts each.[5]
The leaders
- xAI Colossus (Memphis) — 200,000+ GPUs today, ~2 GW planned.
- Meta Prometheus (Ohio) — first 1-gigawatt AI data center, ~500,000+ GPUs, online 2026.[4]
- OpenAI Stargate (Abilene, TX) — 450,000+ Nvidia GB200 GPUs, ~1.2 GW, first buildings live 2025.[3]
- Meta Hyperion (Louisiana) — city-sized campus scaling to 5 GW over several years.
How to read it
Treat the numbers as moving targets. Firms announce capacity years before hardware ships, so a “5-gigawatt” site may run only a fraction today.[1] The reliable signal is the direction: relentlessly up.
Bottom line
The race for the largest cluster is a race for chips and power at once, and a small city’s worth of electricity is now the price of competing at the frontier.