Definition
A large language model is software trained on huge amounts of text to predict the next word, which lets it generate human-like writing, answers, and code.
At a glance
- It does one thing: guess the next word, over and over. Everything it “knows” is a side effect of doing that well across trillions of words[4].
- It is a prediction engine, not a fact database. Confident, fluent, wrong answers (hallucination) are permanent, not a bug to be patched.
- Scale made it useful: billions of parameters trained on internet-scale text[3]. But bigger is not always better for your job.
- You rent a hosted model and pay per “token” (about 3/4 of a word) for text in and out. You almost never train one yourself.
How it works
Given “The capital of France is”, the model scores candidate words and writes the likeliest, “Paris”, then repeats[4]. To get good at this across the whole internet, it must absorb grammar, facts, styles, and code[1]. The fluency in ChatGPT or Claude is that single trick done extremely well[2].
Why it sounds certain when wrong
It picks the most plausible-sounding words, with no internal sense of true or false, so it states fabrications in the same confident tone as facts. The fix is how you use it: feed it your trusted documents at question time (retrieval) and keep a human reviewing anything high-stakes.
What it means for buying
You are renting a general prediction engine billed per token. At scale, model size and caching can swing the bill enormously. Training your own from scratch costs tens of millions and needs research teams[4]; nearly every business should instead use a hosted model and compete through its data and safeguards[3].
Bottom line
An LLM is a next-word predictor that scaled into a brilliant, fast, confidently fallible assistant — rent one, ground it in your data, and put guardrails around it.
References
- What Are Large Language Models (LLMs)? IBM www.ibm.com
- Transformers, the tech behind LLMs (Deep Learning Chapter 5) — Grant Sanderson. 3Blue1Brown www.3blue1brown.com
- Reflections on Foundation Models. Stanford Center for Research on Foundation Models (CRFM) crfm.stanford.edu
- Language Models are Few-Shot Learners (GPT-3) — Tom B. Brown, Benjamin Mann, Nick Ryder, et al.. arXiv arxiv.org
- King - Man + Woman = Queen: The Marvelous Mathematics of Computational Linguistics. MIT Technology Review www.technologyreview.com
Comments
Questions, corrections, and links welcome. Be specific and civil.