How AI Actually Works
What are LLMs? How do they generate text? What’s the difference between training and inference? A no-BS explanation of what’s happening under the hood.
It’s Pattern Matching — Not Thinking
AI doesn’t “understand” anything. A Large Language Model (LLM) is a massive mathematical function that predicts the next word (technically, the next token) based on all the words before it. It was trained on billions of documents — books, websites, code, conversations — and learned statistical patterns about how language works.
When ChatGPT gives you a coherent answer, it’s not reasoning from first principles. It’s generating the most statistically likely continuation of your prompt, one token at a time. The result looks like understanding because the patterns it learned are incredibly nuanced.
The Transformer Architecture
Almost every modern AI model is built on the transformer architecture, introduced by Google in 2017. The key innovation is the attention mechanism — the model can look at every word in the input simultaneously and figure out which words are most relevant to each other.
Before transformers, models processed text sequentially (word by word), which was slow and lost context over long passages. Attention lets the model “attend to” distant parts of the text, which is why modern LLMs can handle long conversations and complex instructions.
- Encoder — reads and understands the input (used in BERT-style models for classification)
- Decoder — generates output text (used in GPT, Claude, Gemini for text generation)
- Encoder-Decoder — translates input to output (used in T5, translation models)
Training vs Inference
Training is the expensive part. Companies like OpenAI, Anthropic, and Google spend tens of millions of dollars running massive GPU clusters for months to train a model. The model reads billions of documents and adjusts billions of internal parameters (weights) to minimize prediction errors.
Inference is what happens when you use the model. Your prompt goes in, the model runs its math, and text comes out. This is much cheaper per query but still requires serious hardware — which is why API access costs money.
Pre-training → Fine-tuning → RLHF
Modern LLMs go through multiple training stages:
- Pre-training: Learn general language patterns from massive datasets (internet text, books, code)
- Supervised Fine-Tuning (SFT): Learn to follow instructions by training on human-written example conversations
- RLHF (Reinforcement Learning from Human Feedback): Humans rank model outputs, and the model learns to prefer responses humans rated higher
This is why raw base models (before fine-tuning) are weird — they’ll complete text but won’t follow instructions naturally. The fine-tuning and RLHF stages turn a text-completion engine into a helpful assistant.
What LLMs Can’t Do
- Access the internet (unless given tools/plugins)
- Remember past conversations (each session is independent)
- Do reliable math (they predict tokens, not calculate — though tool use helps)
- Know if they’re wrong (they generate confident-sounding text regardless of accuracy)
- Reason about novel problems (they interpolate from training data patterns)