There’s a common story about AI hallucination: it’s mostly a prompting problem. Give the model better context, the made-up facts go away. It’s a comforting story — tweak the prompt, fix the output, stay in control. The reality is messier. Hallucination has layers, and context is just the top one.
Context matters, but it’s not the whole picture
Bad context absolutely causes hallucination. When your prompt is vague, too long, full of noise, or carries wrong assumptions from the start, the model latches onto the wrong signals. It fills gaps with plausible-sounding nonsense because that’s what it’s optimized to do — produce coherent text, not verified truth.
Ask “compare it to the older version” without saying what “it” is, and the model will guess. Sometimes it guesses right. Sometimes it confidently describes a comparison between two things that never existed. Stuff too many documents into a limited context window and the model drops important bits or stitches together fragments that don’t belong together — the output sounds coherent, the facts are wrong.
Context hygiene matters. But even with a perfectly crafted prompt, hallucination still happens, because the problem goes deeper than prompts.
The three layers of hallucination
Layer 1: Prompt and context. This is the one everyone talks about. Vague instructions, conflicting information, missing constraints. Fixable: write clearer prompts, remove noise, be specific about what you want and what you don’t want.
Layer 2: The model itself. This is the uncomfortable one. LLMs don’t look up facts in a database. They predict the next most probable token given the current context. The whole training objective is “produce text that looks right” — not “produce text that is right.” A fluent, confident answer and a factually correct answer aren’t the same thing, and the model is optimized for the first.
On top of that, LLMs are stateless. Every request, the model only sees the tokens you send in that exact request. Want it to remember earlier conversation? You resend the whole history. Anything that’s missing, it guesses, and guesses compound. Context windows are finite too — once the problem gets complex enough, something has to drop, and the model doesn’t tell you what it forgot. It just keeps going, stitching reasoning from whatever fragments remain. Research backs this up: longer context often means worse reasoning.
Layer 3: Training data and external knowledge. This is the foundation. Training data that’s incomplete, biased, or flat-out wrong means the model learned incorrect correlations from day one. Outdated knowledge means it confidently answers questions about events that happened after its training cutoff. Overfitting means it memorized patterns that don’t generalize — give it a slightly different input and reasoning drifts. Knowledge in an LLM is compressed into weights, not anchored to specific sources: the model “knows” something but has no idea where it came from. And it’s wired to always answer, never say “I don’t know” — so when there’s a gap, it fills it with something that sounds reasonable.
What this means in practice
I’ve stopped asking “is this hallucination?” and started asking “which layer did this come from?”
When Claude Code confidently invents a function that doesn’t exist in the codebase, that’s usually Layer 1 — not enough context about what’s actually in the project. When it gives a perfectly reasonable-sounding explanation for a bug that’s completely wrong, that’s often Layer 2 — the model constructed a coherent narrative from probabilistic token selection, not from understanding the actual runtime state. When it cites a library feature that doesn’t exist or an API deprecated two years ago, that’s Layer 3 — a training-data issue or outdated knowledge.
Knowing which layer is at play changes the response. Layer 1 problems are fixed by improving the prompt. Layer 2 problems are caught by reading every line of generated code and asking “why” instead of just “how.” Layer 3 problems are solved by feeding the model relevant documentation in the context — RAG, effectively — instead of trusting its built-in knowledge.
What actually helps
Nothing eliminates hallucination entirely, but a few practices meaningfully reduce it:
- Be specific in prompts. Name the entities, timeframe, scope, criteria. “Write a function” is a hallucination invitation. “Write a TypeScript function that validates email format using RFC 5322, returning
{ valid: boolean, reason?: string }” is much safer. - Keep context lean but sufficient. Remove noise, contradictions, approaches you already decided against. Every extra piece of information is another chance for the model to get confused about what matters.
- For anything requiring factual accuracy or recent data, feed the sources directly into the context. Don’t ask the model what it “knows” — it doesn’t know, it predicts. Give it the documents and tell it to answer from those documents only.
- Ask for sources and reasoning chains — not because the model can verify truth (it can’t), but because seeing its reasoning makes it easier to spot where it went wrong. A hallucination with a visible reasoning chain is a bug you can debug. A hallucination with no reasoning is just a claim you might believe.
- Read the output. All of it. The easiest step to skip, and the most important.
The bottom line
“Hallucination is just bad context” is a comforting story — it implies complete control if we just write better prompts. The less comfortable truth is that hallucination is built into how LLMs work. Next-token prediction produces text that looks right, not text that is right. Training data has gaps and biases. Knowledge goes stale. The model can’t verify facts, and won’t say “I don’t know” on its own. Better prompts help, better architecture helps, RAG helps. But the real skill isn’t preventing hallucination — it’s the habit of verifying everything, even when the answer sounds perfect. The hallucinations that bite hardest aren’t the ones that look wrong — they’re the ones that look right.
Comments