notes > 2025-07-a-mental-model-of-llms

A mental model of LLMs

Part of the series "One month of LLMs"

This is my personal mental model of LLMs, as of July 2025. Like all mental models, it is a deliberate simplification. I'm a moderately sophisticated user of LLMs, but I'm no expert, so if I got anything wrong, please let me know.

The core of an LLM is a massive neural network, trained on much of the text of the Internet, that predicts the continuation of a sequence of text.¹

How do AI labs turn an internet text completer into a helpful digital assistant?

By fine-tuning the raw model into something that produces helpful responses, and by further defining the model's identity and personality with a system prompt – textual instructions prepended to the model's input. (Read Claude's here.)

You can imagine that the text that is fed into the neural network looks something like this:

System: You are a helpful digital assistant, etc. etc.
User: What is the capital of Portugal?
Assistant: The capital of Portugal is Lisbon.
User: What about Spain?
Assistant:

Two key points:

Everything is just text,² including the system prompt, which is why LLM-based systems are vulnerable to attacks like jailbreaking and prompt injection. Be careful giving an LLM untrusted input (PDFs, web search results, etc.) – the model cannot fully distinguish input data from user instructions.
The entire conversation is given to the LLM at each turn.

The second point is why you can seamlessly switch from one model to another in the middle of a conversation, or even edit the model's responses. (The ChatGPT UI doesn't let you do this, but you can do it from the API.)

It's also why models have context windows, limits on how long a conversation can be: because the whole exchange is re-submitted every time, long conversations become more and more expensive, even if the new messages are short. Model performance also tends to degrade on longer inputs.

The latest generations of LLMs can use external tools, like searching the Web or even directly editing files on your computer. Tool use actually works through quite a simple mechanism. When the model "decides" it wants to use a tool, it outputs something like {"tool": "web_search", "params": {"query": "capital of portugal"}}. Whatever program is driving the LLM receives this message, invokes the tool, and passes back to the LLM a message like {"result": {"web_results": [...]}}. It's all just text.

Ignoring multi-modal models, which I don't know anything about. ↩
Technically, the text is broken up into tokens that are then converted into word embeddings before the neural net actually consumes them. ↩