Issue: #1
Reading time: ~10 min
Level: No prior AI knowledge required
AI Contribution: This post was written with the assistance of Claude. The ideas and direction are my own; the AI helped compile and draft the content.


I use “model,” “LLM,” and “AI” interchangeably. So does almost everyone I work with — including people who spend a lot of time thinking about this. In the flow of a meeting or a Slack message, it usually doesn’t matter. But occasionally I catch myself saying something like “the AI doesn’t know about that” or “the model gets confused when…” and I notice I’m gesturing at something fuzzy. There’s a concept I’m circling but not quite landing on.

This issue is an attempt to land on it. Not because terminology precision is a virtue in itself — it isn’t — but because having a clearer picture of what an LLM actually is changes the questions you think to ask. And for a finance professional evaluating AI tools or making decisions about where AI fits in their firm’s workflows, asking the right questions is most of the job.

So: what is a large language model? What is “AI,” in the way people use the term today? Are these the same thing? The short answer is no. The longer answer is the rest of this issue.


“AI” in 2026 Mostly Means One Thing

The term “artificial intelligence” has been around for decades and has meant different things in different eras — expert systems in the 1980s, machine learning classifiers through the 2000s, image recognition in the 2010s. When people say “AI” today, though, they almost always mean something specific: generative AI. AI that produces outputs — text, images, audio, code, video — in response to inputs.

This is worth stating plainly because it clarifies what most conversations are actually about. When your compliance team worries about “AI risk,” when your technology colleagues propose “an AI solution,” when a vendor pitches “AI-powered analytics” — they are almost certainly talking about generative AI. The technology that can write a memo, summarise a document, answer a question, or produce a first-cut analysis.

Generative AI is interesting and genuinely different from earlier AI systems because of what it generates: open-ended outputs in human language. Earlier AI systems were generally better described as classifiers or optimisers — they could tell you whether an email was spam, which route was fastest, or whether a credit application met a threshold. Generative AI produces something new in response to each input. That shift is why the technology has broken so visibly into public life.


The Model Is the Engine

Inside every generative AI application — ChatGPT, Claude, Copilot, Gemini, the internal tool your firm may have built — there is a model. The model is doing the essential work.

The analogy I keep coming back to is a car engine. A car is a complex system: bodywork, transmission, electronics, fuel delivery, user interface. But the engine is its defining component. Every important characteristic of how the car performs — its power output, its efficiency, its response to the accelerator — is fundamentally determined by the engine. The rest of the car matters, and a great engine in a badly engineered car is still a bad car. But you cannot meaningfully evaluate the car without understanding the engine.

An AI application works the same way. The application is a suite of software: a chat interface, network connectivity, document handling, memory management, safety filtering, and a hundred other components. All of this is real engineering and it matters. But the model is the engine. The quality of the outputs, the range of what the system can do, the failure modes and limitations — these are mostly a function of the model.

This distinction matters practically. When a company announces it has “upgraded its AI,” the improvement often comes from a change to the underlying model — though not always; application-layer changes can shift the experience meaningfully too. When two products claim to run the same model, their outputs will tend to be broadly comparable regardless of how different the interfaces look — though the application built around the model shapes what you actually encounter. Either way, getting into the habit of asking which model a tool runs tends to be a more useful starting point than asking whether it has “good AI.”

One extension of the analogy is worth making here. A Ferrari hypercar and your father’s 1996 Camry are both cars with engines — but driving the Camry doesn’t tell you much about what the Ferrari can do, beyond the basic principles of how a car works. AI models have the same dynamic. The models most people encounter first tend to be free-tier versions: lighter, less capable, built for accessibility rather than peak performance. Frontier models sit behind paid tiers and are genuinely more capable at complex analytical and reasoning tasks — the kind a finance professional might actually want to test. Writing off a category of capability based on a free-tier experience is a bit like concluding that cars aren’t fast because your only test drive was a Camry.


What a Model Actually Is

So what is the model, technically? At its core, it is a very large mathematical function — a complex set of equations that takes an input and produces an output.

If you work in quantitative research or risk, you will recognise the basic idea immediately. A multi-factor model takes a set of inputs — macroeconomic variables, sector exposures, style factors — and produces an estimate of expected return or risk. The model is, at bottom, a set of equations with coefficients: the weights that determine how much each input contributes to the output.

A large language model is a cousin of this. It takes an input — your prompt, a question, a document — and produces an output: a continuation of text that the model has estimated is the most plausible response.

The core intuition is probabilistic. When you say something to another person, there is a range of reasonable responses they might give. You cannot predict exactly what they will say, but you would be surprised if their response had nothing to do with what you said. The space of plausible replies is constrained by the input. An LLM has learned to model that relationship — given this input, what outputs are plausible? — and generates accordingly.

Here is where the comparison to a finance model becomes interesting, and where language models depart from what you might expect.

In a conventional multi-factor model, the factors are chosen by an analyst. You decide that size, value, momentum, and quality are relevant, define them precisely, and estimate their coefficients from historical data. The model structure is human-defined; the coefficients are data-fitted.

Language is complex enough that earlier attempts to encode it as explicit rules — grammar parsers, manually defined semantic structures — ran into significant limitations. A large language model does not work this way. The internal representations — the equivalent of the “factors” — are not given to it. They are discovered during training. The model is exposed to an enormous corpus of text and, through a process of optimisation, develops its own internal structure: the grammatical patterns, the semantic relationships, the contextual dependencies that let it predict what should come next. No human specified these representations. The model found them.

This is both what makes LLMs so capable and what makes them opaque. The model has developed internal structure that works remarkably well, but which we cannot fully inspect or describe in the way we can inspect a regression’s coefficients. We do not know exactly why the model produces a given output in the way we know why a factor model produces a given estimate. That opacity has real implications for reliability, auditability, and the kinds of governance questions that matter acutely in a financial institution.

It is also part of why we say AI can make mistakes that are difficult to anticipate. When a model produces an incorrect output, we can observe what it said but cannot fully trace the process that produced it. We are left reverse-engineering from the output, which is an imperfect basis for either catching errors or explaining them. Hallucination — the phenomenon of a model generating text that is plausible in form but factually wrong — is a direct consequence of this structure, and a subject worth its own issue.

Parameters and weights
The internal representations an LLM develops during training are encoded in its parameters, also called weights — a vast set of numerical values that determine how the model processes any given input. Modern LLMs have hundreds of billions of parameters. When you hear “a 70 billion parameter model,” this is what those parameters are. More parameters generally means more capacity to learn complex patterns, though the relationship between size and capability is not straightforward.

Untangling the Terminology

With that background, the terminology becomes easier to keep straight.

Generative AI is the broad category: AI systems that generate content. Text, images, audio, and video generation are all generative AI. The term describes what kind of output is produced, not how.

A model is the mathematical engine doing the generating. Every generative AI application runs on one. When someone says “the model,” they mean this component specifically — the trained set of parameters that processes inputs and produces outputs.

A large language model (LLM) is a specific type of model: one trained on text, at scale, to understand and generate language. “Large” refers to the scale of training — vast amounts of text, vast numbers of parameters. “Language” signals that the primary medium is text, though modern LLMs increasingly handle images, audio, and other inputs alongside it. “Model” is the same concept as above.

The relationship, then: LLMs are a category of model. Models are the engine inside generative AI applications. Generative AI is what most people mean when they say “AI” in 2026.

The terms get conflated because, in most practical conversations, they point at the same system from different angles. “I asked the AI to summarise this” and “I ran this through the model” are usually interchangeable because both refer to the same underlying thing. But the distinction matters when you start asking which part of the system is responsible for what — and that is precisely the question that comes up in technology evaluation, vendor assessment, and risk governance conversations.


What the Model Is Not Responsible For — and What Is

The engine analogy has one more use: it draws attention to what belongs to which layer when something goes wrong. The answer is rarely simple.

When a chatbot gives you a confident but incorrect answer about your firm’s internal policy, the failure might be the model — genuinely generating plausible-sounding content that happens to be wrong. Or it might be the application: the model was never given access to your policy documents in the first place. Or it might be the input: the user assumed the system knew the relevant context and did not provide it. Asking the model for yesterday’s NASDAQ close, for instance, is asking it for something it structurally cannot know — real-time market data requires a separate data connectivity layer that most general-purpose AI applications do not have by default. All three failures look similar from the outside, but they have different causes and different solutions.

The model is necessary but not sufficient. Understanding what it does and does not determine is the beginning of being able to evaluate these systems clearly — and to ask the right questions when they fail.


Takeaway

A large language model is a very large mathematical function — trained on enormous amounts of text, at a scale that allows it to discover its own internal representations of language — that produces plausible text in response to inputs. It is not a database, not a search engine, and not a mind. It is a powerful statistical model that has learned the patterns of language well enough to generate coherent, contextually appropriate text.

“AI,” as the term is used today, mostly means applications built on top of models like this. The model is the engine; the application is the car. Both matter, but they are different things, and conflating them leads to the wrong questions.

The more useful framing for any AI tool is not “is the AI good?” but: what model is it running? What has the application built on top? And what does that imply for what the system can and cannot do reliably?

Those questions have tractable answers. Starting there is more productive than the alternative.


Further Reading

  • Introduction to Large Language Models — Google (Machine Learning Crash Course). Google’s own explanation of LLMs, built around clear examples. Slightly technical in places but should be accessible to a non-specialist. A reliable starting point for readers who want to go further on the mechanics.

  • How Large Language Models Work — Andreas Stöffelbauer (Microsoft, via Medium). A more substantive explanation built from the ground up — covering machine learning and deep learning before arriving at LLMs. A longer read, but one of the clearer treatments available. Recommended for readers who want to understand the architecture, not just the concept.

  • AI Can’t Explain How AI Works — CGP Grey (YouTube). An accessible and genuinely enjoyable introduction to how AI systems learn, produced several years before the current wave of public interest. The core insight — that these systems learn without being explicitly programmed with rules — is explained with characteristic clarity. Good first viewing for anyone who hasn’t encountered the idea before.


The AI Weekender publishes weekly. If you found this useful, share it with a colleague who would benefit.