Language Games and LLMs: What Wittgenstein Can Teach Us

LLMs learn language statistically, not through understanding. Wittgenstein's concept of "language games" provides a practical framework: treat every LLM interaction as a game with rules. Clear rules (role, goal, format, constraints) produce strong results. Unclear or missing context leads to hallucinations. Prompt design is game-rule design. RAG and tools anchor the game in reality. Evaluation means checking whether the model followed the rules. The core question for any team: what game do we want the model to play, and do we know the rules ourselves?

"The limits of my language mean the limits of my world."
— Ludwig Wittgenstein

The Problem with Unclear AI Expectations

I spent nine years studying theoretical computer science, especially formal language theory. Only later did it click that natural language, programming languages and even mathematics are all symbol systems we use to coordinate action, follow rules and align expectations.

I'm writing this because the conversation around AI has exploded. On one side, hype. On the other, existential dread. Meanwhile, most teams are already using large language models (LLMs) daily (often productively, sometimes dangerously) without a clear mental model of what these systems are or aren’t.

That’s normal. But unclear mental models create bad expectations. And bad expectations create bad systems.

This isn’t a technical deep dive. It’s a practical philosophical lens for everyday work with LLMs, useful whether you’re an engineer, PM, designer, researcher or manager.

Ludwig Wittgenstein (1889–1951) was an Austrian philosopher who reshaped how we think about language. His key idea was simple: words don't get meaning from perfect definitions but from how we use them in real situations, what he called 'language games.' That’s why he keeps coming up in conversations about software, teamwork and AI: many problems aren’t technical at all, but misunderstandings caused by words.

Why Philosophy Matters for Prompt Engineering

“20th-century philosophy as a practical manual for AI engineers?"

Yes — and not ironically. The key to understanding how large language models work and how they fail may not be in the latest OpenAI paper but in Ludwig Wittgenstein, published decades ago.

This isn’t an academic essay. It’s a practical way to think about prompts, requirements, testing and those painful moments when everyone uses the same word and still means different things. If you take one thing away, let it be this: we can treat LLM work as designing and testing “language games.”

What to Keep in Mind

Wittgenstein’s simple idea is that the meaning of a word isn’t a dictionary definition, but how we use it in a specific situation. He called these situations “language games.”

LLMs are machines that learned patterns from thousands of such games across a huge amount of text. That’s why they can look so smart even though they don’t understand the world as we do. They are masters of imitation. They excel when we give them clear rules (roles, goals and formats). They fail when they lack real-world context, which leads to hallucinations and errors.

Prompt design is really game-rule design; tools and RAG (Retrieval-Augmented Generation) are ways to anchor the game in reality; evaluation is checking whether the model followed the rules. The basic question for every team stays the same: what game do we want the model to play? And do we ourselves know the rules?

How LLMs Use Words as Tools

When I first heard “language game,” it sounded terribly academic. But Wittgenstein didn’t mean a game as entertainment. He meant an activity with rules, written or unwritten. His shift was to stop asking, “What does word X truly mean?” and start asking, “How do people use word X?” Words are tools in a workshop. A hammer doesn’t have one absolute meaning. Its meaning is what you do with it. You hammer nails, but in a pinch, you can also pry something open with it. Meaning is in use.

A small example makes this click. Someone asked ChatGPT whether a screwdriver is the same thing as a pry bar. The answer was brilliant: the primary purpose of a screwdriver is screwing, but humans sometimes use it as an improvised pry bar, even though it can get damaged. The model didn’t reach for a dictionary definition. It tracked different “games” where the tool appears: “electronics repair” versus “emergency opening of a paint can.” That is the point. In one setting, a word is a description; in another, it’s a move.

Wittgenstein, back then, even used a simple builder example: one worker shouts “Slab!” and the other hands it over. Same word, but now it functions like a command. Later, “Five slabs” becomes a report. If that happens on a construction site, it definitely happens in a Jira ticket.

A portrait of a man with dark hair and a serious expression, overlayed with abstract colorful shapes and patterns.

The Map That Never Walks

This later Wittgenstein view (language as social practice, not a mirror of reality) fits LLMs almost too well. They aren’t programmed with grammar rules. No one explained nouns to them. They learn statistically. As linguist John Firth said, “You shall know a word by the company it keeps.” LLMs do exactly that, just on a massive scale. They analyze billions of sentences and learn which words tend to appear next to each other. So they become perfect imitators: they can mimic the style and word combinations for many situations without actually knowing what they’re talking about.

That’s where the well-known “map” analogy helps. LLMs build an incredibly detailed linguistic map of the world. They are geniuses at reading that map, finding the shortest path between two cities, describing what mountains look like and writing a moving paragraph about the joy of meeting a dog. But they never stepped onto the real territory. They never felt that cold nose and wagging tail. All they have are words. So when the task is mostly about well-defined rules inside language, they can be great. When the task needs real-world grounding, they can be desperate.

When the Game Is Clear and When It's Not

You see the biggest difference exactly where you bump into the rules. Where the language game is well-defined, LLMs excel. For example, following instructions: “Act as a senior software engineer and write a code review for this TypeScript code. Focus on readability and potential race conditions.” Here, the rules are clear. The model knows the role, the goal, what to focus on and what kind of output is expected. The same is true when we define an API for it and tell it how to call it. We’re teaching a new, narrow game, and it can learn to play it quickly.

The problem starts when the rules aren’t clear or when the game requires knowledge that isn’t in the language itself. Knowledge of the physical world or of your company’s real systems. The “melt an egg in chicken soup” example shows this brutally. A model answered as if an egg would dissolve like sugar in tea. That’s obvious nonsense to anyone who has ever cooked because we have a model of the world: we know what protein does with heat. The LLM doesn’t. For it, there are just tokens: egg, melt, chicken, soup… Words that appear together in certain contexts. True and hallucinated statements can be indistinguishable.

Another classic example: which room is bigger, one table and one chair, or two tables and two chairs? The model may guess based on associations (“two tables” sounds like “bigger”), but it has no inner representation of space. It lacks that experience. And there’s another practical detail: every chat is a new game for it, and then it remembers nothing.

Now the question everyone asks: what does this mean for me as an engineer, product manager, designer or manager? Isn’t this just a philosophical excuse for why the model sometimes lies? The practical answer is the opposite: it gives you a strong mental framework. You can translate common techniques into this language. Prompt engineering isn’t black magic. It is literally designing the rules of the game: context, role, goal, output format and constraints. You don’t just tell it what, but also how.

And when it doesn’t know something, you often need to anchor it. That’s what RAG and tools are for: supplying data it doesn’t have and grounding the game in a shared, verifiable world. Give it access to internal documentation, a customer database, a status page and facts it can play with, not just patterns from its map. Evaluation (testing) becomes "Does it follow the rules?” Not only factual correctness but also format, tone, and whether it avoids topics it should avoid. Safety guardrails become forbidden moves, boundaries that it must not cross. Suddenly, these practices aren’t random tricks. They’re a systematic way to define and control the game you want to play.

One Meeting, Many Worlds

Let’s demonstrate it with something we all know: the classic argument over a new API design. A backend engineer, a frontend engineer and a product manager sit in a room. Each wants something different. Each is playing a different language game.

For the backend engineer, the game is efficiency, latency under 50 milliseconds and correct HTTP status codes. Their word “effective” means fewer database queries.
The frontend engineer plays a different game: “effective” means getting all data in one request, in the exact shape the UI needs, with minimal client mapping.
And then there’s the product manager, playing a third game: they don’t care about milliseconds or JSON; they care about user stories, business logic and whether the API will be easy to extend for a feature six months from now, the one the team doesn’t even know about yet. Their “effective” means it supports key user paths.

Now imagine you record that slightly heated meeting and give the transcript to an LLM: “Design an OpenAPI specification based on this discussion.” Where would it help and where would it fail?

It would help brilliantly in synthesis. It could summarize the arguments and generate a technically valid, clean spec, playing the “create technical documentation” game well.
The catch is the unspoken context. The model might not understand that the product manager sighs at minute 35, saying, “It’s a shame we can’t serve corporate clients now,” which is actually a strategy signal. Its world is limited to the transcript. And when it sees conflicting requirements, it may hallucinate a compromise that looks good on paper but is hell to maintain because it lacks awareness of your real system and codebase.

Questions That Prevent Illusions

So if you sit down tomorrow to build an LLM feature, the checklist almost writes itself.

What exactly is the language game we want the model to play, and are we clear on the goal?
What are the explicit and implicit rules? Tone, format, what to omit and what counts as “done”?
What context is missing? Internal docs, customer data, the last all-hands summary — how will we provide it via RAG or tools?
What does a “correct move” look like? What evidence and metrics will you use to judge success?

And three warnings you’ll keep relearning if you ignore them.

Never assume shared context. The model doesn’t have access to your head, your Slack or the coffee machine where the real decisions happen. (And honestly, humans also get this wrong.)
Don’t mistake fluency for understanding. The model can sound persuasive; that tells you nothing about truth. Always verify.
Don’t ignore missing grounding. If you don’t give it external data or tools, its “reality” is just a statistical reflection of texts. It’s on us to provide the wire to reality.

Languages That Leave Us Behind

Wittgenstein doesn’t give you a magic prompt. He gives you a more useful habit: treat language as a set of games, and treat yourself as the game designer. When you stop being angry at the model for being “stupid” and start asking whether you explained the rules and provided the grounding, the whole problem becomes more systematic. And strangely, more humble.

And one final thought that still gives us chills: if models start training heavily with themselves, could they develop language games anchored in their own digital world, not ours? From the engineering perspective, can they develop a new programming language that we as humans do not understand at all? If that happens, the question won’t be “who’s right,” but “who is playing which game with whom?” And that is no longer just a technical question.