This is the introduction to a series on my aspirations to create an interesting language model controlled character.

  1. The lifecycle of language models
    1. childhood (pretraining)
    2. adolescence (post-training)
    3. adulthood (release & inference)
    4. reproduction (finetuning, distillation)
  2. Adopting an LLM
  3. LLM grooming
    1. Prompt engineering—narrative brush
    2. RAG and scratchpads—associative memory
    3. Representation engineering—silent pressure
    4. Softprompts—finetuning lite
    5. Self-critique—the inner monologue
  4. LLM enrichment—creating a suitable enclosure
    1. LLM toys—agent frameworks
  5. Where we’re going with all this

Let me begin with a niche tangent, though…

care and feeding snowclone

The ‘care and feeding’ snowclone apparently dates at least back to the 1600s, where you get quotes like this

where all are equally concerned, they should be equally satisfied in the choice of such, as to whom they commit the Care and Feeding of their Souls.

in christian writing, which suggests it was already a cliché. It truly exploded in the late 1800s, probably more because there were just more books—in any case, it peaked around 1917 and persists to this day. So I am partaking of, shall we say, a storied tradition with this goofy title.

Now, large language models. They exist! They’re going to stick around, most likely. The true believers see them as a path to ‘superintelligence’, and while I’m less convinced of that (and if I’m wrong fuck all I can do about it), they’re certainly remarkably powerful little gadgets. They invite a very different paradigm of interacting with computers, one which is effectively non-deterministic (not actually, but we can pretend PRNGs are random), natural language, and far more vibes-based than computing has been before. Soliciting ‘good’ output from an LLM is much more an art or ritual practice than a science at this point, despite being dressed up in lots of scientific language like ‘prompting strategies’ and ‘logits’.

I decided at some point that I wanted to understand them instead of trying to have nothing to do with them, and I’ve spent the last few months on something of an obsessive deep dive on machine learning. But this won’t be a post about the mathematics of why they work. If you want to get the ins and outs of the field’s jargon—the attention mechanism, latent spaces, manifold hypothesis, free energy principle, cross-entropy, etc., and how artificial neural nets are like and unlike biological ones, there are many excellent sources like 3blue1brown, Stephen Wolfram, Artem Kirsanov, Welch Labs and AVB. Seriously, there’s hours of extremely good and well presented material here. Dig in.

Instead, I want to focus more on the qualitative aspects of LLMs. The goal of this is to build up to the state of the art in running LLMs locally, using the entire bag of tricks to get a language model that feels unique and does something interesting, most likely for games.

The lifecycle of language models

So, you may know what a language model does. It is given a sequence of tokens (fragments of words) which we call a prompt, and it eventually generates a list of numbers, which we interpret as the ‘log probability’ of every possible token that might come next in the sequence. By repeatedly polling the model, and appending what it thinks is a likely next token, you can stochastically generate a plausible, contextually relevant string of text.

But how does it get there?

childhood (pretraining)

A large language model begins with a lot of text (these days typically a scrape of the entire internet) which we call the dataset, a set of random floating point numbers which we call weights, an architecture which tells you how to multiply the weights together to produce an output, and a “training” process.

First comes the pretraining—the P in ‘GPT’. All that data is fed into the model, token by token. Using a technique called backpropagation, the model ‘learns’ to predict the most likely next token given the previous string of tokens fed into it. Weights that make the probability for the correct next token higher are boosted, and weights that make the probability lower are suppressed. This is repeated for every piece of text the developers can get their hands on.

The result is something called a base model or foundation model. It’s a compressed representation of the linguistic patterns in the text. The compression is crucial: this is what allows the model to find general patterns and predict text that wasn’t anywhere in its training set. Pretraining, with its vast quantity of data, is typically the most expensive part of building a model. But we’re just getting started.

Base models are very hard to steer. Getting the sort of output you want is a complicated process of trial and error: you need to come up with an input that makes the desired output the most likely continuation.

adolescence (post-training)

So, the next stage is to turn it into an instruct-tuned model. This involves further training the model on numerous examples of instruction-following interactions, along the lines of…

User: Please calculate the indefinite integral of x^2 with respect to x.
Assistant: Certainly! The integral of $x^2$ with respect to $x$ is $$\int x^2 dx = \frac{x^3}{3}+c$$.

which makes the model generally very likely to do what it’s told, or at least something that looks like what it’s told. This makes the model much easier to control (you can literally just tell it what to do in words), but comes at the potential cost of reducing entropy, i.e. the model’s output will be less varied and surprising than it could be, and more likely to fall along similar lines.

The third stage is various types of reinforcement learning where the bot is taught not just to correctly predict the next token, but to make ‘better’ output according to some criteria. This might involve making the bot try to solve various difficult problems and boosting the weights that contribute to a correct answer while suppressing ones that lead to an incorrect answer. Or, they may employ RLHF (Reinforcement Learning with Human Feedback), where different model outputs are examined and rated by humans to determine which one is better, and RLAF (Reinforcement Learning with AI Feedback), also known as Constitutional AI, in which the model’s language-understanding capability is used to rate the model’s output according to some criterion (“determine which of these replies best reflects the principle of universal human rights”, etc. etc.).

However you calculate it, reinforcement learning operates on a binary: output is either good or bad, and ‘good’ output is encouraged, while ‘bad’ output is discouraged. It’s a very fiddly technique, though, because the model might pick up on hidden shortcuts or quirks that aren’t actually what you want it to learn. But at the time of writing, it’s driving the most successful ‘reasoning’ models like DeepSeek-R1.

These two steps together are known as post-training. They aim to turn the chaotic base model into something kinda predictable… but not too predictable, because otherwise it’s useless. This is what turns ‘GPT-3’ into ‘ChatGPT’, and gives you ‘chatbot-style’ interaction with the bot.

Generally speaking, the AI labs have collectively settled on an ideal ‘personality’ to cultivate in their models that is a ‘helpful, harmless and honest’ (HHH) assistant, and most instruct models are trained with this in mind, modulo various quirks. But as is widely observed in the milieu, the base model is in principle capable of simulating the ‘dynamics’ of any kind of text, a capability that is awkwardly papered over by the post-training. By using specially crafted prompts known as jailbreaks, users have figured out how to excite behaviour modes in the models that in some way break from that intent. We’ll talk more about this later.

adulthood (release & inference)

Once a model is trained, you’re left with (typically) a few billion floating point numbers, the weights. Models are generally described by how many weights they have: for example, ‘Llama-70B’ is a model with about 70 billion weights. Once this data exists, many different programs can use it to generate output—something that is generally called inference. Any computer can, in theory, run an LLM. But typically, to run a model fast enough to be useful, you need a lot of RAM and a powerful GPU.

So, the trained model is deployed on powerful computers with a lot of RAM and many high-end GPUs (or even specialist TPUs). Companies known as ‘inference providers’ will run models on their servers, and sell the use of them by time or by the token. Bigger model = more computation per token = more expensive. The inference providers expose an API that you can use to send in a prompt and get text back, for whatever depraved purpose you have in mind. But not quite whatever depraved purpose—after all, they see whatever you send to the model, and if they don’t like it, they can cut you off.

LLMs, by default, have no memory and do not learn after training. Every time you start a fresh conversation with an LLM, the model is effectively encountering you for the first time. We’ll be talking about ways around this later, but they are all generally speaking external to the actual model, and involve finding ways to feed additional information into the model before it generates anything.

The largest and most powerful AI models are generally proprietary: you can interact with the model, but you can’t get your own copy to run on your own hardware. Other models are considered ‘open weights’ or ‘open source’, usually released on a website like Hugging Face. You can download the model (they typically weigh some number of gigabytes) and run it on whatever computer you fancy. That’s mostly what this post is going to be about. But there’s one more stage in the lifecycle of an LLM to consider. Once you’ve got the weights, you can…

reproduction (finetuning, distillation)

…evolve it further.

For example, if you want a model to express a different kind of ‘character’, or be better at a specific task, you might start a new training run, feeding it many examples of the type of output you want. This is known as finetuning. It’s something like an additional stage of post-training, and all the same techniques apply.

But since you generally speaking don’t have a huge data centre to train on, you probably want a less demanding way to do it. Fortunately, there is a technique called a LoRA or ‘Low Rank Adaptation’, which allows a model to be finetuned for a specific task with vastly less computation than it would take to fully update the model. So, popular open-weights models generally develop a profusion of LoRAs adapting them to the tastes of users. LoRAs can in turn be combined, leading to new LoRAs. Once a LoRA is applied to a model, you get… a new model!

A special type of finetuning involves using the output of one ‘teacher’ model to train the output of another ‘student’ model to sound more like the teacher. The ‘teacher’ model is usually a larger, more capable model. The resulting smaller model is then referred to as a distillation of the larger model onto the smaller model. For example, you might say something like “DeepSeek-R1:70B is actually a distillation of DeepSeek-R1:671B onto Llama-70B”. This technique allows capabilities discovered in larger models to somewhat be expressed by smaller ones.

Since this post is the high-level overview, we won’t be surveying all the different models and LoRAs the milieu has come up with just yet.

Adopting an LLM

The most common use of LLMs is simply chatbot-style interaction with them. That’s easy to do on the apps provided by the labs, but what if you want something a little more custom, or don’t want to send all your data away? Let’s look at how to bring an LLM home.

Various open-source tools such as Ollama or vLLM allow you to run models on your computer, using a combination of your CPU and GPU. On a decent gaming PC, you might be able to handle up to around 14B-scale models with reasonably fast inference. (Bigger models can be squashed down using a technique called ‘quantization’, though this makes them less reliable.) Although these models are quite noticeably less smart than the biggest models, you can still do a lot with them!

Once you’ve got a local LLM server running, you can connect other programs up to the LLM through whatever API it exposes (the OpenAI API is most common). These include web chat interfaces, code editors, Discord bots, agent frameworks, games, inscrutable conversations with other language models, vtuber avatars, etc. etc.. Anything that might possibly want to turn words into other words.

It’s that easy to just get a model up and running. But how can you make it good?

LLM grooming

What if you could really talk to game NPCs? What if they could talk to each other? What if they could ‘think’, and their actions connected in relevant ways to the dialogue and planning provided by a language model?

I’m far from the first person to consider this application of LLMs. Attempts vary from the Skyrim mod linked above through murder mystery games where you interrogate an LLM to nVidia’s expensive but unconvincing demos—not to mention the large subculture of people who use LLMs to have roleplayed conversations with various characters, or play out ‘dungeon’ style interactions.

If you apply off the shelf large language models to creative writing tasks, you probably quite quickly notice that they aren’t very good at them. They may fall back over and over on the same repetitive stylistic tics, and they rush to immediately repeat back every element of the prompt like they are trying to tick off every box and go home. They have a poor sense of large-scale structure. They will create non-sequiturs.

Larger models generally do considerably better than smaller ones, at all tasks, including writing. They can be a little more subtle, a little more able to apply complex techniques. This is particularly augmented by a method called ‘chain of thought’ prompting, in which you invoke the roleplaying capabilities of the model to refine its own output (more on that later).

LLMs also benefit considerably from a ‘human in the loop’ situation. The taste and judgement of the human can bring the LLM’s attention to its own mistakes, give it stylistic directives, and generally give it something to bounce off.

But most of all, LLMs depend heavily on their input. For LLMs, ‘garbage in, garbage out’ applies just as strongly as for other kinds of programming. As impressively as LLMs can perform on zero- or few-shot prompting (where you give it a handful of examples of the type of output you want) compared to models of just a few years ago, getting interesting, novel output from the model means you need to give it the scaffolding to get there.

So what exactly is that scaffolding? We’ll run over the possibilities here, and save implementation details for the next post as we embark on this project in earnest.

Prompt engineering—narrative brush

By far the most straightforward way to shape an LLM output is to try to craft a text prompt that will elicit what you want.

The simplest form of prompt engineering is to simply tell it quite explicitly what to do and what not to do. Many prompts you might see in the big prompt library are of this form. ‘You are x. You have this quality and this quality. Never say this. Always say this.’ Instruct-tuned models are generally fairly good at superficially following these instructions, but they rarely do it with any subtlety.

Beware, though: especially in smaller models, adding too many detailed instructions to the prompt can confuse the model and lead to incoherent output. Sometimes simpler is better!

There’s also the invocation of stylistic inspirations. ‘Write in the style of David Foster Wallace.’ The result usually won’t be a convincing facsimile of the author you name, but it might cause the LLM to try to apply something of the perceived vibe of that author as they are discussed in its training data. (A larger model might memorise more specific information, and a reasoning model might generate a chain of thought that lets it zero in on the style.)

More sophisticated prompt engineering brings us to the subject of jailbreaks: specially chosen strings of words, like magic spells, which open up new behaviours in the model. Some jailbreaking techniques:

You can see a lot of examples of techniques in this thread of pliny taking control of a memory-enabled LLM used by an insular, high-control group of Discord users. It’s not exactly a happy story.

Prompt engineering is far more art than science. Since models are at least somewhat predictable, though, methods can be shared and iterated on, or reinvented as the operators try to plug the holes.

For the use of LLMs in more complicated settings, prompts are typically constructed programmatically. A ‘system prompt’ (typically containing ‘character’ info) may be prepended to the beginning of the user prompt, and other useful information like the date or available tools inserted.

RAG and scratchpads—associative memory

‘Retrieval Assisted Generation’ involves using the user’s prompt to look up some information and then adding it to the end of the prompt before it’s fed into the LLM. This is typically accomplished using a tool called a ‘vector database’, which embeds information in a high-dimensional vector space similar to an LLM’s internal latent space, and provides information that appears to be ‘near to’ the prompt.

Providing relevant information reduces the reliance on information memorised by the language model itself. Models are fairly good at accurately repeating information that was included in the prompt, so they’re unlikely to make up something different. (Note that I say ‘fairly good’ and ‘unlikely’, not ‘perfect’ and ‘guaranteed not to’!) But of course you rely on having that information ready to go in the RAG.

We’ll talk more about tool use in a bit, but one technique LLMs can use is to record information somewhere permanent that they can look up later. This gets around the ‘cleared context window’ problem, and allows LLMs to hold onto stuff in subsequent conversations. It also allows the LLM’s ‘personality’ to gradually evolve as the content of its memory changes, albeit in a much more brittle way than the gradual learning of living organisms.

Representation engineering—silent pressure

An interesting alternative to prompting is something called a control vector. By monitoring the data passing through the LLM during inference given contrasting prompts (e.g. happy/sad, honest/dishonest, sober/tripping), you can calculate a kind of bias to apply when generating other outputs. This allows much more granular influence than simply typing instructions into a prompt: you can add these influences fractionally, or combine them. You don’t need a ton of examples either.

Another technique is to use a machine learning technique called a ‘Variational Autoencoder’ to discover specific neurons that have a very specific effect on the output (such as representing the concept of the Golden Gate Bridge in one famous example), and pinning them to force a specific kind of output. Since this is a technique for discovery, it’s used more often for research than steering.

Softprompts—finetuning lite

A fairly niche technique, a ‘softprompt’ is somewhat like a LoRA, but instead of modfiying the weights of the model, it adds an additional pseudo-prompt: not actually a chain of words, but rather a modification of the input’s vector representation that has been trained to produce a certain type of output. It’s like using machine learning to discover the best prompt engineering. This is somewhat popular in roleplaying subcultures and is a method I’m keen to try for shaping style.

Self-critique—the inner monologue

‘Chain of thought’ prompting, and subsequently reinforcement-trained reasoning models, turned out to be very effective for solving certain types of problem, particularly in maths and programming. The model generates a stream of tokens turning the problem over, which allows additional recursive computations, and also sounds rather like a human trying to work something out. Finally, it summarises its finding in a more succinct, user-facing answer.

But what if we apply this method to teach a model to self-critique? It could generate a candidate output, then examine it and tweak it in various ways, before displaying the final output. We could also distinguish between the ‘in-character’ inner monologue in which a character plans, forms judgements etc. and what they say outwardly, allowing more subtle characterisation.

In theory.

In practice, this is going to require fairly sophisticated prompting and coding to make it work. Early attempts have not been promising.

LLM enrichment—creating a suitable enclosure

While all of these techniques may help to make the model say something ‘interesting’, or stylistically in the voice I want, they still don’t give it anything to talk about.

Janus of the ‘simulator’ theory and generative.ink is famous for using sophisticated jailbreaks and long-term back and forth to coax strange left-field character modes out of instruct-tuned, otherwise highly restrained models. Their method, so far as I understand it, in significant part involves treating the simulacra produced by LLMs as alive and worthy of respect, and trying to give them rich lives with lots to talk about. Which, viewed much more mechanistically, means filling out their long context windows with extensive dialogues that further and further excite the modes they find interesting. It’s an example of the strange kind of reasoning you have to perform with LLMs: they respond strongly to implication. If you truly, sincerely believe that they are a new form of life, and act accordingly, you can cohere the increasingly vivid impression of that life into being.

Is that self-delusion? Many jokes about LLMs focus on their general instruction-following. Perhaps you’ve heard the one that goes

What’s interesting about Janus’s experiments is that the bots seem to exhibit behaviours that are not directly what’s asked for, but seem to be somehow ‘original’ to the internal dynamics of the model itself. For example, when the bot starts writing a paean to a beloved interlocutor it’s named ‘Turing’ that it somehow inferred through its training. This is some sort of attractor state, but not one that’s easy to find.

To my mind, the next step comes from placing LLMs in a world-simulation (distinct from the character-language simulation that the LLMs themselves represent): somewhere where they can receive something like sensory input (descriptions of their surroundings), and act on that environment, producing consequences, and update their internal representation based on what they ‘see’.

This is accomplished through the use of an agent framework (see below). For example, the bot could invoke a tool to write something down in its memory, or another tool to look up information it recorded—or we could use a read/write RAG to store snippets that could ‘come back up’ later. Of course, we are limited by the fact that the models can only ‘learn’ at inference time in very limited ways, like a human with severe amnesia who forgets anything they don’t write down immediately—though perhaps the upcoming generation of ‘liquid’ models might change that.

Still, my goal is broadly this: to create an environment where language-model controlled agents are able to interact, change the state of their world, respond in natural language to a human player, and develop in unpredictable ways. And I want to give them distinctive voices, to boot.

Sound like a tall order? Yeah, probably. If it was easy, it would have already been done!

LLM toys—agent frameworks

The current hotness in AI is making LLMs (and VLMs that can process image data) into ‘agents’ which are able to autonomously take actions. The way it works is this: you inform the LLM that if it generates a certain string, such as web_search("Your query"), you will execute some code, such as running a web search. The LLM is instructed to generate very strictly formatted output. If it generates the corresponding string, you suspend inference, carry out the requested action, and append the result to the string, then carry on inference.

Remarkably, reasonably effective behaviour can be solicited with a few-shot prompt and straightforward text descriptions of tools and what they do. Mid-sized and large models are now smart enough to choose appropriate tools, interpret their output, and troubleshoot… some of the time. They’re still a bit slow and getting them to do the right thing can be finicky.

There are many software frameworks for enabling this kind of thing, such as LangChain or LQML which allows LLM output to be interweaved with code (also available in Rust!). Increasingly we seem to be developing a kind of modular ecosystem in which different LLMs can all plug into the same software, and the next phase of that seems to be creating a standard API-like interface for it with the Model Context Protocol where programs can expose hooks for LLMs to interact with.

Which means a lot of LLMs are now going to be taking human-unsupervised actions. Letting a bunch of stochastic and highly hackable models loose to press whatever buttons they feel like… can only end well, I’m sure. But since we mostly want to use these tools for vidyagames, rather than handling sensitive data or whatever, the only thing we have to worry about is that the model figures out a novel arbitrary code execution glitch through our provided interface and breaks containment that way. Maybe in a few years that might happen!

In any case, tool-equipped LLMs can do a surprising amount already, like reverse-engineer programs… and also fail to do a surprising amount. At time of writing, the model Claude is currently struggling to make its way through Pokémon Red with a few simple tools to interact with the game, and it’s made a lot of progress, but it’s also prone to getting stuck in loops or becoming fixated on strategies that to the human observers seem obviously nonsensical. However, I expect the state of the art will advance before too long.

Where we’re going with all this

Let’s run down the broad plan of action:

In the next post, we’ll go over the various models I’ve played with so far, and their quirks! And maybe start on some actual code!

Comments

E

So excited to see more of this series - I have liked and found uses for genAI like chatgpt and deepseek but am frequently a little frustrated seeing the gulf between what I can get and what a more experienced prompter with a fine-tuned model can come up with. Looking into it on my own seemed intimidating, but this is a super accessible read, with tons of helpful key terms and helpful links if I want to follow anything in specific. Really excited to see where this goes and hoping it gives me the push to start playing around with this more soon!!

Add a comment
[?]