Continual learning as an associative graph

I've just finished up a draft of a research project (which can be found here) targeting the scaling and development of decentralised multi-agent collaboration architectures. Interestingly however, this project made it very apparent that LLMs fail to be able to effectively reason about actions and their consequences on an external world state (especially when actions themselves are also external i.e. not conducted by the LLM). But...why? And more importantly, what ramifications does this have on the general idea of learning, planning, and reasoning in these autoregressive systems?

LLM training and inference procedures

In raw LLMs today, model architectures are built on top of attention. This means:

LLMs start by understanding how each token is related to each other through an $O(n^2)$ attention pass
The transformer heads then take these token-level relationships and map them into different dimensional spaces in order to learn higher-level associations; in manifold, form inductive biases towards specific dynamical motifs/trajectories that are suggestive of some ground truth rule in human language

LLMs are then trained, both using traditional ML techniques and RL/fine-tuning, on the policy of predicting the most accurate next token in the sequence; that is, the goal of an LLM is to most accurately make use of the higher-order statistical similarities that they've learned such that they can then choose what to say next such that it maximises the reward they get.

(Geometrically/intuitively, you can think of this as there being specific grooves, folds, structures, or associations within some $n$ -dimensional space that exist as dynamical motifs with certain gravities, and depending on your input sequence and contextual location within this space, you're pulled toward a certain subset of them that then introduces inductive biases toward what you're most likely to generate next. If you want more detail or intuition in regards to how LLMs work, look out for my post about visualising ML models from a representation learning perspective.)

Dynamics modelling in today's models

This idea of training is where I believe most of our problems arise. Existing literature is unsure about whether the statistical associations between the token vocabularies of LLMs provide enough information about the topics that they describe, and more importantly whether they actually encode proper world-scale knowledge. Whether or not this is the case, it is indisputable that LLMs are optimised only to minimise the statistical error across the most recent 100,000 tokens and, as such, are not primarily optimised for understanding the world.

This suggests a couple things about the associations and world knowledge the model is able to pick up. Specifically:

Knowledge is fragile and expensive. Because LLM knowledge is built on strong neuron activation patterns that must be robust enough to last across a whole dataset that consists of numerous different world states and action-consequence sequences (some that directly contradict each other), it is a simple logical jump to assume that learned tendencies will likely be either overly general or convolved. As a result of this, we then have LLM knowledge that is uninterpretable and fragile. In addition to the traditional issues regarding catastrophic forgetting and ineffective post-training, these ideas present a larger roadblock on the path to continual learning that we'll discuss below.
Uncontrollable biases. Because of the statistical and data-driven nature of machine learning, models often take on biases that directly and unabashedly reflect the microcosm of the world found in their training sets. At such large scales as we use for today's generative models, sloppy curation and data mixing will train undesirable and sometimes malicious behaviours into models; coupled with the lack of interpretability of the models described above as well as the inherent and stochasticity/lack of reproducibility of model behaviour, this presents large challenges in regards to model alignment and behavioural consistency and control.
Lack of model plasticity. Existing models, once trained, are frozen during deployment, with very little post-training available to be done (especially for closed-source models). This presents a direct, low-level challenge in regards to models being able to learn in the first place in addition to the performance limitations that models with out-of-date, static learned world knowledge also deal with. One step up, the fragility and expense of training such large models once again appears as a limiting factor in regards to the ability of these models to stay up to date.

Learning is a search problem

The assumptions that all of these limitations for existing agents actually matter to build a good continual learning system is conditioned on a relatively intuitive human-based ground truth: most of the time, we learn by associating consequences of actions with our existing mental models.

Consequence learning in humans

For all intents and purposes, it makes sense to describe memory and learning in humans as (1) associative and (2) Bayesian. That is, your understanding of a new piece of information is almost always conditioned on all of the information you've been exposed to before. That is, when you see something new, you interpret it through the experiences you've had in the past and the mental models, heuristics, and cognitive scripts you've built based on them. New understandings that are formed in this way can then be considered as new connections; these connections can either be a new edge between two existing "nodes" of experience or knowledge that you already had (a new insight on an old perspective), or edges between existing nodes and newly discovered nodes. One can think of "new nodes" as essentially non-sequiturs (see this blog post by Venkatesh Rao that I found very insightful) that trigger a reformation of one's mental model in order to form new connections and ingress these new pieces of information.

A general associative learning paradigm

Though this is a rather rough reduction of human learning, it does provide a helpful lens through which we can analyse existing LLM learning paradigms.

Let's begin by breaking down model learning into the problems of search and integration, as inspired by this continual learning blogpost. More formally, we can define the mechanism of learning for a model as, across some set of information, the ability to find a sufficient set of relevant associations (if any) and then build integrations such that further reference to the same concept can be effectively reasoned through using first principles without excessive context.

Though this is a slightly contrived definition, this does allow us to reduce existing LLM learning paradigms such as LoRA, supervised full fine-tuning, and token-space continual learning into a few related buckets within the framework.

Weight-space updating = changing the "mental model", (although it's extremely hard to manage because of the lack of interpretability; we don't understand what weights/weight groups do what). This type of learning includes most fine-tuning and adapter methods, and though it allows for somewhat effective old-new connections, it does reform the whole latent space of associations and, as a result, is not completely associative because it doesn't form connections on top of existing information, but instead reforms the whole space in order to move existing pieces of information closer to each other.
Token-space updating = cleverly navigating the existing mental model; in the model sense, this is exclusively old-old because it assumes that everything that the LLM has learned before is explicitly and entirely a ground truth and simply attempts to use a GPS or directing system (i.e. the tokens within the context window) in order to place the model in a particular position within its learned latent space such that it's able to use the most proximal concepts at that location to effectively convey insights to the user.

From examining even just these two cases, we can more clearly glean information about building an effective "associative learner" and, ironically, it basically reduces to effective search once again. We can proceed first-principles from human learning once again in order to finalise our hypothesis about associative learning:

The problem with humans learning is that they are imperfect and need to relearn things, i.e. human learning is lossy and stochastic (learning the same thing at a different time interval proably doesn't yield the exact same result). Furthermore, forgetting turns what should be an old-old into an old-new. You can also analogise forgetting into a "retrieval failure" aka my search (memory) wasn't good enough to find that thing.
For LLMs, they have (roughly) perfect memory within their context window and vocabulary (i.e. converting tokens into learned embeddings within the latent space), but they fail to selectively retrieve and instead brute force across their whole memory each time, thinking that everything matters. Thus, in order to learn associatively, LLMs simply need to better understand their latent space such that they correctly choose when to make old-old connections and when to make old-new connections.

This conclusion about LLM learning gives us the final piece that we need to bring together our discussion of associative learning, dynamics modelling, and consequence understanding all together, giving us what I believe is a comprehensive hypothesis for continual learning frameworks, simplified down to 3 problems:

perfecting search and retrieval within the memory space -- I should know when to use old-old and when to use old-new
effectively using a dynamics model to reason about actions and their consequences
extracting insights from those consequences/observations and integrating them through effectively building connections, both old-old and old-new

Thus is associative learning.