This post is a bit of a shorter post that seeks to formalise/crystallise an idea I was chatting about with Zafar from the Social Physics Lab yesterday surrounding the idea that today's foundation models lack "future permanence", or the ability to understand that the model itself will exist past the immediate output or turn that it's currently producing. It seems very convincing to me that this is another central bastion in the idea of developing a solid continual learning, and this post is my attempt to really solidly set this out.
Rehashing action-consequence
Recall that, under an action-consequence learning paradigm, an actor learns through the implicit and explicit relationships it extracts between a set of actions it chooses to conduct in an environment and both the consequences on the actor and those observed in the environment. A simple example to give is that of a person putting their hand on a hot stove. The action is the placement of the hand, the consequence is a burn of varying severity, and the explicit learned relationship is that hot stoves will burn your hand, and because humans have evolved to optimise away from pain, the person will learn not to put their hand on a hot stove.
This example obeys action-consequence in the most literal sense that the consequence produced from the action is directly felt by the actor and causes a clear signal towards one crisp, simple learning. However, one thing that the action-consequence paradigm misses is the intrinsic reward and understanding that humans have that informs the direction in which the learning is crystallised; in our previous example, this is the fact that there is a base hedonistic desire away from pain and towards pleasure that qualifies the "hot stove burns" relationship as bad. It is then a simple observation to see that models today do not have this type of intrinsic policy (if anything, the intrinsic policy is a statistical reward tied to accurately predicting the most likely next token i.e. being a stochastic parrot), and thus that this is likely a large part of why models are unable to effectively distill and integrate learnings.
A simple first-principles derivation of future permanence
In order to construct a solid understanding of future permanence, we note that, in general, one of the chief objectives of humans is to survive comfortably for as long as possible (the philosophical implications of the different flavours of optimising for this objective are discussed more in-depth in this post I like by Neel Nanda). In this sense, humans are always taking into account the next state they'll exist in, even if the state evolution occurs after an extremely short duration (e.g. a split-second decision to dodge an axe flying out of a dresser). If we take one step back and think more generally, humans can be said to make decisions to varying degress that guarantee some version of this objective for as long as possible. Some examples include getting a solid job (stable income that allows stable exchange of value given today's economic systems) or establishing relationships (maximising social capital or mental fulfillment in order to reduce mental load and its adverse effects).
In a similar vein, if we instead attempt to just understand what the best way to learn is, it becomes extremely obvious that any piece of information needs to be qualified as some measure of productive (and worth remembering/learning) or unproductive, and in that sense we require a metric by which we want to measure any information we extract from our action-consequence experiences.
Thinking in both of these directions clearly illuminates the simple reasoning behind understanding future permanence, and also the more general framework by which we should operate in: there should be some sort of qualfiying metric or meta-objective that dictates the importance of what an actor learns in order to then maximise the efficiency by which we learn about objectives. In other words, actors need to understand what their critical path is before understanding how to take steps in that direction.