Next Token Prediction for AGI

Incomplete Ideas: Path to Artificial General Intelligence (part 2)


How GPT-2 Works:

Learns a probability distribution from a highly dimensional space. Our model then learns to sample from this learned distribution representing the training data. We use a generalized task which is to predict the next frame’s representation (token embedding).

During inference, we sample auto-regressively from this learned distribution, in other words, we generate more data by sampling from a distribution. Instead of sampling word tokens, we sample image tokens which can then be classified into categories of actions, objects, etc.

Is Next Token Prediction Enough for General Artifficial Intelligence?

Arguments for:

  • It is a general task that can be applied to any domain and large amounts of data.
  • It is a task that can be learned from raw data without messing it up with human-engineered noisy labels.
  • Is this task not the foundation of what our brain does? Surviving and adapting ourselves to changing environments (evolutionary theory).

Arguments against:

  • It seems overly simplistic and not enough to capture the complexity of the world.
  • Is there a chain-of-though recurrance in the attention mechanism during a single forward pass?


Maxence
Maxence
PhD student in Surgical AI

My research interests include machine learning, computer vision, and surgical robotics.