Path to AGI

Incomplete Ideas towards Artificial General Intelligence


What changed in the last decade?

  • Large-scale pre-training data.
  • Bigger models with more parameters.
  • More compute (FLOPS).
  • General purpose learning algorithms and architectures over task-specific methods e.g. Transformers.
  • Old school optimization technique e.g. Gradient Descent and Loss Function e.g. CrossEntropy are enough.
  • Less human-engineered labels and features e.g. Self-Supervised Learning from raw data.

“The bitter Lesson” from Richard Sutton (2019).

What is Missing

  • Sparsity for less expensive Models (better resource allocation / compute per token) e.g. Sparse Attention.
  • Dynamic Compute (Flops per Token) depending on the difficulty of the task e.g. Routing Networks / Mixtures of Experts (MoE).
  • Embodiment and Interaction with the Physical World e.g. Robotics and Simulation Environments (Embodied AI + Sim2Real).
  • Unsupervised and Self-supervised Learning for generalization and One-shot Learning.

Pitfalls

  • Human Engineered Feature Methods
  • Task Specific Datasets for Training
  • Human-designed Heuristics and Domain Knowledge Methods
  • Antropomorphizing AI

What GPT-2 Does

Learns a probability distribution from a highly dimensional space. Our model then learns to sample from this learned distribution representing the training data. We use a generalized task which is to predict the next frame’s representation (token embedding).

During inference, we sample auto-regressively from this learned distribution, in other words, we generate more data by sampling from a distribution. Instead of sampling word tokens, we sample image tokens which can then be classified into categories of actions, objects, etc.

What’s the Path Forward?


Maxence
Maxence
PhD student in Surgical AI

My research interests include machine learning, computer vision, and surgical robotics.