Path to AGI

Apr 17, 2024 2 min read

Incomplete Ideas towards Artificial General Intelligence

What changed in the last decade?

Large-scale pre-training data.
Bigger models with more parameters.
More compute (FLOPS).
General purpose learning algorithms and architectures over task-specific methods e.g. Transformers.
Old school optimization technique e.g. Gradient Descent and Loss Function e.g. CrossEntropy are enough.
Less human-engineered labels and features e.g. Self-Supervised Learning from raw data.

“The bitter Lesson” from Richard Sutton (2019).

What is Missing

Sparsity for less expensive Models (better resource allocation / compute per token) e.g. Sparse Attention.
Dynamic Compute (Flops per Token) depending on the difficulty of the task e.g. Routing Networks / Mixtures of Experts (MoE).
Embodiment and Interaction with the Physical World e.g. Robotics and Simulation Environments (Embodied AI + Sim2Real).
Unsupervised and Self-supervised Learning for generalization and One-shot Learning.

Pitfalls

Human Engineered Feature Methods
Task Specific Datasets for Training
Human-designed Heuristics and Domain Knowledge Methods
Antropomorphizing AI

What GPT-2 Does

Learns a probability distribution from a highly dimensional space. Our model then learns to sample from this learned distribution representing the training data. We use a generalized task which is to predict the next frame’s representation (token embedding).

During inference, we sample auto-regressively from this learned distribution, in other words, we generate more data by sampling from a distribution. Instead of sampling word tokens, we sample image tokens which can then be classified into categories of actions, objects, etc.

What’s the Path Forward?

AI Technology Research