A path to safe AGI through Constructability

Jul 11, 2024

In April this year, a post landed in LessWrong/AI Alignment Forum: Constructability: Plainly-coded AGIs may be feasible in the near future, by Épihanie Gédéon and Charbel-Raphaël Segerie. Today’s models are nearly ubiquitously powered by deep learning architectures—black boxes that improve their predictive inference through backpropagation on model weights. (Of course, those researching Interpretability (Interpretists?) are keen on removing that veil.) This work supports a, in my own words, more inherently interpretable architecture through “Constructability”

Read →

Comments

Quarry

A path to safe AGI through Constructability