In April this year, a post landed in LessWrong/AI Alignment Forum: Constructability: Plainly-coded AGIs may be feasible in the near future, by Épihanie Gédéon and Charbel-Raphaël Segerie. Today’s models are nearly ubiquitously powered by deep learning architectures—black boxes that improve their predictive inference through backpropagation on model weights. (Of course, those researching Interpretability (Interpretists?) are keen on removing that veil.) This work supports a, in my own words, more inherently interpretable architecture through “Constructability”
Share this post
A path to safe AGI through Constructability
Share this post
In April this year, a post landed in LessWrong/AI Alignment Forum: Constructability: Plainly-coded AGIs may be feasible in the near future, by Épihanie Gédéon and Charbel-Raphaël Segerie. Today’s models are nearly ubiquitously powered by deep learning architectures—black boxes that improve their predictive inference through backpropagation on model weights. (Of course, those researching Interpretability (Interpretists?) are keen on removing that veil.) This work supports a, in my own words, more inherently interpretable architecture through “Constructability”