Preview Mode Links will not work in preview mode

AXRP - the AI X-risk Research Podcast

24 - Superalignment with Jan Leike

Jul 27, 2023

Recently, OpenAI made a splash by announcing a new "Superalignment" team. Lead by Jan Leike and Ilya Sutskever, the team would consist of top researchers, attempting to solve alignment for superintelligent AIs in four years by figuring out how to build a trustworthy human-level AI alignment researcher, and then using it...

23 - Mechanistic Anomaly Detection with Mark Xu

Jul 27, 2023

Is there some way we can detect bad behaviour in our AI system without having to know exactly what it looks like? In this episode, I speak with Mark Xu about mechanistic anomaly detection: a research direction based on the idea of detecting strange things happening in neural networks, in the hope that that will alert...