
MathyAIwithMike
Dive into mechanistic interpretability with Sadaltager, exploring how to reverse engineer neural networks. The discussion covers challenges in understanding how AI computes, limitations of current tools like PCA and sparse dictionary learning, and the shift towards building interpretable models from the start – 'glass boxes' instead of black boxes. Validation techniques and the need for 'model organisms' are highlighted, emphasizing the implications for AI safety, policy, and building trustworthy AI systems.