Interpretability Hackathon 3 0 Keynote Neel Nanda

Media Summary: When Anthropic tested Claude Sonnet 4.5 for alignment, the model appeared perfectly behaved — but it turned out the model had ... A talk I gave to my MATS 9.0 Training Program on using This is a talk I gave to my MATS 9.0 training scholars about the big picture of mech interp - as of Oct 2025, what had changed?

Interpretability Hackathon 3 0 Keynote Neel Nanda - Detailed Analysis & Overview

When Anthropic tested Claude Sonnet 4.5 for alignment, the model appeared perfectly behaved — but it turned out the model had ... A talk I gave to my MATS 9.0 Training Program on using This is a talk I gave to my MATS 9.0 training scholars about the big picture of mech interp - as of Oct 2025, what had changed? A talk I gave to my MATS 9.0 training program about reasoning model Part 1 of a walkthrough of our paper, Progress Measures for Grokking via Mechanistic This is a talk I gave to my MATS scholars, with a stylised history of the field of mechanistic

How can we reverse engineer what a neural network is doing? In this IASEAI '25 session, An Introduction to Mechanistic ... Check out presentations from the top four submissions to the PART 1* — a comprehensive update on mechanistic