Media Summary: When Anthropic tested Claude Sonnet 4.5 for alignment, the model appeared perfectly behaved — but it turned out the model had ... A talk I gave to my MATS 9.0 Training Program on using This is a talk I gave to my MATS 9.0 training scholars about the big picture of mech interp - as of Oct 2025, what had changed?

Interpretability Hackathon 3 0 Keynote Neel Nanda - Detailed Analysis & Overview

When Anthropic tested Claude Sonnet 4.5 for alignment, the model appeared perfectly behaved — but it turned out the model had ... A talk I gave to my MATS 9.0 Training Program on using This is a talk I gave to my MATS 9.0 training scholars about the big picture of mech interp - as of Oct 2025, what had changed? A talk I gave to my MATS 9.0 training program about reasoning model Part 1 of a walkthrough of our paper, Progress Measures for Grokking via Mechanistic This is a talk I gave to my MATS scholars, with a stylised history of the field of mechanistic

How can we reverse engineer what a neural network is doing? In this IASEAI '25 session, An Introduction to Mechanistic ... Check out presentations from the top four submissions to the PART 1* — a comprehensive update on mechanistic

Photo Gallery

Interpretability Hackathon 3.0 Keynote - Neel Nanda
Interpretability Hackathon 0.0 Keynote w/ Neel Nanda
Interpretability Hackathon 2.0 Keynote - Neel Nanda
Neel Nanda - Our Pivot To Pragmatic Interpretability [Alignment Workshop]
Neel Nanda - Our Pivot To Pragmatic Interpretability [Alignment Workshop]
Can Interpretability Control Model Training?
What Matters Right Now In Mechanistic Interpretability?
How Reasoning Models Break Mechanistic Interpretability Techniques
A Walkthrough of Progress Measures for Grokking via Mechanistic Interpretability: What? (Part 1/3)
The Story of Mech Interp
Neel Nanda – Mechanistic Interpretability: A Whirlwind Tour
An Introduction to Mechanistic Interpretability – Neel Nanda | IASEAI 2025
View Detailed Profile
Interpretability Hackathon 3.0 Keynote - Neel Nanda

Interpretability Hackathon 3.0 Keynote - Neel Nanda

Neel Nanda

Interpretability Hackathon 0.0 Keynote w/ Neel Nanda

Interpretability Hackathon 0.0 Keynote w/ Neel Nanda

Neel Nanda

Interpretability Hackathon 2.0 Keynote - Neel Nanda

Interpretability Hackathon 2.0 Keynote - Neel Nanda

Neel Nanda

Neel Nanda - Our Pivot To Pragmatic Interpretability [Alignment Workshop]

Neel Nanda - Our Pivot To Pragmatic Interpretability [Alignment Workshop]

Neel Nanda

Neel Nanda - Our Pivot To Pragmatic Interpretability [Alignment Workshop]

Neel Nanda - Our Pivot To Pragmatic Interpretability [Alignment Workshop]

When Anthropic tested Claude Sonnet 4.5 for alignment, the model appeared perfectly behaved — but it turned out the model had ...

Can Interpretability Control Model Training?

Can Interpretability Control Model Training?

A talk I gave to my MATS 9.0 Training Program on using

What Matters Right Now In Mechanistic Interpretability?

What Matters Right Now In Mechanistic Interpretability?

This is a talk I gave to my MATS 9.0 training scholars about the big picture of mech interp - as of Oct 2025, what had changed?

How Reasoning Models Break Mechanistic Interpretability Techniques

How Reasoning Models Break Mechanistic Interpretability Techniques

A talk I gave to my MATS 9.0 training program about reasoning model

A Walkthrough of Progress Measures for Grokking via Mechanistic Interpretability: What? (Part 1/3)

A Walkthrough of Progress Measures for Grokking via Mechanistic Interpretability: What? (Part 1/3)

Part 1 of a walkthrough of our paper, Progress Measures for Grokking via Mechanistic

The Story of Mech Interp

The Story of Mech Interp

This is a talk I gave to my MATS scholars, with a stylised history of the field of mechanistic

Neel Nanda – Mechanistic Interpretability: A Whirlwind Tour

Neel Nanda – Mechanistic Interpretability: A Whirlwind Tour

Neel Nanda

An Introduction to Mechanistic Interpretability – Neel Nanda | IASEAI 2025

An Introduction to Mechanistic Interpretability – Neel Nanda | IASEAI 2025

How can we reverse engineer what a neural network is doing? In this IASEAI '25 session, An Introduction to Mechanistic ...

Interpretability 3.0 Hackathon Lightning Talks w/ Esben Kran

Interpretability 3.0 Hackathon Lightning Talks w/ Esben Kran

Check out presentations from the top four submissions to the

I lead a Google DeepMind team at 26. If you want to work at an AI company... | Neel Nanda (Part 2)

I lead a Google DeepMind team at 26. If you want to work at an AI company... | Neel Nanda (Part 2)

PART 1* — a comprehensive update on mechanistic

Part 2: 5. Interpretability

Part 2: 5. Interpretability

Neel Nanda

A Walkthrough of Progress Measures for Grokking via Mechanistic Interpretability: Why? (Part 3/3)

A Walkthrough of Progress Measures for Grokking via Mechanistic Interpretability: Why? (Part 3/3)

Part

Neel Nanda: Mechanistic Intepretability (HAAISS 2024)

Neel Nanda: Mechanistic Intepretability (HAAISS 2024)

Neel Nanda