Official code for "Brittle Minds, Fixable Activations: Understanding Belief Representations in Language Models" published at EMNLP'25
| README.md | ||
Brittle Minds, Fixable Activations: Understanding Belief Representations in Language Models
Matteo Bortoletto, Constantin Ruhdorfer, Lei Shi, Andreas Bulling
Findings of EMNLP 2025, Suzhou, China
ICML 2024 Workshop on Mechanistic Interpretability, Vienna, Austria
[Paper]
Under construction