Tools and Tutorials

Tutorial on Interpretability Techniques for Speech Models
by Grzegorz Chrupała, Martijn Bentum, Charlotte Pouw, Hosein Mohebbi, Gaofei Shen, Marianne de Heer Kloots, Tom Lentz, Jelle Zuidema.

Pre-trained foundation models have revolutionized speech technology like many other adjacent fields. The combination of their capability and opacity has sparked interest in researchers trying to interpret the models in various ways. While there has been much progress in understanding model internals and explaining the decisions of Vision and Text AI, speech technology has lagged behind on such interpretability. This tutorial provides a structured overview of interpretability techniques, their applications, implications, and limitations when applied to speech models, aiming to help researchers and practitioners better understand, evaluate, debug, and optimize speech models while building trust in their predictions.

Tutorial on Transformer-specific Interpretability
by Hosein Mohebbi, Jaap Jumelet, Michael Hanna, Jelle Zuidema, Afra Alishahi.

Transformers have emerged as dominant players in various scientific fields, especially NLP. However, their inner workings, like many other neural networks, remain opaque. Despite the widespread use of model-agnostic interpretability techniques, including gradient-based and occlusion-based, their shortcomings are becoming increasingly apparent for Transformer interpretation, making the field of interpretability more demanding today. In this tutorial, we will present Transformer-specific interpretability methods, a new trending approach, that make use of specific features of the Transformer architecture and are deemed more promising for understanding Transformer-based models. We start by discussing the potential pitfalls and misleading results model-agnostic approaches may produce when interpreting Transformers. Next, we discuss Transformer-specific methods, including those designed to quantify context- mixing interactions among all input pairs (as the fundamental property of the Transformer architecture) and those that combine causal methods with low-level Transformer analysis to identify particular subnetworks within a model that are responsible for specific tasks. By the end of the tutorial, we hope participants will understand the advantages (as well as current limitations) of Transformer-specific interpretability methods, along with how these can be applied to their own research.

Python Library

PhD candidate Gabriele Sarti, with contributions from many others from our network, has created Inseq. Inseq is a Pytorch-based hackable toolkit to democratize the study of interpretability for sequence generation models, including LLMs and translation systems. Its goals are to facilitate the use of state-of-the-art interpretability in existing text processing applications, and to enable a more transparent evaluation of new feature attribution methods. The library has been downloaded over 25.000 times and has been used in publications that appeared in prime scientific venues such as ACL, EMNLP, TACL, and ICLR.

Practical on Interpretability for Speech Models

Grzegorz Chrupała and InDeep PhD student Gaofei Shen made for the LOT school a course on neural models of spoken language. This course briefly introduces students to current deep-learning-based (neural) approaches to modeling spoken language. Students learn the fundamental concepts underlying deep learning and study in some detail how it is applied to modeling and simulating the acquisition and processing of spoken language. The course covers the most important recent research and focuses on two families of approaches: self-supervised representation learning and visually grounded modeling. Students learn how to apply pre-trained models to new utterances, and extract, evaluate and analyse representations produced by the models.

Video Series

InDeep’s members and affiliated experts Jelle Zuidema, Jaap Jumelet, Afra Alishahi, Hosein Mohebbi, Sandro Pezzelle, and Michael Hanna have made a video series that explains and explores how to open the Black Box of Large Language Models. The six videos answer why Interpretability is important in the Age of LLMs, explain the behaviour of neural models, demonstrate why it is crucial to track how Transformers mix contextual information, show how to best measure context-mixing in Transformers, what circuits are and how to use them to reverse-engineer model mechanisms.