Technology
Gemma Scope
Gemma Scope is Google's open-source interpretability toolkit: a neural network microscope using Sparse Autoencoders (SAEs) to expose and analyze internal layer-level behavior in the Gemma LLM family.
This is Gemma Scope 2: a comprehensive, open suite of interpretability tools from Google DeepMind, specifically engineered for the Gemma 3 model collection. It functions as a microscope for the LLM, leveraging Sparse Autoencoders (SAEs) and Transcoders across every layer to decompose dense activations into sparse, interpretable concepts. The goal is clear: allow researchers to examine complex internal algorithms, debug emergent behaviors like jailbreaks or hallucinations, and accelerate the development of robust safety interventions. The toolkit, which includes models trained on over 1 trillion parameters, provides the visibility needed to audit and steer AI agent behavior effectively.
Related technologies
Recent Talks & Demos
Showing 1-1 of 1