Technology

Gemma Scope

Gemma Scope is Google's open-source interpretability toolkit: a neural network microscope using Sparse Autoencoders (SAEs) to expose and analyze internal layer-level behavior in the Gemma LLM family.

This is Gemma Scope 2: a comprehensive, open suite of interpretability tools from Google DeepMind, specifically engineered for the Gemma 3 model collection. It functions as a microscope for the LLM, leveraging Sparse Autoencoders (SAEs) and Transcoders across every layer to decompose dense activations into sparse, interpretable concepts. The goal is clear: allow researchers to examine complex internal algorithms, debug emergent behaviors like jailbreaks or hallucinations, and accelerate the development of robust safety interventions. The toolkit, which includes models trained on over 1 trillion parameters, provides the visibility needed to audit and steer AI agent behavior effectively.

https://ai.google.dev/gemma/docs/gemma_scope

1 project · 1 city

Related technologies

BERT 179 BLOOM 115 Embeddings 22 Goodfire API 1 GPT-3 191 GPT-4 528 Llama-2 227 PaLM 2 116 RoBERTa 118 Sparse Autoencoders 2

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

SAEs for LLM Steering

Mumbai Nov 23

Sparse Autoencoders GPT-4