Todd et al., "In-Context Algebra" (2025)

2025-12-18

In context learning, Mechanistic interpretability

From David Bau’s group. Accepted at ICLR 2026. Contrasts with prior arithmetic interpretability work (Grokking, Power et al. 2022) where tokens have fixed meanings and models learn geometric/Fourier representations—here, without fixed semantics, symbolic/relational mechanisms emerge instead.

Setup#

Small transformer (4 layers, 8 heads, 1024 hidden dim, 18-token vocab) trained from scratch on synthetic finite group algebra. Each training sequence has ~200 facts of the form A B = C drawn from a randomly sampled group (cyclic C₃–C₁₀, dihedral D₃–D₅), with a fresh random mapping from tokens to group elements per sequence. The model learns via next-token prediction—each sequence is already an in-context learning problem where the model must infer token meanings from surrounding facts. Test sequences use new random mappings and, in some experiments, entirely unseen groups.

Main findings#

Five interpretable mechanisms explain ~90% of model behavior:

These emerge in distinct phase transitions during training: structural tokens → closure → copying → elimination → associativity. Later mechanisms build on earlier ones—e.g., identity recognition’s “demotion” reuses earlier “promotion” circuits.

Key contrast with Grokking: without fixed token meanings, the model cannot use Fourier/periodic representations. Instead it develops symbolic/relational strategies (pattern matching, cancellation laws).

Generalizes to unseen groups (near-perfect on held-out order-8 groups) with partial success on non-group algebraic structures.

Receive my updates

YY's Random Walks — Science, academia, and occasional rabbit holes.

YY's Bike Shed — Sustainable mobility, urbanism, and the details that matter.

×