AlphaInterp: Probing AlphaFold 3's Internal Representations Reveals Evolutionary Determinants of Predicted Structure and Confidence
Feldman, J., Skolnick, J.
AlphaFold 3 predicts the three-dimensional structures of proteins and their complexes with remarkable accuracy, yet the computations by which it converts evolutionary information into structure have remained opaque. Here, in the first systematic mechanistic interpretability analysis of AlphaFold 3, we show that the model relies predominantly on comparative evolutionary context rather than raw sequence, and that a few divergent homologs contribute more to accurate prediction than many near-identical ones. By probing the single and pair representations at four checkpoints along the forward pass, we find that the Pairformer compresses a diffuse co-evolutionary manifold into a compact latent space in which biophysical features are linearly encoded and predicted confidence is causally manipulable in the representational geometry. Across adversarial mutation, fold-switching, and structural-generalization benchmarks, accuracy is preserved even under heavily degraded multiple sequence alignments (MSAs) but collapses when MSAs are removed, regardless of sequence familiarity or training set membership. The model depends on phylogenetic diversity, not MSA depth: a few sufficiently divergent sequences largely restore accuracy and representational coherence, whereas near-identical sequences do not, and evolutionarily unrelated sequences fail entirely even when the alignment format is preserved. AlphaFold 3 therefore uses the MSA to locate structurally constrained positions and activate structural priors stored in its weights. In other words, AlphaFold 3 is a very sensitive fold recognition algorithm. This insight has direct implications for structure prediction, evolutionary inference, and protein design.
Read on ELI