Related notes: COMPSCI 713, Transformers, Deep Learning Explained with Mathematics, Mathematical Foundations, Computational Graphics
Why AlphaFold Matters
AlphaFold is important because it solved a problem that sits at the boundary of biology, geometry, probability, and machine learning:
- proteins are written as 1D amino-acid sequences
- their function depends on a 3D folded structure
- predicting that 3D structure from sequence alone is hard
In other words, the model has to learn how a string becomes a shape.
That is why AlphaFold is a beautiful AI system to study. It is not just “another neural network.” It is a system that:
- represents sequence information
- represents pairwise geometric relationships
- uses attention to exchange information globally
- gradually refines a structure in 3D space
For a PyTorch learner, AlphaFold is a great example of how modern deep learning moves beyond plain classification and into structured reasoning.
Environment Setup
If you want to follow the examples locally, uv is a clean way to manage Python and virtual environments.
brew install uv
mkdir alphafold_intro
cd alphafold_intro
uv init .
uv add numpy matplotlib torch torchvision jupyterlabThe important thing is not the package manager itself. The important thing is that once the environment is stable, we can focus on tensors, shapes, and model logic.
The Core Learning Question
Let a protein sequence be
where each is an amino acid token.
The target is a set of 3D coordinates
for atoms or residues.
So the learning problem is:
This sounds simple, but it is not. Why?
- residues far apart in sequence can be close in 3D
- local chemistry matters
- global consistency matters
- the output is geometric, not just categorical
That means the model must learn both content and relationships.
Tensor Intuition First
Before AlphaFold, we need the right tensor mental model.
In PyTorch, a tensor is just a multidimensional numerical array with autograd support:
import torch
x = torch.tensor([1.0, 2.0, 3.0])The shape tells us how the data is organised.
- scalar:
() - vector:
(d,) - matrix:
(n, d) - pairwise table:
(n, n, d)
That last one is especially important. AlphaFold does not only store features for each residue. It also stores features for each pair of residues.
Sequence Representation vs Pair Representation
The key AlphaFold idea is that we need two coupled views of the protein.
1. Sequence / residue representation
For each residue , we keep a feature vector:
Stacked together, this becomes:
This is similar to token embeddings in an LLM.
2. Pair representation
For each pair , we keep a feature vector:
Stacked together:
This pair tensor is where AlphaFold becomes really interesting. It explicitly models whether two residues may be:
- close in space
- part of the same structural motif
- geometrically compatible
This is much richer than a plain sequence model.
A Small PyTorch Picture
If a protein has length n = 128, then a toy version might look like this:
n = 128
d_seq = 256
d_pair = 128
seq_repr = torch.randn(n, d_seq) # [n, d_seq]
pair_repr = torch.randn(n, n, d_pair) # [n, n, d_pair]Now we already see the computational challenge:
- sequence scales like
- pair representation scales like
That is one reason protein models become expensive quickly.
Why Attention Helps
Classical recurrent models struggle with long-range dependency. Proteins are full of long-range dependency.
A residue near the beginning of the sequence may interact strongly with one near the end. Attention is useful because it lets each position read from all others.
For a sequence representation:
and attention weights are:
This gives global communication across residues.
But AlphaFold goes further. It uses information from pair features to bias or guide sequence attention. So the model is not only asking:
“Which tokens are relevant?”
It is also asking:
“Which residues are likely geometrically related?”
AlphaFold as a Message-Passing System
One good high-level mental model is:
flowchart LR A[Sequence Tokens] --> B[Residue Representation] A --> C[Pair Initialization] B --> D[Evoformer Updates] C --> D D --> E[Structure Module] E --> F[3D Coordinates] F --> G[Refinement / Confidence]
The important thing here is not the exact block names. It is the feedback loop:
- sequence updates pair
- pair updates sequence
- both feed structure prediction
That is why AlphaFold feels more like a system than a single layer stack.
Multiple Sequence Alignments
A huge part of AlphaFold’s power comes from evolutionary information.
If we collect related protein sequences across species, we get a multiple sequence alignment (MSA). The rough idea is:
- residues that change together may be structurally linked
- conserved positions often matter functionally
- co-evolution provides indirect geometric evidence
So AlphaFold does not only read one sequence. It often reads a whole family of related sequences and tries to detect correlated variation.
That is conceptually similar to saying:
“The data distribution itself contains clues about structure.”
The Evoformer Intuition
You do not need to memorise every block to understand the architecture at a beginner level.
The Evoformer is basically a repeated refinement stage for:
- MSA / sequence features
- pairwise relation features
It does three important things:
- lets residues attend to each other
- lets pair features accumulate geometric hints
- lets sequence and pair states exchange information repeatedly
The repeated refinement matters. One pass is usually not enough to infer a globally consistent fold.
Triangle Reasoning
One famous AlphaFold idea is triangle-style updates on pairs.
Why triangles?
If residue relates to residue , and relates to , then that provides evidence about the pair .
This is a form of relational reasoning. In plain language:
- pairwise geometry should be consistent
- consistency often emerges through triples
That is why pair updates are not just local MLPs. They are trying to enforce something more like geometric compatibility.
Structure Module
After enough relational refinement, the model must actually output coordinates.
At a high level, the structure module maps learned features into 3D transformations and positions:
This stage is where representation learning becomes geometry.
A useful beginner intuition is:
- the earlier blocks learn “what relates to what”
- the structure block turns those relations into coordinates
Confidence Prediction
AlphaFold is also valuable because it predicts confidence, not just structure.
This matters in science. A model should know when it may be wrong.
Common confidence ideas include:
- local confidence per residue
- global structure confidence
- predicted alignment / distance reliability
For ML students, this is a nice reminder that good systems often predict both:
- an answer
- a measure of trust
A Minimal PyTorch Thought Experiment
Suppose we want a tiny educational prototype, not real AlphaFold. Then we might do:
import torch
import torch.nn as nn
class ToyProteinModel(nn.Module):
def __init__(self, vocab=21, d_seq=128, d_pair=64):
super().__init__()
self.embed = nn.Embedding(vocab, d_seq)
self.seq_proj = nn.Linear(d_seq, d_seq)
self.pair_proj = nn.Linear(d_seq * 2, d_pair)
self.coord_head = nn.Linear(d_seq, 3)
def forward(self, tokens):
x = self.embed(tokens) # [n, d_seq]
x = self.seq_proj(x)
xi = x.unsqueeze(1).expand(-1, x.size(0), -1)
xj = x.unsqueeze(0).expand(x.size(0), -1, -1)
pair = self.pair_proj(torch.cat([xi, xj], dim=-1)) # [n, n, d_pair]
coords = self.coord_head(x) # [n, 3]
return x, pair, coordsThis is not AlphaFold, but it teaches the right shape intuition:
- token embedding
- pair tensor construction
- coordinate head
That is enough to start seeing the design space.
The Main Difference from LLMs
AlphaFold and transformers are related, but they are not the same kind of system.
An LLM mainly predicts the next token:
AlphaFold instead predicts a structured geometric object:
So the output space is different:
- LLMs output text distributions
- AlphaFold outputs geometry and confidence
That change in output space drives a completely different architecture emphasis.
What To Focus On As a Student
If you are learning AlphaFold for the first time, focus on these ideas first:
- sequence representation and pair representation are equally important
- attention handles long-range dependency
- evolutionary data adds powerful biological signal
- geometric consistency is a core modeling constraint
- the final target is a 3D structure, not a label
If those five ideas are clear, then the paper and implementation details become much easier to digest.
Closing Intuition
The deepest lesson from AlphaFold is not just “AI can predict proteins.”
It is that modern models work best when the representation matches the structure of the problem.
Proteins are not just sequences. They are:
- sequences
- pairwise interaction graphs
- 3D geometric objects
- evolutionary objects
AlphaFold succeeds because it respects all of those views at once.
That is exactly the kind of systems thinking worth learning from.