Transformers

This page is a hub for my transformer-related notes.

Core Routes

Deep Learning Explained with Mathematics for the broader conceptual roadmap
Build GPT-2 from Scratch for implementation-level detail
AlphaFold Explained for attention ideas in a biological setting
AI Systems for inference, memory, and deployment constraints
Mathematical Foundations for the probability and optimization layer beneath transformer training

COMPSCI 714 is the most direct course path into this cluster
Soft Computing Explained is a good contrast point if you want the older probabilistic and approximate-reasoning view
Computational Graphics is where geometric intuition and visual structure may reconnect with representation learning later