This page is a hub for my transformer-related notes.
Core Routes
- Deep Learning Explained with Mathematics for the broader conceptual roadmap
- Build GPT-2 from Scratch for implementation-level detail
- AlphaFold Explained for attention ideas in a biological setting
- AI Systems for inference, memory, and deployment constraints
- Mathematical Foundations for the probability and optimization layer beneath transformer training
Related Themes
- tokenization and representation learning
- self-attention and sequence modeling
- scaling, inference efficiency, and system design
Connected Routes
- COMPSCI 714 is the most direct course path into this cluster
- Soft Computing Explained is a good contrast point if you want the older probabilistic and approximate-reasoning view
- Computational Graphics is where geometric intuition and visual structure may reconnect with representation learning later