‘All’ you need to know about transformer

Attention is well used in modern transformer learning models. Okay, but what is attention exactly? I often find myself going back to the great work “Attention is all you need” over and over again when I need to explain the concept to others and to myself.

As an engineer, I found it useful to understand math through code, so inspired by Jacob Gildenblat’s post on explainability of vision transformers, I made this attention learning note with code snippets in PyTorch.

For Q, K, V, attention, and transformer model, there is visual illustration along with detailed code snippets for each concept/math equation.