architecture (2) attention (2) decoding (2) deep-learning (1) essays (6) gpu (2) inference (2) infra (1) llm (8) machine-learning (1) moe (1) multimodal (1) notes (10) performance (3) post-training (1) rl (1) roofline (1) sft (1) statistics (1) tutorials (13) vision (1)

 architecture (2)

Gated Attention
Mixture of Experts

 attention (2)

Gated Attention
Attention Is Not Matmul Bound

 decoding (2)

Speculative Decoding
The Temperature Knob

 deep-learning (1)

Vision and Language

 essays (6)

Mother's Day Bouquet
The Open World
Life’s River
Battle not with monsters, lest ye become a monster
Writing, the Entropy in my Universe
Heat, Anxiety and Release

 gpu (2)

AI Infra Resource Map
Attention Is Not Matmul Bound

 inference (2)

Why Decode Is Slow
Speculative Decoding

 infra (1)

AI Infra Resource Map

 llm (8)

Gated Attention
AI Infra Resource Map
Why Decode Is Slow
When Is SFT Done?
Attention Is Not Matmul Bound
Mixture of Experts
Speculative Decoding
The Temperature Knob

 machine-learning (1)

MLE, a Unifying View

 moe (1)

Mixture of Experts

 multimodal (1)

Vision and Language

 notes (10)

Vision and Language
Gated Attention
AI Infra Resource Map
Why Decode Is Slow
When Is SFT Done?
Attention Is Not Matmul Bound
Mixture of Experts
Speculative Decoding
The Temperature Knob
MLE, a Unifying View

 performance (3)

AI Infra Resource Map
Why Decode Is Slow
Attention Is Not Matmul Bound

 post-training (1)

When Is SFT Done?

 rl (1)

When Is SFT Done?

 roofline (1)

Why Decode Is Slow

 sft (1)

When Is SFT Done?

 statistics (1)

MLE, a Unifying View

 tutorials (13)

Vision and Language
Gated Attention
AI Infra Resource Map
Why Decode Is Slow
When Is SFT Done?
Attention Is Not Matmul Bound
Mixture of Experts
Speculative Decoding
The Temperature Knob
MLE, a Unifying View
Intro to LLMs
But, what is Attention in transformers?
Intro to Recurrent Neural Networks (RNNs)

 vision (1)

Vision and Language