Tag Index

architecture (2)

Gated Attention

June 27, 2026

Mixture of Experts

June 22, 2026

attention (2)

Gated Attention

June 27, 2026

Attention Is Not Matmul Bound

June 23, 2026

decoding (2)

Speculative Decoding

June 21, 2026

The Temperature Knob

June 20, 2026

deep-learning (1)

Vision and Language

June 28, 2026

essays (6)

Mother's Day Bouquet

May 7, 2026

The Open World

October 24, 2025

Life’s River

January 1, 2025

Battle not with monsters, lest ye become a monster

November 13, 2023

Writing, the Entropy in my Universe

September 3, 2018

Heat, Anxiety and Release

February 4, 2017

gpu (2)

AI Infra Resource Map

June 26, 2026

Attention Is Not Matmul Bound

June 23, 2026

inference (2)

Why Decode Is Slow

June 25, 2026

Speculative Decoding

June 21, 2026

infra (1)

AI Infra Resource Map

June 26, 2026

llm (8)

Gated Attention

June 27, 2026

AI Infra Resource Map

June 26, 2026

Why Decode Is Slow

June 25, 2026

When Is SFT Done?

June 24, 2026

Attention Is Not Matmul Bound

June 23, 2026

Mixture of Experts

June 22, 2026

Speculative Decoding

June 21, 2026

The Temperature Knob

June 20, 2026

machine-learning (1)

MLE, a Unifying View

May 12, 2026

moe (1)

Mixture of Experts

June 22, 2026

multimodal (1)

Vision and Language

June 28, 2026

notes (10)

Vision and Language

June 28, 2026

Gated Attention

June 27, 2026

AI Infra Resource Map

June 26, 2026

Why Decode Is Slow

June 25, 2026

When Is SFT Done?

June 24, 2026

Attention Is Not Matmul Bound

June 23, 2026

Mixture of Experts

June 22, 2026

Speculative Decoding

June 21, 2026

The Temperature Knob

June 20, 2026

MLE, a Unifying View

May 12, 2026

performance (3)

AI Infra Resource Map

June 26, 2026

Why Decode Is Slow

June 25, 2026

Attention Is Not Matmul Bound

June 23, 2026

post-training (1)

When Is SFT Done?

June 24, 2026

rl (1)

When Is SFT Done?

June 24, 2026

roofline (1)

Why Decode Is Slow

June 25, 2026

sft (1)

When Is SFT Done?

June 24, 2026

statistics (1)

MLE, a Unifying View

May 12, 2026

tutorials (13)

Vision and Language

June 28, 2026

Gated Attention

June 27, 2026

AI Infra Resource Map

June 26, 2026

Why Decode Is Slow

June 25, 2026

When Is SFT Done?

June 24, 2026

Attention Is Not Matmul Bound

June 23, 2026

Mixture of Experts

June 22, 2026

Speculative Decoding

June 21, 2026

The Temperature Knob

June 20, 2026

MLE, a Unifying View

May 12, 2026

Intro to LLMs

February 12, 2025

But, what is Attention in transformers?

March 13, 2023

Intro to Recurrent Neural Networks (RNNs)

October 4, 2022

vision (1)

Vision and Language

June 28, 2026