what i'm learning and notes i take from it.
Notes from Stanford's Language Modeling from Scratch course.
Using TMA in the Hopper architecture to load data from global to shared memory.
interesting blogs or papers and my notes from them.
No notes yet.
zines!
A visual explainer on MoE in transformers.