learning
← back to map
model architecture
Notes on transformers, attention, MLPs, and model design choices.