learning
← back to map

post-training

Notes on SFT, RLHF, DPO, and alignment.

No notes yet.