SEMINAR

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Jiwon Kim
2025.01.20
DPO
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
VENUE2023 NeurIPS
PAPER LINKNeurIPS