ABOUT
MEMBERS
PUBLICATIONS
RESEARCH
ACTIVITY
CONTACT
SEMINAR
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Jiwon Kim
2025.01.20
DPO
VENUE
2023 NeurIPS
PAPER LINK
NeurIPS
PDF
PDF 다운로드
이전 글
Tent: Fully Test-Time Adaptation by Entropy Minimization
다음 글
Chain-of-Thought Reasoning Without Prompting
목록으로