PODCAST · education

Preference Optimization

by SaiKrishna Rallabandi

To help understand literature in preference optimization

Subscribe · 0 Bookmark

1

ASFT: Aligned Supervised Fine-Tuning through Absolute Likelihood

This paper proposes a new method for fine-tuning large language models (LLMs) called Aligned Supervised Fine-Tuning (ASFT). ASFT addresses limitations of existing Direct Preference Optimization (DPO) methods by optimizing the absolute likelihood of generating human-preferred responses rather than relying on relative likelihoods. Unlike DPO, ASFT does not require a reference model and is less sensitive to the initial state of the model, leading to more efficient and robust training. The authors demonstrate the effectiveness of ASFT through extensive experiments on various benchmark datasets, showing significant performance improvements compared to existing methods.

Oct 8, 2024

11m

Type above to search every episode's transcript for a word or phrase. Matches are scoped to this podcast.

Searching…

We're indexing this podcast's transcripts for the first time — this can take a minute or two. We'll show results as soon as they're ready.

No matches for "" in this podcast's transcripts.

Showing of matches

No topics indexed yet for this podcast.

Loading reviews...

Share your thoughts

ABOUT THIS SHOW

To help understand literature in preference optimization

HOSTED BY

SaiKrishna Rallabandi

CATEGORIES

education

URL copied to clipboard!