Blog

What is Direct Preference Optimization ?

LLM Fine Tuning

Reinforcement Learning with Human Feedback (RLHF) is the current state-of-the-art technique to fine tune LLMs. However, a recent and a much simpler improvement on RLHF was published in paper titled 'Direct Preference Optimization-Your Language Model is Secretly a Reward Model'.

27 May 2024

Sample Efficient Accuracy Estimation

Model Evaluation

In high-stakes ML applications where the cost of labelling is expensive, it is imperative to perform model monitoring in sample efficient way.

25 May 2024