Vineet Gattani
Publications Blog About Resume

    What is Direct Preference Optimization ?

    LLM Fine Tuning

    Reinforcement Learning with Human Feedback (RLHF) is the current state-of-the-art technique to fine tune LLMs. However, a recent and a much simpler improvement on RLHF was published in paper titled 'Direct Preference Optimization-Your Language Model is Secretly a Reward Model'.

    27 May 2024

    Sample Efficient Accuracy Estimation

    Model Evaluation

    In high-stakes ML applications where the cost of labelling is expensive, it is imperative to perform model monitoring in sample efficient way.

    25 May 2024
    with by Vineet Gattani
    theme portfolYOU