MIPO: Modulated Intervention Preference Optimization

This repository is for training preference optimization using the Modulated Intervention Preference Optimization (MIPO), which improves upon the issues of the existing DPO. MIPO allows for more efficient learning by adjusting the intervention of the reference model for each instance. When MIPO was used with the Llama3-8B and Mistral-7B models, it showed significantly better performance than DPO on Alpaca Eval 2.0and MT-bench.

You can find detailed information in this paper.

How to Use

It was implemented based on the code from the Alignment Handbook and developed to be used with the trl Trainer.

Training Scripts

Example - Llama3-8B-Instruct:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc_per_node=8 src/run_mipo.py configs/llama3/config.yaml

Hyperparameter Tuning

As mentioned in Section 6.2 of the paper, the performance differences are not significant across the range of hyperparameters. In most cases, good results were achieved when using values between 5 and 25.

Citation

Please cite our paper if you find the repo helpful in your work:

@article{jang2024modulated,
  title={Modulated Intervention Preference Optimization (MIPO): Keep the Easy, Refine the Difficult},
  author={Jang, Cheolhun},
  journal={arXiv preprint arXiv:2409.17545},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
src		src
LICENSE		LICENSE
MIPO.png		MIPO.png
README.md		README.md
__init__.py		__init__.py
alpaca-eval-2.0.png		alpaca-eval-2.0.png
hyperparameter.png		hyperparameter.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MIPO: Modulated Intervention Preference Optimization

How to Use

Training Scripts

Hyperparameter Tuning

Citation

About

Releases

Packages

Languages

License

ncsoft/MIPO

Folders and files

Latest commit

History

Repository files navigation

MIPO: Modulated Intervention Preference Optimization

How to Use

Training Scripts

Hyperparameter Tuning

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages