Skip to content

basiclab/awesome-personalized-finetuning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Awesome-Personalized-Finetuning

topology

Table of Contents


Papers

Human Alignment

This section focuses on the fundamental approach and the applications by training the model to align the human preference.

Fundamental Approach

Title Venue Year Code Keywords
Training language models to follow instructions with human feedback NeurlPS 2022 OpenRLHF RLHF
SLiC-HF: Sequence Likelihood Calibration with Human Feedback ICLR 2023 Non-Official SLiC-HF
Direct Preference Optimization: Your Language Model is Secretly a Reward Model NeurlPS 2023 OpenRLHF DPO
RRHF: Rank Responses to Align Language Models with Human Feedback without tears NeurlPS 2023 Official RRHF
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment TMLR 2023 Official RAFT
Back to Basics: Revisiting REINFORCE-Style Optimization for Learning from Human Feedback in LLMs ACL 2024 OpenRLHF RLOO
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models ICML 2024 Official SPIN
A General Theoretical Paradigm to Understand Learning from Human Preferences AISTATS 2024 OpenRLHF IPO
Statistical rejection sampling improves preference optimization ICLR 2024 Official Rejection Sampling
SimPO: Simple Preference Optimization with a Reference-Free Reward NeurlPS 2024 Official SimPO
KTO: Model Alignment as Prospect Theoretic Optimization ICML 2024 OpenRLHF KTO
RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback ICML 2024 RLAIF
RLHF Workflow: From Reward Modeling to Online RLHF TMLR 2024 Official Online-RLHF

Application

Title Venue Year Code Keywords
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback CVPR 2024 Official RLHF-V
Diffusion Model Alignment Using Direct Preference Optimization CVPR 2024 Official DiffusionDPO
Training Diffusion Models with Reinforcement Learning ICLR 2024 Official DDPO
RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback ICML 2024 Official RL-VLM-F
Aligning Diffusion Models by Optimizing Human Utility NeurlPS 2024 Official Diffusion-KTO

Data Distillation

Survey

Title Venue Year
Data Distillation: A Survey TMLR 2023
A Comprehensive Survey of Dataset Distillation T-PAMI 2024

Fundamental Approach

Title Venue Year Code Keywords
Dataset Distillation arXiv 2018 Non-Official
Dataset Condensation with Gradient Matching ICLR 2021 Official gradient matching
CAFE: Learning to Condense Dataset by Aligning Features CVPR 2022 Official CAFE
Dataset Distillation by Matching Training Trajectories CVPR 2022 Official MTT, trajectory matching
Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching ICLR 2024 Official lossless
Multisize Dataset Condensation ICLR 2024 Official mltisize
Embarassingly Simple Dataset Distillation ICLR 2024 Official RaT-BPTT
D4M: Dataset Distillation via Disentangled Diffusion Model CVPR 2024 Official D4M
Dataset Distillation by Automatic Training Trajectories ECCV 2024 Official ATT
Elucidating the Design Space of Dataset Condensation NeurlPS 2024 Official EDC

Application

Title Venue Year Code Keywords
Dataset Distillation with Attention Labels for Fine-tuning BERT ACL 2023 Official
Vision-Language Dataset Distillation TMLR 2024 Official
Low-Rank Similarity Mining for Multimodal Dataset Distillation ICML 2024 Official LoRS
Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement CVPR 2024 Official
DiLM: Distilling Dataset into Language Model for Text-level Dataset Distillation NAACL 2024 Official DiLM
Textual Dataset Distillation via Language Model Embedding EMNLP 2024 N/A

Archiecture

Title Venue Year Code Keywords
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters arXiv 2024 Official TokenFormer
Large Concept Models: Language modeling in a sentence representation space arXiv 2024 Official LCM
Byte Latent Transformer: Patches Scale Better Than Tokens arXiv 2024 Official BLT

Acknowledgement

Thanks to the following repositories:

About

A curated list for training a personalized model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published