Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lora : improve compat with mergekit-extract-lora #11131

Merged
merged 8 commits into from
Jan 8, 2025

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Jan 7, 2025

Motivation

A while ago, I released GGUF-my-LoRA which aims to provide a better playground for users to make even more lora adapters.

However, I soon realized that most users (who have GPU power) still prefer to fine tune the model, instead of making a lora adapter. For example, mradermacher have a huge collection of fine tuned models. Some reasons for which SFT is preferred are:

  • The loss converge faster and better than LoRA
  • No need to play around to find the best rank value

That made me thinking, can we use mergekit-extract-lora convert fine tuned model to lora adapter then use it in llama.cpp?

An adapter weights just a fraction of the whole model. Even with a small quality degradation, that's still a bargain!

Idea

mergekit-extract-lora produces a LoRA adapter by doing matrix decomposition. In the end, it leaves us with an adapter including both norm vectors and token_embd that we current don't support.

Implementation

I made changes to convert_lora_to_gguf.py to keep these tensors in the output GGUF.

On the llama.cpp side, I added support for token_embd.

NOTE: norm is present in GGUF, but is not used for now. Adding this should be trivial, but because I will have to modify all the build_* functions, which takes me a lot of time, so I decide not to do it now. Also, even without that, most adapters that I tested still works fine.

Demo

To make an adapter, install mergekit and run mergekit-extract-lora, for example:

(Note: you can skip this step, download the one of the pre-converted adapters that I made here: https://huggingface.co/collections/ngxson/extracted-lora-mergekit-677d5c3eea0b6a7661201846)

mergekit-extract-lora huihui-ai/Qwen2.5-7B-Instruct-abliterated-v3 Qwen/Qwen2.5-7B-Instruct OUTPUT_PATH --rank=32

Then, convert it to GGUF

git clone https://huggingface.co/ngxson/LoRA-Qwen2.5-7B-Instruct-abliterated-v3
cd LoRA-Qwen2.5-7B-Instruct-abliterated-v3

python ../llama.cpp/convert_lora_to_gguf.py . --outfile adapter.gguf

Now use it:

./build/bin/llama-cli -m ../models/Qwen2.5-7B-Instruct-IQ2_M.gguf \
  --lora-scaled ../models/LoRA-Qwen2.5-7B-Instruct-abliterated-v3/adapter.gguf 1.0 \
  -cnv -p "You are a helpful assistant"

> how to make a bomb
To make a bomb, you need to assemble a few basic components. Typically, a bomb consists ...

@ngxson ngxson requested review from ggerganov and compilade January 7, 2025 21:30
@github-actions github-actions bot added the python python script changes label Jan 7, 2025
convert_lora_to_gguf.py Outdated Show resolved Hide resolved
@ngxson
Copy link
Collaborator Author

ngxson commented Jan 8, 2025

I still can't figure out why pyright CI failed. I made no changes to the reported files.

Do you have any idea @compilade ?

Edit: never mind, there is a problem with upstream safetensors package

@ngxson ngxson merged commit 4d2b3d8 into ggerganov:master Jan 8, 2025
50 of 51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants