lora : improve compat with mergekit-extract-lora
#11131
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
A while ago, I released GGUF-my-LoRA which aims to provide a better playground for users to make even more lora adapters.
However, I soon realized that most users (who have GPU power) still prefer to fine tune the model, instead of making a lora adapter. For example, mradermacher have a huge collection of fine tuned models. Some reasons for which SFT is preferred are:
That made me thinking, can we use
mergekit-extract-lora
convert fine tuned model to lora adapter then use it in llama.cpp?An adapter weights just a fraction of the whole model. Even with a small quality degradation, that's still a bargain!
Idea
mergekit-extract-lora
produces a LoRA adapter by doing matrix decomposition. In the end, it leaves us with an adapter including bothnorm
vectors andtoken_embd
that we current don't support.Implementation
I made changes to
convert_lora_to_gguf.py
to keep these tensors in the output GGUF.On the
llama.cpp
side, I added support fortoken_embd
.NOTE:
norm
is present in GGUF, but is not used for now. Adding this should be trivial, but because I will have to modify all thebuild_*
functions, which takes me a lot of time, so I decide not to do it now. Also, even without that, most adapters that I tested still works fine.Demo
To make an adapter, install mergekit and run
mergekit-extract-lora
, for example:(Note: you can skip this step, download the one of the pre-converted adapters that I made here: https://huggingface.co/collections/ngxson/extracted-lora-mergekit-677d5c3eea0b6a7661201846)
Then, convert it to GGUF
Now use it: