Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the README with GGUF instructions #70

Merged
merged 1 commit into from
Mar 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 29 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ A quick breakdown of each of the files:

* `gpt2.f90`: the actual GPT-2 model and a decoder
* `main.f90`: the main driver
* `create_model.py`: downloads the TensorFlow model and converts to our own
format (`model.dat`)
* `create_model.py`: downloads the TensorFlow model and converts to the GGUF
format (`model.gguf`)
* `encode_input.py`: encodes the text input into tokens (input file for `gpt2`)
* Matmul implementations
* `linalg_f.f90` native Fortran
Expand All @@ -46,25 +46,43 @@ Configure and build:
FC=gfortran cmake .
make

Create the `model.dat` file from a given GPT-2 model. Supported sizes (and the
Download the GPT2 model weights:

curl -o model.gguf -L https://huggingface.co/certik/fastGPT/resolve/main/model_fastgpt_124M_v2.gguf

You can also download 355M for the `gpt-medium` model.

Now you can modify the `input` file to change the input string and set other
parameters.

Run (requires `model.gguf` and `input` in the current directory):

./gpt2

## Creating the GGUF file

Create the `model.gguf` file from a given GPT-2 model. Supported sizes (and the
corresponding names to be used in `pt.py`, and the approximate download size):
"124M" (`gpt2`, 0.5GB), "355M" (`gpt-medium`, 1.5GB), "774M" (`gpt-large`,
3GB), "1558M" (`gpt-xl`, 6GB). This will download the model and cache it for
subsequent runs:

python create_model.py --models_dir "models" --model_size "124M"

Alternatively, download the fastGPT model directly from
https://huggingface.co/datasets/certik/fastGPT, e.g.:
This script depends on the `gguf` Python library, that you can install using:

curl -O -L https://huggingface.co/datasets/certik/fastGPT/resolve/main/model_fastgpt_124M_v1.dat
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
git checkout 4e9a7f7f7fb6acbddd1462909c8d696e38edbfcc
cd gguf-py
pip install .

Now you can modify the `input` file to change the input string and set other
parameters.

Run (requires `model.dat` and `input` in the current directory):
The `gguf` library is available in pip and conda, but we currently require the
latest version that is not available there yet.

./gpt2
We used this script to create several GGUF files and uploaded them to:
https://huggingface.co/certik/fastGPT, so that you can just download the
pre-generated files.

### Example Output

Expand Down
4 changes: 2 additions & 2 deletions create_model.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""
This script loads the specified GPT-2 model from OpenAI using TensorFlow,
converts it into our custom format and saves it to `model.dat`, which contains
converts it into our custom format and saves it to `model.gguf`, which contains
everything (all the parameters, all the weights, encoding/decoding
information).
Expand Down Expand Up @@ -268,7 +268,7 @@ def main(model_size: str = "124M", models_dir: str = "models"):
print(" Done. Loading time: ", t2-t1)

# generate output ids
print("Converting model, saving to `model.dat`")
print("Converting model, saving to `model.gguf`")
t1 = clock()
decoder_txt = "".join(decoder)
idx = decoder_idx(decoder)
Expand Down
Loading