Use GGUF to store model weights #69

certik · 2024-03-15T20:13:23Z

Here are the two lowest models. 124M:

$ gguf-dump model_fastgpt_124M_v2.gguf
* Loading: model_fastgpt_124M_v2.gguf
* File is LITTLE endian, script is running on a LITTLE endian host.

* Dumping 4 key/value pair(s)
      1: UINT32     |        1 | GGUF.version = 3
      2: UINT64     |        1 | GGUF.tensor_count = 22
      3: UINT64     |        1 | GGUF.kv_count = 1
      4: INT32      |        1 | general.data_offset = 1088

* Dumping 22 tensor(s)
      1:         12 |    12,     1,     1,     1 | I32     | header
      2:   38597376 |   768, 50257,     1,     1 | F32     | wte
      3:     786432 |   768,  1024,     1,     1 | F32     | wpe
      4:   28311552 |  3072,   768,    12,     1 | F32     | mlp_fc_w
      5:      36864 |  3072,    12,     1,     1 | F32     | mlp_fc_b
      6:   28311552 |   768,  3072,    12,     1 | F32     | mlp_proj_w
      7:       9216 |   768,    12,     1,     1 | F32     | mlp_proj_b
      8:   21233664 |  2304,   768,    12,     1 | F32     | attn_w
      9:      27648 |  2304,    12,     1,     1 | F32     | attn_b
     10:    7077888 |   768,   768,    12,     1 | F32     | attn_proj_w
     11:       9216 |   768,    12,     1,     1 | F32     | attn_proj_b
     12:       9216 |   768,    12,     1,     1 | F32     | ln1_b
     13:       9216 |   768,    12,     1,     1 | F32     | ln1_g
     14:       9216 |   768,    12,     1,     1 | F32     | ln2_b
     15:       9216 |   768,    12,     1,     1 | F32     | ln2_g
     16:        768 |   768,     1,     1,     1 | F32     | lnf_b
     17:        768 |   768,     1,     1,     1 | F32     | lnf_g
     18:      50258 | 50258,     1,     1,     1 | I32     | idx
     19:     356735 | 356735,    1,     1,     1 | I8      | decoder_txt
     20:      50002 | 50002,     1,     1,     1 | I32     | vocab_idx
     21:     406304 | 406304,    1,     1,     1 | I8      | vocab_txt
     22:        256 |   256,     1,     1,     1 | I32     | byte_decoder

and 355M:

$ gguf-dump model_fastgpt_355M_v2.gguf 
* Loading: model_fastgpt_355M_v2.gguf
* File is LITTLE endian, script is running on a LITTLE endian host.

* Dumping 4 key/value pair(s)
      1: UINT32     |        1 | GGUF.version = 3
      2: UINT64     |        1 | GGUF.tensor_count = 22
      3: UINT64     |        1 | GGUF.kv_count = 1
      4: INT32      |        1 | general.data_offset = 1088

* Dumping 22 tensor(s)
      1:         12 |    12,     1,     1,     1 | I32     | header
      2:   51463168 |  1024, 50257,     1,     1 | F32     | wte
      3:    1048576 |  1024,  1024,     1,     1 | F32     | wpe
      4:  100663296 |  4096,  1024,    24,     1 | F32     | mlp_fc_w
      5:      98304 |  4096,    24,     1,     1 | F32     | mlp_fc_b
      6:  100663296 |  1024,  4096,    24,     1 | F32     | mlp_proj_w
      7:      24576 |  1024,    24,     1,     1 | F32     | mlp_proj_b
      8:   75497472 |  3072,  1024,    24,     1 | F32     | attn_w
      9:      73728 |  3072,    24,     1,     1 | F32     | attn_b
     10:   25165824 |  1024,  1024,    24,     1 | F32     | attn_proj_w
     11:      24576 |  1024,    24,     1,     1 | F32     | attn_proj_b
     12:      24576 |  1024,    24,     1,     1 | F32     | ln1_b
     13:      24576 |  1024,    24,     1,     1 | F32     | ln1_g
     14:      24576 |  1024,    24,     1,     1 | F32     | ln2_b
     15:      24576 |  1024,    24,     1,     1 | F32     | ln2_g
     16:       1024 |  1024,     1,     1,     1 | F32     | lnf_b
     17:       1024 |  1024,     1,     1,     1 | F32     | lnf_g
     18:      50258 | 50258,     1,     1,     1 | I32     | idx
     19:     356735 | 356735,    1,     1,     1 | I8      | decoder_txt
     20:      50002 | 50002,     1,     1,     1 | I32     | vocab_idx
     21:     406304 | 406304,    1,     1,     1 | I8      | vocab_txt
     22:        256 |   256,     1,     1,     1 | I32     | byte_decoder

driver.f90

ci/build.sh

certik added 6 commits March 15, 2024 13:05

Save the model to GGUF format

85196dd

Do not write any kv pair

3400108

Modify the loader to read GGUF file

f932eb6

Update CI, do not generate model.dat at all

0197fc5

Comment out running using model.dat

c1dbc36

Install the gguf Python library

e6f866a

certik commented Mar 15, 2024

View reviewed changes

driver.f90 Show resolved Hide resolved

certik commented Mar 15, 2024

View reviewed changes

ci/build.sh Outdated Show resolved Hide resolved

certik marked this pull request as draft March 15, 2024 20:23

certik added 4 commits March 15, 2024 18:04

Save the data_offset into GGUF

3f22464

Add an assert to ensure it is the first variable

7874954

Bump the model version

40d6672

Download model_fastgpt_124M_v2.gguf from HF

61da291

certik marked this pull request as ready for review March 16, 2024 00:42

certik merged commit caf364a into main Mar 17, 2024
2 checks passed

certik deleted the gguf branch March 17, 2024 17:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use GGUF to store model weights #69

Use GGUF to store model weights #69

certik commented Mar 15, 2024 •

edited

Loading

Use GGUF to store model weights #69

Use GGUF to store model weights #69

Conversation

certik commented Mar 15, 2024 • edited Loading

certik commented Mar 15, 2024 •

edited

Loading