GGUF: C++ refactor, backend support, misc fixes #11030

JohannesGaessler · 2025-01-01T18:01:31Z

This PR splits the GGUF code into a separate file and refactors it as C++, see discussion in #10655 . In addition to the lists in the linked PR:

Changed the return type of gguf_write_to_file from void to bool. The return value indicates success. On master a failure to open the output file results in a crash, a failure to write to the file is silently ignored.
Restricted the key general.alignment to uint32_t and powers of 2. On master this key can be set to other types (allowing users to write a file that then causes an error on read) and other values (which don't work correctly with GGML_PAD). There is now a macro GGUF_KEY_GENERAL_ALIGNMENT since this key has a special meaning.
When trying to load GGUF files with a higher version than the current software an error is raised.
If user code tries to call gguf_get_arr_data on a string array an error is raised. On master this returns a pointer of type gguf_str, a type defined in ggml.c. I would consider this a misuse of the API.
Now std::stringstream is used in test-gguf which (unlike tmpfile) should work without elevated privileges on Windows.

What could still be done:

On master when determining the meta data size no memory is allocated and the growing of the buffer is simulated instead. With this PR the dynamically growing memory buffer with explicit offset and allocated size is replaced with a vector. It would add complexity to restore the functionality on master and I'm not convinced it's a worthwhile optimization since the metadata should be quite small.
On master when setting a key-value pair that already exists the entry with the corresponding name is overwritten. With this PR the old entry is removed and a new entry is appended instead. As a consequence the resulting order of KV pairs is different. I don't think the order of KV pairs is important since they should only ever be accessed by index when iterated over. Maybe we should just sort the KV pairs alphabetically.

Djip007 · 2025-01-01T19:55:10Z

ggml/src/gguf.cpp

+        if constexpr (std::is_same<T, uint32_t>::value) {
+            GGML_ASSERT(val > 0 && (val & (val - 1)) == 0 && GGUF_KEY_GENERAL_ALIGNMENT " must be power of 2");
+        } else {
+            GGML_ABORT(GGUF_KEY_GENERAL_ALIGNMENT " must be type u32");


can be a static_assert ?

No it cannot.

My bad! miss the first if ... I have to be more careful next time.

JohannesGaessler · 2025-01-01T20:07:57Z

ggml/src/gguf.cpp

+
+        struct ggml_context * ctx_data = *params.ctx;
+
+        struct ggml_tensor * data = nullptr;


I prefer to use the struct keyword even if not strictly necessary in C++ code.

JohannesGaessler · 2025-01-01T20:09:14Z

The MacOS compilation failure seems to be due to the implicit use of ggml_tensor.backend in copy constructors, which is deprecated. Should we just remove it?

slaren · 2025-01-01T20:14:27Z

Yes, we should remove it now that the SYCL backend does not use it anymore.

JohannesGaessler · 2025-01-01T22:06:39Z

I went ahead and added myself to CODEOWNERS for the newly split off GGUF source files (and also the training code and the CUDA files for matrix multiplication and FlashAttention).

Djip007 · 2025-01-01T22:16:59Z

The MacOS compilation failure seems to be due to the implicit use of ggml_tensor.backend in copy constructors, which is deprecated. Should we just remove it?

Or create the copy constructor for the struct gguf_tensor_info.

ggml/src/gguf.cpp

JohannesGaessler · 2025-01-02T08:02:05Z

I just realized that I (unintentionally) changed how arrays of size 1 are treated. With this PR they are converted to scalars, on master they are not.

ggml/include/gguf.h

ggml/src/gguf.cpp

ggerganov · 2025-01-02T08:23:07Z

ggml/src/gguf.cpp

+// TODO this returns int but the underlying type is uint64
+int gguf_get_n_kv(const struct gguf_context * ctx) {
+    return ctx->kv.size();
+}
+
+// TODO should this return uint64_t?
+int gguf_find_key(const struct gguf_context * ctx, const char * key) {
+    // return -1 if key not found
+    int keyfound = -1;


Yes, I think so. In general, we use size_t for memory sizes and offsets expressed in bytes and int64_t for counting things like number of elements. The GGUF spec slightly deviates from the convention and uses uint64_t for both of these, which is not ideal. But at least the interface should match the underlying types.

Also, there are GGUF functions that use int for indexing KV and tensors that would also need to be updated:

GGML_API uint8_t gguf_get_val_u8 (const struct gguf_context * ctx, int key_id); GGML_API const char * gguf_get_tensor_name (const struct gguf_context * ctx, int i);

However, this might become a breaking change. Maybe we should do this in a follow-up PR?

Changing the data type from a signed to an unsigned integer would definitely be a breaking change since the intended way to check whether a key exists is to check whether the returned index is non-negative. It would also be possible to return int64_t since the maximum number of gguf_kv/gguf_tensor_info elements that can be allocated on a 64 bit system is going to be smaller than the maximum value that can be represented with int64_t.

ggerganov · 2025-01-02T08:24:49Z

ggml/src/gguf.cpp

+// TODO should this be a const pointer? should it exist at all?
+void * gguf_get_data(const struct gguf_context * ctx) {
+    return ctx->data;
+}


I guess we can remove this.

Djip007 · 2025-01-02T11:02:10Z

An other idea for gguf_kv

#include <string>
#include <cstdint>
#include <vector>
#include <memory>

[[noreturn]] void ggml_abort(const char * file, int line, const char * fmt, ...);

#define GGML_ABORT(...) ggml_abort(__FILE__, __LINE__, __VA_ARGS__)

enum gguf_type {
    GGUF_TYPE_UINT8   = 0,
    // [...]
    GGUF_TYPE_STRING  = 8,
    // [...]
    GGUF_TYPE_COUNT,       // marks the end of the enum
};

template <typename T>
struct type_to_gguf_type;

template <>
struct type_to_gguf_type<uint8_t> {
    static constexpr enum gguf_type value = GGUF_TYPE_UINT8;
};

// [...]
template <>
struct type_to_gguf_type<std::string> {
    static constexpr enum gguf_type value = GGUF_TYPE_STRING;
};

class gguf_kv {
    gguf_kv(gguf_kv&) = delete;
    gguf_kv(gguf_kv&&) = delete;
    gguf_kv* operator=(gguf_kv&) = delete;
    gguf_kv* operator=(gguf_kv&&) = delete;
public:
    gguf_kv(const std::string& key, gguf_type type):
        key {key},
        type {type}
    {};
    virtual ~gguf_kv() {};

    const std::string & get_key() const {
        return key;
    }

    const enum gguf_type & get_type() const {
        return type;
    }

    virtual size_t get_ne() const = 0;

    virtual const uint8_t & get_val_uint8(const size_t i = 0) const = 0;
    // [...]
    virtual const std::string & get_val_string(const size_t i = 0) const = 0;

protected:
    const std::string key;
    const gguf_type type;
};

template<typename T, bool VECTOR> struct gguf_kv_type;
template<typename T> struct gguf_kv_type<T, true> {
    using type = std::vector<T>;
};
template<typename T> struct gguf_kv_type<T, false> {
    using type = T;
};

template<typename T, bool VECTOR = false>
class gguf_kv_t : public gguf_kv {
private:
    gguf_kv_t(gguf_kv_t&) = delete;
    gguf_kv_t(gguf_kv_t&&) = delete;
    gguf_kv_t* operator=(gguf_kv_t&) = delete;
    gguf_kv_t* operator=(gguf_kv_t&&) = delete;

public:
    using value_type = T;
    using type = typename gguf_kv_type<T,VECTOR>::type;

    gguf_kv_t(const std::string& key, const type& value):
        gguf_kv(key, type_to_gguf_type<value_type>::value),
        value(value)
    { }

    size_t get_ne() const override {
        if constexpr (VECTOR) {
            return value.size();
        } else {
            return 1;
        }
    }

    virtual const uint8_t & get_val_uint8(const size_t i = 0) const {
        if constexpr (type_to_gguf_type<T>::value == GGUF_TYPE_UINT8) {
            return get_val(i);
        }
        GGML_ABORT("Not a GGUF_TYPE_UINT8");
    }
    // [...]
    virtual const std::string & get_val_string(const size_t i = 0) const {
        if constexpr (type_to_gguf_type<T>::value == GGUF_TYPE_STRING) {
            return get_val(i);
        }
        GGML_ABORT("Not a GGUF_TYPE_STRING");
    }

private:
    inline const type & get_val(const size_t i = 0) const {
        if constexpr (VECTOR) {
            return value[i];
        } else {
            return value;
        }
    }

protected:
    type value;
};

int main() {
    std::unique_ptr<gguf_kv> t_string {new gguf_kv_t<std::string, false>("key", "value")};
    std::unique_ptr<gguf_kv> t_uint8 {new gguf_kv_t<uint8_t, false>("key", 0)};

    std::vector<std::unique_ptr<gguf_kv>> list;
    list.push_back(std::move(t_string));
    list.push_back(std::move(t_uint8));
}

JohannesGaessler · 2025-01-02T11:16:24Z

I considered making gguf_kv a C++ class in a similar way to what you are proposing but decided against it because I think it adds too much complexity.

Djip007 · 2025-01-02T19:18:26Z

ggml/src/gguf.cpp

+    const uint64_t n_kv = src->kv.size();
+    for (uint64_t i = 0; i < n_kv; ++i) {
+        const struct gguf_kv & kv = src->kv[i];


for (const auto & kv : src->kv) { [...] switch (kv.get_type()) [...]

?

My personal preference it to use loop variables and indexing since I find that to be more convenient in conjunction with GNU Debugger.

if you what to use indexing (a bad idea for me in that case) you have to use

std::vector<struct gguf_kv>::size_type ... // or auto when possible

and not a uint64_t.

uint64_t is acceptable here because it's the underlay type of n_kv when writing to/reading from a file. Another option is size_t, as we usually use it in the llama.cpp code base.

slaren · 2025-01-02T21:04:00Z

ggml_fopen handles conversion of utf-8 filenames on Windows, removing it is likely to cause issues.

JohannesGaessler · 2025-01-02T22:06:39Z

Thanks, I wasn't aware. The only reason why I switched from FILE * to C++ streams is that on Windows tmpfile requires elevated privileges but that functionality is only needed for the test code. But the (de)serialization itself should not depend on the OS and our test coverage includes non-Windows systems so I think it's fine to revert to FILE * and tmpfile.

Djip007 · 2025-01-03T00:36:51Z

or use the std::filesystem::path

JohannesGaessler · 2025-01-03T08:24:03Z

I don't remember the exact context but I previously (I think it was in 2023) made a PR with portability issues related to files on Windows and when I suggested using std::filesystem Georgi said his experience with the library was bad and that it should not be used.

ggerganov · 2025-01-03T08:39:04Z

I don't remember the exact context but I previously (I think it was in 2023) made a PR with portability issues related to files on Windows and when I suggested using std::filesystem Georgi said his experience with the library was bad and that it should not be used.

My experience is from quite a few years ago and things have likely improved. Moreover, we are already using std::filesystem in ggml-backend-reg.cpp, so it's no longer not recommended.

But in any case, if FILE * is enough for the GGUF functionality then we should keep using it.

JohannesGaessler · 2025-01-03T13:47:31Z

I changed the GGUF API as follows:

The number of KV pairs/tensors is now defined using int64_t instead of uint64_t. This does not affect the theoretical maximum number on 64 bit systems since the limiting factor there should be the size of the memory allocation for a vector. The KV pair/tensor ids similarly are now also handled as int64_t. This type change should be fine in most circumstances but it can be a breaking change for e.g. typedefs or print statements.
The version is now returned as uint32_t instead of int to be consistent with the data type in the file.
Externally visible array indices now use size_t.
gguf_remove_key now returns the previous id of the removed key. Currently unused but may be useful I think.

remove ggml_tensor.backend update CODEOWNERS [no ci] remove gguf_get_data from API revise GGUF API data types

ggerganov · 2025-01-04T14:06:10Z

ggml/src/gguf.cpp

+        return fread(dst.data(), 1, dst.length(), file) == dst.length();
+    }
+
+    bool read(void * dst, const size_t size) {


Suggested change

bool read(void * dst, const size_t size) {

bool read(void * dst, const size_t size) const {

github-actions bot added testing Everything test related examples ggml changes relating to the ggml tensor library for machine learning labels Jan 1, 2025

Djip007 reviewed Jan 1, 2025

View reviewed changes

Djip007 reviewed Jan 2, 2025

View reviewed changes

ggml/src/gguf.cpp Outdated Show resolved Hide resolved

ggerganov reviewed Jan 2, 2025

View reviewed changes

Djip007 reviewed Jan 2, 2025

View reviewed changes

JohannesGaessler added 2 commits January 3, 2025 15:29

GGUF: C++ refactor, backend support, misc fixes

b87784d

remove ggml_tensor.backend update CODEOWNERS [no ci] remove gguf_get_data from API revise GGUF API data types

fix gguf_set_kv

b456e10

JohannesGaessler force-pushed the gguf-refactor-7 branch from 1f36d19 to b456e10 Compare January 3, 2025 14:35

JohannesGaessler added 2 commits January 3, 2025 15:40

fix print datatypes

e8ac094

fix Windows return statement

0857839

ggerganov approved these changes Jan 4, 2025

View reviewed changes

const methods

4ee662f

JohannesGaessler merged commit 53ff6b9 into ggerganov:master Jan 7, 2025
48 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GGUF: C++ refactor, backend support, misc fixes #11030

GGUF: C++ refactor, backend support, misc fixes #11030

JohannesGaessler commented Jan 1, 2025 •

edited

Loading

Djip007 Jan 1, 2025

JohannesGaessler Jan 1, 2025

Djip007 Jan 1, 2025

This comment was marked as resolved.

JohannesGaessler Jan 1, 2025

JohannesGaessler commented Jan 1, 2025

slaren commented Jan 1, 2025

JohannesGaessler commented Jan 1, 2025

Djip007 commented Jan 1, 2025 •

edited

Loading

JohannesGaessler commented Jan 2, 2025

ggerganov Jan 2, 2025

JohannesGaessler Jan 2, 2025

ggerganov Jan 2, 2025

Djip007 commented Jan 2, 2025

JohannesGaessler commented Jan 2, 2025

Djip007 Jan 2, 2025

JohannesGaessler Jan 2, 2025

Djip007 Jan 3, 2025

ngxson Jan 3, 2025

slaren commented Jan 2, 2025

JohannesGaessler commented Jan 2, 2025

Djip007 commented Jan 3, 2025

JohannesGaessler commented Jan 3, 2025

ggerganov commented Jan 3, 2025

JohannesGaessler commented Jan 3, 2025

ggerganov Jan 4, 2025


		struct ggml_context * ctx_data = *params.ctx;

		struct ggml_tensor * data = nullptr;

	bool read(void * dst, const size_t size) {
	bool read(void * dst, const size_t size) const {

GGUF: C++ refactor, backend support, misc fixes #11030

GGUF: C++ refactor, backend support, misc fixes #11030

Conversation

JohannesGaessler commented Jan 1, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as resolved.

Choose a reason for hiding this comment

JohannesGaessler commented Jan 1, 2025

slaren commented Jan 1, 2025

JohannesGaessler commented Jan 1, 2025

Djip007 commented Jan 1, 2025 • edited Loading

JohannesGaessler commented Jan 2, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Djip007 commented Jan 2, 2025

JohannesGaessler commented Jan 2, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

slaren commented Jan 2, 2025

JohannesGaessler commented Jan 2, 2025

Djip007 commented Jan 3, 2025

JohannesGaessler commented Jan 3, 2025

ggerganov commented Jan 3, 2025

JohannesGaessler commented Jan 3, 2025

Choose a reason for hiding this comment

JohannesGaessler commented Jan 1, 2025 •

edited

Loading

Djip007 commented Jan 1, 2025 •

edited

Loading