Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GGUF: C++ refactor, backend support, misc fixes #11030

Merged
merged 5 commits into from
Jan 7, 2025

Conversation

JohannesGaessler
Copy link
Collaborator

@JohannesGaessler JohannesGaessler commented Jan 1, 2025

This PR splits the GGUF code into a separate file and refactors it as C++, see discussion in #10655 . In addition to the lists in the linked PR:

  • Changed the return type of gguf_write_to_file from void to bool. The return value indicates success. On master a failure to open the output file results in a crash, a failure to write to the file is silently ignored.
  • Restricted the key general.alignment to uint32_t and powers of 2. On master this key can be set to other types (allowing users to write a file that then causes an error on read) and other values (which don't work correctly with GGML_PAD). There is now a macro GGUF_KEY_GENERAL_ALIGNMENT since this key has a special meaning.
  • When trying to load GGUF files with a higher version than the current software an error is raised.
  • If user code tries to call gguf_get_arr_data on a string array an error is raised. On master this returns a pointer of type gguf_str, a type defined in ggml.c. I would consider this a misuse of the API.
  • Now std::stringstream is used in test-gguf which (unlike tmpfile) should work without elevated privileges on Windows.

What could still be done:

  • On master when determining the meta data size no memory is allocated and the growing of the buffer is simulated instead. With this PR the dynamically growing memory buffer with explicit offset and allocated size is replaced with a vector. It would add complexity to restore the functionality on master and I'm not convinced it's a worthwhile optimization since the metadata should be quite small.
  • On master when setting a key-value pair that already exists the entry with the corresponding name is overwritten. With this PR the old entry is removed and a new entry is appended instead. As a consequence the resulting order of KV pairs is different. I don't think the order of KV pairs is important since they should only ever be accessed by index when iterated over. Maybe we should just sort the KV pairs alphabetically.

@github-actions github-actions bot added testing Everything test related examples ggml changes relating to the ggml tensor library for machine learning labels Jan 1, 2025
if constexpr (std::is_same<T, uint32_t>::value) {
GGML_ASSERT(val > 0 && (val & (val - 1)) == 0 && GGUF_KEY_GENERAL_ALIGNMENT " must be power of 2");
} else {
GGML_ABORT(GGUF_KEY_GENERAL_ALIGNMENT " must be type u32");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be a static_assert ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it cannot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad! miss the first if ... I have to be more careful next time.


struct ggml_context * ctx_data = *params.ctx;

struct ggml_tensor * data = nullptr;

This comment was marked as resolved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to use the struct keyword even if not strictly necessary in C++ code.

@JohannesGaessler
Copy link
Collaborator Author

The MacOS compilation failure seems to be due to the implicit use of ggml_tensor.backend in copy constructors, which is deprecated. Should we just remove it?

@slaren
Copy link
Collaborator

slaren commented Jan 1, 2025

Yes, we should remove it now that the SYCL backend does not use it anymore.

@JohannesGaessler
Copy link
Collaborator Author

I went ahead and added myself to CODEOWNERS for the newly split off GGUF source files (and also the training code and the CUDA files for matrix multiplication and FlashAttention).

@Djip007
Copy link
Contributor

Djip007 commented Jan 1, 2025

The MacOS compilation failure seems to be due to the implicit use of ggml_tensor.backend in copy constructors, which is deprecated. Should we just remove it?

Or create the copy constructor for the struct gguf_tensor_info.

ggml/src/gguf.cpp Outdated Show resolved Hide resolved
@JohannesGaessler
Copy link
Collaborator Author

I just realized that I (unintentionally) changed how arrays of size 1 are treated. With this PR they are converted to scalars, on master they are not.

ggml/include/gguf.h Outdated Show resolved Hide resolved
ggml/src/gguf.cpp Outdated Show resolved Hide resolved
ggml/src/gguf.cpp Outdated Show resolved Hide resolved
ggml/src/gguf.cpp Outdated Show resolved Hide resolved
Comment on lines 725 to 733
// TODO this returns int but the underlying type is uint64
int gguf_get_n_kv(const struct gguf_context * ctx) {
return ctx->kv.size();
}

// TODO should this return uint64_t?
int gguf_find_key(const struct gguf_context * ctx, const char * key) {
// return -1 if key not found
int keyfound = -1;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so. In general, we use size_t for memory sizes and offsets expressed in bytes and int64_t for counting things like number of elements. The GGUF spec slightly deviates from the convention and uses uint64_t for both of these, which is not ideal. But at least the interface should match the underlying types.

Also, there are GGUF functions that use int for indexing KV and tensors that would also need to be updated:

    GGML_API uint8_t      gguf_get_val_u8  (const struct gguf_context * ctx, int key_id);

    GGML_API const char *   gguf_get_tensor_name  (const struct gguf_context * ctx, int i);

However, this might become a breaking change. Maybe we should do this in a follow-up PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the data type from a signed to an unsigned integer would definitely be a breaking change since the intended way to check whether a key exists is to check whether the returned index is non-negative. It would also be possible to return int64_t since the maximum number of gguf_kv/gguf_tensor_info elements that can be allocated on a 64 bit system is going to be smaller than the maximum value that can be represented with int64_t.

Comment on lines 720 to 723
// TODO should this be a const pointer? should it exist at all?
void * gguf_get_data(const struct gguf_context * ctx) {
return ctx->data;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can remove this.

@Djip007
Copy link
Contributor

Djip007 commented Jan 2, 2025

An other idea for gguf_kv

#include <string>
#include <cstdint>
#include <vector>
#include <memory>

[[noreturn]] void ggml_abort(const char * file, int line, const char * fmt, ...);

#define GGML_ABORT(...) ggml_abort(__FILE__, __LINE__, __VA_ARGS__)

enum gguf_type {
    GGUF_TYPE_UINT8   = 0,
    // [...]
    GGUF_TYPE_STRING  = 8,
    // [...]
    GGUF_TYPE_COUNT,       // marks the end of the enum
};

template <typename T>
struct type_to_gguf_type;

template <>
struct type_to_gguf_type<uint8_t> {
    static constexpr enum gguf_type value = GGUF_TYPE_UINT8;
};

// [...]
template <>
struct type_to_gguf_type<std::string> {
    static constexpr enum gguf_type value = GGUF_TYPE_STRING;
};

class gguf_kv {
    gguf_kv(gguf_kv&) = delete;
    gguf_kv(gguf_kv&&) = delete;
    gguf_kv* operator=(gguf_kv&) = delete;
    gguf_kv* operator=(gguf_kv&&) = delete;
public:
    gguf_kv(const std::string& key, gguf_type type):
        key {key},
        type {type}
    {};
    virtual ~gguf_kv() {};

    const std::string & get_key() const {
        return key;
    }

    const enum gguf_type & get_type() const {
        return type;
    }

    virtual size_t get_ne() const = 0;

    virtual const uint8_t & get_val_uint8(const size_t i = 0) const = 0;
    // [...]
    virtual const std::string & get_val_string(const size_t i = 0) const = 0;

protected:
    const std::string key;
    const gguf_type type;
};

template<typename T, bool VECTOR> struct gguf_kv_type;
template<typename T> struct gguf_kv_type<T, true> {
    using type = std::vector<T>;
};
template<typename T> struct gguf_kv_type<T, false> {
    using type = T;
};

template<typename T, bool VECTOR = false>
class gguf_kv_t : public gguf_kv {
private:
    gguf_kv_t(gguf_kv_t&) = delete;
    gguf_kv_t(gguf_kv_t&&) = delete;
    gguf_kv_t* operator=(gguf_kv_t&) = delete;
    gguf_kv_t* operator=(gguf_kv_t&&) = delete;

public:
    using value_type = T;
    using type = typename gguf_kv_type<T,VECTOR>::type;

    gguf_kv_t(const std::string& key, const type& value):
        gguf_kv(key, type_to_gguf_type<value_type>::value),
        value(value)
    { }

    size_t get_ne() const override {
        if constexpr (VECTOR) {
            return value.size();
        } else {
            return 1;
        }
    }

    virtual const uint8_t & get_val_uint8(const size_t i = 0) const {
        if constexpr (type_to_gguf_type<T>::value == GGUF_TYPE_UINT8) {
            return get_val(i);
        }
        GGML_ABORT("Not a GGUF_TYPE_UINT8");
    }
    // [...]
    virtual const std::string & get_val_string(const size_t i = 0) const {
        if constexpr (type_to_gguf_type<T>::value == GGUF_TYPE_STRING) {
            return get_val(i);
        }
        GGML_ABORT("Not a GGUF_TYPE_STRING");
    }

private:
    inline const type & get_val(const size_t i = 0) const {
        if constexpr (VECTOR) {
            return value[i];
        } else {
            return value;
        }
    }

protected:
    type value;
};

int main() {
    std::unique_ptr<gguf_kv> t_string {new gguf_kv_t<std::string, false>("key", "value")};
    std::unique_ptr<gguf_kv> t_uint8 {new gguf_kv_t<uint8_t, false>("key", 0)};

    std::vector<std::unique_ptr<gguf_kv>> list;
    list.push_back(std::move(t_string));
    list.push_back(std::move(t_uint8));
}

@JohannesGaessler
Copy link
Collaborator Author

I considered making gguf_kv a C++ class in a similar way to what you are proposing but decided against it because I think it adds too much complexity.

Comment on lines 1032 to 1036
const uint64_t n_kv = src->kv.size();
for (uint64_t i = 0; i < n_kv; ++i) {
const struct gguf_kv & kv = src->kv[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for (const auto & kv : src->kv) {
[...]
    switch (kv.get_type())
[...]

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal preference it to use loop variables and indexing since I find that to be more convenient in conjunction with GNU Debugger.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you what to use indexing (a bad idea for me in that case) you have to use

std::vector<struct gguf_kv>::size_type ...
// or auto when possible

and not a uint64_t.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint64_t is acceptable here because it's the underlay type of n_kv when writing to/reading from a file. Another option is size_t, as we usually use it in the llama.cpp code base.

@slaren
Copy link
Collaborator

slaren commented Jan 2, 2025

ggml_fopen handles conversion of utf-8 filenames on Windows, removing it is likely to cause issues.

@JohannesGaessler
Copy link
Collaborator Author

Thanks, I wasn't aware. The only reason why I switched from FILE * to C++ streams is that on Windows tmpfile requires elevated privileges but that functionality is only needed for the test code. But the (de)serialization itself should not depend on the OS and our test coverage includes non-Windows systems so I think it's fine to revert to FILE * and tmpfile.

@Djip007
Copy link
Contributor

Djip007 commented Jan 3, 2025

or use the std::filesystem::path

@JohannesGaessler
Copy link
Collaborator Author

I don't remember the exact context but I previously (I think it was in 2023) made a PR with portability issues related to files on Windows and when I suggested using std::filesystem Georgi said his experience with the library was bad and that it should not be used.

@ggerganov
Copy link
Owner

I don't remember the exact context but I previously (I think it was in 2023) made a PR with portability issues related to files on Windows and when I suggested using std::filesystem Georgi said his experience with the library was bad and that it should not be used.

My experience is from quite a few years ago and things have likely improved. Moreover, we are already using std::filesystem in ggml-backend-reg.cpp, so it's no longer not recommended.

But in any case, if FILE * is enough for the GGUF functionality then we should keep using it.

@JohannesGaessler
Copy link
Collaborator Author

I changed the GGUF API as follows:

  • The number of KV pairs/tensors is now defined using int64_t instead of uint64_t. This does not affect the theoretical maximum number on 64 bit systems since the limiting factor there should be the size of the memory allocation for a vector. The KV pair/tensor ids similarly are now also handled as int64_t. This type change should be fine in most circumstances but it can be a breaking change for e.g. typedefs or print statements.
  • The version is now returned as uint32_t instead of int to be consistent with the data type in the file.
  • Externally visible array indices now use size_t.
  • gguf_remove_key now returns the previous id of the removed key. Currently unused but may be useful I think.

remove ggml_tensor.backend

update CODEOWNERS [no ci]

remove gguf_get_data from API

revise GGUF API data types
return fread(dst.data(), 1, dst.length(), file) == dst.length();
}

bool read(void * dst, const size_t size) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool read(void * dst, const size_t size) {
bool read(void * dst, const size_t size) const {

@JohannesGaessler JohannesGaessler merged commit 53ff6b9 into ggerganov:master Jan 7, 2025
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples ggml changes relating to the ggml tensor library for machine learning testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants