sync : llama.cpp #1070

ggerganov · 2025-01-14T07:31:39Z

No description provided.

* Added init tensor calling code * Added get_alloc_size forwarding * Cleaned up and improved type/error handling. * fix: remove trailing whitespaces. * Cleanup and use GGML error logging functions. * Handle potentially dangerous edge cases. * Apply suggestions from code review Co-authored-by: Diego Devesa <[email protected]> --------- Co-authored-by: Diego Devesa <[email protected]>

…tary driver (llama/11074) * Vulkan: Add device-specific blacklist for coopmat for the AMD proprietary driver * Add (TM) to AMD name check

* CUDA: add BF16 support

…ama/11087) * SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 * Revert "SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6" This reverts commit f62dc45f318e48d375e7734b34cbddee81deed52. * Reland: Use get_multi_ptr instead of deprecated get_pointer in wkv6

Remove duplicated macros, use GGML_LOG_ERROR for errors

* GGUF: C++ refactor, backend support, misc fixes remove ggml_tensor.backend update CODEOWNERS [no ci] remove gguf_get_data from API revise GGUF API data types

* fix: Vulkan shader gen binary path when cross compiling

…(llama/11117) * Disable GL_KHR_cooperative_matrix Vulkan extension if not available. * Perform Vulkan extensions checks in a more sensible order * Remove unnecessary #ifdef directive

This change upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for quantised int8 datatype. This change results in 10% - 70% improvement in total speed(ie all tokens/total time), across various batch sizes. The patch is tested with Meta-Lllama-3-8B, Mistral-7B, Llama-2-7B-chat-hf models on a IBM POWER10 machine. Signed-off-by: Amrita H S <[email protected]>

Signed-off-by: hydai <[email protected]>

* SYCL: refactor ggml_sycl_compute_forward * SYCL: add back GGML_USED(dst) to ggml_sycl_cpy * SYCL: add function name to noop debug * SYCL: Some device info print refactoring and add details of XMX availability

@compilade

llama: add support for QRWKV6 model architecture (llama/11001) * WIP: Add support for RWKV6Qwen2 Signed-off-by: Molly Sophia <[email protected]> * RWKV: Some graph simplification Signed-off-by: Molly Sophia <[email protected]> * Add support for RWKV6Qwen2 with cpu and cuda GLA Signed-off-by: Molly Sophia <[email protected]> * RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead Signed-off-by: Molly Sophia <[email protected]> * Fix some typos Signed-off-by: Molly Sophia <[email protected]> * code format changes Signed-off-by: Molly Sophia <[email protected]> * Fix wkv test & add gla test Signed-off-by: Molly Sophia <[email protected]> * Fix cuda warning Signed-off-by: Molly Sophia <[email protected]> * Update README.md Signed-off-by: Molly Sophia <[email protected]> * Update ggml/src/ggml-cuda/gla.cu Co-authored-by: Georgi Gerganov <[email protected]> * Fix fused lerp weights loading with RWKV6 Signed-off-by: Molly Sophia <[email protected]> * better sanity check skipping for QRWKV6 in llama-quant thanks @compilade Signed-off-by: Molly Sophia <[email protected]> Co-authored-by: compilade <[email protected]> --------- Signed-off-by: Molly Sophia <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: compilade <[email protected]>

…roup_size_control validation error (llama/11161) * Vulkan: Remove float16 use in shaders * Fix validation error about subgroup_size_control extension

… (llama/11211) Build fails when using HIP and GGML_BACKEND_DL: ``` /usr/bin/ld: ../ggml/src/libggml.so: undefined reference to `ggml_backend_cuda_reg' collect2: error: ld returned 1 exit status ``` This patch fixes this.

…e improvements) (llama/11042) * Refactor: Moves cuda graph executable update step to separate function. * Refactor: Moves cuda graph update check to separate function. * Refactor: Moves cuda graph maintenance (update or adjusting copy parameters) to separate function for improved readability. * Fix: Adds missing reference to maintain_cuda_graph() definition. * Refactor: Improves structure and abstractions by moving CUDA graph evaluation and capture to its own function. * Refactor: Moves node graph checks and copy ops into individual function for improved readability. * Refactor: Removes code permanently excluded from compilation to increase readability. * Style: Adds missing newline * Style: Consolidates several neighboring '#ifdef USE_CUDA_GRAPH' into a single one * Refactor: Makes 'cuda_graph_update_required' a local variable * remove double lines between functions --------- Co-authored-by: slaren <[email protected]>

ggml-ci

--------- Co-authored-by: Skyler Szot <[email protected]> Co-authored-by: Shangqing Gu <[email protected]> Co-authored-by: Alexander Angus <[email protected]> Co-authored-by: Hongqiang Wang <[email protected]> Co-authored-by: Max Krasnyansky <[email protected]>

ggml-ci

giladgd and others added 23 commits January 14, 2025 09:17

fix: Vulkan shader gen binary path (llama/11037)

b19d9cc

Vulkan: Add device-specific blacklist for coopmat for the AMD proprie…

81fde73

…tary driver (llama/11074) * Vulkan: Add device-specific blacklist for coopmat for the AMD proprietary driver * Add (TM) to AMD name check

CUDA: add BF16 support (llama/11093)

5514692

* CUDA: add BF16 support

rpc : code cleanup (llama/11107)

ef9e2eb

Remove duplicated macros, use GGML_LOG_ERROR for errors

ggml-backend : only offload from host buffers (llama/11120)

4630db8

ggml-backend : only offload from host buffers (fix) (llama/11124)

7150212

GGUF: C++ refactor, backend support, misc fixes (llama/11030)

bacc721

* GGUF: C++ refactor, backend support, misc fixes remove ggml_tensor.backend update CODEOWNERS [no ci] remove gguf_get_data from API revise GGUF API data types

fix: Vulkan shader gen binary path when Cross-compiling (llama/11096)

6a87a43

* fix: Vulkan shader gen binary path when cross compiling

Disable GL_KHR_cooperative_matrix Vulkan extension if not available. …

454cfa3

…(llama/11117) * Disable GL_KHR_cooperative_matrix Vulkan extension if not available. * Perform Vulkan extensions checks in a more sensible order * Remove unnecessary #ifdef directive

fix: add missing msg in static_assert (llama/11143)

d50565e

Signed-off-by: hydai <[email protected]>

SYCL: Refactor ggml_sycl_compute_forward (llama/11121)

620b188

* SYCL: refactor ggml_sycl_compute_forward * SYCL: add back GGML_USED(dst) to ggml_sycl_cpy * SYCL: add function name to noop debug * SYCL: Some device info print refactoring and add details of XMX availability

Vulkan: Fix float16 use on devices without float16 support + fix subg…

67b8cc9

…roup_size_control validation error (llama/11161) * Vulkan: Remove float16 use in shaders * Fix validation error about subgroup_size_control extension

sync : llama.cpp

23fbf2f

ggml-ci

scripts : sync opencl

f6cbb38

scripts : sync gguf

18e8b10

GGUF: C++ refactor, backend support, misc fixes (skip) (llama/11030)

daca9a1

ggml-ci

ggerganov force-pushed the sync-llama.cpp-25-01-14 branch from cc1a4cf to daca9a1 Compare January 14, 2025 07:33

ggerganov merged commit 41c67ee into master Jan 14, 2025
7 of 8 checks passed

ggerganov deleted the sync-llama.cpp-25-01-14 branch January 14, 2025 07:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync : llama.cpp #1070

sync : llama.cpp #1070

ggerganov commented Jan 14, 2025

sync : llama.cpp #1070

sync : llama.cpp #1070

Conversation

ggerganov commented Jan 14, 2025