Sparsity support #60

MannyKayy · 2023-04-02T08:38:54Z

MannyKayy
Apr 2, 2023

GGML has risen in significance with regard to CPU-centric approaches as the backbone to whisper.cpp and llama.cpp.
I wonder if it could finally provide a counter-balance and even compete with GPU approaches.

Off the top of my head, I can think of this paper :
SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems by Beidi Chen (@keroro824). This work is both beneficial for Large Scale Neural Network Training and Inference.

Slides 23-29 from their NeurIPs presentation cover the approach (timestamp: minute 6-8 in the associated video).

The SLIDE code is also available from this repo i believe.

It seems like the project did not become widely popular because of inertia within the deeplearning eco-system in favour of GPU driven approaches.

A notable reference is from this paper where it states that,

". For this reason, (Chen et al., 2020) build their SLIDE system in C++ from scratch on CPU. Even though their implementation achieved remarkable speed up, their impact is limited as they implemented their system from scratch in C++, making it difficult for the community to adopt SLIDE in practice. "

Alot of their work on CPU centric machine learning is probably relevant and could be popularized via GGML.

Other work in this space that is of interestest.

This is probably of relevance to discussion at ggerganov/llama.cpp#638
and ggerganov/llama.cpp#521 .

Also interesting article from Tim about the tendancy towards sparsity in attention layers as language models scale. https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/

So squinting at all of this, could hazard a guess that it may be possible to squeeze Llama 65B @ 128GB (38.5 GB @ 4-bit )-> to around 13B (4GB @ 4-bit)... That is, to scale its efficiency to the current equivalent of Llama 7B.

Which would open the door to running GPT3 scale 175B parameter models on CPU, at roughly the same cost as the current Llamma 30B implementation..

Thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparsity support #60

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Sparsity support #60

MannyKayy Apr 2, 2023

Replies: 0 comments

MannyKayy
Apr 2, 2023