What is a good local workflow #1794

msaroufim · 2023-08-18T22:27:52Z

I have a few folks I'm working with that are working on some non-accuracy preserving ML optimization techniques so their workflow will look like: make some model update, check out accuracy, make another model update, check accuracy again etc..

The only 2 ways I see of them using HELM are either

The new HTTP client: Although people found that is too high overhead for local development and for some remote HPC clusters you won't have the option of opening up a server
HF hub route: Assuming you can fit your model into transformers you then need to upload it to a remote store every time you want to do an eval

So I'm curious what folks think would be the simplest and lowest overhead approach to run HELM frequently locally while making as few model changes as possible from a PyTorch nn.Module

The text was updated successfully, but these errors were encountered:

yifanmai · 2023-08-21T18:37:11Z

Some options here:

The HF route: we support loading a Hugging Face model from a local path to a checkpoint directory, though it is not well-documented yet. This is done using the --enable-local-huggingface-models flag. See Eval local HF models with flag, add LLaMA and Alpaca #1505 for more info.
If they're using something like Lit-GPT, we will have better support for that soon; see Lit-GPT model integration #1783 and Model integration for Lit-GPT #1792
They could write a raw generate function that calls directly into PyTorch, and then implement their own Client that wraps around it; this is similar to how Model integration for Lit-GPT #1792 works.
If they're using another popular training / inference framework that we don't have support for, we could consider adding support for it, depending on the framework.

msaroufim · 2023-08-21T21:16:32Z

So the audience I have in mind is Pytorch devs who may or may not be using some underlying framework. In particular I was wondering for the generate() path do they only need to implement a custom Client in which case can we have an opinionated raw PyTorch client? Do people also need to implement the service, window, schema etc..?

yifanmai · 2023-08-21T21:27:32Z

can we have an opinionated raw PyTorch client?

I like the idea of having a opinionated raw PyTorch client. Would you have an example of what this would look like? I imagine one way to do this is for the client to read in configuration on the nn.Module name (which it could auto-import) and various parameters for the module and path to a tokenizer, and the module would have to conform to a particular spec.

I'm very close to having model configuration files working, but in the meantime, I think the client can read from environment variables for this configuration.

Do people also need to implement the service, window, schema etc..?

In general, I'm very close (about a couple weeks away) to getting configuration files for the models and tokenizers working end-to-end. After that's done, users won't need to modify the HELM code; they can just specify the configuration file, and (1) the schema will be auto-generated, (2) the default window service will be used with (3) the tokenizer they specify.

This does assume that users are using a "standard-ish" tokenizer like the Hugging Face ones.

HDCharles · 2023-08-21T23:32:05Z

I'm one of the users @msaroufim is talking about, specifically I do quantized inference. The flow is generally 1)load a model from a checkpoint, 2) perform module swaps on the model to place quantized models where necessary, 3) run model over a small piece of the dataset to calibrate (usually test on test and calibration on validation or something), 4) apply final transformations 5) then run eval.

what would be the easiest way to do that? We're constantly tweaking the 2nd and 4th steps so it sounds like editing the generate function would be the easiest but i'm not sure if you have access to the necessary data at that point

msaroufim · 2023-08-29T01:49:35Z

Thanks @yifanmai if you have a WIP branch that you'd like us to try out and give feedback for please let us know

msaroufim · 2023-09-23T23:44:29Z

Hi @yifanmai any update on this? Really eager to try something out, with efficient helm we can start running evals in pytorch CI at a reasonable cost

yifanmai · 2023-09-26T17:28:27Z

Hi @msaroufim, sorry for the delay. I opened a PR #1861 that should improve the user workflow. Would you have some time to try it out in the next couple of days?

msaroufim · 2023-09-27T22:54:13Z

Oh interesting I was under the impression that's a PR specific to the NeurIPS competition - will review asap

Just to be clear there are 2 different scenarios I'm interested in for HELM

The NeurIPS competition
PyTorch CI for quantization/sparsity work

yifanmai · 2023-09-27T22:58:16Z

Sorry, I think I put this under the wrong issue. #1861 is more relevant to the NeurIPS competition. I need to think more about the PyTorch CI use case.

@HDCharles and @msaroufim I was also curious if there are any code in a git branch or gist or Python notebook that demonstrates doing this quantization pipeline outside of HELM? If you could share this with me, that would help me understand the intended workflow better.

yifanmai · 2023-09-27T23:16:04Z

Also wondering: is this doing the GPTQ algorithm or something else? I'm planning to add quantization to the HuggingFaceClient soon (HF quantization API doc), so maybe the Pytorch integration would be similar.

msaroufim · 2023-09-28T00:27:19Z

I'm gonna be out tmrw for a conference let me write a more comprehensive repro and ask and get back to you in the next couple of days

yifanmai · 2023-09-28T01:33:23Z

Sounds good, no rush. Enjoy the conference!

yifanmai · 2024-08-06T16:13:30Z

Closing due to staleness, but feel free to reopen if there are further questions.

We now support quantization: see #1912 for details.

HELM has changed quite substantially since this issue was open. Currently the recommended routes for running a local models are:

Running local inference from a Hugging Face checkpoint on disk, or
Running remote inference using a vLLM server

See #2463 for an explanation of how these methods work. Both methods only require modifying a configuration file, and do not require adding any Python code.

msaroufim added documentation Improvements or additions to documentation competition Support for the NeurIPS Large Language Model Efficiency Challenge labels Aug 18, 2023

yifanmai added the user question label Aug 21, 2023

ruixin31 mentioned this issue Feb 8, 2024

Feature request: batching and running HELM as a library #2336

Open

yifanmai closed this as completed Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is a good local workflow #1794

What is a good local workflow #1794

msaroufim commented Aug 18, 2023 •

edited

Loading

yifanmai commented Aug 21, 2023

msaroufim commented Aug 21, 2023

yifanmai commented Aug 21, 2023 •

edited

Loading

HDCharles commented Aug 21, 2023

msaroufim commented Aug 29, 2023

msaroufim commented Sep 23, 2023 •

edited

Loading

yifanmai commented Sep 26, 2023

msaroufim commented Sep 27, 2023

yifanmai commented Sep 27, 2023

yifanmai commented Sep 27, 2023

msaroufim commented Sep 28, 2023 •

edited

Loading

yifanmai commented Sep 28, 2023

yifanmai commented Aug 6, 2024

What is a good local workflow #1794

What is a good local workflow #1794

Comments

msaroufim commented Aug 18, 2023 • edited Loading

yifanmai commented Aug 21, 2023

msaroufim commented Aug 21, 2023

yifanmai commented Aug 21, 2023 • edited Loading

HDCharles commented Aug 21, 2023

msaroufim commented Aug 29, 2023

msaroufim commented Sep 23, 2023 • edited Loading

yifanmai commented Sep 26, 2023

msaroufim commented Sep 27, 2023

yifanmai commented Sep 27, 2023

yifanmai commented Sep 27, 2023

msaroufim commented Sep 28, 2023 • edited Loading

yifanmai commented Sep 28, 2023

yifanmai commented Aug 6, 2024

msaroufim commented Aug 18, 2023 •

edited

Loading

yifanmai commented Aug 21, 2023 •

edited

Loading

msaroufim commented Sep 23, 2023 •

edited

Loading

msaroufim commented Sep 28, 2023 •

edited

Loading