-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is a good local workflow #1794
Comments
Some options here:
|
So the audience I have in mind is Pytorch devs who may or may not be using some underlying framework. In particular I was wondering for the |
I like the idea of having a opinionated raw PyTorch client. Would you have an example of what this would look like? I imagine one way to do this is for the client to read in configuration on the I'm very close to having model configuration files working, but in the meantime, I think the client can read from environment variables for this configuration.
In general, I'm very close (about a couple weeks away) to getting configuration files for the models and tokenizers working end-to-end. After that's done, users won't need to modify the HELM code; they can just specify the configuration file, and (1) the schema will be auto-generated, (2) the default window service will be used with (3) the tokenizer they specify. This does assume that users are using a "standard-ish" tokenizer like the Hugging Face ones. |
I'm one of the users @msaroufim is talking about, specifically I do quantized inference. The flow is generally 1)load a model from a checkpoint, 2) perform module swaps on the model to place quantized models where necessary, 3) run model over a small piece of the dataset to calibrate (usually test on test and calibration on validation or something), 4) apply final transformations 5) then run eval. what would be the easiest way to do that? We're constantly tweaking the 2nd and 4th steps so it sounds like editing the generate function would be the easiest but i'm not sure if you have access to the necessary data at that point |
Thanks @yifanmai if you have a WIP branch that you'd like us to try out and give feedback for please let us know |
Hi @yifanmai any update on this? Really eager to try something out, with efficient helm we can start running evals in pytorch CI at a reasonable cost |
Hi @msaroufim, sorry for the delay. I opened a PR #1861 that should improve the user workflow. Would you have some time to try it out in the next couple of days? |
Oh interesting I was under the impression that's a PR specific to the NeurIPS competition - will review asap Just to be clear there are 2 different scenarios I'm interested in for HELM
|
Sorry, I think I put this under the wrong issue. #1861 is more relevant to the NeurIPS competition. I need to think more about the PyTorch CI use case. @HDCharles and @msaroufim I was also curious if there are any code in a git branch or gist or Python notebook that demonstrates doing this quantization pipeline outside of HELM? If you could share this with me, that would help me understand the intended workflow better. |
Also wondering: is this doing the GPTQ algorithm or something else? I'm planning to add quantization to the HuggingFaceClient soon (HF quantization API doc), so maybe the Pytorch integration would be similar. |
I'm gonna be out tmrw for a conference let me write a more comprehensive repro and ask and get back to you in the next couple of days |
Sounds good, no rush. Enjoy the conference! |
Closing due to staleness, but feel free to reopen if there are further questions. We now support quantization: see #1912 for details. HELM has changed quite substantially since this issue was open. Currently the recommended routes for running a local models are:
See #2463 for an explanation of how these methods work. Both methods only require modifying a configuration file, and do not require adding any Python code. |
I have a few folks I'm working with that are working on some non-accuracy preserving ML optimization techniques so their workflow will look like: make some model update, check out accuracy, make another model update, check accuracy again etc..
The only 2 ways I see of them using HELM are either
So I'm curious what folks think would be the simplest and lowest overhead approach to run HELM frequently locally while making as few model changes as possible from a PyTorch
nn.Module
The text was updated successfully, but these errors were encountered: