-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rectorization to introduce the separation between Model metadata and deployment #1903
Rectorization to introduce the separation between Model metadata and deployment #1903
Conversation
… and _decode_raw_response_to_text
…ford-crfm/helm into joss-refactor-4-deployments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably easier to explain what I have in mind using code, so I opened #2002, PTAL.
Co-authored-by: JosselinSomervilleRoberts <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need these changes to fix conflicts with #1998.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In src/helm/benchmark/adaptation/adapters/test_*_adapter.py
:
For every AdapterSpec with model="X"
, also set model_deployment="X"
. The specific failing test is src/helm/benchmark/adaptation/adapters/test_language_modeling_adapter.py::TestLanguageModelingAdapter::test_prompt_wrapping
Update the pull request description to discuss the fact that the cache organization is now the host rather than the creator organization. (e.g.
|
Summary
This is only a draft PR of what will be the modularization regarding models in HELM.
This separates
Model
into two concepts:ModelMetadata
andModelDeployment
(both of these objects already existed so we had 3 different 3 objects which could be very confusing.Here are a few motivations behind those changes:
meta/llama-7b
can be served ashuggingface/llama-7b
ortogether/llama-7b
.auto_client.py
andwindiw_service_factory.py
.Some details
Let's now introduce the high-level changes of this PR. The first thing is to define
ModelDeployment
and ModelMetadata`:In
model_metadata_registry.py
:In
model_deployment_registry.py
:In
request.py
:(Both
model
andmodel_deployment
should now be specified inRequest
.)In most of the codebase, all references to
Model
are replaced by references toModelDeployment
(there are a few exceptions includingGeneralInfo
, website-related things and mode).A config now looks like this
Instead of (but still supported):
Backwards compatibility
Supported use cases
Here are a few examples of runs and the expected behavior:
model=text
-> same behavior as today (for each text model, find one deployment and run it).model=meta/llama-2-7b
-> runs one of the deployment ofllama-2-7b
->together/llama-2-7b
. Additionally prints a warning saying that one deployment was randomly chosen and that the user should use the next bullet point to be more explicit.model_deployment=huggingface/llama-2-7b
-> as expectedmodel=meta/llama-2-7b,model_deployment=huggingface/llama-2-7b
-> as expectedmodel=meta/llama-2-7b,model_deployment=huggingface/gpt2
-> Raises error (incompatible)model_deployment=text
-> Raises error *(ambiguous, do we run several deployments for the same model?)`This works thanks to the
get_default_model_deployment_for_model
function inrun_specs.py
. This function is responsible for finding a valid deployment name given a model name (of an old config for example).The process to find a model deployment name is as follows:
1. If there is a model deployment with the same name as the model arg, use it.
2. If there is at least one deployment for the model, use the first one that is available (if possible not deprecated)
3. If there are no deployments for the model, raise an error.
Support of old runs
Old runs are still valid and will work with
helm-summarize
andhelm-server
. Once can totally have in the same suite some old and new runs and everything will work as expected.WARNING: The Cache will not completely be preserved. This is due to the fact that we are now using consistently the host organization name for the caching while we were using a mix of host and creator in the past. Some hosts won't be affected (like
openai
) but some others will be (likebigscience
that is now cached intogether
).Consequent changes
This PR enabled some other changes:
WindowService
assignment.Addition of a new model
To add a new model, the user simply needs to update the 3
.yaml
files. In general, there will be no need for a customWindowService
and the default one will work.What needs to be done in follow-up PRs
AutoTokenizer
( Add an AutoTokenizer #1994 )schema.yaml
butmodel_metadata.yaml
).gitignore
so that people can easily add their own model in the repo) ( Add private configs #1996 )DefaultWindowService
model_deployment
instead ofmodel
(Some done in this PR).ModelDeployment
and create some properties inModelMetadata
to replace some tags; ( Remove many tags #1995 )TEXT_MODEL_TAG
,IMAGE_MODEL_TAG
,CODE_MODEL_TAG
,TEXT_SIMILARITY_MODEL_TAG
->media_type: List[str]
FULL_FUNCTIONALITY_TEXT_MODEL_TAG
,LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG
->functionality: str
INSTRUCTION_FOLLOWING_MODEL_TAG
->instruction_following: bool
ABLATION_MODEL_TAG
->ablation: bool