Rectorization to introduce the separation between Model metadata and deployment #1903

JosselinSomervilleRoberts · 2023-10-17T00:50:33Z

Summary

This is only a draft PR of what will be the modularization regarding models in HELM.
This separates Model into two concepts: ModelMetadata and ModelDeployment (both of these objects already existed so we had 3 different 3 objects which could be very confusing.

Here are a few motivations behind those changes:

Some models can be served by different hosts (now the creator organization is not necessarily the host anymore). Example: meta/llama-7b can be served as huggingface/llama-7b or together/llama-7b.
This allows us to better handle dependency injections for all models (see @yifanmai's work for the NeurIps challenge). Eventually (once fully ported), all models, tokenizers, and window services will be defined properly in their definition rather than having huge "else-if" branches in auto_client.py and windiw_service_factory.py.
This simplifies a lot of the code and makes things more readable.
This will enable to publish different results based on the host (@julian-q has noticed that due to some pre/post-processing, the results from together are different than from huggingface for example).

Some details

Let's now introduce the high-level changes of this PR. The first thing is to define ModelDeployment and ModelMetadata`:

In model_metadata_registry.py:

@dataclass(frozen=False)
class ModelMetadata:
    # Name of the model group (e.g. "openai/davinci").
    # This is the name of the model, not the name of the deployment.
    # Usually formatted as "<creator_organization>/<engine_name>".
    # Example: "ai21/j1-jumbo"
    name: str

    # Name of the organization that created the model.
    creator_organization_name: str

    # Name that is going to be displayed to the user (on the website, etc.)
    display_name: str

    # Description of the model, to be displayed on the website.
    description: str

    # Description of the access level of the model.
    # Should be one of the following:
    # - "open": the model is open-source and can be downloaded from the internet.
    # - "closed": TODO(PR)
    # - "limited": TODO(PR)
    # If there are multiple deployments, this should be the most permissive access across
    # all deployments.
    access: str

    # Release date of the model.
    release_date: date

    # Tags corresponding to the properties of the model.
    tags: List[str] = field(default_factory=list)

    # Number of parameters in the model.
    # This should be a string as the number of parameters is usually a round number (175B),
    # but we set it as an int for plotting purposes.
    num_parameters: Optional[int] = None

    # List of the model deployments for this model.
    # Should at least contain one model deployment.
    # Refers to the field "name" in the ModelDeployment class.
    # Defaults to a single model deployment with the same name as the model.
    deployment_names: Optional[List[str]] = None

    @property
    def creator_organization(self) -> str:
        """
        Extracts the creator organization from the model name.
        Example: 'ai21/j1-jumbo' => 'ai21'
        This can be different from the hosting organization.
        """
        return self.name.split("/")[0]

    @property
    def engine(self) -> str:
        """
        Extracts the model engine from the model name.
        Example: 'ai21/j1-jumbo' => 'j1-jumbo'
        """
        return self.name.split("/")[1]

In model_deployment_registry.py:

@dataclass(frozen=True)
class ModelDeployment:
    """A model deployment is an accessible instance of this model (e.g. a hosted endpoint).

    A model can have multiple model deployments."""

    # Name of the model deployment.
    # Usually formatted as "<hosting_group>/<engine_name>"
    # Example: "huggingface/t5-11b"
    name: str

    # Specification for instantiating the client for this model deployment.
    client_spec: ClientSpec

    # Name of the model that this model deployment is for.
    # Refers to the field "name" in the Model class.
    # If unset, defaults to the same value as `name`.
    model_name: Optional[str] = None

    # Tokenizer for this model deployment.
    # If unset, auto-inferred by the WindowService.
    tokenizer_name: Optional[str] = None

    # Specification for instantiating the window service for this model deployment.
    window_service_spec: Optional[WindowServiceSpec] = None

    # Maximum sequence length for this model deployment.
    max_sequence_length: Optional[int] = None

    # Maximum request length for this model deployment.
    # If unset, defaults to the same value as max_sequence_length.
    max_request_length: Optional[int] = None

    # The max length of the model input and output tokens.
    # Some models (like Anthropic/Claude and Megatron) have a specific limit sequence length + max_token.
    # If unset, defaults to INT_MAX (i.e. bo limit).
    max_sequence_and_generated_tokens_length: Optional[int] = None

    # Whether this model deployment is deprecated.
    deprecated: bool = False

    @property
    def host_organization(self) -> str:
        """
        Extracts the host group from the model deployment name.
        Example: "huggingface" from "huggingface/t5-11b"
        This can be different from the creator organization (for example "together")
        """
        return self.name.split("/")[0]

    @property
    def engine(self) -> str:
        """
        Extracts the model engine from the model deployment name.
        Example: 'ai21/j1-jumbo' => 'j1-jumbo'
        """
        return self.name.split("/")[1]

In request.py:

@dataclass(frozen=True)
class Request:
    """
    A `Request` specifies how to query a language model (given a prompt,
    complete it).  It is the unified representation for communicating with
    various APIs (e.g., GPT-3, Jurassic).
    """

    model_deployment: str = ""
    """Which model deployment to query -> Determines the Client.
    Refers to a deployment in the model deployment registry."""

    model: str = ""
    """Which model to use -> Determines the Engine.
    Refers to a model metadata in the model registry."""

   [...]

(Both model and model_deployment should now be specified in Request.)

In most of the codebase, all references to Model are replaced by references to ModelDeployment (there are a few exceptions including GeneralInfo, website-related things and mode).

A config now looks like this

entries: [
   {description: "billsum_legal_summarization:model_deployment=anthropic/claude-v1.3", priority: 1},
]

Instead of (but still supported):

entries: [
   {description: "billsum_legal_summarization:model=anthropic/claude-v1.3", priority: 1},
]

Backwards compatibility

Supported use cases

Here are a few examples of runs and the expected behavior:

model=text -> same behavior as today (for each text model, find one deployment and run it).
model=meta/llama-2-7b -> runs one of the deployment of llama-2-7b -> together/llama-2-7b. Additionally prints a warning saying that one deployment was randomly chosen and that the user should use the next bullet point to be more explicit.
model_deployment=huggingface/llama-2-7b -> as expected
model=meta/llama-2-7b,model_deployment=huggingface/llama-2-7b -> as expected
model=meta/llama-2-7b,model_deployment=huggingface/gpt2 -> Raises error (incompatible)
model_deployment=text -> Raises error *(ambiguous, do we run several deployments for the same model?)`

This works thanks to the get_default_model_deployment_for_model function in run_specs.py. This function is responsible for finding a valid deployment name given a model name (of an old config for example).
The process to find a model deployment name is as follows:
1. If there is a model deployment with the same name as the model arg, use it.
2. If there is at least one deployment for the model, use the first one that is available (if possible not deprecated)
3. If there are no deployments for the model, raise an error.

Support of old runs

Old runs are still valid and will work with helm-summarize and helm-server. Once can totally have in the same suite some old and new runs and everything will work as expected.

WARNING: The Cache will not completely be preserved. This is due to the fact that we are now using consistently the host organization name for the caching while we were using a mix of host and creator in the past. Some hosts won't be affected (like openai) but some others will be (like bigscience that is now cached in together).

Consequent changes

This PR enabled some other changes:

Deletion of many TAGS related to the WindowService assignment.
Deletion of support for clients, tokenizer, and window services other than in the YAML files.

Addition of a new model

To add a new model, the user simply needs to update the 3 .yaml files. In general, there will be no need for a custom WindowService and the default one will work.

What needs to be done in follow-up PRs

Add an AutoTokenizer ( Add an AutoTokenizer #1994 )
Update the frontend. (No longer read schema.yaml but model_metadata.yaml)
Add the support for private configs (with a .gitignore so that people can easily add their own model in the repo) ( Add private configs #1996 )
Change nearly all window services to DefaultWindowService
Write documentation to explain how to add a model
Update all the tutorials to use model_deployment instead of model (Some done in this PR).
Move the tags to the ModelDeployment and create some properties in ModelMetadata to replace some tags; ( Remove many tags #1995 )
- TEXT_MODEL_TAG, IMAGE_MODEL_TAG, CODE_MODEL_TAG, TEXT_SIMILARITY_MODEL_TAG -> media_type: List[str]
- FULL_FUNCTIONALITY_TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG -> functionality: str
- INSTRUCTION_FOLLOWING_MODEL_TAG -> instruction_following: bool
- ABLATION_MODEL_TAG -> ablation: bool

… and _decode_raw_response_to_text

…ford-crfm/helm into joss-refactor-4-deployments

…illed

yifanmai

Probably easier to explain what I have in mind using code, so I opened #2002, PTAL.

src/helm/benchmark/model_deployment_registry.py

src/helm/common/request.py

Co-authored-by: JosselinSomervilleRoberts <[email protected]>

yifanmai

Need these changes to fix conflicts with #1998.

src/helm/config/tokenizer_configs.yaml

src/helm/config/model_deployments.yaml

yifanmai

In src/helm/benchmark/adaptation/adapters/test_*_adapter.py:
For every AdapterSpec with model="X", also set model_deployment="X". The specific failing test is src/helm/benchmark/adaptation/adapters/test_language_modeling_adapter.py::TestLanguageModelingAdapter::test_prompt_wrapping

yifanmai · 2023-11-17T22:47:37Z

Update the pull request description to discuss the fact that the cache organization is now the host rather than the creator organization. (e.g. meta/llama-2 now goes in together rather than meta).

WARNING: Due to the change to Request, the Cache is not always preserved between old and new runs. I am not sure what causes this issue.

…request

JosselinSomervilleRoberts added 16 commits October 3, 2023 17:25

Add the Tokenizer object logic

e6b9729

Changed most clients to use a Tokenizer object

38784f7

Changed remaining clients to use a Tokenizer object

daf431b

Removed calls to CachableClient.tokenize()

cfd0c15

Add TODOs

4bf3c46

Make client methods abstract

d6341ef

Resolve merge conflicts

d2ac135

Fix ICE Tokenizer test

4e83fd2

Fix Critique breaking change

0445a78

Revert fix

e1cbe32

Fix all window service test issues except for Cohere

40337f7

Resolve merge conflicts with HuggingFace refactorization

037a869

Refactor CachableClient -> CachingClient

1077699

Refactor yalm_tokenizer_src -> yalm_tokenizer_data

b0fefef

Merge #1891

0e47750

First draft of the model deployment/metadata refactorization

c1b06e3

JosselinSomervilleRoberts requested review from yifanmai and percyliang October 17, 2023 00:50

JosselinSomervilleRoberts added 12 commits October 17, 2023 14:22

Fix one of the TODO

802b2ec

Merge branch 'main' into joss-refactor-1-tokenizer

a907774

CachableTokenizer -> CachingTokenizer

c9aa4fd

Port VLM model Idefics to use new Tokenizer logic

b1badb7

Add TODOs to remove tokenize and decode methods from Client

57fd565

Change methods of CachingTokenizer to preserve existing Cache

50bb454

Change raw_request to request in _tokenization_raw_response_to_tokens…

9876162

… and _decode_raw_response_to_text

Merge branch 'main' into joss-refactor-1-tokenizer

359994c

First draft of the model deployment/metadata refactorization

4d8c080

Fix one of the TODO

ec97f8d

Merge branch 'joss-refactor-4-deployments' of https://github.com/stan…

068b199

…ford-crfm/helm into joss-refactor-4-deployments

Fixing black and mypy

427e723

Add default model metadata registration for huggingface models

7cecdbd

JosselinSomervilleRoberts requested a review from yifanmai November 14, 2023 01:21

JosselinSomervilleRoberts added 2 commits November 14, 2023 14:53

Changing Request so that model and model_deployment are always both f…

a8d931b

…illed

Fix test server service

57431f1

yifanmai requested changes Nov 15, 2023

View reviewed changes

src/helm/benchmark/model_deployment_registry.py Outdated Show resolved Hide resolved

src/helm/common/request.py Outdated Show resolved Hide resolved

JosselinSomervilleRoberts and others added 5 commits November 15, 2023 13:37

Update tutorial

67252bc

Alternative model deployment proposal (#2002)

abf9d0a

Co-authored-by: JosselinSomervilleRoberts <[email protected]>

Merge main

6ba0199

Fix helm-run and a few tests

4b849a7

Merge branch 'main' into joss-refactor-4-deployments

5749a8f

yifanmai requested changes Nov 17, 2023

View reviewed changes

JosselinSomervilleRoberts added 5 commits November 17, 2023 15:04

Fix broken test

6f8a13c

Small fixes to the configs

e7eb250

Fix Mistral #1998

f177503

Fix files that were still not speciying both model and deployment in …

a3323bd

…request

Fix some mypy issues

c2d3e03

JosselinSomervilleRoberts requested a review from yifanmai November 17, 2023 23:29

Merge branch 'main' into joss-refactor-4-deployments

5ce5648

JosselinSomervilleRoberts mentioned this pull request Nov 18, 2023

Support configuring precision and quantization in HuggingFaceClient #1912

Merged

yifanmai approved these changes Nov 18, 2023

View reviewed changes

JosselinSomervilleRoberts merged commit 65a0236 into main Nov 18, 2023
9 checks passed

JosselinSomervilleRoberts deleted the joss-refactor-4-deployments branch November 18, 2023 00:37

teetone mentioned this pull request Nov 18, 2023

Holistic Evaluation of Text-to-Image Models (HEIM) #1939

Merged

This was referenced Nov 22, 2023

Fix incorrect tokenizers in clients returned by AutoClient #1979

Merged

Update documentation on adding a new model #2022

Closed

yifanmai mentioned this pull request Jan 23, 2024

crfm-models server returns error if client's HELM version is too recent #2270

Closed

This was referenced Feb 2, 2024

Refactor WindowService #1502

Closed

Configuration file for adding models to HELM #1673

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rectorization to introduce the separation between Model metadata and deployment #1903

Rectorization to introduce the separation between Model metadata and deployment #1903

JosselinSomervilleRoberts commented Oct 17, 2023 •

edited

Loading

yifanmai left a comment

yifanmai left a comment

yifanmai left a comment

yifanmai commented Nov 17, 2023

Rectorization to introduce the separation between Model metadata and deployment #1903

Rectorization to introduce the separation between Model metadata and deployment #1903

Conversation

JosselinSomervilleRoberts commented Oct 17, 2023 • edited Loading

Summary

Some details

Backwards compatibility

Supported use cases

Support of old runs

Consequent changes

Addition of a new model

What needs to be done in follow-up PRs

yifanmai left a comment

Choose a reason for hiding this comment

yifanmai left a comment

Choose a reason for hiding this comment

yifanmai left a comment

Choose a reason for hiding this comment

yifanmai commented Nov 17, 2023

JosselinSomervilleRoberts commented Oct 17, 2023 •

edited

Loading