Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rectorization to introduce the separation between Model metadata and deployment #1903

Merged
merged 112 commits into from
Nov 18, 2023

Conversation

JosselinSomervilleRoberts
Copy link
Contributor

@JosselinSomervilleRoberts JosselinSomervilleRoberts commented Oct 17, 2023

Summary

This is only a draft PR of what will be the modularization regarding models in HELM.
This separates Model into two concepts: ModelMetadata and ModelDeployment (both of these objects already existed so we had 3 different 3 objects which could be very confusing.

Here are a few motivations behind those changes:

  • Some models can be served by different hosts (now the creator organization is not necessarily the host anymore). Example: meta/llama-7b can be served as huggingface/llama-7b or together/llama-7b.
  • This allows us to better handle dependency injections for all models (see @yifanmai's work for the NeurIps challenge). Eventually (once fully ported), all models, tokenizers, and window services will be defined properly in their definition rather than having huge "else-if" branches in auto_client.py and windiw_service_factory.py.
  • This simplifies a lot of the code and makes things more readable.
  • This will enable to publish different results based on the host (@julian-q has noticed that due to some pre/post-processing, the results from together are different than from huggingface for example).

Some details

Let's now introduce the high-level changes of this PR. The first thing is to define ModelDeployment and ModelMetadata`:

In model_metadata_registry.py:

@dataclass(frozen=False)
class ModelMetadata:
    # Name of the model group (e.g. "openai/davinci").
    # This is the name of the model, not the name of the deployment.
    # Usually formatted as "<creator_organization>/<engine_name>".
    # Example: "ai21/j1-jumbo"
    name: str

    # Name of the organization that created the model.
    creator_organization_name: str

    # Name that is going to be displayed to the user (on the website, etc.)
    display_name: str

    # Description of the model, to be displayed on the website.
    description: str

    # Description of the access level of the model.
    # Should be one of the following:
    # - "open": the model is open-source and can be downloaded from the internet.
    # - "closed": TODO(PR)
    # - "limited": TODO(PR)
    # If there are multiple deployments, this should be the most permissive access across
    # all deployments.
    access: str

    # Release date of the model.
    release_date: date

    # Tags corresponding to the properties of the model.
    tags: List[str] = field(default_factory=list)

    # Number of parameters in the model.
    # This should be a string as the number of parameters is usually a round number (175B),
    # but we set it as an int for plotting purposes.
    num_parameters: Optional[int] = None

    # List of the model deployments for this model.
    # Should at least contain one model deployment.
    # Refers to the field "name" in the ModelDeployment class.
    # Defaults to a single model deployment with the same name as the model.
    deployment_names: Optional[List[str]] = None

    @property
    def creator_organization(self) -> str:
        """
        Extracts the creator organization from the model name.
        Example: 'ai21/j1-jumbo' => 'ai21'
        This can be different from the hosting organization.
        """
        return self.name.split("/")[0]

    @property
    def engine(self) -> str:
        """
        Extracts the model engine from the model name.
        Example: 'ai21/j1-jumbo' => 'j1-jumbo'
        """
        return self.name.split("/")[1]

In model_deployment_registry.py:

@dataclass(frozen=True)
class ModelDeployment:
    """A model deployment is an accessible instance of this model (e.g. a hosted endpoint).

    A model can have multiple model deployments."""

    # Name of the model deployment.
    # Usually formatted as "<hosting_group>/<engine_name>"
    # Example: "huggingface/t5-11b"
    name: str

    # Specification for instantiating the client for this model deployment.
    client_spec: ClientSpec

    # Name of the model that this model deployment is for.
    # Refers to the field "name" in the Model class.
    # If unset, defaults to the same value as `name`.
    model_name: Optional[str] = None

    # Tokenizer for this model deployment.
    # If unset, auto-inferred by the WindowService.
    tokenizer_name: Optional[str] = None

    # Specification for instantiating the window service for this model deployment.
    window_service_spec: Optional[WindowServiceSpec] = None

    # Maximum sequence length for this model deployment.
    max_sequence_length: Optional[int] = None

    # Maximum request length for this model deployment.
    # If unset, defaults to the same value as max_sequence_length.
    max_request_length: Optional[int] = None

    # The max length of the model input and output tokens.
    # Some models (like Anthropic/Claude and Megatron) have a specific limit sequence length + max_token.
    # If unset, defaults to INT_MAX (i.e. bo limit).
    max_sequence_and_generated_tokens_length: Optional[int] = None

    # Whether this model deployment is deprecated.
    deprecated: bool = False

    @property
    def host_organization(self) -> str:
        """
        Extracts the host group from the model deployment name.
        Example: "huggingface" from "huggingface/t5-11b"
        This can be different from the creator organization (for example "together")
        """
        return self.name.split("/")[0]

    @property
    def engine(self) -> str:
        """
        Extracts the model engine from the model deployment name.
        Example: 'ai21/j1-jumbo' => 'j1-jumbo'
        """
        return self.name.split("/")[1]

In request.py:

@dataclass(frozen=True)
class Request:
    """
    A `Request` specifies how to query a language model (given a prompt,
    complete it).  It is the unified representation for communicating with
    various APIs (e.g., GPT-3, Jurassic).
    """

    model_deployment: str = ""
    """Which model deployment to query -> Determines the Client.
    Refers to a deployment in the model deployment registry."""

    model: str = ""
    """Which model to use -> Determines the Engine.
    Refers to a model metadata in the model registry."""

   [...]

(Both model and model_deployment should now be specified in Request.)

In most of the codebase, all references to Model are replaced by references to ModelDeployment (there are a few exceptions including GeneralInfo, website-related things and mode).

A config now looks like this

entries: [
   {description: "billsum_legal_summarization:model_deployment=anthropic/claude-v1.3", priority: 1},
]

Instead of (but still supported):

entries: [
   {description: "billsum_legal_summarization:model=anthropic/claude-v1.3", priority: 1},
]

Backwards compatibility

Supported use cases

Here are a few examples of runs and the expected behavior:

  • model=text -> same behavior as today (for each text model, find one deployment and run it).
  • model=meta/llama-2-7b -> runs one of the deployment of llama-2-7b -> together/llama-2-7b. Additionally prints a warning saying that one deployment was randomly chosen and that the user should use the next bullet point to be more explicit.
  • model_deployment=huggingface/llama-2-7b -> as expected
  • model=meta/llama-2-7b,model_deployment=huggingface/llama-2-7b -> as expected
  • model=meta/llama-2-7b,model_deployment=huggingface/gpt2 -> Raises error (incompatible)
  • model_deployment=text -> Raises error *(ambiguous, do we run several deployments for the same model?)`

This works thanks to the get_default_model_deployment_for_model function in run_specs.py. This function is responsible for finding a valid deployment name given a model name (of an old config for example).
The process to find a model deployment name is as follows:
1. If there is a model deployment with the same name as the model arg, use it.
2. If there is at least one deployment for the model, use the first one that is available (if possible not deprecated)
3. If there are no deployments for the model, raise an error.

Support of old runs

Old runs are still valid and will work with helm-summarize and helm-server. Once can totally have in the same suite some old and new runs and everything will work as expected.

WARNING: The Cache will not completely be preserved. This is due to the fact that we are now using consistently the host organization name for the caching while we were using a mix of host and creator in the past. Some hosts won't be affected (like openai) but some others will be (like bigscience that is now cached in together).

Consequent changes

This PR enabled some other changes:

  • Deletion of many TAGS related to the WindowService assignment.
  • Deletion of support for clients, tokenizer, and window services other than in the YAML files.

Addition of a new model

To add a new model, the user simply needs to update the 3 .yaml files. In general, there will be no need for a custom WindowService and the default one will work.

What needs to be done in follow-up PRs

  • Add an AutoTokenizer ( Add an AutoTokenizer #1994 )
  • Update the frontend. (No longer read schema.yaml but model_metadata.yaml)
  • Add the support for private configs (with a .gitignore so that people can easily add their own model in the repo) ( Add private configs #1996 )
  • Change nearly all window services to DefaultWindowService
  • Write documentation to explain how to add a model
  • Update all the tutorials to use model_deployment instead of model (Some done in this PR).
  • Move the tags to the ModelDeployment and create some properties in ModelMetadata to replace some tags; ( Remove many tags #1995 )
    • TEXT_MODEL_TAG, IMAGE_MODEL_TAG, CODE_MODEL_TAG, TEXT_SIMILARITY_MODEL_TAG -> media_type: List[str]
    • FULL_FUNCTIONALITY_TEXT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG -> functionality: str
    • INSTRUCTION_FOLLOWING_MODEL_TAG -> instruction_following: bool
    • ABLATION_MODEL_TAG -> ablation: bool

Copy link
Collaborator

@yifanmai yifanmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably easier to explain what I have in mind using code, so I opened #2002, PTAL.

src/helm/benchmark/model_deployment_registry.py Outdated Show resolved Hide resolved
src/helm/common/request.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@yifanmai yifanmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need these changes to fix conflicts with #1998.

src/helm/config/tokenizer_configs.yaml Outdated Show resolved Hide resolved
src/helm/config/model_deployments.yaml Outdated Show resolved Hide resolved
src/helm/config/model_deployments.yaml Outdated Show resolved Hide resolved
src/helm/config/model_deployments.yaml Outdated Show resolved Hide resolved
Copy link
Collaborator

@yifanmai yifanmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In src/helm/benchmark/adaptation/adapters/test_*_adapter.py:
For every AdapterSpec with model="X", also set model_deployment="X". The specific failing test is src/helm/benchmark/adaptation/adapters/test_language_modeling_adapter.py::TestLanguageModelingAdapter::test_prompt_wrapping

@yifanmai
Copy link
Collaborator

Update the pull request description to discuss the fact that the cache organization is now the host rather than the creator organization. (e.g. meta/llama-2 now goes in together rather than meta).

WARNING: Due to the change to Request, the Cache is not always preserved between old and new runs. I am not sure what causes this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants