Skip to content

Commit

Permalink
Merge pull request #86 from dudeperf3ct/feature/zencoder-huggingface-…
Browse files Browse the repository at this point in the history
…model-deployer

HuggingFace Endpoint Inference Model Deployer
  • Loading branch information
htahir1 authored Jan 30, 2024
2 parents 7d1fa76 + ec4f16a commit c254ccc
Show file tree
Hide file tree
Showing 11 changed files with 914 additions and 185 deletions.
47 changes: 43 additions & 4 deletions llm-finetuning/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,13 +78,51 @@ python run.py --training-pipeline --config finetune_gcp.yaml

# Deployment
python run.py --deployment-pipeline --config <NAME_OF_CONFIG_IN_CONFIGS_FOLDER>
python run.py --deployment-pipeline --config finetune_gcp.yaml
python run.py --deployment-pipeline --config deployment_a100.yaml
```

The `feature_engineering` and `deployment` pipeline can be run simply with the `default` stack, but the training pipelines [stack](https://docs.zenml.io/user-guide/production-guide/understand-stacks) will depend on the config.

The `deployment` pipelines relies on the `training_pipeline` to have run before.

## :cloud: Deployment

We have create a custom zenml model deployer for deploying models on the huggingface inference endpoint. The code for custom deployer is in [huggingface](./huggingface/) folder.

For running deployment pipeline, we create a custom zenml stack. As we are using a custom model deployer, we will have to register the flavor and model deployer. We update the stack to use this custom model deployer for running deployment pipeline.

```bash
zenml init
zenml stack register zencoder_hf_stack -o default -a default
zenml stack set zencoder_hf_stack
export HUGGINGFACE_USERNAME=<here>
export HUGGINGFACE_TOKEN=<here>
export NAMESPACE=<here>
zenml secret create huggingface_creds --username=$HUGGINGFACE_USERNAME --token=$HUGGINGFACE_TOKEN
zenml model-deployer flavor register huggingface.hf_model_deployer_flavor.HuggingFaceModelDeployerFlavor
```

Afterward, you should see the new flavor in the list of available flavors:

```bash
zenml model-deployer flavor list
```

Register model deployer component into the current stack

```bash
zenml model-deployer register hfendpoint --flavor=hfendpoint --token=$HUGGINGFACE_TOKEN --namespace=$NAMESPACE
zenml stack update zencoder_hf_stack -d hfendpoint
```

Run the deployment pipeline using the CLI:

```shell
# Deployment
python run.py --deployment-pipeline --config <NAME_OF_CONFIG_IN_CONFIGS_FOLDER>
python run.py --deployment-pipeline --config deployment_a100.yaml
```

## 🥇Recent developments

A working prototype has been trained and deployed as of Jan 19 2024. The model is using minimal data and finetuned using QLoRA and PEFT. The model was trained using 1 A100 GPU on the cloud:
Expand Down Expand Up @@ -114,16 +152,17 @@ This project recently did a [call of volunteers](https://www.linkedin.com/feed/u
- [x] Create a functioning training pipeline.
- [ ] Curate a set of 5-10 repositories that are using the ZenML latest syntax and use data generation pipeline to push dataset to HuggingFace.
- [ ] Create a Dockerfile for the training pipeline with all requirements installed including ZenML, torch, CUDA etc. CUrrently I am having trouble creating this in this [config file](configs/finetune_local.yaml). Probably might make sense to create a docker imag with the right CUDA and requirements including ZenML. See here: https://sdkdocs.zenml.io/0.54.0/integration_code_docs/integrations-aws/#zenml.integrations.aws.flavors.sagemaker_step_operator_flavor.SagemakerStepOperatorSettings

- [ ] Tests trained model on various metrics
- [ ] Create a custom [model deployer](https://docs.zenml.io/stacks-and-components/component-guide/model-deployers) that deploys a huggingface model from the hub to a huggingface inference endpoint. This would involve creating a [custom model deployer](https://docs.zenml.io/stacks-and-components/component-guide/model-deployers/custom) and editing the [deployment pipeline accordingly](pipelines/deployment.py)

## :bulb: More Applications

While the work here is solely based on the task of finetuning the model for the ZenML library, the pipeline can be changed with minimal effort to point to any set of repositories on GitHub. Theoretically, one could extend this work to point to proprietary codebases to learn from them for any use-case.

For example, see how [VMWare fine-tuned StarCoder to learn their style](https://octo.vmware.com/fine-tuning-starcoder-to-learn-vmwares-coding-style/).
For example, see how [VMWare fine-tuned StarCoder to learn their style](https://octo.vmware.com/fine-tuning-starcoder-to-learn-vmwares-coding-style/).

Also, make sure to join our <a href="https://zenml.io/slack" target="_blank">
<img width="15" src="https://cdn3.iconfinder.com/data/icons/logos-and-brands-adobe/512/306_Slack-512.png" alt="Slack"/>
<b>Slack Community</b>
</a> to become part of the ZenML family!
<b>Slack Community</b>
</a> to become part of the ZenML family!
37 changes: 19 additions & 18 deletions llm-finetuning/configs/deployment_a10.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,21 +10,22 @@ model:
steps:
deploy_model_to_hf_hub:
parameters:
framework: pytorch
task: text-generation
accelerator: gpu
vendor: aws
region: us-east-1
max_replica: 1
instance_size: xxlarge
instance_type: g5.12xlarge
namespace: zenml
custom_image:
health_route: /health
env:
MAX_BATCH_PREFILL_TOKENS: "2048"
MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "1512"
QUANTIZE: bitsandbytes
MODEL_ID: /repository
url: registry.internal.huggingface.tech/api-inference/community/text-generation-inference:sha-564f2a3
hf_endpoint_cfg:
framework: pytorch
task: text-generation
accelerator: gpu
vendor: aws
region: us-east-1
max_replica: 1
instance_size: xxlarge
instance_type: g5.12xlarge
namespace: zenml
custom_image:
health_route: /health
env:
MAX_BATCH_PREFILL_TOKENS: "2048"
MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "1512"
QUANTIZE: bitsandbytes
MODEL_ID: /repository
url: registry.internal.huggingface.tech/api-inference/community/text-generation-inference:sha-564f2a3
37 changes: 19 additions & 18 deletions llm-finetuning/configs/deployment_a100.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,21 +10,22 @@ model:
steps:
deploy_model_to_hf_hub:
parameters:
framework: pytorch
task: text-generation
accelerator: gpu
vendor: aws
region: us-east-1
max_replica: 1
instance_size: xlarge
instance_type: p4de
namespace: zenml
custom_image:
health_route: /health
env:
MAX_BATCH_PREFILL_TOKENS: "2048"
MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "1512"
QUANTIZE: bitsandbytes
MODEL_ID: /repository
url: registry.internal.huggingface.tech/api-inference/community/text-generation-inference:sha-564f2a3
hf_endpoint_cfg:
framework: pytorch
task: text-generation
accelerator: gpu
vendor: aws
region: us-east-1
max_replica: 1
instance_size: xlarge
instance_type: p4de
namespace: zenml
custom_image:
health_route: /health
env:
MAX_BATCH_PREFILL_TOKENS: "2048"
MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "1512"
QUANTIZE: bitsandbytes
MODEL_ID: /repository
url: registry.internal.huggingface.tech/api-inference/community/text-generation-inference:sha-564f2a3
37 changes: 19 additions & 18 deletions llm-finetuning/configs/deployment_t4.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,21 +10,22 @@ model:
steps:
deploy_model_to_hf_hub:
parameters:
framework: pytorch
task: text-generation
accelerator: gpu
vendor: aws
region: us-east-1
max_replica: 1
instance_size: large
instance_type: g4dn.12xlarge
namespace: zenml
custom_image:
health_route: /health
env:
MAX_BATCH_PREFILL_TOKENS: "2048"
MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "1512"
QUANTIZE: bitsandbytes
MODEL_ID: /repository
url: registry.internal.huggingface.tech/api-inference/community/text-generation-inference:sha-564f2a3
hf_endpoint_cfg:
framework: pytorch
task: text-generation
accelerator: gpu
vendor: aws
region: us-east-1
max_replica: 1
instance_size: large
instance_type: g4dn.12xlarge
namespace: zenml
custom_image:
health_route: /health
env:
MAX_BATCH_PREFILL_TOKENS: "2048"
MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "1512"
QUANTIZE: bitsandbytes
MODEL_ID: /repository
url: registry.internal.huggingface.tech/api-inference/community/text-generation-inference:sha-564f2a3
Empty file.
25 changes: 25 additions & 0 deletions llm-finetuning/huggingface/hf_deployment_base_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
from pydantic import BaseModel
from typing import Optional, Dict
from zenml.utils.secret_utils import SecretField


class HuggingFaceBaseConfig(BaseModel):
"""Huggingface Inference Endpoint configuration."""

endpoint_name: Optional[str] = ""
repository: Optional[str] = None
framework: Optional[str] = None
accelerator: Optional[str] = None
instance_size: Optional[str] = None
instance_type: Optional[str] = None
region: Optional[str] = None
vendor: Optional[str] = None
token: Optional[str] = None
account_id: Optional[str] = None
min_replica: Optional[int] = 0
max_replica: Optional[int] = 1
revision: Optional[str] = None
task: Optional[str] = None
custom_image: Optional[Dict] = None
namespace: Optional[str] = None
endpoint_type: str = "public"
Loading

0 comments on commit c254ccc

Please sign in to comment.