Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi GPU with PEFT on LLM #102

Merged
merged 29 commits into from
Jun 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
5d2a2ee
multi GPU with PEFT on LLM
avishniakov Apr 17, 2024
3e9776a
eof
avishniakov Apr 17, 2024
d77afdc
fixes for subprocess
avishniakov Apr 17, 2024
7cc1f01
callback patch
avishniakov Apr 17, 2024
377ec0f
tidy up
avishniakov Apr 17, 2024
8856e5c
new iteration
avishniakov Apr 19, 2024
0c85e8b
lint
avishniakov Apr 19, 2024
27e6795
fsspec fix
avishniakov Apr 19, 2024
f257630
pin datasets to lower version
avishniakov Apr 19, 2024
f68a469
relax datasets pin a bit
avishniakov Apr 19, 2024
65cdc7e
polish for step operators
avishniakov May 3, 2024
93398cd
push some functionality to the core
avishniakov May 7, 2024
2bb3ab9
format
avishniakov May 7, 2024
261e2ce
update README
avishniakov May 7, 2024
c22092b
update README
avishniakov May 7, 2024
b51a111
use `AccelerateScaler`
avishniakov May 14, 2024
f3943b2
pass bit config around
avishniakov May 15, 2024
555997e
functional way
avishniakov Jun 4, 2024
28faf6c
Merge branch 'main' into feature/OSSK-514-multi-gpu-with-peft
avishniakov Jun 4, 2024
65b5e11
remove configs
avishniakov Jun 4, 2024
d77f50f
restore configs
avishniakov Jun 4, 2024
fd3887d
restore reqs
avishniakov Jun 4, 2024
5264011
accelerate as a function from the core
avishniakov Jun 5, 2024
d9172c7
reduce README
avishniakov Jun 5, 2024
817a1b2
og metadata separately
avishniakov Jun 7, 2024
6d988eb
resume logging
avishniakov Jun 13, 2024
80a1084
add `trust_remote_code=True`
avishniakov Jun 14, 2024
bb9dc65
final touches
avishniakov Jun 20, 2024
329cf17
final touches
avishniakov Jun 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 36 additions & 20 deletions llm-lora-finetuning/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,11 @@ pip install -r requirements.txt

### 👷 Combined feature engineering and finetuning pipeline

> [!WARNING]
> All steps of this pipeline have a `clean_gpu_memory(force=True)` at the beginning. This is used to ensure that the memory is properly cleared after previous steps.
>
> This functionality might affect other GPU processes running on the same environment, so if you don't want to clean the GPU memory between the steps, you can delete those utility calls from all steps.

The easiest way to get started with just a single command is to run the finetuning pipeline with the `orchestrator_finetune.yaml` configuration file, which will do data preparation, model finetuning, evaluation with [Rouge](https://huggingface.co/spaces/evaluate-metric/rouge) and promotion:

```shell
Expand All @@ -50,6 +55,17 @@ When running the pipeline like this, the trained model will be stored in the Zen
<br/>
</div>

### ⚡ Accelerate your finetuning

Do you want to benefit from multi-GPU-training with Distributed Data Parallelism (DDP)? Then you can use other configuration files prepared for this purpose.
For example, `orchestrator_finetune.yaml` can run a finetuning of the [Microsoft Phi 2](https://huggingface.co/microsoft/phi-2) powered by [Hugging Face Accelerate](https://huggingface.co/docs/accelerate/en/index) on all GPUs available in the environment. To do so, just call:

```shell
python run.py --config orchestrator_finetune.yaml --accelerate
```

Under the hood, the finetuning step will spin up the accelerated job using the step code, which will run on all available GPUs.

## ☁️ Running with a remote stack

To finetune an LLM on remote infrastructure, you can either use a remote orchestrator or a remote step operator. Follow these steps to set up a complete remote stack:
Expand All @@ -71,26 +87,26 @@ The project loosely follows [the recommended ZenML project structure](https://do

```
.
├── configs # pipeline configuration files
│ ├── orchestrator_finetune.yaml # default local or remote orchestrator
│ └── remote_finetune.yaml # default step operator configuration
├── configs # pipeline configuration files
│ ├── orchestrator_finetune.yaml # default local or remote orchestrator configuration
│ └── remote_finetune.yaml # default step operator configuration
├── materializers
│ └── directory_materializer.py # custom materializer to push whole directories to the artifact store and back
├── pipelines # `zenml.pipeline` implementations
│ └── train.py # Finetuning and evaluation pipeline
├── steps # logically grouped `zenml.steps` implementations
│ ├── evaluate_model.py # evaluate base and finetuned models using Rouge metrics
│ ├── finetune.py # finetune the base model
│ ├── prepare_datasets.py # load and tokenize dataset
── promote.py # promote good models to target environment
── utils # utility functions
├── callbacks.py # custom callbacks
│ ├── cuda.py # helpers for CUDA
│ ├── loaders.py # loaders for models and data
│ ├── logging.py # logging helpers
│ └── tokenizer.py # load and tokenize
│ └── directory_materializer.py # custom materializer to push whole directories to the artifact store and back
├── pipelines # `zenml.pipeline` implementations
│ └── train.py # Finetuning and evaluation pipeline
├── steps # logically grouped `zenml.steps` implementations
│ ├── evaluate_model.py # evaluate base and finetuned models using Rouge metrics
│ ├── finetune.py # finetune the base model
│ ├── log_metadata.py # helper step to ensure that model metadata is always logged
── prepare_datasets.py # load and tokenize dataset
│ └── promote.py # promote good models to target environment
├── utils # utility functions
│ ├── callbacks.py # custom callbacks
│ ├── loaders.py # loaders for models and data
│ ├── logging.py # logging helpers
│ └── tokenizer.py # load and tokenize
├── .dockerignore
├── README.md # this file
├── requirements.txt # extra Python dependencies
└── run.py # CLI tool to run pipelines on ZenML Stack
├── README.md # this file
├── requirements.txt # extra Python dependencies
└── run.py # CLI tool to run pipelines on ZenML Stack
```
5 changes: 4 additions & 1 deletion llm-lora-finetuning/configs/orchestrator_finetune.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,10 @@ settings:
parent_image: pytorch/pytorch:2.2.2-cuda11.8-cudnn8-runtime
requirements: requirements.txt
python_package_installer: uv
python_package_installer_args:
system: null
apt_packages:
- git
environment:
PJRT_DEVICE: CUDA
USE_TORCH_XLA: "false"
Expand All @@ -50,7 +54,6 @@ steps:
dataset_name: gem/viggo

finetune:
enable_step_logs: False
parameters:
max_steps: 300
eval_steps: 30
Expand Down
17 changes: 16 additions & 1 deletion llm-lora-finetuning/configs/remote_finetune.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,10 @@ settings:
parent_image: pytorch/pytorch:2.2.2-cuda11.8-cudnn8-runtime
requirements: requirements.txt
python_package_installer: uv
python_package_installer_args:
system: null
apt_packages:
- git
environment:
PJRT_DEVICE: CUDA
USE_TORCH_XLA: "false"
Expand All @@ -50,18 +54,29 @@ steps:
dataset_name: gem/viggo

finetune:
enable_step_logs: False
step_operator: gcp_a100
retry:
max_retries: 3
delay: 10
backoff: 2
parameters:
max_steps: 300
eval_steps: 30
bf16: True

evaluate_finetuned:
step_operator: gcp_a100
retry:
max_retries: 3
delay: 10
backoff: 2

evaluate_base:
step_operator: gcp_a100
retry:
max_retries: 3
delay: 10
backoff: 2

promote:
parameters:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,3 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#

import gc

import torch


def cleanup_memory() -> None:
"""Clean up GPU memory."""
while gc.collect():
torch.cuda.empty_cache()
51 changes: 33 additions & 18 deletions llm-lora-finetuning/pipelines/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,14 @@
#


from steps import evaluate_model, finetune, prepare_data, promote
from zenml import logging as zenml_logging
from zenml import pipeline

zenml_logging.STEP_LOGS_STORAGE_MAX_MESSAGES = (
10000 # workaround for https://github.com/zenml-io/zenml/issues/2252
from steps import (
evaluate_model,
finetune,
prepare_data,
promote,
log_metadata_from_step_artifact,
)
from zenml import pipeline


@pipeline
Expand All @@ -47,40 +48,54 @@ def llm_peft_full_finetune(
"At least one of `load_in_8bit` and `load_in_4bit` must be True."
)
if load_in_4bit and load_in_8bit:
raise ValueError(
"Only one of `load_in_8bit` and `load_in_4bit` can be True."
)
raise ValueError("Only one of `load_in_8bit` and `load_in_4bit` can be True.")

datasets_dir = prepare_data(
base_model_id=base_model_id,
system_prompt=system_prompt,
use_fast=use_fast,
)
ft_model_dir = finetune(

evaluate_model(
base_model_id,
system_prompt,
datasets_dir,
None,
use_fast=use_fast,
load_in_4bit=load_in_4bit,
load_in_8bit=load_in_8bit,
load_in_4bit=load_in_4bit,
id="evaluate_base",
)
evaluate_model(
log_metadata_from_step_artifact(
"evaluate_base",
"base_model_rouge_metrics",
after=["evaluate_base"],
id="log_metadata_evaluation_base"
)

ft_model_dir = finetune(
base_model_id,
system_prompt,
datasets_dir,
ft_model_dir,
use_fast=use_fast,
load_in_8bit=load_in_8bit,
load_in_4bit=load_in_4bit,
id="evaluate_finetuned",
)

evaluate_model(
base_model_id,
system_prompt,
datasets_dir,
None,
ft_model_dir,
use_fast=use_fast,
load_in_8bit=load_in_8bit,
load_in_4bit=load_in_4bit,
id="evaluate_base",
id="evaluate_finetuned",
)
log_metadata_from_step_artifact(
"evaluate_finetuned",
"finetuned_model_rouge_metrics",
after=["evaluate_finetuned"],
id="log_metadata_evaluation_finetuned"
)
promote(after=["evaluate_finetuned", "evaluate_base"])

promote(after=["log_metadata_evaluation_finetuned", "log_metadata_evaluation_base"])
102 changes: 102 additions & 0 deletions llm-lora-finetuning/pipelines/train_accelerated.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Apache Software License 2.0
#
# Copyright (c) ZenML GmbH 2024. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#


from steps import (
evaluate_model,
finetune,
prepare_data,
promote,
log_metadata_from_step_artifact,
)
from zenml import pipeline
from zenml.integrations.huggingface.steps import run_with_accelerate


@pipeline
def llm_peft_full_finetune(
system_prompt: str,
base_model_id: str,
use_fast: bool = True,
load_in_8bit: bool = False,
load_in_4bit: bool = False,
):
"""Pipeline for finetuning an LLM with peft.

It will run the following steps:

- prepare_data: prepare the datasets and tokenize them
- finetune: finetune the model
- evaluate_model: evaluate the base and finetuned model
- promote: promote the model to the target stage, if evaluation was successful
"""
if not load_in_8bit and not load_in_4bit:
raise ValueError(
"At least one of `load_in_8bit` and `load_in_4bit` must be True."
)
if load_in_4bit and load_in_8bit:
raise ValueError("Only one of `load_in_8bit` and `load_in_4bit` can be True.")

datasets_dir = prepare_data(
base_model_id=base_model_id,
system_prompt=system_prompt,
use_fast=use_fast,
)

evaluate_model(
base_model_id,
system_prompt,
datasets_dir,
None,
use_fast=use_fast,
load_in_8bit=load_in_8bit,
load_in_4bit=load_in_4bit,
id="evaluate_base",
)
log_metadata_from_step_artifact(
"evaluate_base",
"base_model_rouge_metrics",
after=["evaluate_base"],
id="log_metadata_evaluation_base"
)

ft_model_dir = run_with_accelerate(finetune)(
base_model_id=base_model_id,
dataset_dir=datasets_dir,
use_fast=use_fast,
load_in_8bit=load_in_8bit,
load_in_4bit=load_in_4bit,
)

evaluate_model(
base_model_id,
system_prompt,
datasets_dir,
ft_model_dir,
use_fast=use_fast,
load_in_8bit=load_in_8bit,
load_in_4bit=load_in_4bit,
id="evaluate_finetuned",
)
log_metadata_from_step_artifact(
"evaluate_finetuned",
"finetuned_model_rouge_metrics",
after=["evaluate_finetuned"],
id="log_metadata_evaluation_finetuned"
)

promote(after=["log_metadata_evaluation_finetuned", "log_metadata_evaluation_base"])
2 changes: 1 addition & 1 deletion llm-lora-finetuning/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ scipy
evaluate
rouge_score
nltk
accelerate
accelerate
Loading
Loading