zenml-io · avishniakov · Jun 27, 2024 · Apr 17, 2024 · Apr 17, 2024 · Apr 17, 2024
diff --git a/llm-lora-finetuning/README.md b/llm-lora-finetuning/README.md
@@ -34,6 +34,11 @@ pip install -r requirements.txt
 
 ### 👷 Combined feature engineering and finetuning pipeline
 
+> [!WARNING]  
+> All steps of this pipeline have a `clean_gpu_memory(force=True)` at the beginning. This is used to ensure that the memory is properly cleared after previous steps.
+>
+> This functionality might affect other GPU processes running on the same environment, so if you don't want to clean the GPU memory between the steps, you can delete those utility calls from all steps.
+
 The easiest way to get started with just a single command is to run the finetuning pipeline with the `orchestrator_finetune.yaml` configuration file, which will do data preparation, model finetuning, evaluation with [Rouge](https://huggingface.co/spaces/evaluate-metric/rouge) and promotion:
 
 ```shell
@@ -50,6 +55,17 @@ When running the pipeline like this, the trained model will be stored in the Zen
   <br/>
 </div>
 
+### ⚡ Accelerate your finetuning
+
+Do you want to benefit from multi-GPU-training with Distributed Data Parallelism (DDP)? Then you can use other configuration files prepared for this purpose.
+For example, `orchestrator_finetune.yaml` can run a finetuning of the [Microsoft Phi 2](https://huggingface.co/microsoft/phi-2) powered by [Hugging Face Accelerate](https://huggingface.co/docs/accelerate/en/index) on all GPUs available in the environment. To do so, just call:
+
+```shell
+python run.py --config orchestrator_finetune.yaml --accelerate
+```
+
+Under the hood, the finetuning step will spin up the accelerated job using the step code, which will run on all available GPUs.
+
 ## ☁️ Running with a remote stack
 
 To finetune an LLM on remote infrastructure, you can either use a remote orchestrator or a remote step operator. Follow these steps to set up a complete remote stack:
@@ -71,26 +87,26 @@ The project loosely follows [the recommended ZenML project structure](https://do
 
 ```
 .
-├── configs                         # pipeline configuration files
-│   ├── orchestrator_finetune.yaml  # default local or remote orchestrator
-│   └── remote_finetune.yaml        # default step operator configuration
+├── configs                                       # pipeline configuration files
+│   ├── orchestrator_finetune.yaml                # default local or remote orchestrator configuration
+│   └── remote_finetune.yaml                      # default step operator configuration
 ├── materializers
-│   └── directory_materializer.py   # custom materializer to push whole directories to the artifact store and back
-├── pipelines                       # `zenml.pipeline` implementations
-│   └── train.py                    # Finetuning and evaluation pipeline
-├── steps                           # logically grouped `zenml.steps` implementations
-│   ├── evaluate_model.py           # evaluate base and finetuned models using Rouge metrics
-│   ├── finetune.py                 # finetune the base model
-│   ├── prepare_datasets.py         # load and tokenize dataset
-│   └── promote.py                  # promote good models to target environment
-├── utils                           # utility functions
-│   ├── callbacks.py                # custom callbacks
-│   ├── cuda.py                     # helpers for CUDA
-│   ├── loaders.py                  # loaders for models and data
-│   ├── logging.py                  # logging helpers
-│   └── tokenizer.py                # load and tokenize
+│   └── directory_materializer.py                 # custom materializer to push whole directories to the artifact store and back
+├── pipelines                                     # `zenml.pipeline` implementations
+│   └── train.py                                  # Finetuning and evaluation pipeline
+├── steps                                         # logically grouped `zenml.steps` implementations
+│   ├── evaluate_model.py                         # evaluate base and finetuned models using Rouge metrics
+│   ├── finetune.py                               # finetune the base model
+│   ├── log_metadata.py                           # helper step to ensure that model metadata is always logged
+│   ├── prepare_datasets.py                       # load and tokenize dataset
+│   └── promote.py                                # promote good models to target environment
+├── utils                                         # utility functions
+│   ├── callbacks.py                              # custom callbacks
+│   ├── loaders.py                                # loaders for models and data
+│   ├── logging.py                                # logging helpers
+│   └── tokenizer.py                              # load and tokenize
 ├── .dockerignore
-├── README.md                       # this file
-├── requirements.txt                # extra Python dependencies 
-└── run.py                          # CLI tool to run pipelines on ZenML Stack
+├── README.md                                     # this file
+├── requirements.txt                              # extra Python dependencies 
+└── run.py                                        # CLI tool to run pipelines on ZenML Stack
 ```
diff --git a/llm-lora-finetuning/configs/orchestrator_finetune.yaml b/llm-lora-finetuning/configs/orchestrator_finetune.yaml
@@ -29,6 +29,10 @@ settings:
     parent_image: pytorch/pytorch:2.2.2-cuda11.8-cudnn8-runtime
     requirements: requirements.txt
     python_package_installer: uv
+    python_package_installer_args:
+      system: null
+    apt_packages: 
+      - git
     environment:
       PJRT_DEVICE: CUDA
       USE_TORCH_XLA: "false"
@@ -50,7 +54,6 @@ steps:
       dataset_name: gem/viggo
 
   finetune:
-    enable_step_logs: False
     parameters:
       max_steps: 300
       eval_steps: 30

diff --git a/llm-lora-finetuning/configs/remote_finetune.yaml b/llm-lora-finetuning/configs/remote_finetune.yaml
@@ -29,6 +29,10 @@ settings:
     parent_image: pytorch/pytorch:2.2.2-cuda11.8-cudnn8-runtime
     requirements: requirements.txt
     python_package_installer: uv
+    python_package_installer_args:
+      system: null
+    apt_packages: 
+      - git
     environment:
       PJRT_DEVICE: CUDA
       USE_TORCH_XLA: "false"
@@ -50,18 +54,29 @@ steps:
       dataset_name: gem/viggo
 
   finetune:
-    enable_step_logs: False
     step_operator: gcp_a100
+    retry:
+      max_retries: 3
+      delay: 10
+      backoff: 2
     parameters:
       max_steps: 300
       eval_steps: 30
       bf16: True
 
   evaluate_finetuned:
     step_operator: gcp_a100
+    retry:
+      max_retries: 3
+      delay: 10
+      backoff: 2
 
   evaluate_base:
     step_operator: gcp_a100
+    retry:
+      max_retries: 3
+      delay: 10
+      backoff: 2
 
   promote:
     parameters:

diff --git a/llm-lora-finetuning/utils/cuda.py → llm-lora-finetuning/pipelines/__init__.py b/llm-lora-finetuning/utils/cuda.py → llm-lora-finetuning/pipelines/__init__.py
@@ -14,13 +14,3 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
-
-import gc
-
-import torch
-
-
-def cleanup_memory() -> None:
-    """Clean up GPU memory."""
-    while gc.collect():
-        torch.cuda.empty_cache()
diff --git a/llm-lora-finetuning/pipelines/train.py b/llm-lora-finetuning/pipelines/train.py
@@ -16,13 +16,14 @@
 #
 
 
-from steps import evaluate_model, finetune, prepare_data, promote
-from zenml import logging as zenml_logging
-from zenml import pipeline
-
-zenml_logging.STEP_LOGS_STORAGE_MAX_MESSAGES = (
-    10000  # workaround for https://github.com/zenml-io/zenml/issues/2252
+from steps import (
+    evaluate_model,
+    finetune,
+    prepare_data,
+    promote,
+    log_metadata_from_step_artifact,
 )
+from zenml import pipeline
 
 
 @pipeline
@@ -47,40 +48,54 @@ def llm_peft_full_finetune(
             "At least one of `load_in_8bit` and `load_in_4bit` must be True."
         )
     if load_in_4bit and load_in_8bit:
-        raise ValueError(
-            "Only one of `load_in_8bit` and `load_in_4bit` can be True."
-        )
+        raise ValueError("Only one of `load_in_8bit` and `load_in_4bit` can be True.")
 
     datasets_dir = prepare_data(
         base_model_id=base_model_id,
         system_prompt=system_prompt,
         use_fast=use_fast,
     )
-    ft_model_dir = finetune(
+
+    evaluate_model(
         base_model_id,
+        system_prompt,
         datasets_dir,
+        None,
         use_fast=use_fast,
-        load_in_4bit=load_in_4bit,
         load_in_8bit=load_in_8bit,
+        load_in_4bit=load_in_4bit,
+        id="evaluate_base",
     )
-    evaluate_model(
+    log_metadata_from_step_artifact(
+        "evaluate_base",
+        "base_model_rouge_metrics",
+        after=["evaluate_base"],
+        id="log_metadata_evaluation_base"
+    )
+
+    ft_model_dir = finetune(
         base_model_id,
-        system_prompt,
         datasets_dir,
-        ft_model_dir,
         use_fast=use_fast,
         load_in_8bit=load_in_8bit,
         load_in_4bit=load_in_4bit,
-        id="evaluate_finetuned",
     )
+
     evaluate_model(
         base_model_id,
         system_prompt,
         datasets_dir,
-        None,
+        ft_model_dir,
         use_fast=use_fast,
         load_in_8bit=load_in_8bit,
         load_in_4bit=load_in_4bit,
-        id="evaluate_base",
+        id="evaluate_finetuned",
+    )
+    log_metadata_from_step_artifact(
+        "evaluate_finetuned",
+        "finetuned_model_rouge_metrics",
+        after=["evaluate_finetuned"],
+        id="log_metadata_evaluation_finetuned"
     )
-    promote(after=["evaluate_finetuned", "evaluate_base"])
+
+    promote(after=["log_metadata_evaluation_finetuned", "log_metadata_evaluation_base"])
diff --git a/llm-lora-finetuning/pipelines/train_accelerated.py b/llm-lora-finetuning/pipelines/train_accelerated.py
@@ -0,0 +1,102 @@
+# Apache Software License 2.0
+#
+# Copyright (c) ZenML GmbH 2024. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+
+from steps import (
+    evaluate_model,
+    finetune,
+    prepare_data,
+    promote,
+    log_metadata_from_step_artifact,
+)
+from zenml import pipeline
+from zenml.integrations.huggingface.steps import run_with_accelerate
+
+
+@pipeline
+def llm_peft_full_finetune(
+    system_prompt: str,
+    base_model_id: str,
+    use_fast: bool = True,
+    load_in_8bit: bool = False,
+    load_in_4bit: bool = False,
+):
+    """Pipeline for finetuning an LLM with peft.
+
+    It will run the following steps:
+
+    - prepare_data: prepare the datasets and tokenize them
+    - finetune: finetune the model
+    - evaluate_model: evaluate the base and finetuned model
+    - promote: promote the model to the target stage, if evaluation was successful
+    """
+    if not load_in_8bit and not load_in_4bit:
+        raise ValueError(
+            "At least one of `load_in_8bit` and `load_in_4bit` must be True."
+        )
+    if load_in_4bit and load_in_8bit:
+        raise ValueError("Only one of `load_in_8bit` and `load_in_4bit` can be True.")
+
+    datasets_dir = prepare_data(
+        base_model_id=base_model_id,
+        system_prompt=system_prompt,
+        use_fast=use_fast,
+    )
+
+    evaluate_model(
+        base_model_id,
+        system_prompt,
+        datasets_dir,
+        None,
+        use_fast=use_fast,
+        load_in_8bit=load_in_8bit,
+        load_in_4bit=load_in_4bit,
+        id="evaluate_base",
+    )
+    log_metadata_from_step_artifact(
+        "evaluate_base",
+        "base_model_rouge_metrics",
+        after=["evaluate_base"],
+        id="log_metadata_evaluation_base"
+    )
+
+    ft_model_dir = run_with_accelerate(finetune)(
+        base_model_id=base_model_id,
+        dataset_dir=datasets_dir,
+        use_fast=use_fast,
+        load_in_8bit=load_in_8bit,
+        load_in_4bit=load_in_4bit,
+    )
+
+    evaluate_model(
+        base_model_id,
+        system_prompt,
+        datasets_dir,
+        ft_model_dir,
+        use_fast=use_fast,
+        load_in_8bit=load_in_8bit,
+        load_in_4bit=load_in_4bit,
+        id="evaluate_finetuned",
+    )
+    log_metadata_from_step_artifact(
+        "evaluate_finetuned",
+        "finetuned_model_rouge_metrics",
+        after=["evaluate_finetuned"],
+        id="log_metadata_evaluation_finetuned"
+    )
+
+    promote(after=["log_metadata_evaluation_finetuned", "log_metadata_evaluation_base"])
diff --git a/llm-lora-finetuning/requirements.txt b/llm-lora-finetuning/requirements.txt
@@ -8,4 +8,4 @@ scipy
 evaluate
 rouge_score
 nltk
-accelerate
+accelerate