Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved the Instructions #113

Merged
merged 19 commits into from
Jul 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion .typos.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,12 @@ extend-exclude = [
"customer-satisfaction/streamlit_app.py",
"nba-pipeline/Building and Using An MLOPs Stack With ZenML.ipynb",
"customer-satisfaction/tests/data_test.py",
"end-to-end-computer-vision/**/*.ipynb"
"end-to-end-computer-vision/**/*.ipynb",
"classifier-e2e/run_skip_basics.ipynb",
"classifier-e2e/run_full.ipynb",
"classifier-e2e/run_skip_basics.ipynb",
"classifier-e2e/run_full.ipynb",
"classifier-e2e/run_skip_basics.ipynb"
]

[default.extend-identifiers]
Expand All @@ -26,6 +31,7 @@ Implicitly = "Implicitly"
fo = "fo"
mapp = "mapp"
polution = "polution"
magent = "magent"

[default]
locale = "en-us"
71 changes: 42 additions & 29 deletions end-to-end-computer-vision/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,13 @@ things that you'll need to do.
## ZenML

We recommend using our [ZenML Pro offering](https://cloud.zenml.io/) to get a
deployed instance of zenml:
deployed instance of ZenML:

### Set up your environment

```bash
pip install -r requirements.txt
zenml integration install label_studio torch gcp mlflow -y
zenml integration install torch gcp mlflow label_studio -y
pip uninstall wandb # This comes in automatically
```

Expand Down Expand Up @@ -63,37 +63,50 @@ zenml connect --url <INSERT_ZENML_URL_HERE>
We will use GCP in the commands listed below, but it will work for other cloud
providers.

### Follow our guide to set up your credential for GCP
1) Follow our guide to set up your credentials for GCP [here](https://docs.zenml.io/how-to/auth-management/gcp-service-connector)

[Set up a GCP service
connector](https://docs.zenml.io/how-to/auth-management/gcp-service-connector)
2) Set up a bucket in GCP to persist your training data

### Set up a bucket to persist your training data

### Set up a bucket to use as artifact store within ZenML

[Learn how to set up a GCP artifact store stack component within zenml
here](https://docs.zenml.io/stack-components/artifact-stores)
### Set up vertex for pipeline orchestration

[Learn how to set up a Vertex orchestrator stack component within zenml
here](https://docs.zenml.io/stack-components/orchestrators/vertex)
### For training on accelerators like GPUs/TPUs set up Vertex

[Learn how to set up a Vertex step operator stack component within zenml
here](https://docs.zenml.io/stack-components/step-operators/vertex)
### Set up Container Registry

[Learn how to set up a google cloud container registry component within zenml
here](https://docs.zenml.io/stack-components/container-registries/gcp)
3) Set up a bucket to use as artifact store within ZenML
Learn how to set up a GCP artifact store stack component within ZenML
[here](https://docs.zenml.io/stack-components/artifact-stores)
4) Set up Vertex for pipeline orchestration
Learn how to set up a Vertex orchestrator stack component within ZenML
[here](https://docs.zenml.io/stack-components/orchestrators/vertex)
5) For training on accelerators like GPUs/TPUs set up Vertex
Learn how to set up a Vertex step operator stack component within ZenML
[here](https://docs.zenml.io/stack-components/step-operators/vertex)
6) Set up a Container Registry in GCP. Learn how to set up a google cloud container registry component within ZenML
[here](https://docs.zenml.io/stack-components/container-registries/gcp)

## Label Studio

### [Start Label Studio locally](https://labelstud.io/guide/start)
### [Follow these ZenML instructions to set up Label Studio as a stack component](https://docs.zenml.io/stack-components/annotators/label-studio)
### Create a project within Label Studio and name it `ship_detection_gcp`
### [Set up Label Studio to use external storage](https://labelstud.io/guide/storage)
use the first bucket that you created to data persistence
1) [Start Label Studio locally](https://labelstud.io/guide/start)
For Label Studio we recommend using docker/docker-compose to deploy a local instance
```bash
git clone https://github.com/HumanSignal/label-studio.git
cd label-studio
docker-compose up -d # starts label studio at http://localhost:8080
```
2) [Follow these ZenML instructions to set up Label Studio as a stack component](https://docs.zenml.io/stack-components/annotators/label-studio#how-to-deploy-it)
3) Create a project within Label Studio and name it `ship_detection_gcp`
![img.png](_assets/project_creation_label_studio.png)
4) Configure your project to use `Object Detection with Bounding Boxes` as Labeling Setup
![img.png](_assets/labeling_setup.png)
In the following screen you now need to configure the labeling interface. This is where you define the different classes that you want to detect. In our case this should be a single `ship` class.
![img.png](_assets/labeling_interface.png)
Additionally you might want to allow users to zoom during labeling. This can be configured when you scroll down on this same screen.
6) [Set up Label Studio to use external storage](https://labelstud.io/guide/storage)
Use the first bucket that you created for data persistence

## Hugging Face

This specific project relies on a dataset loaded from Hugging Face. As such a free Hugging Face account is needed.

1) Login in the CLI. Simply follow the instructions from this command.
```commandline
huggingface-cli login
```

## ZenML Stacks

Expand Down Expand Up @@ -126,7 +139,7 @@ The project consists of the following pipelines:
This pipeline downloads the [Ship Detection
dataset](https://huggingface.co/datasets/datadrivenscience/ship-detection). This
dataset contains some truly huge images with a few hundred million pixels. In
order to make these useable, we break down all source images into manageable
order to make these usable, we break down all source images into manageable
tiles with a maximum height/width of 1000 pixels. After this preprocessing is
done, the images are uploaded into a cloud bucket and the ground truth
annotations are uploaded to a local Label Studio instance.
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion end-to-end-computer-vision/configs/ingest_data.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ steps:
enable_step_logs: False
parameters:
dataset: "datadrivenscience/ship-detection"
data_source: # Insert your bucket path here where the training images will live e.g. "gs://foo/bar"
data_source: <INSERT_HERE> # Replace this with the path to a data source
upload_labels_to_label_studio:
enable_cache: False
parameters:
Expand Down
1 change: 1 addition & 0 deletions end-to-end-computer-vision/configs/training_pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ steps:
batch_size: 8
imgsz: 720
epochs: 1
is_apple_silicon_env: False

settings:
docker:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ steps:
imgsz: 720
epochs: 50000
is_quad_gpu_env: True
is_apple_silicon_env: False
settings:
step_operator.vertex:
accelerator_type: NVIDIA_TESLA_T4 # see https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec#AcceleratorType
Expand Down
Empty file.
86 changes: 0 additions & 86 deletions end-to-end-computer-vision/steps/download_from_hf.py

This file was deleted.

10 changes: 10 additions & 0 deletions end-to-end-computer-vision/steps/train_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ def train_model(
batch_size: int = 16,
imgsz: int = 640,
is_quad_gpu_env: bool = False,
is_apple_silicon_env: bool = False,
) -> Tuple[
Annotated[
YOLO, ArtifactConfig(name="Trained_YOLO", is_model_artifact=True)
Expand All @@ -57,6 +58,7 @@ def train_model(
dataset: Dataset to train the model on.
data_source: Source where the data lives
is_quad_gpu_env: Whether we are in an env with 4 gpus
is_apple_silicon_env: In case we are running on Apple compute

Returns:
Tuple[YOLO, Dict[str, Any]]: Trained model and validation metrics.
Expand All @@ -75,6 +77,14 @@ def train_model(
imgsz=imgsz,
device=[0, 1, 2, 3],
)
elif is_apple_silicon_env:
model.train(
data=data_path,
epochs=epochs,
batch=batch_size,
imgsz=imgsz,
device="mps",
)
else:
model.train(
data=data_path,
Expand Down
50 changes: 38 additions & 12 deletions end-to-end-computer-vision/utils/dataset_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,20 @@ def load_images_from_folder(folder):
return images


def load_images_from_source(data_source, download_dir, filenames):
total_images = len(filenames)
for index, filename in enumerate(filenames):
src_path = f"{data_source}/{filename}.png"
dst_path = os.path.join(download_dir, f"{filename}.png")
if not os.path.exists(dst_path):
fileio.copy(src_path, dst_path)

if (index + 1) % 100 == 0 or index == total_images - 1:
logger.info(
f"{index + 1} of {total_images} images have been downloaded..."
)


def load_and_split_data(
dataset: LabelStudioAnnotationExport, data_source: str
) -> str:
Expand All @@ -71,21 +85,33 @@ def load_and_split_data(
if f.endswith(".txt")
]

# Download corresponding images from gcp bucket
images_folder = os.path.join(extract_location, "images")
# Download images from source bucket and if successful keep them to reuse for future runs
load_images = False
download_dir = os.path.join(os.getcwd(), "images") # Temporary dirname that represents a still incomplete download
loaded_images = os.path.join(os.getcwd(), "loaded-images") # The dirname used once the download fully completes
images_folder = os.path.join(extract_location, "images") # tmp dirpath used for the current run only

# Check that images have not already been downloaded
if not os.path.exists(loaded_images):
os.makedirs(download_dir, exist_ok=True)
load_images = True

# Checks that new images have not been added since previous download
if os.path.exists(loaded_images):
if len(os.listdir(loaded_images)) != len(filenames):
download_dir = loaded_images
load_images = True

if load_images:
logger.info(f"Downloading images from {data_source}")
load_images_from_source(data_source, download_dir, filenames)
os.rename(download_dir, loaded_images)

os.makedirs(images_folder, exist_ok=True)

total_images = len(filenames)
logger.info(f"Downloading images from {data_source}")
for index, filename in enumerate(filenames):
src_path = f"{data_source}/{filename}.png"
dst_path = os.path.join(images_folder, f"{filename}.png")
fileio.copy(src_path, dst_path)
logger.info(f"Copy images to {images_folder}")
load_images_from_source(loaded_images, images_folder, filenames)

if (index + 1) % 100 == 0 or index == total_images - 1:
logger.info(
f"{index + 1} of {total_images} images have been downloaded..."
)
split_dataset(extract_location, ratio=(0.7, 0.15, 0.15), seed=42)
yaml_path = generate_yaml(extract_location)
return yaml_path
Expand Down
3 changes: 2 additions & 1 deletion end-to-end-computer-vision/utils/split_data.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import math
import os
import random
import shutil
Expand Down Expand Up @@ -37,7 +38,7 @@ def split_dataset(
seed: Random seed for reproducibility.
"""
# Ensure the ratio is correct
assert sum(ratio) == 1.0
assert math.isclose(sum(ratio), 1.0, rel_tol=1e-9)

# Seed to get consistent results
if seed is not None:
Expand Down
2 changes: 1 addition & 1 deletion llm-agents/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ You can sign up for a free trial of the cloud at https://cloud.zenml.io. Once si

### Models Tab in the Dashboard

The models tab acts as a central control plane for all of your models. You can view the different versions that get created implictly with your pipeline runs, check their metadata, deployments and more!
The models tab acts as a central control plane for all of your models. You can view the different versions that get created implicitly with your pipeline runs, check their metadata, deployments and more!

![model versions](./assets/llm-agent/model_versions.png)

Expand Down
2 changes: 1 addition & 1 deletion llm-finetuning/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ This project recently did a [call of volunteers](https://www.linkedin.com/feed/u

While the work here is solely based on the task of finetuning the model for the ZenML library, the pipeline can be changed with minimal effort to point to any set of repositories on GitHub. Theoretically, one could extend this work to point to proprietary codebases to learn from them for any use-case.

For example, see how [VMWare fine-tuned StarCoder to learn their style](https://octo.vmware.com/fine-tuning-starcoder-to-learn-vmwares-coding-style/).
For example, see how [VMWare fine-tuned StarCoder to learn their style](https://entreprenerdly.com/fine-tuning-starcoder-to-create-a-coding-assistant-that-adapts-to-your-coding-style/).

Also, make sure to join our <a href="https://zenml.io/slack" target="_blank">
<img width="15" src="https://cdn3.iconfinder.com/data/icons/logos-and-brands-adobe/512/306_Slack-512.png" alt="Slack"/>
Expand Down
8 changes: 4 additions & 4 deletions llm-litgpt-finetuning/lit_gpt/lora.py
Original file line number Diff line number Diff line change
Expand Up @@ -383,7 +383,7 @@ def conv1d(
If the number of heads is equal to the number of query groups - grouped queries are disabled
(see scheme in `lit_gpt/config.py:Config`). In this case the combined QKV matrix consists of equally sized
query, key and value parts, which means we can utilize `groups` argument from `conv1d`: with this argument the
input and weight matrices will be splitted in equally sized parts and applied separately (like having multiple
input and weight matrices will be split in equally sized parts and applied separately (like having multiple
conv layers side by side).

Otherwise QKV matrix consists of unequally sized parts and thus we have to split input and weight matrices manually,
Expand All @@ -408,14 +408,14 @@ def conv1d(
# ⚬ C_output': embeddings size for each LoRA layer (not equal in size)
# ⚬ r: rank of all LoRA layers (equal in size)

input_splitted = input.chunk(
input_split = input.chunk(
sum(self.enable_lora), dim=1
) # N * (B, C // N, T)
weight_splitted = weight.split(
weight_split = weight.split(
self.qkv_shapes
) # N * (C_output', r, 1)
return torch.cat(
[F.conv1d(a, b) for a, b in zip(input_splitted, weight_splitted)],
[F.conv1d(a, b) for a, b in zip(input_split, weight_split)],
dim=1, # (B, C_output', T)
) # (B, C_output, T)

Expand Down
Loading