Skip to content

Commit

Permalink
GitHub actions orchestrator example (#703)
Browse files Browse the repository at this point in the history
* Add option to skip github remote check

* Conform to best practice example structure

* Improve scheduling workflow name

* Add new flavors to docs

* GH actions example

* Fix login step

* Add example image

* Only have pb2 pipeline once in workflow

* Add missing dollar

* Apply suggestions from code review

Co-authored-by: Alex Strick van Linschoten <[email protected]>

* Add CR permissions to example

* Fix spelling

* Add link to readme

* Update examples/github_actions_orchestration/README.md

Co-authored-by: Alex Strick van Linschoten <[email protected]>
  • Loading branch information
schustmi and strickvl authored Jun 14, 2022
1 parent 68d5b5d commit 36e3dd3
Show file tree
Hide file tree
Showing 14 changed files with 318 additions and 108 deletions.
6 changes: 4 additions & 2 deletions .pyspelling-ignore-words
Original file line number Diff line number Diff line change
Expand Up @@ -614,6 +614,7 @@ getfile
getsource
getter
gfile
ghcr
gif
github
gke
Expand Down Expand Up @@ -800,9 +801,9 @@ runtime
sagemaker
sam
sas
scalable
scalability
scalable
scalable
scaler
schemas
scikit
Expand Down Expand Up @@ -834,6 +835,7 @@ stackoverflow
startswith
staticmethod
stderr
stdin
stdout
stefan
str
Expand Down Expand Up @@ -874,10 +876,10 @@ touchpoint
traceback
txt
ui
unencrypted
uncomment
unconfigured
unencrypted
unencrypted
uninstallation
unix
unlinking
Expand Down
1 change: 1 addition & 0 deletions docs/book/extending-zenml/orchestrators.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ modules, such as the `AirflowOrchestrator` in the `airflow` integration and the
| [AirflowOrchestrator](https://apidocs.zenml.io/latest/api_docs/integrations/#zenml.integrations.airflow.orchestrators.airflow_orchestrator.AirflowOrchestrator) | airflow | airflow |
| [KubeflowOrchestrator](https://apidocs.zenml.io/latest/api_docs/integrations/#zenml.integrations.kubeflow.orchestrators.kubeflow_orchestrator.KubeflowOrchestrator) | kubeflow | kubeflow |
| [VertexAIOrchestrator](https://apidocs.zenml.io/latest/api_docs/integrations/#zenml.integrations.gcp.orchestrators.vertex_orchestrator.VertexOrchestrator) | vertex | gcp |
| [GitHubActionsOrchestrator](https://apidocs.zenml.io/latest/api_docs/integrations/#zenml.integrations.github.orchestrators.github_actions_orchestrator.GitHubActionsOrchestrator) | github | github |


If you would like to see the available flavors for artifact stores, you can
Expand Down
1 change: 1 addition & 0 deletions docs/book/extending-zenml/secrets-managers.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ For production use cases some more flavors can be found in specific
| [AWSSecretsManager](https://apidocs.zenml.io/latest/api_docs/integrations/#zenml.integrations.aws.secrets_managers.aws_secrets_manager.AWSSecretsManager) | aws | aws |
| [GCPSecretsManager](https://apidocs.zenml.io/latest/api_docs/integrations/#zenml.integrations.gcp.secrets_managers.gcp_secrets_manager.GCPSecretsManager) | gcp | gcp |
| [AzureSecretsManager](https://apidocs.zenml.io/latest/api_docs/integrations/#zenml.integrations.azure.secrets_managers.azure_secrets_manager.AzureSecretsManager) | azure | azure |
| [GitHubSecretsManager](https://apidocs.zenml.io/latest/api_docs/integrations/#zenml.integrations.github.secrets_managers.github_secrets_manager.GitHubSecretsManager) | github | github |

If you would like to see the available flavors for secret managers, you can
use the command:
Expand Down
157 changes: 118 additions & 39 deletions examples/github_actions_orchestration/README.md
Original file line number Diff line number Diff line change
@@ -1,73 +1,152 @@
# 🏃 Run pipelines in GitHub Actions
# 🏃 Run pipelines using GitHub Actions

# 🖥 Run it locally
[GitHub Actions](https://docs.github.com/en/actions) is a platform that allows you to execute
arbitrary software development workflows right in your GitHub repository. It's most commonly used for CI/CD pipelines, but using the **GitHub Actions orchestrator** ZenML now enables you to easily run and schedule
your machine learning pipelines as GitHub Actions workflows.

## 👣 Step-by-Step
### 📄 Prerequisites
## 📄 Prerequisites

In order to run this example, you need to install and initialize ZenML.
In order to run your ZenML pipelines using GitHub Actions, we need to set up a few things first:

* First you'll need a [GitHub](https://github.com) account and a cloned repository.
* You'll also need to create a GitHub personal access token that allows you read/write GitHub secrets and push Docker images to your GitHub container registry. To do so, please follow [this guide](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token) and make sure to assign your token the **repo** and **write:packages** scopes.
* A MySQL database that ZenML will use to store metadata. See [here](https://github.com/schustmi/github-orchestrator-test/actions/workflows/github_pipeline.yaml) for more information on how to set one up on AWS/GCP/Azure.
* An artifact store to save the outputs of your pipeline steps. See [here](https://docs.zenml.io/advanced-guide/guide-aws-gcp-azure#artifact-store) for more information on how to set one up on AWS/GCP/Azure.

```bash
pip install zenml

# Install ZenML integrations: choose one of s3/gcp/azure depending on where your artifact store is hosted
zenml integration install github <s3/gcp/azure>

cd <ROOT_OF_YOUR_GITHUB_REPOSITORY>

# Change the current working directory to a path inside your cloned GitHub repository
cd <PATH_INSIDE_GITHUB_REPOSITORY>

# If your git repository does not contain an existing ZenML pipeline, we can use the sample pipeline from this example
zenml example pull github_actions_orchestration
cp zenml_examples/github_actions_orchestration/run.py .
rm -rf zenml_examples
git add run.py
# If your git repository already contains a ZenML pipeline, you can skip these next few commands
zenml example pull github_actions_orchestration --path=.
git add github_actions_orchestration
git commit -m "Add ZenML example pipeline"
```

### 🥞 Create a new GitHub Actions Stack

```bash
export GITHUB_USERNAME=<>
export GITHUB_AUTHENTICATION_TOKEN=<>
cd github_actions_orchestration

zenml orchestrator register github_orchestrator --flavor=github
zenml container-registry register github_container_registry --flavor=github --uri=<CR_URI> --automatic_token_authentication=true
# Set environment variables for your GitHub username as well as the personal access token that you created earlier.
# These will be used to authenticate with the GitHub API in order to store credentials as GitHub secrets.
export GITHUB_USERNAME=<GITHUB_USERNAME>
export GITHUB_AUTHENTICATION_TOKEN=<GITHUB_AUTHENTICATION_TOKEN>

zenml secrets_manager register github_secrets_manager --flavor=github --owner=<GITHUB_REPOSITORY_OWNER> --repository=<GITHUB_REPOSITORY_NAME>

# Register a metadata store and a secret to connect to it
zenml secret register mysql_secret --schema=mysql --user=<USERNAME> --password=<PASSWORD> --ssl_ca=<> --ssl_cert=<> --ssl_key=<>
zenml metadata-store register cloud_metadata_store --flavor=mysql --host=<HOST> --database=<DATABASE_NAME> --secret=mysql_secret

# Register one of the three following artifact stores and a secret to connect to it
# 1) AWS
zenml secret register s3_store_auth --schema=aws --aws_access_key_id=<ACCESS_KEY_ID> --aws_secret_access_key=<SECRET_ACCESS_KEY>
zenml artifact-store register cloud_artifact_store --flavor=s3 --path=<YOUR_S3_BUCKET_PATH> --authentication_secret=s3_store_auth

# 2) GCP
zenml secret register gcp_store_auth --schema=gcp ...
zenml artifact-store register cloud_artifact_store --flavor=gcp --path=<YOUR_GCP_BUCKET_PATH> --authentication_secret=gcp_store_auth
# Login to the GitHub container registry so we can push the Docker images required to run your ZenML pipeline.
echo $GITHUB_AUTHENTICATION_TOKEN | docker login ghcr.io -u $GITHUB_USERNAME --password-stdin
```

# 3) AZURE
zenml secret register azure_store_auth --schema=azure ...
zenml artifact-store register cloud_artifact_store --flavor=azure --path=<YOUR_AZURE_BUCKET_PATH> --authentication_secret=azure_store_auth
## 🥞 Create a new GitHub Actions Stack

Once we have finished all the external setup, we can create a ZenML stack that
connects all these elements together:

```bash
# We configure the orchestrator to automatically commit and push the GitHub workflow file. If you want to disable this behavior, simply remove the `--push=true` argument
zenml orchestrator register github_orchestrator --flavor=github --push=true

# You can find the repository owner and repository name from the URL of your GitHub repository,
# for example https://github.com/zenml-io/zenml -> The owner would be `zenml-io` and the repository name `zenml`
zenml secrets_manager register github_secrets_manager \
--flavor=github \
--owner=<GITHUB_REPOSITORY_OWNER> \
--repository=<GITHUB_REPOSITORY_NAME>

# The GITHUB_CONTAINER_REGISTRY_URI format will be like this: ghcr.io/GITHUB_REPOSITORY_OWNER
zenml container-registry register github_container_registry \
--flavor=github \
--automatic_token_authentication=true \
--uri=<GITHUB_CONTAINER_REGISTRY_URI>

# Register a metadata store (we will create the authentication secret later)
# - HOST is the public IP address of your MySQL database
# - DATABASE_NAME is the name of the database in which ZenML should store metadata
zenml metadata-store register cloud_metadata_store \
--flavor=mysql \
--secret=mysql_secret \
--host=<HOST> \
--database=<DATABASE_NAME> \

# Register one of the three following artifact stores (we will create the authentication secrets later)
# AWS:
zenml artifact-store register cloud_artifact_store \
--flavor=s3 \
--authentication_secret=s3_store_auth \
--path=<S3_BUCKET_PATH>
# GCP:
zenml artifact-store register cloud_artifact_store \
--flavor=gcp \
--authentication_secret=gcp_store_auth \
--path=<GCP_BUCKET_PATH>
# AZURE:
zenml artifact-store register cloud_artifact_store \
--flavor=azure \
--authentication_secret=azure_store_auth \
--path=<AZURE_BUCKET_PATH>

# Register and activate the stack
zenml stack register github_stack \
-o github_orchestrator \
-s github_secrets_manager \
-c github_container_registry \
-m cloud_metadata_store \
-a cloud_artifact_store \
--set

# Now that the stack is active, we can register the secrets needed to connect to our metadata and artifact store:
zenml secret register mysql_secret \
--schema=mysql \
--user=<USERNAME> \
--password=<PASSWORD> \
--ssl_ca=@<PATH_TO_SSL_SERVER_CERTIFICATE> \
--ssl_cert=@<PATH_TO_SSL_CLIENT_CERTIFICATE> \
--ssl_key=@<PATH_TO_SSL_CLIENT_KEY>

# Register one of the following secrets depending on the flavor of artifact store that you've registered:
# AWS: See https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html for how to
# create the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to authenticate with your S3 bucket
zenml secret register s3_store_auth \
--schema=aws \
--aws_access_key_id=<AWS_ACCESS_KEY_ID> \
--aws_secret_access_key=<AWS_SECRET_ACCESS_KEY>
# GCP: The PATH_TO_GCP_TOKEN can be either a token generated by the `gcloud` CLI utility
# (e.g. ~/.config/gcloud/application_default_credentials.json) or a service account file
zenml secret register gcp_store_auth \
--schema=gcp \
--token=@<PATH_TO_GCP_TOKEN>
# AZURE: See https://docs.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?tabs=azure-portal
# for how to find your AZURE_ACCOUNT_NAME and AZURE_ACCOUNT_KEY
zenml secret register azure_store_auth \
--schema=azure \
--account_name=<AZURE_ACCOUNT_NAME> \
--account_key=<AZURE_ACCOUNT_KEY>
```

## ▶️ Run the pipeline

We're almost done now, but there is one additional step we need to do after our first pipeline ran (and failed). To do so, simply call

```bash
python run.py
```

### 📆 Run or schedule the pipeline
Running your first pipeline using the ZenML GitHub Actions orchestrator will create a [GitHub package](https://github.com/features/packages) called **zenml-github-actions** which by default won't be accessible by GitHub Actions.
Luckily it doesn't take much effort to resolve this problem: Head to `https://github.com/users/<GITHUB_REPOSITORY_OWNER>/packages/container/package/zenml-github-actions` (replace <GITHUB_REPOSITORY_OWNER> with the value you passed earlier during stack configuration) and click on `Package settings` on the right side. In there you can either
* change the package visibility to `public`
* give your repository permissions to access this package using GitHub Actions in the `Manage Actions access` section (see [here](https://docs.github.com/en/packages/learn-github-packages/configuring-a-packages-access-control-and-visibility#ensuring-workflow-access-to-your-package))

After this final step we can try again, and this time it should work:

```bash
python run.py
```

That's it! If everything went as planned, this pipeline should now be running in
GitHub Actions and you should be able to access it from the GitHub UI. It will look something like this:

![GitHub Actions UI](assets/github_actions_ui.png)

# 📜 Learn more

If you want to learn more about orchestrators in general or about how to build your own orchestrators in ZenML
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
18 changes: 18 additions & 0 deletions examples/github_actions_orchestration/pipelines/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Copyright (c) ZenML GmbH 2022. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at:
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied. See the License for the specific language governing
# permissions and limitations under the License.
from .github_example_pipeline.github_example_pipeline import (
github_example_pipeline,
)

__all__ = ["github_example_pipeline"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Copyright (c) ZenML GmbH 2022. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at:
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied. See the License for the specific language governing
# permissions and limitations under the License.

from zenml.pipelines import pipeline


@pipeline
def github_example_pipeline(first_step, second_step, third_step):
# Link all the steps together
first_num = first_step()
random_num = second_step()
third_step(first_num, random_num)
45 changes: 8 additions & 37 deletions examples/github_actions_orchestration/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,53 +11,24 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied. See the License for the specific language governing
# permissions and limitations under the License.
import random

from zenml.pipelines import pipeline
from zenml.steps import Output, step


@step
def get_first_num() -> Output(first_num=int):
"""Returns an integer."""
return 10


@step
def get_random_int() -> Output(random_num=int):
"""Get a random integer between 0 and 10"""
return random.randint(0, 10)


@step
def subtract_numbers(first_num: int, random_num: int) -> Output(result=int):
"""Subtract random_num from first_num."""
return first_num - random_num


@pipeline(enable_cache=False)
def example_pipeline(get_first_num, get_random_int, subtract_numbers):
# Link all the steps together
first_num = get_first_num()
random_num = get_random_int()
subtract_numbers(first_num, random_num)

from pipelines import github_example_pipeline
from steps import get_first_num, get_random_int, subtract_numbers

if __name__ == "__main__":
pipeline_instance = example_pipeline(
get_first_num=get_first_num(),
get_random_int=get_random_int(),
subtract_numbers=subtract_numbers(),
p = github_example_pipeline(
first_step=get_first_num(),
second_step=get_random_int(),
third_step=subtract_numbers(),
)

pipeline_instance.run()
p.run()

# If you want to run your pipeline on a schedule instead, you need to pass
# in a `Schedule` object with a cron expression. Note that for the schedule
# to get active, you'll need to merge the GitHub Actions workflow into your
# GitHub default branch. To see it in action, uncomment the following lines:

# from zenml.pipelines import Schedule
# pipeline_instance.run(
# p.run(
# schedule=Schedule(cron_expression="* 1 * * *")
# )
18 changes: 18 additions & 0 deletions examples/github_actions_orchestration/steps/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Copyright (c) ZenML GmbH 2022. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at:
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied. See the License for the specific language governing
# permissions and limitations under the License.
from .first_step.first_step import get_first_num
from .second_step.second_step import get_random_int
from .third_step.third_step import subtract_numbers

__all__ = ["get_first_num", "get_random_int", "subtract_numbers"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Copyright (c) ZenML GmbH 2022. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at:
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied. See the License for the specific language governing
# permissions and limitations under the License.
from zenml.steps import Output, step


@step
def get_first_num() -> Output(first_num=int):
"""Returns an integer."""
return 10
Loading

0 comments on commit 36e3dd3

Please sign in to comment.