Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fabric E2E Sample] Added CI pipelines #998

Closed

Conversation

camaderal
Copy link

@camaderal camaderal commented Jan 4, 2025

Type of PR

  • Code changes
  • Test changes
  • CI-CD changes

Purpose

Added the following CI pipelines

  • devops/azure-pipelines-ci-artifacts.yml
    • Triggered on commits to the main branch. This should trigger the linting/testing for the python modules and publish fabric environment and ADLS artifacts.
    • Steps
      • Publish Fabric Environment Config Artifacts
        • Lint and run test for custom libraries (python modules)
        • Build Fabric Environment Config Artifacts
          • Output:
          fabric_env
          |_environment.yaml
          |_custom_libraries
             |_ddo_transform_standardize.py
             |_ddo_transform_transform.py
             |_otel_monitor_invoker.py
          
      • Publish ADLS Artifacts
        • Create the application.cfg from template
        • Build ADLS Artifacts
          • Output:
          adls
          |_config
             |_application.cfg
             |_lakehouse_ddls.yaml
          |_reference
             |_dim_date.csv
             |_dim_time.csv
          
  • devops/azure-pipelines-ci-qa.yml:
    • Triggered on PR against the main branch. This would test the python modules, create an ephemeral workspace, and run the workspace tests.
    • Steps
      • BuildLibraries
        • Lint and run test for custom libraries (python modules)
      • BuildFabric
        • Interactive Azure CLI Login (Most fabric APIs needed user identity so need to login)
        • Get Access Tokens for Fabric, Azure Management, Azure Storage
        • Build Workspace
          • Create new feature workspace if $FABRIC_WORKSPACE_NAME-$PR_ID doesn’t exist.
            • Idea is the pipeline will only create workspace per PR. Cleanup of feature workspace will be done when merged.
          • Add workspace role assignments for group admin
          • Provision Workspace Identity
          • Sync newly created workspace to the feature branch.
          • Create a custom work pool with name $FABRIC_CUSTOM_POOL_NAME if it doesn't exist. Assign it to the workspace.
            • work pool details will be derived from fabric/fabric_environment/spark_pool_settings.yml file of the feature branch.
          • Create ADLS storage container with name feature-$PR-ID if it doesn't exist.
          • Add role assignment to the feature workspace identity for the ADLS storage account.
          • Create the ADLS Cloud connection with name $FABRIC_CONNECTION_NAME-$PR_ID if it doesn't exist.
          • Add collection role assignments for group admin
          • Create the ADLS shortcut if it doesn't exist.
        • Update environment
          • Check staged and published compute settings and libraries.
          • If there are unpublished changes, publish the environment. (First time the pipeline is run for a PR, the pipeline updates spark pool and all libraries are unpublished so it will publish the environment. For subsequent runs, it shouldn't re-publish the environment.)
          • If there are changes in the PR for the the environment config files or custom libraries, then it will always re-publish the environment every time PR is run.
        • Update ADLS Config files
          • Create the application.cfg from template
          • Upload the following to the feature-$PR-ID storage container
        adls
        |_config
           |_application.cfg
           |_lakehouse_ddls.yaml
        |_reference
           |_dim_date.csv
           |_dim_time.csv
        
        • Test Fabric Workspace
          • Run workspace tests (run notebooks, pipeline, etc)
            • FABRIC_WORKSPACE_NAME will be the feature workspaceFABRIC_WORKSPACE_NAME-$PR_ID
  • devops/azure-pipeline-ci-qa-cleanup:
    • Should be triggered when PR against the main branch is completed/abandoned. This would remove resources created during the azure-pipelines-ci-qa pipeline. I still haven't figured how to trigger but I am thinking service hooks?
    • Steps
      • CleanupWorkspace
        • Interactive Azure CLI Login (Most fabric APIs needed user identity so need to login)
        • Get Access Tokens for Fabric, Azure Management, Azure Storage
        • Remove storage container, storage role assignments, ADLS cloud connection and workspace

Added a setup repository script

  • This is a temporary script to setup the repo files inAzure DevOps to test the pipelines.
  • Note that I moved some files so uploading the files is easier:
    • config/fabric_environment/environment.yaml ⇨ fabric/fabric_environment/environment.yaml
    • config/fabric_environment/*.py ⇨ libraries/src/
    • src/test/* ⇨ libraries/test/
  • I also modified the deploy script to sync the fabric workspace in the $GIT_DIRECTORY_NAME/fabric/workspace instead.

Does this introduce a breaking change? If yes, details on what can break

No.

Author pre-publish checklist

  • Added test to prove my fix is effective or new feature works
  • No PII in logs
  • Made corresponding changes to the documentation

Validation steps

To test the pipelines, you need to have your git repositories to be a certain way. Here are steps on how to set it up.

  • Generate your Azure Devops Credentials then add the following env variables GIT_USERNAME and GIT_TOKEN. Other than that make sure, these env variables are also set:

    • GIT_ORGANIZATION_NAME
    • GIT_PROJECT_NAME
    • GIT_REPOSITORY_NAME
    • GIT_BRANCH_NAME
    • GIT_DIRECTORY_NAME
  • Run the scripts/setup_repository.py This would add all the necessary files to the GIT_DIRECTORY_NAME path in the Azure DevOps repo. This will also create the fabric/workspace folder where the fabric workspace will be synced. The file structure should look like this (minus the fabric/workspace part which would be generated later):
    Screenshot 2025-01-04 at 5 17 16

  • Either:

    • Clean deploy the fabric workspace with the instructions from the README.
    • Sync your existing fabric workspace to the fabric/workspace folder in the Azure Devops repo.
  • Create the variable group. Here are the required values:

    • WORKING_DIR: Folder path where you committed your repository. Same as GIT_DIRECTORY_NAME in the env variables
    • SUBSCRIPTION_ID : Same as the SUBSCRIPTION_ID in the env variables
    • RESOURCE_GROUP_NAME: Same as the RESOURCE_GROUP_NAME in the env variables
    • STORAGE_ACCOUNT_NAME: Created storage account name
    • STORAGE_ACCOUNT_ROLE_DEFINITION_ID (Role definition id given to fabric workspace to access storage account)
    • STORAGE_CONTAINER_NAME : Default is ”main”
    • KEYVAULT_NAME: Created key vault name
    • ORGANIZATIONAL_NAME: Same as the GIT_ORGANIZATION_NAME in the env variables
    • PROJECT_NAME: Same as the GIT_PROJECT_NAME in the env variables
    • REPO_NAME: Same as the GIT_REPOSITORY_NAME in the env variables
    • FABRIC_WORKSPACE_GROUP_ADMIN : Principal Id of the Group Admin for the workspace and cloud connection
    • FABRIC_WORKSPACE_NAME: Created workspace name
    • FABRIC_CAPACITY_NAME: Created capacity name
    • FABRIC_ENVIRONMENT_NAME: Created environment name
    • FABRIC_LAKEHOUSE_NAME: Created lakehouse name
    • FABRIC_SHORTCUT_NAME: Default is “sc-adls-main”
    • FABRIC_CUSTOM_POOL_NAME: Created custom pool name.
    • FABRIC_CONNECTION_NAME: Created ADLS Cloud Connection name.
  • Modify the following values in the pipelines and commit it.

    • <VARIABLE_GROUP_NAME>: the variable group you created
    • <DEV_BRANCH_NAME>: the branch name you want the pipeline to be triggered.
  • Setup the pipeline then run it.

  • Known issue:

    • In the CI QA pipeline, there is a part in the build workspace script where a new storage container is built then a shortcut is created. Sometimes, the creation of container or the addition of role assignment takes time? and when the shortcut is created it will result in the message below. Without changing anything, you can rerun the pipeline and it should work the second time.:
    Exception: [Error] Failed to create shortcut 'sc-adls-main': 400 - {"requestId":"XXX","errorCode":"BadRequest","moreDetails":[{"errorCode":"RequestBodyValidationFailed","message":"Unauthorized. Access to target location https://xxx.blob.core.windows.net/feature-XXX// denied."}],"message":"The request could not be processed due to missing or invalid information"}

Issues Closed or Referenced

camaderal and others added 22 commits December 16, 2024 23:33
rename the standardize validation root folder
…0-2' into kitsune/notebook_and_pipeline_updates
…0-2' into kitsune/notebook_and_pipeline_updates
…ansform_module

[Fabric E2E Sample] ddo transform module upload
…0-2' into kitsune/notebook_and_pipeline_updates
@yuna-s yuna-s changed the title Added CI pipelines [Fabric E2E Sample] Added CI pipelines Jan 6, 2025
@bsherwin
Copy link
Contributor

bsherwin commented Jan 6, 2025

@camaderal @yuna-s - I updated the github workflow actions on main that will solve the hyperlinks check. You'll need to merge from main to get the changes.

@promisinganuj promisinganuj added the e2e: fabric Related with E2E Fabric Sample label Jan 6, 2025
@promisinganuj promisinganuj linked an issue Jan 6, 2025 that may be closed by this pull request
5 tasks
Copy link

@yunishimura0716 yunishimura0716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your great work @camaderal !
Overall, looks good.
I saw some duplicate codes (e.g. fabric api in build_workspace.py and update_environment.py). You can make it more clean by putting them together into a single file or place.



# Main function to orchestrate the process
def main() -> None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be better to do error handling as it will stop running if there is pre-existing but this should be ignored

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was supposed to be a temporary script. I just added it for people to setup the repo easily the first time. I am thinking this part would be done eventually with terraform? I am not sure.

Anyway, I commit by folder so some files might already be present and some are not. In this case, if I ignore the exception the non-pre-existing files will not be uploaded.

One way I could improve this is to check each file if it exist first then deciding if it should be "added" or "updated"
but it was too complicated 😭

Copy link

@yunishimura0716 yunishimura0716 Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, then this should be in the process of deployment such that it can be compatible with the run at any time (regardless there are pre-existing or not, it will make the AzDo repo ready for CI)

Comment on lines 50 to 73
# -----------------------------------------------------------------------------
azure_management_headers: Dict[str, str] = {}
azure_storage_headers: Dict[str, str] = {}
fabric_headers: Dict[str, str] = {}

fabric_api_endpoint = "https://api.fabric.microsoft.com/v1"


def set_azure_management_headers(azure_management_bearer_token: str) -> None:
global azure_management_headers
azure_management_headers = {
"Authorization": f"Bearer {azure_management_bearer_token}",
"Content-Type": "application/json",
}


def set_azure_storage_headers(azure_storage_bearer_token: str) -> None:
global azure_storage_headers
azure_storage_headers = {"Authorization": f"Bearer {azure_storage_bearer_token}", "x-ms-version": "2021-02-12"}


def set_fabric_headers(fabric_bearer_token: str) -> None:
global fabric_headers
fabric_headers = {"Authorization": f"Bearer {fabric_bearer_token}", "Content-Type": "application/json"}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't simply declare the variable in global scope?
like

Suggested change
# -----------------------------------------------------------------------------
azure_management_headers: Dict[str, str] = {}
azure_storage_headers: Dict[str, str] = {}
fabric_headers: Dict[str, str] = {}
fabric_api_endpoint = "https://api.fabric.microsoft.com/v1"
def set_azure_management_headers(azure_management_bearer_token: str) -> None:
global azure_management_headers
azure_management_headers = {
"Authorization": f"Bearer {azure_management_bearer_token}",
"Content-Type": "application/json",
}
def set_azure_storage_headers(azure_storage_bearer_token: str) -> None:
global azure_storage_headers
azure_storage_headers = {"Authorization": f"Bearer {azure_storage_bearer_token}", "x-ms-version": "2021-02-12"}
def set_fabric_headers(fabric_bearer_token: str) -> None:
global fabric_headers
fabric_headers = {"Authorization": f"Bearer {fabric_bearer_token}", "Content-Type": "application/json"}
# -----------------------------------------------------------------------------
fabric_api_endpoint = "https://api.fabric.microsoft.com/v1"
azure_management_headers = {
"Authorization": f"Bearer {azure_management_bearer_token}",
"Content-Type": "application/json",
}
azure_storage_headers = {"Authorization": f"Bearer {azure_storage_bearer_token}", "x-ms-version": "2021-02-12"}
fabric_headers = {"Authorization": f"Bearer {fabric_bearer_token}", "Content-Type": "application/json"}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was copying the style from here:

But yeah I can do this.

Comment on lines 29 to 39
# Global Variables
# -----------------------------------------------------------------------------
fabric_headers: Dict[str, str] = {}

fabric_api_endpoint = "https://api.fabric.microsoft.com/v1"


def set_fabric_headers(fabric_bearer_token: str) -> None:
global fabric_headers
fabric_headers = {"Authorization": f"Bearer {fabric_bearer_token}", "Content-Type": "application/json"}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above comment at build_workspace.py

@camaderal
Copy link
Author

@yunishimura0716 Thanks for reviewing! I really appreciate it. I refactored the code and applied your comments (except for the setup repository script because this is just temporary script). Please check it and see if there is anything else I need to change. Thanks!

Base automatically changed from kitsune/notebook_and_pipeline_updates to feat/e2e-fabric-dataops-sample-v0-2 January 10, 2025 00:28
@camaderal
Copy link
Author

Due to rebasing and merging, I think there are a lot of unnecessary changes included in this PR. I will close this one and re-create a new one with just my changes.

@camaderal camaderal removed this from the E2E - parking_sensor_fabric - V1 milestone Jan 10, 2025
@camaderal camaderal closed this Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
e2e: fabric Related with E2E Fabric Sample
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create Basic AzDO CI pipeline
6 participants