-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fabric E2E Sample] Added CI pipelines #998
[Fabric E2E Sample] Added CI pipelines #998
Conversation
rename the standardize validation root folder
…ook_and_pipeline_updates
…b.com/Azure-Samples/modern-data-warehouse-dataops into kitsune/notebook_and_pipeline_updates
…ook_and_pipeline_updates
…0-2' into kitsune/notebook_and_pipeline_updates
…0-2' into kitsune/notebook_and_pipeline_updates
…ansform_module [Fabric E2E Sample] ddo transform module upload
…0-2' into kitsune/notebook_and_pipeline_updates
e2e_samples/fabric_dataops_sample/libraries/test/ddo_transform/test_standardize.py
Outdated
Show resolved
Hide resolved
e2e_samples/fabric_dataops_sample/libraries/test/ddo_transform/test_transform.py
Outdated
Show resolved
Hide resolved
@camaderal @yuna-s - I updated the github workflow actions on main that will solve the hyperlinks check. You'll need to merge from main to get the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your great work @camaderal !
Overall, looks good.
I saw some duplicate codes (e.g. fabric api in build_workspace.py
and update_environment.py
). You can make it more clean by putting them together into a single file or place.
|
||
|
||
# Main function to orchestrate the process | ||
def main() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be better to do error handling as it will stop running if there is pre-existing but this should be ignored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this was supposed to be a temporary script. I just added it for people to setup the repo easily the first time. I am thinking this part would be done eventually with terraform? I am not sure.
Anyway, I commit by folder so some files might already be present and some are not. In this case, if I ignore the exception the non-pre-existing files will not be uploaded.
One way I could improve this is to check each file if it exist first then deciding if it should be "added" or "updated"
but it was too complicated 😭
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, then this should be in the process of deployment such that it can be compatible with the run at any time (regardless there are pre-existing or not, it will make the AzDo repo ready for CI)
e2e_samples/fabric_dataops_sample/fabric/fabric_environment/spark_pool_settings.yml
Show resolved
Hide resolved
# ----------------------------------------------------------------------------- | ||
azure_management_headers: Dict[str, str] = {} | ||
azure_storage_headers: Dict[str, str] = {} | ||
fabric_headers: Dict[str, str] = {} | ||
|
||
fabric_api_endpoint = "https://api.fabric.microsoft.com/v1" | ||
|
||
|
||
def set_azure_management_headers(azure_management_bearer_token: str) -> None: | ||
global azure_management_headers | ||
azure_management_headers = { | ||
"Authorization": f"Bearer {azure_management_bearer_token}", | ||
"Content-Type": "application/json", | ||
} | ||
|
||
|
||
def set_azure_storage_headers(azure_storage_bearer_token: str) -> None: | ||
global azure_storage_headers | ||
azure_storage_headers = {"Authorization": f"Bearer {azure_storage_bearer_token}", "x-ms-version": "2021-02-12"} | ||
|
||
|
||
def set_fabric_headers(fabric_bearer_token: str) -> None: | ||
global fabric_headers | ||
fabric_headers = {"Authorization": f"Bearer {fabric_bearer_token}", "Content-Type": "application/json"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't simply declare the variable in global scope?
like
# ----------------------------------------------------------------------------- | |
azure_management_headers: Dict[str, str] = {} | |
azure_storage_headers: Dict[str, str] = {} | |
fabric_headers: Dict[str, str] = {} | |
fabric_api_endpoint = "https://api.fabric.microsoft.com/v1" | |
def set_azure_management_headers(azure_management_bearer_token: str) -> None: | |
global azure_management_headers | |
azure_management_headers = { | |
"Authorization": f"Bearer {azure_management_bearer_token}", | |
"Content-Type": "application/json", | |
} | |
def set_azure_storage_headers(azure_storage_bearer_token: str) -> None: | |
global azure_storage_headers | |
azure_storage_headers = {"Authorization": f"Bearer {azure_storage_bearer_token}", "x-ms-version": "2021-02-12"} | |
def set_fabric_headers(fabric_bearer_token: str) -> None: | |
global fabric_headers | |
fabric_headers = {"Authorization": f"Bearer {fabric_bearer_token}", "Content-Type": "application/json"} | |
# ----------------------------------------------------------------------------- | |
fabric_api_endpoint = "https://api.fabric.microsoft.com/v1" | |
azure_management_headers = { | |
"Authorization": f"Bearer {azure_management_bearer_token}", | |
"Content-Type": "application/json", | |
} | |
azure_storage_headers = {"Authorization": f"Bearer {azure_storage_bearer_token}", "x-ms-version": "2021-02-12"} | |
fabric_headers = {"Authorization": f"Bearer {fabric_bearer_token}", "Content-Type": "application/json"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was copying the style from here:
Line 100 in c7781a1
def set_headers(fabric_bearer_token: str) -> None: |
But yeah I can do this.
# Global Variables | ||
# ----------------------------------------------------------------------------- | ||
fabric_headers: Dict[str, str] = {} | ||
|
||
fabric_api_endpoint = "https://api.fabric.microsoft.com/v1" | ||
|
||
|
||
def set_fabric_headers(fabric_bearer_token: str) -> None: | ||
global fabric_headers | ||
fabric_headers = {"Authorization": f"Bearer {fabric_bearer_token}", "Content-Type": "application/json"} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above comment at build_workspace.py
@yunishimura0716 Thanks for reviewing! I really appreciate it. I refactored the code and applied your comments (except for the setup repository script because this is just temporary script). Please check it and see if there is anything else I need to change. Thanks! |
Due to rebasing and merging, I think there are a lot of unnecessary changes included in this PR. I will close this one and re-create a new one with just my changes. |
Type of PR
Purpose
Added the following CI pipelines
$FABRIC_WORKSPACE_NAME-$PR_ID
doesn’t exist.$FABRIC_CUSTOM_POOL_NAME
if it doesn't exist. Assign it to the workspace.fabric/fabric_environment/spark_pool_settings.yml
file of the feature branch.feature-$PR-ID
if it doesn't exist.$FABRIC_CONNECTION_NAME-$PR_ID
if it doesn't exist.feature-$PR-ID
storage containerFABRIC_WORKSPACE_NAME
will be the feature workspaceFABRIC_WORKSPACE_NAME-$PR_ID
Added a setup repository script
$GIT_DIRECTORY_NAME/fabric/workspace
instead.Does this introduce a breaking change? If yes, details on what can break
No.
Author pre-publish checklist
Validation steps
To test the pipelines, you need to have your git repositories to be a certain way. Here are steps on how to set it up.
Generate your Azure Devops Credentials then add the following env variables
GIT_USERNAME
andGIT_TOKEN
. Other than that make sure, these env variables are also set:GIT_ORGANIZATION_NAME
GIT_PROJECT_NAME
GIT_REPOSITORY_NAME
GIT_BRANCH_NAME
GIT_DIRECTORY_NAME
Run the
scripts/setup_repository.py
This would add all the necessary files to theGIT_DIRECTORY_NAME
path in the Azure DevOps repo. This will also create thefabric/workspace
folder where the fabric workspace will be synced. The file structure should look like this (minus thefabric/workspace
part which would be generated later):Either:
Create the variable group. Here are the required values:
WORKING_DIR
: Folder path where you committed your repository. Same asGIT_DIRECTORY_NAME
in the env variablesSUBSCRIPTION_ID
: Same as theSUBSCRIPTION_ID
in the env variablesRESOURCE_GROUP_NAME
: Same as theRESOURCE_GROUP_NAME
in the env variablesSTORAGE_ACCOUNT_NAME
: Created storage account nameSTORAGE_ACCOUNT_ROLE_DEFINITION_ID
(Role definition id given to fabric workspace to access storage account)STORAGE_CONTAINER_NAME
: Default is ”main”KEYVAULT_NAME
: Created key vault nameORGANIZATIONAL_NAME
: Same as theGIT_ORGANIZATION_NAME
in the env variablesPROJECT_NAME
: Same as theGIT_PROJECT_NAME
in the env variablesREPO_NAME
: Same as theGIT_REPOSITORY_NAME
in the env variablesFABRIC_WORKSPACE_GROUP_ADMIN
: Principal Id of the Group Admin for the workspace and cloud connectionFABRIC_WORKSPACE_NAME
: Created workspace nameFABRIC_CAPACITY_NAME
: Created capacity nameFABRIC_ENVIRONMENT_NAME
: Created environment nameFABRIC_LAKEHOUSE_NAME
: Created lakehouse nameFABRIC_SHORTCUT_NAME
: Default is “sc-adls-main”FABRIC_CUSTOM_POOL_NAME
: Created custom pool name.FABRIC_CONNECTION_NAME
: Created ADLS Cloud Connection name.Modify the following values in the pipelines and commit it.
Setup the pipeline then run it.
Known issue:
Issues Closed or Referenced