Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airflow CaDeT Spike follow-up - Further Testing and Expansion #6512

Open
4 of 5 tasks
jhpyke opened this issue Jan 7, 2025 · 7 comments
Open
4 of 5 tasks

Airflow CaDeT Spike follow-up - Further Testing and Expansion #6512

jhpyke opened this issue Jan 7, 2025 · 7 comments
Assignees

Comments

@jhpyke
Copy link
Contributor

jhpyke commented Jan 7, 2025

In a followup to #5857 , we now look to expand the functionality of the Airflow CaDeT deployment to prove it can achieve parity with our existing deployment mechanisms. As such, we should look to expand the functionality being tested beyond the MVP.

Functionality to test:

  • Can we replicate the existing retry functionality, where the retry is a separate, follow-up task.
  • Can we use a DAG to trigger another DAG and deploy dependent tables (A deployment that waits for a previous deployment to finish)
  • Can we post the status of a workflow (on failure) to a slack channel per the existing Slack integrations in Github Actions.
  • Can we deploy on an hourly schedule as a minimum interval while maintaining stability.
  • Can we do a large deployment while maintaining stability (Check we won't hit memory limits with any of our deploys).

Once we've tested these, we'll be confident we can create a setup that maintains parity with our existing github actions deployments.

@jnayak-moj
Copy link

The work is in progress.
made changes to the airflow docker image for the spike to use a different repo other than main.
added retry functionality in the deploy steps in the image
used the latest docker image in the airflow dag
testing is in progress.

@jnayak-moj
Copy link

The first task retry functionality is added, testing is in progress. The tests are passing when there are no models to retry. Also skip logic is tested. Working on testing the scenarios for the failed models for retry.

@jnayak-moj
Copy link

jnayak-moj commented Jan 15, 2025

@jnayak-moj
Copy link

jnayak-moj commented Jan 15, 2025

regarding the 4th requirement on scheduling, I have tested the scheduling of the airflow DAG and its scheduling correctly as expected but the stability need to be tested with a long running deployment task.

Scheduled Job:
https://23f37892-d1d1-4d9f-a03d-b8a53581fd20.c0.eu-west-1.airflow.amazonaws.com/dags/development-sandpit.parent_dag/grid?search=development-sandpit.parent_dag&dag_run_id=scheduled__2025-01-15T15%3A20%3A00%2B00%3A00

@jnayak-moj
Copy link

Tested the posting of status of failed jobs to slack. This works fine.
https://mojdt.slack.com/archives/C01AQ5M4UMS/p1737372397140739

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🚀 In Progress
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants