Skip to content

envdes/CityBrainClimateData

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CityBrainClimateData

Introduction

This project develops data pipelines to streamline the process of extracting CESM data from AWS S3, transforming and loading data into in Citybrain.

The data pipeline includes the following steps:

  1. Download data from AWS: Use a bash script to download data from the specified S3 URI.
  2. Save parameters in JSON: Save relevant parameters in a JSON file.
  3. Data transformation: Convert the downloaded data from Zarr to Parquet and perform data transformations.
  4. Create table in Citybrian: Create a table in Citybrian using the Parquet files and the parameters in the JSON file.
  5. Quality Assurance (QA): Download sample data from Citybrian to check if the data pipeline has functioned correctly and assess data quality.

For example, CESM1 data pipeline: plot

How to run the data pipeline

In this project, Apache Airflow DAG is used to orchestrate the workflow.

Before running a data pipeline:

To run a data pipeline (DAG):

  1. In the DAG file, replace the default S3 URI with the S3 URI of the target data. For example, in cesm1/cesm1-dag.py, change 's3://ncar-cesm-lens/atm/daily/cesmLE-RCP85-QBOT.zarr' to 's3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLNS.zarr':

    DAG files (xx_dag_xx.py):

  2. Trigger the DAG execution from Airflow UI or from command line. For example, to trigger CESM1 DAG (cesm1_dag.py): airflow dags trigger cesm1_dag. The DAG will execute the tasks according to the defined task dependencies.

  3. Then monitor the progress and status of each task from Airflow UI or from command line.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published