From 152f69bc4fa0f2e5bbfaa1bedd151792c7273215 Mon Sep 17 00:00:00 2001 From: Safoine El Khabich <34200873+safoinme@users.noreply.github.com> Date: Mon, 13 Nov 2023 12:59:50 +0100 Subject: [PATCH 01/22] add tests and assests --- .github/actions/nlp_template_test/action.yml | 3 +- README.md | 21 +- assets/deploy_pipeline.png | Bin 0 -> 41989 bytes assets/full_template.png | Bin 0 -> 439556 bytes assets/promote_pipeline.png | Bin 0 -> 49570 bytes assets/training_pipeline.png | Bin 0 -> 87882 bytes template/steps/deploying/save_model.py | 1 - tests/conftest.py | 102 ++++++++++ tests/test_template.py | 194 +++++++++++++++++++ 9 files changed, 307 insertions(+), 14 deletions(-) create mode 100644 assets/deploy_pipeline.png create mode 100644 assets/full_template.png create mode 100644 assets/promote_pipeline.png create mode 100644 assets/training_pipeline.png create mode 100644 tests/conftest.py create mode 100644 tests/test_template.py diff --git a/.github/actions/nlp_template_test/action.yml b/.github/actions/nlp_template_test/action.yml index 8b78d56..c3aed5a 100644 --- a/.github/actions/nlp_template_test/action.yml +++ b/.github/actions/nlp_template_test/action.yml @@ -69,13 +69,14 @@ runs: - name: Concatenate requirements shell: bash run: | - zenml integration export-requirements -o ./local_checkout/integration-requirements.txt sklearn mlflow s3 kubernetes kubeflow slack evidently + zenml integration export-requirements -o ./local_checkout/integration-requirements.txt mlflow s3 kubernetes kubeflow discord aws huggingface pytorch skypilot_aws cat ./local_checkout/requirements.txt ./local_checkout/test-requirements.txt ./local_checkout/integration-requirements.txt >> ./local_checkout/all-requirements.txt - name: Install requirements shell: bash run: | pip install -r ./local_checkout/all-requirements.txt + pip install accelerate - name: Run pytests shell: bash diff --git a/README.md b/README.md index bc47aca..b8a9a56 100644 --- a/README.md +++ b/README.md @@ -66,6 +66,10 @@ For more details, check the `README.md` file in the generated project directory. This NLP project template includes three main pipelines: +
+ +
+ ### Training Pipeline The training pipeline is designed to handle the end-to-end process of training an NLP model. It includes steps for data loading, tokenization, model training, and model registration. The pipeline is parameterized to allow for customization of the training process, such as sequence length, batch size, and learning rate. @@ -113,24 +117,17 @@ The training pipeline is the heart of the NLP project. It is responsible for pre The training pipeline is configured using the `{{product_name}}_training_pipeline` function, which includes steps for data loading, tokenization, model training, and model registration. The pipeline can be customized with parameters such as `lower_case`, `padding`, `max_seq_length`, and others to tailor the tokenization and training process to your specific NLP use case. -### Training Pipeline: Data and Tokenization +### Training Pipeline -[📂 Code folder](template/steps/data_tokenization/) +[📂 Code folder](template/steps/model_training/)- +
The first stage of the training pipeline involves loading the dataset and preparing it for the model. The `data_loader` step fetches the dataset, which is then passed to the `tokenizer_loader` and `tokenization_step` to convert the raw text data into a format suitable for the NLP model. Tokenization is a critical step in NLP pipelines, as it converts text into tokens that the model can understand. The tokenizer can be configured to handle case sensitivity, padding strategies, and sequence lengths, ensuring that the input data is consistent and optimized for training. -### Training Pipeline: Model Training - -[📂 Code folder](template/steps/model_training/) -- -
- Once the data is tokenized, the `model_trainer` step takes over to train the NLP model. This step utilizes the tokenized dataset and the tokenizer itself to fine-tune the model on the specific task, such as sentiment analysis, text classification, or named entity recognition. The model training step can be configured with parameters like `train_batch_size`, `eval_batch_size`, `num_epochs`, `learning_rate`, and `weight_decay` to control the training process. After training, the model is evaluated, and if it meets the quality criteria, it is registered in the model registry with a unique name. @@ -139,7 +136,7 @@ The model training step can be configured with parameters like `train_batch_size [📂 Code folder](template/steps/promotion/)- +
The promotion pipeline is responsible for promoting the best model to the chosen stage, such as Production or Staging. The pipeline can be configured to promote models based on metric comparison or simply promote the latest model version. @@ -150,7 +147,7 @@ The `{{product_name}}_promote_pipeline` function orchestrates the promotion proc [📂 Code folder](template/steps/deployment/)- +
The deployment pipeline handles the deployment of the model to various environments. It can be configured to deploy locally, to HuggingFace Hub, or to SkyPilot, depending on the project's needs. diff --git a/assets/deploy_pipeline.png b/assets/deploy_pipeline.png new file mode 100644 index 0000000000000000000000000000000000000000..1851642edb1d1ef89f0896af800091b8c1669da5 GIT binary patch literal 41989 zcmeFZ2T+six-c57C?cYwNL5^lf{IA*DgufW5ke0rs0bk-(h1F_pwgmCDbgY!EdfFe zogghLQlx|uAVQ=D2mz7+A>{vp>+G}d%$ZZ}{b%l+nf=bF$(QdfPp_}9@0b|y?-$t* z0)hB%-Mn!Z1llVO0_}Rry9apkRF859_-Egfn>OAc5Z@8*A0AM0sxa`7$NR3qbx={K z*evj6m&-NdYambwf^YNwZV)Id`PPkV76Cl-qXO?AJPeWE%&if9^yb-#>kfOb?Ugdj zeEa#GUS{}@*l>Nty-TK@%W^3fra+jpX78U<=MFphO`dz)Irr!5u)o5W(r@g1aQgL* zMf5mg&*&cq3Qkha#e^S_{YJX1VB}vr8zsov`SC3k$P?Y$ehcDCIgD{mDZ?PMkLR`BjJLhJ^FO
z0_(6C5*%j*mxK;zCJw)53bVA~I-@t9&%|5VTdn&$<;+tDXKRk-zT5cg+FjZBEX!eR
z+ hf|Z!eNH$5Jb#1YIHHR&b}_GR*Ry^n6g@xPpr @`E^98kw
zPxWGxkmZk03cy*)(O{p6$PgAo!wujX1&(WLNjJd5&%n$qT7Ya)s@Wjid|AMW0R9dH
z>gWjHbunlvCuu@4-6!d&hBh{kQSm<3NxI186#P@ClvS|5#Ont4woF@PoLKMcS?vLG
z(?c?<>9&KQH3+ojpaU^5h%)A&x~qSYAb;0ru(_ve@cp7lep4A6<7nS)X6z%oUY&`p
ztk0qLK)??(2Zp_2LP9Mm%8s
zm0LCE#M;_Yd-20fda}vtYT!pz0;(Odo?T))sTN`HGw~^gvrEgP9?z@9JJ~nE@18h$
zJXt3aWalo
Gxstu{bJ
z->e7e5NZZsRE>-~CgB8|3m)3X8RPHAS
;CGxNflMJKEYBfn);W$F
zA~XZ{RcNha#8IZYq-EoP0zsU>SA1PqUA{D^*K;(7!l6g12ngp0&_)iJW*-+@>#Xz1
zYOC*S=lnbEHt4mkJA?`y_uPqz5nlf$aubbZo?YKqz7@zX7_C#wfWz?Oi>YPL9lq0W
zy{T?~lrYZZF&FGFZEATR2p@B0WhibtAYSdq=Bfh5&KMPw4w~t7i-