Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

email notification not working #1483

Open
kinow opened this issue Dec 9, 2024 · 17 comments
Open

email notification not working #1483

kinow opened this issue Dec 9, 2024 · 17 comments
Assignees
Labels
bug Something isn't working discussion The issue is created to keep track a discussion

Comments

@kinow
Copy link
Member

kinow commented Dec 9, 2024

In GitLab by @lauherrero on Dec 9, 2024, 11:34

Hello @dbeltrankyl and @kinow,

Autosubmit Version

v4.1.9 at the hub

Expid affected(If applicable)

a85o

Which task has issues? Where is the log(If applicable)

  • Log_Path: log path /esarchive/autosubmit/a85o/tmp/ASLOGS/20241209_110131_run.log

Summary

I am trying to use the email notification following the steps described at autosubmit documentation email notifications.

In my case, I wrote the mail configuration at /esarchive/autosubmit/a85o/proj/git_project/conf/expdef.yml and modified the joblists (jobs_daily.yml, jobs_monthly.yml, jobs_traffic.yml; they are at the same folder as expdef) adding the variable: NOTIFY_ON: FAILED COMPLETED.

At /esarchive/autosubmit/a85o/conf/main.yml you can select which joblist you want to use.

My mail configuration looks like this (/esarchive/autosubmit/a85o/proj/git_project/conf/expdef.yml):

mail:
    # Enable mail notifications for remote_failures
    # Default:True
    NOTIFY_ON_REMOTE_FAIL: True
    # Enable mail notifications
    # Default: False
    NOTIFICATIONS: True
    # Mail address where notifications will be received
    TO:  laura.herrero@ bsc.es

Job example (/esarchive/autosubmit/a85o/proj/git_project/conf/jobs_daily.yml):

  DOWNLOAD_ENTSOE_unit:
    FILE: templates/download.sh,templates/configuration_files/ENTSOE_unit.yml
    PLATFORM: local
    FILE_TO_DOWNLOAD: ENTSOE_unit
    RUNNING: chunk
    DOWNLOAD_FLAG: "True"
    ORIGINAL_FILE_PATH_GENERAL: "%INPUTS.ENTSOE_UNIT.ORIGINAL_FILE_PATH.GENERAL%"
    ORIGINAL_FILE_PATH_SPAIN: "%INPUTS.ENTSOE_UNIT.ORIGINAL_FILE_PATH.SPAIN%"
    DOWNLOAD_DELAY: "%INPUTS.ENTSOE_UNIT.DELAY%"
    NOTIFY_ON: FAILED COMPLETED

I also tried this by adding the mail configuration at a new autosubmit_cxxx.yml file, but it did not work.

Relevant logs and/or screenshots(if applicable)

2024-12-09 11:03:07,912 Successful check job command: nohup kill -0 644286 > /dev/null 2>&1; echo $?
2024-12-09 11:03:07,912 Job a85o_2024120900_fc0_1_DOWNLOAD_ENTSOE_UNIT is RUNNING
2024-12-09 11:03:08,224 Traceback (most recent call last):
  File "/shared/earth/easybuild/rocky/8.4/x86_64/software/autosubmit/4.1.9-foss-2021b-Python-3.9.6/lib/python3.9/site-packages/autosubmit-4.1.9-py3.9.egg/EGG-INFO/scripts/autosubmit", line 59, in main
    return_value = Autosubmit.parse_args()
  File "/shared/earth/easybuild/rocky/8.4/x86_64/software/autosubmit/4.1.9-foss-2021b-Python-3.9.6/lib/python3.9/site-packages/autosubmit-4.1.9-py3.9.egg/autosubmit/autosubmit.py", line 707, in parse_args
    return Autosubmit.run_experiment(args.expid, args.notransitive,args.start_time,args.start_after, args.run_only_members, args.profile)
  File "/shared/earth/easybuild/rocky/8.4/x86_64/software/autosubmit/4.1.9-foss-2021b-Python-3.9.6/lib/python3.9/site-packages/autosubmit-4.1.9-py3.9.egg/autosubmit/autosubmit.py", line 2222, in run_experiment
    Autosubmit.job_notify(as_conf,expid,job,job_prev_status,job_changes_tracker)
  File "/shared/earth/easybuild/rocky/8.4/x86_64/software/autosubmit/4.1.9-foss-2021b-Python-3.9.6/lib/python3.9/site-packages/autosubmit-4.1.9-py3.9.egg/autosubmit/autosubmit.py", line 1793, in job_notify
    if Status.VALUE_TO_KEY[job.status] in job.notify_on:
AttributeError: 'Job' object has no attribute 'notify_on'

[CRITICAL] 2024-12-09 11:03:08,226 A not admitted configuration or error in the code has happened: 'Job' object has no attribute 'notify_on'.
 Please report it to Autosubmit Developers through Git
2024-12-09 11:03:08,226 More info at https://autosubmit.readthedocs.io/en/master/troubleshooting/error-codes.html

Any other relevant information(if applicable)

()
cc @ainagaya

@kinow
Copy link
Member Author

kinow commented Dec 9, 2024

Hi @lauherrero I think this is a duplicate issue. I had the same problem, and fixed it by adding a configuration under each JOB... give me a sec to find it, I think the person who first had it was @apuiggro ....

@kinow
Copy link
Member Author

kinow commented Dec 9, 2024

Here you go, https://earth.bsc.es/gitlab/es/autosubmit/-/issues/1468

Try adding NOTIFY_ON: RUNNING or another status under each job, then try running your workflow again.

@kinow
Copy link
Member Author

kinow commented Dec 9, 2024

marked this issue as related to #1468

@kinow
Copy link
Member Author

kinow commented Dec 9, 2024

In GitLab by @lauherrero on Dec 9, 2024, 12:31

Hi @kinow,

I tried it with this joblist, but the error persists:

JOBS:
  DOWNLOAD_ENTSOE_unit:
    FILE: templates/download.sh,templates/configuration_files/ENTSOE_unit.yml
    PLATFORM: local
    FILE_TO_DOWNLOAD: ENTSOE_unit
    RUNNING: chunk
    DOWNLOAD_FLAG: "True"
    ORIGINAL_FILE_PATH_GENERAL: "%INPUTS.ENTSOE_UNIT.ORIGINAL_FILE_PATH.GENERAL%"
    ORIGINAL_FILE_PATH_SPAIN: "%INPUTS.ENTSOE_UNIT.ORIGINAL_FILE_PATH.SPAIN%"
    DOWNLOAD_DELAY: "%INPUTS.ENTSOE_UNIT.DELAY%"
    NOTIFY_ON: RUNNING

Is it okay?

Thank you :)

@kinow
Copy link
Member Author

kinow commented Dec 9, 2024

Hi @lauherrero !

Strange, your change looks correct, and in theory it was supposed to work and then you'd get a notification whenever the job entered the RUNNING state.

Do you know if I can clone your experiment and run it? Asking as I am not sure if it won't erase/write some data somewhere, and in case that could cause any trouble.

If I can do that, tomorrow morning I can debug it on the hub to see what's going on.

In the meantime, the only thing I can think of is deleting the files before recreating the experiment (delete files under pkl, delete folder conf/metadata, and run autosubmit create -f ....), to ensure Autosubmit is really reading the latest config you have (which I think it's doing, as your conf/metadata/experiment...yml file contains the NOTIFY_ON: RUNNING configuration...)

Thanks!

@kinow
Copy link
Member Author

kinow commented Dec 9, 2024

In GitLab by @dbeltrankyl on Dec 9, 2024, 15:45

Hello

I think (didn't test but looking at the changes in the code ) you have to put it in each job ( sorry for that, just looked at #1468) not only the one you want

NOTIFY_ON: "RUNNING"

Rest of jobs

NOTIFY_ON: ""

Also, may be worth to check the spam folder

@kinow
Copy link
Member Author

kinow commented Dec 10, 2024

In GitLab by @lauherrero on Dec 10, 2024, 11:00

Hi @kinow @dbeltrankyl,

I ran the experiment a85o with this joblist (only one job), but it did not work:

JOBS:
  DOWNLOAD_ENTSOE_unit:
    FILE: templates/download.sh,templates/configuration_files/ENTSOE_unit.yml
    PLATFORM: local
    FILE_TO_DOWNLOAD: ENTSOE_unit
    RUNNING: chunk
    DOWNLOAD_FLAG: "True"
    ORIGINAL_FILE_PATH_GENERAL: "%INPUTS.ENTSOE_UNIT.ORIGINAL_FILE_PATH.GENERAL%"
    ORIGINAL_FILE_PATH_SPAIN: "%INPUTS.ENTSOE_UNIT.ORIGINAL_FILE_PATH.SPAIN%"
    DOWNLOAD_DELAY: "%INPUTS.ENTSOE_UNIT.DELAY%"
    NOTIFY_ON: "RUNNING"

The experiment does not run, raises this error and stops:

2024-12-10 10:50:36,837 Successful check job command: nohup kill -0 1511070 > /dev/null 2>&1; echo $?
2024-12-10 10:50:36,837 Job a85o_2024120900_fc0_1_DOWNLOAD_ENTSOE_UNIT is RUNNING
2024-12-10 10:50:37,125 Traceback (most recent call last):
  File "/shared/earth/easybuild/rocky/8.4/x86_64/software/autosubmit/4.1.9-foss-2021b-Python-3.9.6/lib/python3.9/site-packages/autosubmit-4.1.9-py3.9.egg/EGG-INFO/scripts/autosubmit", line 59, in main
    return_value = Autosubmit.parse_args()
  File "/shared/earth/easybuild/rocky/8.4/x86_64/software/autosubmit/4.1.9-foss-2021b-Python-3.9.6/lib/python3.9/site-packages/autosubmit-4.1.9-py3.9.egg/autosubmit/autosubmit.py", line 707, in parse_args
    return Autosubmit.run_experiment(args.expid, args.notransitive,args.start_time,args.start_after, args.run_only_members, args.profile)
  File "/shared/earth/easybuild/rocky/8.4/x86_64/software/autosubmit/4.1.9-foss-2021b-Python-3.9.6/lib/python3.9/site-packages/autosubmit-4.1.9-py3.9.egg/autosubmit/autosubmit.py", line 2222, in run_experiment
    Autosubmit.job_notify(as_conf,expid,job,job_prev_status,job_changes_tracker)
  File "/shared/earth/easybuild/rocky/8.4/x86_64/software/autosubmit/4.1.9-foss-2021b-Python-3.9.6/lib/python3.9/site-packages/autosubmit-4.1.9-py3.9.egg/autosubmit/autosubmit.py", line 1793, in job_notify
    if Status.VALUE_TO_KEY[job.status] in job.notify_on:
AttributeError: 'Job' object has no attribute 'notify_on'

[CRITICAL] 2024-12-10 10:50:37,127 A not admitted configuration or error in the code has happened: 'Job' object has no attribute 'notify_on'.
 Please report it to Autosubmit Developers through Git
2024-12-10 10:50:37,127 More info at https://autosubmit.readthedocs.io/en/master/troubleshooting/error-codes.html

In the meantime, the only thing I can think of is deleting the files before recreating the experiment (delete files under pkl, delete folder conf/metadata, and run autosubmit create -f ....), to ensure Autosubmit is really reading the latest config you have (which I think it's doing, as your conf/metadata/experiment...yml file contains the NOTIFY_ON: RUNNING configuration...)

I tried this also, but the error persists :(

You can copy the experiment if you want, but I think the GitLab repository is private.

Thank you very much <3

@kinow
Copy link
Member Author

kinow commented Dec 10, 2024

Hi Laura! I checked your experiment configuration and logs, and I think you did everything correctly. Time to debug it 💪

[bdepaula@ bsceshub02 conf]$ module load autosubmit/4.1.11-foss-2021b-Python-3.9.6
[bdepaula@ bsceshub02 conf]$ autosubmit expid -y a85o -d "Copy from Laura" -H marenostrum5
Autosubmit is running with 4.1.11
The new experiment "a860" has been registered.
Generating folder structure...
Experiment folder: /esarchive/autosubmit/a860
Generating config files...
Experiment a860 created
[bdepaula@ bsceshub02 conf]$ cd /esarchive/autosubmit/a860
[bdepaula@ bsceshub02 conf]$ cp ../../a85o/conf/main.yml .
[bdepaula@ bsceshub02 conf]$ sed -i "s/USER: lherrero/USER: bdepaula/" /esarchive/autosubmit/a860/conf/main.yml 
[bdepaula@ bsceshub02 conf]$ sed -i "s/USER: asdf/USER: bsc032841/" /esarchive/autosubmit/a860/conf/main.yml
[bdepaula@ bsceshub02 conf]$ cat main.yml 
GENERAL:
  # daily, monthly or traffic
  JOBLIST: daily

  PLATFORMS:
    MARENOSTRUM5:
      USER: bsc032841
    HUB:
      USER: bdepaula

  DATELIST: 20241209

I think that gave me a valid experiment with similar configuration to yours.

[bdepaula@ bsceshub02 conf]$ autosubmit create -np a860
Yaml file /esarchive/autosubmit/a860/proj/git_project/conf/bootstrap not found
Autosubmit is running with 4.1.11
Preparing .lock file to avoid multiple instances with same expid.
[WARNING] Platform MARENOSTRUM5 not found in configuration file
Cloning 19-send-email https://earth.bsc.es/gitlab/nextgeneu/internal/auto-phenomena into /esarchive/autosubmit/a860/proj
Cloning into 'git_project'...
...
...
...
Adding dependencies to the graph..
Adding dependencies to the job..
Transitive reduction...
Looking for edgeless jobs...

Saving the jobs list...

Job list created successfully
[WARNING] Remember to MODIFY the MODEL config files!

And there are plenty of NOTIFY_ON there, but it seems to be different than what you have, @lauherrero . The git configuration I got has more jobs. Did you manually edit your list of jobs to make the test simpler?

@kinow
Copy link
Member Author

kinow commented Dec 10, 2024

In GitLab by @lauherrero on Dec 10, 2024, 11:32

Hi Bruno!

I was using autosubmit version 4.1.9, so I tested it with 4.1.11 and I got the same error but after the retrials (maybe this information is useful):

2024-12-10 11:25:23,689 Successful check job command: nohup kill -0 1622129 > /dev/null 2>&1; echo $?
2024-12-10 11:25:23,689 a85o_2024120900_fc0_1_DOWNLOAD_ENTSOE_UNIT job seems to have completed: checking...
2024-12-10 11:25:23,690 /esarchive/autosubmit/a85o/tmp/LOG_a85o/a85o_2024120900_fc0_1_DOWNLOAD_ENTSOE_UNIT_COMPLETED File does not exist.. waiting 5s for a new retry (retries left: 3)
2024-12-10 11:25:28,697 /esarchive/autosubmit/a85o/tmp/LOG_a85o/a85o_2024120900_fc0_1_DOWNLOAD_ENTSOE_UNIT_COMPLETED File does not exist.. waiting 10s for a new retry (retries left: 2)
2024-12-10 11:25:38,707 /esarchive/autosubmit/a85o/tmp/LOG_a85o/a85o_2024120900_fc0_1_DOWNLOAD_ENTSOE_UNIT_COMPLETED File does not exist.. waiting 15s for a new retry (retries left: 1)
[ERROR] 2024-12-10 11:25:53,723 Job a85o_2024120900_fc0_1_DOWNLOAD_ENTSOE_UNIT completion check failed. There is no COMPLETED file[eCode=6009]
[WARNING] 2024-12-10 11:25:53,724 Job a85o_2024120900_fc0_1_DOWNLOAD_ENTSOE_UNIT is FAILED. Checking completed files to confirm the failure...[eCode=3000]
2024-12-10 11:25:53,725 /esarchive/autosubmit/a85o/tmp/LOG_a85o/a85o_2024120900_fc0_1_DOWNLOAD_ENTSOE_UNIT_COMPLETED File does not exist.. waiting 5s for a new retry (retries left: 3)
2024-12-10 11:25:58,731 /esarchive/autosubmit/a85o/tmp/LOG_a85o/a85o_2024120900_fc0_1_DOWNLOAD_ENTSOE_UNIT_COMPLETED File does not exist.. waiting 10s for a new retry (retries left: 2)
2024-12-10 11:26:08,732 /esarchive/autosubmit/a85o/tmp/LOG_a85o/a85o_2024120900_fc0_1_DOWNLOAD_ENTSOE_UNIT_COMPLETED File does not exist.. waiting 15s for a new retry (retries left: 1)
[ERROR] 2024-12-10 11:26:23,736 Job a85o_2024120900_fc0_1_DOWNLOAD_ENTSOE_UNIT completion check failed. There is no COMPLETED file[eCode=6009]
2024-12-10 11:26:24,154 Traceback (most recent call last):
  File "/shared/earth/easybuild/rocky/8.4/x86_64/software/autosubmit/4.1.11-foss-2021b-Python-3.9.6/bin/autosubmit", line 36, in main
    return_value = Autosubmit.parse_args()
  File "/shared/earth/easybuild/rocky/8.4/x86_64/software/autosubmit/4.1.11-foss-2021b-Python-3.9.6/lib/python3.9/site-packages/autosubmit/autosubmit.py", line 711, in parse_args
    return Autosubmit.run_experiment(args.expid, args.notransitive,args.start_time,args.start_after, args.run_only_members, args.profile)
  File "/shared/earth/easybuild/rocky/8.4/x86_64/software/autosubmit/4.1.11-foss-2021b-Python-3.9.6/lib/python3.9/site-packages/autosubmit/autosubmit.py", line 2264, in run_experiment
    Autosubmit.job_notify(as_conf,expid,job,job_prev_status,job_changes_tracker)
  File "/shared/earth/easybuild/rocky/8.4/x86_64/software/autosubmit/4.1.11-foss-2021b-Python-3.9.6/lib/python3.9/site-packages/autosubmit/autosubmit.py", line 1837, in job_notify
    if Status.VALUE_TO_KEY[job.status] in job.notify_on:
AttributeError: 'Job' object has no attribute 'notify_on'

[CRITICAL] 2024-12-10 11:26:24,155 Unexpected error: 'Job' object has no attribute 'notify_on'.
 Please report it to Autosubmit Developers through Git
2024-12-10 11:26:24,155 More info at https://autosubmit.readthedocs.io/en/master/troubleshooting/error-codes.html

I modify my joblists manually in my experiment to test, but if you need it I can create a branch on the repository with one job only.

Thank you very much @kinow :D

@kinow
Copy link
Member Author

kinow commented Dec 10, 2024

Hi Laura, good to know 4.1.11 gave the same error. I will modify the joblist to have just one job, and will launch it now then. Thanks!

@kinow
Copy link
Member Author

kinow commented Dec 10, 2024

I don't seem to be able to connect to the bdepaula@ bsceshub02 anymore. Going to try that again later today, or tomorrow.

@kinow
Copy link
Member Author

kinow commented Dec 10, 2024

In GitLab by @dbeltrankyl on Dec 10, 2024, 13:52

I can have a look @kinow , I think that the review of !531 should have more priority for you.

Edit: I'll try to reproduce the issue from my laptop

@kinow
Copy link
Member Author

kinow commented Dec 10, 2024

In GitLab by @dbeltrankyl on Dec 10, 2024, 14:29

I have reproduced the issue on master

Basically, the attribute notify_on, has disappeared from the code so it won't work

I'll open a merge request to add re-add it

@kinow
Copy link
Member Author

kinow commented Dec 10, 2024

In GitLab by @dbeltrankyl on Dec 10, 2024, 15:08

Waiting for hubs to be up again so I can use the automail service, but there is a potential fix https://earth.bsc.es/gitlab/es/autosubmit/-/merge_requests/532

I think it is missing the normalization of the status aside of docs and pytests

@kinow
Copy link
Member Author

kinow commented Dec 11, 2024

Thanks Dani!

@lauherrero
Copy link
Contributor

Hi Dani and Bruno!

What is the status of this?

Thank you :)
@kinow @dbeltrankyl

@kinow
Copy link
Member Author

kinow commented Jan 10, 2025

HI @lauherrero !

Thanks for bumping this issue. I think this is now fixed on master and in the 4.1.12 dev version (deployed to ClimateDT, not to hubs yet).

I believe this was fixed in #2020

Just needs to be tested before we close this issue now. Maybe we can install the dev module in the hubs, and that way you could test it with one of your experiments and let us know if that works alright?

@kinow kinow added this to the Autosubmit 4.1.12 milestone Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working discussion The issue is created to keep track a discussion
Projects
None yet
Development

No branches or pull requests

3 participants