Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Framework: AT is down #13027

Closed
jhux2 opened this issue May 20, 2024 · 8 comments
Closed

Framework: AT is down #13027

jhux2 opened this issue May 20, 2024 · 8 comments
Labels
autotester Issues related to the autotester. PA: Framework Issues that fall under the Trilinos Framework Product Area type: bug The primary issue is a bug in Trilinos code or tests

Comments

@jhux2
Copy link
Member

jhux2 commented May 20, 2024

Bug Report

There are a few PR's that are blocked from merging due to reported failures, but nothing shows up on the dashboard:

#13019
#13014

@trilinos/framework

@jhux2 jhux2 added the type: bug The primary issue is a bug in Trilinos code or tests label May 20, 2024
@jhux2 jhux2 changed the title PackageName: General Summary of the Bug Framework: AT down? May 20, 2024
@jhux2 jhux2 added PA: Framework Issues that fall under the Trilinos Framework Product Area autotester Issues related to the autotester. labels May 20, 2024
@cgcgcg
Copy link
Contributor

cgcgcg commented May 21, 2024

Could someone from @trilinos/framework acknowledge that there is an issue? Thanks!

@jmlapre
Copy link
Member

jmlapre commented May 21, 2024

Our Jenkins agents are currently crashing on the GPU machines. We've reported this to DICE and we hope to have this resolved soon.

@achauphan
Copy link
Contributor

achauphan commented May 22, 2024

The AutoTester should now be running as the rhel8 upgrade hit us with two issues:

  1. rhel8 upgrade on our GPU machines had the Java path previously used in their Jenkin's configurations changed unexpectedly.
  2. GenConfig and how it uses the SEMS modules in our PR scripts were NOT prepared for the rhel8 upgrade. Specifically, he config rhel7_sems-cuda-11.4.2-sems-gnu-10.1.0-sems-openmpi-4.0.5_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_no-uvm_deprecated-on_no-package-enables attempts to load the SEMS module sems-openmpi/4.0.5-cuda-11.4.2 which no longer exists since CUDA is now built into the module openmpi/4.0.5 in this new version of SEMS modules, so the configuration had issues finding that.

The second issue currently has a temporary fix in-place so that the AutoTester can actually run as the real fix is still being worked on and requires changes to the PR scripts that are in this repo (which would require AutoTester to pass to be implemented). The actual fix should help resolve #13022 as we would implement a rhel8 version of that config.

While the AutoTester should be working today, I will update this further when the desired fixed in place for that second issue.

@jhux2
Copy link
Member Author

jhux2 commented May 22, 2024

Thanks for the update, @achauphan.

@csiefer2
Copy link
Member

@achauphan @sebrowne This doesn't look good #13032

@achauphan
Copy link
Contributor

@achauphan @sebrowne This doesn't look good #13032

Responded in this comment

@jhux2
Copy link
Member Author

jhux2 commented May 23, 2024

@achauphan Some PRs have merged, but there are others, e.g., #13017, where there are no obvious failure reports (except for the new non-blocking PR test), but the AT reports that tests failed.

@jhux2
Copy link
Member Author

jhux2 commented Jun 5, 2024

fixed

@jhux2 jhux2 closed this as completed Jun 5, 2024
@jhux2 jhux2 unpinned this issue Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autotester Issues related to the autotester. PA: Framework Issues that fall under the Trilinos Framework Product Area type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

5 participants