Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeing failure in reduction tests on Perlmutter-CPU with nvidia #161

Closed
xylar opened this issue Nov 15, 2024 · 3 comments · Fixed by #187
Closed

Seeing failure in reduction tests on Perlmutter-CPU with nvidia #161

xylar opened this issue Nov 15, 2024 · 3 comments · Fixed by #187
Assignees
Labels
bug Something isn't working

Comments

@xylar
Copy link

xylar commented Nov 15, 2024

I just ran CTests on Perlmutter-CPU with nvidia and I'm seeing:

1: Global sum I4:    PASS (exp,act=2,2)
1: Global sum I8:    PASS (exp,act=4,4)
1: Global sum R4:    PASS (exp,act=6.000002,6.000002)
1: Global sum R8:    PASS (exp,act=8.000000000000201,8.000000000000201)
1: Global sum real:  PASS (exp,act=10.000002000000000,10.000002000000000)
1: Global sum A1DI4: PASS (exp,act=90,90)
1: Global sum A2DI4: PASS (exp,act=9900,9900)
1: Global sum A1DI8: PASS (exp,act=90,90)
1: Global sum A2DI8: PASS (exp,act=9900,9900)
1: Global sum A1DR4: PASS (exp,act=90.0001983643,90.0001983643)
1: Global sum A2DR4: PASS (exp,act=9900.098633,9900.098633)
1: Global sum A1DR8: PASS (exp,act=90.0000000000020,90.0000000000020)
1: Global sum A2DR8: PASS (exp,act=9900.0000000009859,9900.0000000009859)
1: Global min I4:    PASS (exp,act=0,0)
1: Global max I4:    PASS (exp,act=1,1)
1: Global min R8:    PASS (exp,act=4.0000000000001,4.0000000000001)
1: Global max R8:    PASS (exp,act=5.0000000000001,5.0000000000001)
1: Global min A1DI4: PASS
1: Global max A1DI4: FAIL
1: Global sum device A1DI4: PASS (exp,act=90,90)
1: Global sum device A2DI4: PASS (exp,act=9900,9900)
1: Global sum device A1DR4: PASS (exp,act=90.0001983643,90.0001983643)
0: Global sum I4:    PASS (exp,act=2,2)
0: Global sum I8:    PASS (exp,act=4,4)
0: Global sum R4:    PASS (exp,act=6.000002,6.000002)
0: Global sum R8:    PASS (exp,act=8.000000000000201,8.000000000000201)
0: Global sum real:  PASS (exp,act=10.000002000000000,10.000002000000000)
0: Global sum A1DI4: PASS (exp,act=90,90)
0: Global sum A2DI4: PASS (exp,act=9900,9900)
0: Global sum A1DI8: PASS (exp,act=90,90)
0: Global sum A2DI8: PASS (exp,act=9900,9900)
0: Global sum A1DR4: PASS (exp,act=90.0001983643,90.0001983643)
0: Global sum A2DR4: PASS (exp,act=9900.098633,9900.098633)
0: Global sum A1DR8: PASS (exp,act=90.0000000000020,90.0000000000020)
0: Global sum A2DR8: PASS (exp,act=9900.0000000009859,9900.0000000009859)
0: Global min I4:    PASS (exp,act=0,0)
0: Global max I4:    PASS (exp,act=1,1)
0: Global min R8:    PASS (exp,act=4.0000000000001,4.0000000000001)
0: Global max R8:    PASS (exp,act=5.0000000000001,5.0000000000001)
0: Global min A1DI4: PASS
0: Global max A1DI4: FAIL
0: Global sum device A1DI4: PASS (exp,act=90,90)
0: Global sum device A2DI4: PASS (exp,act=9900,9900)
0: Global sum device A1DR4: PASS (exp,act=90.0001983643,90.0001983643)
srun: error: nid004451: tasks 0-1: Exited with exit code 10
srun: Terminating StepId=32907910.24

Note: Global max A1DI4: FAIL on both cores.

All other tests are passing.

@xylar xylar added the bug Something isn't working label Nov 15, 2024
@amametjanov amametjanov self-assigned this Nov 15, 2024
@brian-oneill
Copy link

For the record, seeing the same error occur with nvidiagpu on pm-gpu

@xylar
Copy link
Author

xylar commented Dec 9, 2024

I'm pretty sure this has been fixed @amametjanov or @brian-oneill, can you point to what PR fixed it so we can close this?

@xylar
Copy link
Author

xylar commented Dec 9, 2024

Oh, apologies, maybe it hasn't been fixed. I was still seeing it when I last tested:
E3SM-Project/polaris#243

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants