Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI_Gatherv/MPI_Scatterv displacements overflow in frame/collect_on_comm.c #2156

Open
benkirk opened this issue Jan 17, 2025 · 2 comments · May be fixed by #2157
Open

MPI_Gatherv/MPI_Scatterv displacements overflow in frame/collect_on_comm.c #2156

benkirk opened this issue Jan 17, 2025 · 2 comments · May be fixed by #2157
Assignees

Comments

@benkirk
Copy link

benkirk commented Jan 17, 2025

Describe the bug

The functions col_on_comm() & dst_on_comm() in frame/collect_on_comm.c use MPI_CHAR as the underlying datatype in MPI_{Gather,Scattter}v operations. This means the required displacements, displace[], are in terms of bytes. For large problems, and large local communicators, this can cause overflow in the displacement offsets, which manifests in MPI communication failure. Typically with a very obtuse error message.

This seems to occur more frequently with large local communicators, typical of high-core-count nodes.

To Reproduce

We have boiled this down to a 6-rank example that is available on NSF NCAR/Derecho at /glade/work/negins/consulting/RC-26919/high-res, with a PR to be submitted with a proposed fix.

Expected behavior

  1. These routines should be modified to use proper MPI data types for a given *typesize so the displacements are smaller (elements instead of bytes),
  2. These routines should perform error checking for overflow.

Additional context

Related to #1333
We think this is also the underlying issue with https://forum.mmm.ucar.edu/threads/cxil_map-write-error-with-real-exe.19321/

@benkirk
Copy link
Author

benkirk commented Jan 17, 2025

@negin513 I created this issue and pointed to your glade/work/negins/consulting/RC-26919/high-res repeater.

PR with proposed fix coming.

@benkirk
Copy link
Author

benkirk commented Jan 17, 2025

On Cray-EX systems under cray-mpich this issue can manifest as a cxil_map: write error emanating from MPI_Scatterv
I was able to obtain similar stack traces with open-source OpenMPI and MPICH. In all cases the issue was passing a negative value in the displace[] array. I'm a little surprised none of the MPIs complained about that....

cxil_map: write error
MPICH ERROR [Rank 0] [job id 346dad44-5f6b-40bf-be06-4964c591f68f] [Thu Jul 25 12:08:12 2024] [dec1252] - Abort(472504335) (rank 0 in comm 0): Fatal error in PMPI_Scatterv: Other MPI error, error stack:
PMPI_Scatterv(416)..........: MPI_Scatterv(sbuf=0x14e3d8082020, scnts=0xb2830c0, displs=0xb282cb0, MPI_CHAR, rbuf=0x7ffe51e54bc0, rcount=8448000, MPI_CHAR, root=0, comm=comm=0xc4000000) failed
MPIR_CRAY_Scatterv(462).....: 
MPIC_Isend(511).............: 
MPID_Isend_coll(610)........: 
MPIDI_isend_coll_unsafe(176): 
MPIDI_OFI_send_normal(368)..: OFI tagged senddata failed (ofi_send.h:368:MPIDI_OFI_send_normal:Invalid argument) 

Here's a full stack trace from mpich-3.4.3 before the fix:

taskid: 0 hostname: casper53
 module_io_quilt_old.F        2931 T
 Ntasks in X            2 , ntasks in Y            3
Domain # 1: dx =  1000.000 m
REAL_EM V4.5.2 PREPROCESSOR
git commit a8eb846859cb39d0acfd1d3297ea9992ce66424a
 *************************************
 Parent domain
 ids,ide,jds,jde            1        2560           1        2000
 ims,ime,jms,jme           -4        1287          -4         674
 ips,ipe,jps,jpe            1        1280           1         667
 *************************************
DYNAMICS OPTION: Eulerian Mass Coordinate
 alloc_space_field: domain            1 ,           30665188100  bytes allocated
 Yes, this special data is acceptable to use: OUTPUT FROM METGRID V4.5
 Input data is acceptable to use: met_em.d01.2010-02-19_00:00:00.nc
 metgrid input_wrf.F first_date_input = 2010-02-19_00:00:00
 metgrid input_wrf.F first_date_nml = 2010-02-19_00:00:00

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x146206fa2640 in ???
#1  0x146206fa1873 in ???
#2  0x146205e868ff in ???
#3  0x146205fc6759 in ???
#4  0x146206524e0b in MPIR_Typerep_copy
	at src/mpi/datatype/typerep/src/typerep_yaksa_pack.c:30
#5  0x1462065c8121 in MPIDI_POSIX_eager_send
	at src/mpid/ch4/shm/posix/eager/iqueue/iqueue_send.h:102
#6  0x1462065c6051 in MPIDI_POSIX_eager_send
	at ./src/mpid/ch4/shm/posix/eager/include/posix_eager_impl.h:16
#7  0x1462065c6051 in progress_send
	at src/mpid/ch4/shm/posix/posix_progress.c:132
#8  0x1462065c6051 in MPIDI_POSIX_progress
	at src/mpid/ch4/shm/posix/posix_progress.c:174
#9  0x14620657b498 in progress_test
	at src/mpid/ch4/src/ch4_progress.c:93
#10  0x14620657b7e1 in MPID_Progress_wait
	at src/mpid/ch4/src/ch4_progress.c:228
#11  0x14620642ccb4 in MPIR_Waitall_state
	at src/mpi/request/waitall.c:41
#12  0x14620642d392 in MPID_Waitall
	at ./src/mpid/ch4/src/ch4_wait.h:135
#13  0x14620642d392 in MPIR_Waitall
	at src/mpi/request/waitall.c:163
#14  0x146206510ad3 in MPIC_Waitall
	at src/mpi/coll/helper_fns.c:638
#15  0x1462064adc93 in MPIR_Scatterv_allcomm_linear
	at src/mpi/coll/scatterv/scatterv_allcomm_linear.c:69
#16  0x1462063401da in MPIR_Scatterv_impl
	at src/mpi/coll/scatterv/scatterv.c:142
#17  0x1462063403e3 in MPIDI_NM_mpi_scatterv
	at ./src/mpid/ch4/netmod/include/../ofi/ofi_coll.h:202
#18  0x1462063403e3 in MPIDI_Scatterv_intra_composition_alpha
	at ./src/mpid/ch4/src/ch4_coll_impl.h:768
#19  0x1462063403e3 in MPID_Scatterv
	at ./src/mpid/ch4/src/ch4_coll.h:360
#20  0x1462063403e3 in MPIR_Scatterv
	at src/mpi/coll/scatterv/scatterv.c:190
#21  0x146206340aa7 in PMPI_Scatterv
	at src/mpi/coll/scatterv/scatterv.c:380
#22  0xe3b1c0 in dst_on_comm
	at /container/WRF/frame/collect_on_comm.c:183
#23  0xe3b1c0 in dist_on_comm0_
	at /container/WRF/frame/collect_on_comm.c:141
#24  0xad35ae in wrf_global_to_patch_generic_
	at /container/WRF/frame/module_dm.f90:7658
#25  0xadccfb in wrf_global_to_patch_real_
	at /container/WRF/frame/module_dm.f90:7489
#26  0xa5f168 in call_pkg_and_dist_generic_
	at /container/WRF/frame/module_io.f90:23332
#27  0xa6043e in call_pkg_and_dist_real_
	at /container/WRF/frame/module_io.f90:22786
#28  0xa5c6f1 in call_pkg_and_dist_
	at /container/WRF/frame/module_io.f90:22695
#29  0xa5c2eb in wrf_read_field1_
	at /container/WRF/frame/module_io.f90:21300
#30  0xa60f06 in wrf_read_field_
	at /container/WRF/frame/module_io.f90:21090
#31  0x235d54e in wrf_ext_read_field_
	at /container/WRF/share/wrf_ext_read_field.f90:145
#32  0x1becb50 in input_wrf_
	at /container/WRF/share/input_wrf.f90:1655
#33  0x1b52cd8 in __module_io_domain_MOD_input_auxinput1
	at /container/WRF/share/module_io_domain.f90:678
#34  0x407385 in med_sidata_input_
	at /container/WRF/main/real_em.f90:417
#35  0x432bc3 in real_data
	at /container/WRF/main/real_em.f90:137
#36  0x432caf in main
	at /container/WRF/main/real_em.f90:5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants