Closed
Description
Background information
What version of the PMIx Reference Library are you using?
- OpenPMIx
v4.2
branch at 8cb6f58 - PRRTE
master
branch at openpmix/prrte@8d735f5 andv3.0
branch at openpmix/prrte@b29abde - Open MPI
main
branch at open-mpi/ompi@5575c86 (with both Fix Singletons and Singleton Spawn open-mpi/ompi#10688 and Fix the dpm to fwd IO to the spawning parent open-mpi/ompi#10695 cherry-picked into the branch since they have not been merged in yet)
Testing note on this comment
Describe how PMIx was installed
Built with Open MPI main
and manually adjusted the submodule pointers.
Please describe the system on which you are running
- Operating system/version: RHEL 8.4
- Computer hardware: ppc64le
- Network type: Single node shared memory
Details of the problem
Using PRRTE master
and OpenPMIx master
we have been able to get singleton MPI_Comm_spawn
working. However, if we move to OpenPMix v4.2
then it fails. See open-mpi/ompi#10688
The MPI tests I'm using are located here
shell$ mpirun --np 1 ./simple_spawn ./simple_spawn
Spawning './simple_spawn' ... OK
shell$ ./simple_spawn ./simple_spawn
[f5n18:3788975] PMIX ERROR: PROC-ENTRY-NOT-FOUND in file server/pmix_server.c at line 3588
[f5n18:3788914] pml_ucx.c:191 Error: Failed to receive UCX worker address: Take next option (-46)
[f5n18:3788914] OPAL ERROR: Error in file dpm/dpm.c at line 480
I noticed that if I force the hash
GDS component that it works correctly
shell$ export PMIX_MCA_gds=hash
shell$ ./simple_spawn ./simple_spawn
Spawning './simple_spawn' ... OK
So this seems to only impact the singleton spawn case, and is related to the GDS component (verbose output indicates that it is using ds21
)
Metadata
Metadata
Assignees
Labels
No labels