Skip to content

v4.2 GDS issue with singleton spawn #2705

Closed
@jjhursey

Description

@jjhursey

Background information

What version of the PMIx Reference Library are you using?

Testing note on this comment

Describe how PMIx was installed

Built with Open MPI main and manually adjusted the submodule pointers.

Please describe the system on which you are running

  • Operating system/version: RHEL 8.4
  • Computer hardware: ppc64le
  • Network type: Single node shared memory

Details of the problem

Using PRRTE master and OpenPMIx master we have been able to get singleton MPI_Comm_spawn working. However, if we move to OpenPMix v4.2 then it fails. See open-mpi/ompi#10688

The MPI tests I'm using are located here

shell$ mpirun --np 1 ./simple_spawn ./simple_spawn
Spawning './simple_spawn' ... OK
shell$ ./simple_spawn ./simple_spawn
[f5n18:3788975] PMIX ERROR: PROC-ENTRY-NOT-FOUND in file server/pmix_server.c at line 3588
[f5n18:3788914] pml_ucx.c:191  Error: Failed to receive UCX worker address: Take next option (-46)
[f5n18:3788914] OPAL ERROR: Error in file dpm/dpm.c at line 480

I noticed that if I force the hash GDS component that it works correctly

shell$ export PMIX_MCA_gds=hash
shell$ ./simple_spawn ./simple_spawn
Spawning './simple_spawn' ... OK

So this seems to only impact the singleton spawn case, and is related to the GDS component (verbose output indicates that it is using ds21)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions