Skip to content

DART calls influencing each other [MPICH 3.2] #461

Open
@stiefn

Description

@stiefn

I have some problems with DART calls. Here is a minimum example that results in the last DART call to be non-blocking on Unit 1, but blocking on Unit 0 which makes the program to never terminate:

#include <libdash.h>

int main(int argc, char* argv[]) {
  dash::init(&argc, &argv);

  std::vector<int> send1;
  std::vector<std::size_t> send1_count;
  std::vector<std::size_t> send1_displs;
  std::vector<int> recv1;
  std::vector<std::size_t> recv1_count;
  std::vector<std::size_t> recv1_displs;
  if(dash::myid() == 0) {
    send1.resize(5);
    send1_count = { 0, 20 };
    send1_displs = { 0, 0 };
    recv1 = { };
    recv1_count = { 0, 0 };
    recv1_displs = { 0, 0 };
  } else {
    send1 = { };
    send1_count = { 0, 0 };
    send1_displs = { 0, 0 };
    recv1.resize(5);
    recv1_count = { 20, 0 };
    recv1_displs = { 0, 20 };
  }
  dart_alltoallv(send1.data(), send1_count.data(), send1_displs.data(), 
      DART_TYPE_BYTE, recv1.data(), recv1_count.data(), recv1_displs.data(),
      dash::Team::All().dart_id());

  int send2 = 5;
  std::vector<std::size_t> recv2(2);
  dart_allgather(&send2, recv2.data(), sizeof(std::size_t), DART_TYPE_BYTE, 
      dash::Team::All().dart_id());

  std::vector<int> send3;
  std::vector<std::size_t> send3_count;
  std::vector<std::size_t> send3_displs;
  std::vector<int> recv3;
  std::vector<std::size_t> recv3_count;
  std::vector<std::size_t> recv3_displs;
  if(dash::myid() == 0) {
    send3 = { 2, 2, 4, 3, 4, 1, 2, 0 };
    send3_count = { 8, 0 };
    send3_displs = { 0, 8 };
    recv3.resize(9);
    recv3_count = { 8, 1 };
    recv3_displs = { 0, 8 };
  } else {
    send3 = { 6, 6, 7, 9, 5, 9, 6, 8 };
    send3_count = { 1, 7 };
    send3_displs = { 0, 1 };
    recv3.resize(7);
    recv3_count = { 0, 7 };
    recv3_displs = { 0, 0 };
  }
  dart_alltoallv(send3.data(), send3_count.data(), send3_displs.data(),
      DART_TYPE_INT, recv3.data(), recv3_count.data(), recv3_displs.data(), 
      dash::Team::All().dart_id());

  dash::finalize();
  return 0;
} 

Each of the calls run perfectly fine on their own, but in combination I am getting this weird behaviour.
Commenting out the second DART call also results in the following error for the third DART call:

Message from rank 1 to tag 10 truncated; 20 bytes received but buffer size is 4

This seems clearly wrong, as unit 1 should only be sending 4 bytes according to the code.

I am no MPI expert, so maybe my code is buggy but it seems odd that each of the calls run successfully on their own and only the combination creates these problems.

MPI version: MPICH 3.2

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions