Description
After looking at the implementations of dash::copy
today, I noticed a major flaw in the current implementation. In a nutshell, this is what dash::copy(T*, T*, GlobOutputIter)
does:
memcpy
the local portion of the transferput
the preceeding elementsput
the succeeding elements
Now, this will work fine for up to 3 units. However, experience from the last couple of years dictates that there well might be more than 3 units in the future so whenever a user tries to copy a local vector into parts of a dash::Array
spanning >=4 units a unicorn will die a gruel death.
My guess is that DASH assumes that DART is aware of the continuous address space, i.e., a gptr
is just the start of an arbitrary region that can potentially span over multiple units. The bad news is that DART communication operations are totally agnostic of this atm so the higher levels have to make sure not to write out-of-bounds of a single unit referenced in a gptr
. In essence, #398 addresses this problem for dash::transform
.
We could, however, give DART the notion of a continuous address space and let it handle multi-unit put
s and get
s rather easily. DART has the information on the size of each allocation on the individual units available and could thus nicely overlap remote transfers and (node-)local memcpy
s. It would also be a little more efficient since the meta-data queries would have to be done once instead of once for every individual target unit.
While I think this would be a worthwhile addition to DART, it also alters the semantics of it, shifting away from being a slim wrapper around MPI. But maybe this is how DASH already expects it to behave? The alternative would be to adapt dash::copy
to handle these cases (not sure what other DASH features are affected).
Please comment.