[WIP] Introducing execution- and launch policies #300

fuchsto · 2017-02-24T00:33:59Z

Addresses issues #272 #104 #216

fuchsto · 2017-02-24T00:46:07Z

Related to launch policies: Task concept specification (see #193)

devreal

Thanks for getting the ball rolling. Questions inline.

devreal · 2017-02-24T08:11:10Z

dash/include/dash/Execution.h

+  /**
+   * Parallel non-sequential execution policy.
+   */
+  class parallel_unsequenced_policy { };


What is the difference in semantics between parallel_policy and parallel_unsequenced_policy?

This is well described on cppreference.

devreal · 2017-02-24T08:11:43Z

dash/include/dash/algorithm/Copy.h

-  InputIt  in_last,
-  OutputIt out_first);
+dash::Future<ValueType *> copy(
+  ExecutionPolicy && policy,


The execution policy is ignored atm, right?

Yes, I just pinned the branch as PR to simplify discussion. It's work in progress (= [WIP] in title).

rkowalewski

Thanks. It perfectly matches the C++17 Specs.

fuchsto · 2017-02-25T09:56:10Z

I think I got the required concepts figured out.
This summary is mostly relevant to @devreal @fmoessbauer @fuerlinger

Execution Policies

In C++17, execution policies are introduced in algorithm function interfaces and passed as first argument:

template< class ExecutionPolicy, class InputIt, class UnaryFunction2 >
void for_each( ExecutionPolicy&& policy, InputIt first, InputIt last, UnaryFunction2 f );

It is therefore trivial to specialize / overload algorithm interfaces for execution policies.
From the code-mechanics perspective, we can comfortably extend the STL execution policy concept.
Introducing additional traits and some CRTP adapters should do.

The actual effort is in rather formal aspects. Which is nice, because science.

In principle, this boils down to additional PGAS-specific dimensions for execution policies.
Some examples:

collaboration, such as collective, asymmetric, leader
synchronization such as nowait (as known from OpenMP, would skip implicit barriers)
unit-level parallelization, the vanilla STL execution policies, e.g. parallel would allow units to spawn OpenMP threads

... you get the idea. Policies can be arbitrarily combined with foo_policy | bar_policy as usual.

Launch Policies

STL concepts do not cover our requirements regarding launch policies.
In vanilla C++, asynchronous and deferred execution is provided by the std::launch and std::async etc. function interfaces and algorithms are agnostic of launch policies:

template< class Function, class... Args >
std::future<typename std::result_of<Function(Args...)>::type>
    async( std::launch policy, Function&& fun, Args&&... args );

Parameter fun is the actual callable object.

Our algorithm variants depend on launch policies, however. For example, asynchronous copying dash::copy_async and blocking copying dash::copy are independent interfaces, and I don't think there is practicable alternative that is correct and efficient in the general case.
Implementing blocking and deferred variants as wrappers of a single asynchronous implementation seems sensible, but in fact it hurts performance pretty badly.

I sketched out some alternatives, like:

Grouping algorithms by launch policy in namespaces, like dash::async::copy. Does not allow combination of policies, so it's not an option.
Meta-magic: dash::async could just replace callables depending on the launch policy. For example, passing dash::copy with dash::launch::async would invoke dash::copy_async instead. Nearly impossible to maintain and extremely irritating as the executed task is different from the specified callable.

So far, my initial approach based on phantom- and container proxy types looks like the best option to me. Instead of namespaces or function template specializations, algorithm variants are realized as overloads for iterator traits.

An iterator for async operations would have dash::iterator_traits<It>::async_policy_tag specified.
Containers like dash::Array use adapters or proxies to convert iterators between policies.
This is already implemented in dash::Array<T>::async and allows usage like this:

dash::Array<X> array(size);

// Default:
auto copy_end = dash::copy(array.begin(), array.end(), dest);
// -> dash::Array<X>::iterator

// Async:
auto copy_end = dash::copy(array.async.begin(), array.async.end(), dest);
// current implementation:
// -> dash::GlobAsyncIter<T>
// will be:
// -> dash::Future<dash::Array<X>::iterator>

// Lazy:
auto copy_end = dash::copy(array.deferred.begin(), array.deferred.end(), dest);
// -> dash::Array<X>::iterator

This is very close to the STL specs and there is no way to get it wrong for developers.
No maintenance effort in algorithms at all, no coupling of concerns, much rejoice.

But also, it is important to note that we now can actually define dash::async and dash::launch like in STL, and replace iterator parameters automatically (but, different from replacing the callable parameter as described above, with robust semantics):

// Developer writes:
auto fut_copy_end = dash::async(
             dash::launch::async,
             dash::copy<Foo>,
             array.begin(),
             array.end());
// --> resolves to:
auto fut_copy_end = dash::async(
             dash::launch::async,
             dash::copy<Foo>,
             array.begin().async,  // same as array.async.begin()
             array.end().async);   // same as array.async.end()

This is semantically and conceptually robust:

GlobIter<T, ..., dash::launch::async> satisfies GlobIter<T, ..., dash::launch::any> and
The developer specified the launch policy explicitly, so the principle of least astonishment is respected, the behavior and signature corresponds to std::async.

So long story short: There is a chance we can have the cake and eat it, too.

In the current implementation in branch development, array.async is a fully specialized proxy accessor with dash::ArrayAsyncRef<T>::iterator = GlobAsyncIter<T>.
I added those as proof of concept., actually these specializations are not necessary for every policy, we can use traits as well.
This is preferrable as we otherwise had to define a fully specialized type for any combination of launch policies, and adding a policy would be the opposite of fun.

Looking forward to your thoughts!

devreal · 2017-02-28T07:56:53Z

Thanks for this brain dump! I think I get the idea.

On Execution Policies
I think this is clear and it helps to get rid of some of the confusion we currently have, e.g., which unit receives the results of a dash::accumulate call. One thing to note, though:

unit-level parallelization, the vanilla STL execution policies, e.g. parallel would allow units to spawn OpenMP threads

Keep in mind that DART will eventually have it's own thread-pool independent of OpenMP. OpenMP is just too restrictive to use it as a general-purpose task scheduler (unfortunately). Executing tasks in DART and OpenMP in parallel likely leads to contention and performance degradation.

On Launch Policies
Just to put this into perspective of general tasking in DASH: calling dash::async on dash::copy with dash::launch::async will convert array.begin() to array.begin().async, which issues an asynchronous copy and the returned dash::Future can be used to wait for completion (through a flush), right?

Now, how is this implemented for other DASH algorithms? Do we provide overrides for them as well? For example, combining dash::launch::async with the collective execution policy sounds like a bad idea...

My idea of tasking in DASH currently is that calling dash::async with dash::launch::async specified on any other function will result in an asynchronously executed action, i.e., a task that is scheduled through DART to be executed at some point in time. Again, the dash::future is used to wait for completion. The DART side of this is pretty much there (although maybe not extensively tested).

There are two concepts that I would like to raise awareness of:

Task dependencies
I've been working on a way to define an ordering of tasks through data dependencies in DART by specifying arbitrary memory locations in global memory (similar to OpenMP task data dependencies). I haven't thought much about the DASH abstraction although I think dash::async would suite well. This would require dash::async to optionally accept dependency definitions.
Fire-and-forget or Bulk Tasks
Similar to OpenMP tasks, users may want to define tasks without explicitly waiting for them individually. Instead, tasks are defined in bulk and waited for by a single statement. This is almost necessary if you want to taskify one or multiple iterations of your algorithm. We could either introduce a launch policy for dash::async for this type of tasks (returning an empty dash::Future) or have a separate abstraction for it (such as dash::Task::create and dash::Task::complete). I'm OK with both, I just want to make sure we are covering this use-case.

We should definitely discuss this in Garching. I think we are arriving at a state where the tasking in DART is ready to be made available (though maybe not in its full feature-set). I've been holding it back until after the (ominous) next release so far.

fuchsto · 2017-02-28T08:15:55Z

@devreal

Keep in mind that DART will eventually have it's own thread-pool independent of OpenMP

Of course! Just using OpenMP as a reference, but we want light-weight tasks, of course.

My idea of tasking in DASH currently is that calling dash::async with dash::launch::async specified
on any other function will result in an asynchronously executed action

That's not a contradiction: the conversion to array.begin().async is internal, the visible part of the interface is just the return type dash::Future<T> which is also what tasks would return.
It's definitely important to support "real" task models like yours, I kept that in mind in my proposal.

But on the other hand, how would you replace the current dash::copy_async by tasks?
Asynchronous one-sided MPI will be impossible to beat in efficiency, wrapping a blocking dash::copy in a task would not help.

I assume that eventually, the current async interface of DART will be wrapped in a proper task interface so the C++ layer could prefer tasks to abstract async operations - Still, we must be able to support these "async-specialized implementation" cases like dash::copy_async, too.

We need both, and I think / hope the proposed interface could do.

devreal · 2017-02-28T08:26:34Z

But on the other hand, how would you replace the current dash::copy_async by tasks?

Not at all, it does not make sense to implement dash::copy_async in terms of tasks as the time is mostly spent in data transfer. I was just wondering about the other DASH algorithms :)

We need both, and I think / hope the proposed interface could do.

Absolutely, I just wanted to make sure we are on the same page here. I will present my current status in Garching and based on that we can discuss the details of the tasking abstraction in it's full beauty :)

fuchsto · 2017-02-28T09:23:21Z

Ah, I got a good feeling about this.

Ehhmm, when exactly is the SPPEXA name-dancing in Garching again?

fuchsto · 2017-02-28T09:26:52Z

Ah, here we go:

http://www.sppexa.de/sppexa-activities/annual-plenary-meeting/2017.html

That's a good time frame. Did you happen to review HPX and Kokkos (also: Alpaka), yet?
I'm about to make DASH views and Kokkos views fancy each other.

devreal · 2017-02-28T09:31:00Z

I had a look at Kokkos (see my comments in #193) and will attend a workshop on HPX here at HLRS next week. I haven't looked at Alpaca, yet. Could that be relevant as well?

fuerlinger · 2017-02-28T09:39:48Z

Yes, Alpaka is relevant as well. I'm just here at the Alpaka Kickoff meeting, will report back tomorrow.

devreal · 2017-02-28T09:41:11Z

@fuerlinger Gotcha. If you can provide us with some material I'll be happy to look at it before the Garching meeting.

devreal · 2017-03-29T11:06:47Z

Can we get this into 0.3.0? If not, I would start working on a quick fix for dash::accumulate and document the deficiencies so we can close #216.

fuchsto · 2017-04-24T12:09:10Z

@devreal I can get this into 0.3.0 this week.

pascalj · 2018-05-03T07:58:36Z

@fuchsto May I hijack this PR to add the few lines needed for Alpakas for_each and maybe transform example? It currently does not rely on these changes, but it wouldn't be much work to base the MEPHISTO-stuff (in the MEPHISTO repo) on dash/include/dash/Execution.h

fuchsto added doc enhancement feature help wanted module:algorithms specification:dash-concepts labels Feb 24, 2017

fuchsto self-assigned this Feb 24, 2017

fuchsto requested review from fuerlinger, rkowalewski, fmoessbauer and devreal February 24, 2017 00:34

Introducing execution- and launch policies

0cb58f5

fuchsto mentioned this pull request Feb 24, 2017

Improvements to dash::accumulate #216

Open

devreal reviewed Feb 24, 2017

View reviewed changes

rkowalewski approved these changes Feb 24, 2017

View reviewed changes

fuchsto mentioned this pull request Feb 27, 2017

Launch policies for algorithms #272

Open

fuchsto added 4 commits March 24, 2017 03:39

Merge branch 'development' into feat-policies

d432193

Merge branch 'development' into feat-policies

b15163c

Merge branch 'development' into feat-policies

cb1f05a

Merge branch 'development' into feat-policies

337f03f

Merge branch 'development' into feat-policies

885700c

fmoessbauer mentioned this pull request May 11, 2017

Copy / Assignment of dash::Atomic #289

Open

[WIP] Introducing execution- and launch policies #300

Are you sure you want to change the base?

[WIP] Introducing execution- and launch policies #300

Uh oh!

Conversation

fuchsto commented Feb 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fuchsto commented Feb 24, 2017

Uh oh!

devreal left a comment

Choose a reason for hiding this comment

Uh oh!

devreal Feb 24, 2017

Choose a reason for hiding this comment

Uh oh!

rkowalewski Feb 24, 2017

Choose a reason for hiding this comment

Uh oh!

devreal Feb 24, 2017

Choose a reason for hiding this comment

Uh oh!

fuchsto Feb 24, 2017

Choose a reason for hiding this comment

Uh oh!

rkowalewski left a comment

Choose a reason for hiding this comment

Uh oh!

fuchsto commented Feb 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devreal commented Feb 28, 2017

Uh oh!

fuchsto commented Feb 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devreal commented Feb 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fuchsto commented Feb 28, 2017

Uh oh!

fuchsto commented Feb 28, 2017

Uh oh!

devreal commented Feb 28, 2017

Uh oh!

fuerlinger commented Feb 28, 2017

Uh oh!

devreal commented Feb 28, 2017

Uh oh!

devreal commented Mar 29, 2017

Uh oh!

fuchsto commented Apr 24, 2017

Uh oh!

pascalj commented May 3, 2018

Uh oh!

Uh oh!

fuchsto commented Feb 24, 2017 •

edited

Loading

fuchsto commented Feb 25, 2017 •

edited

Loading

fuchsto commented Feb 28, 2017 •

edited

Loading

devreal commented Feb 28, 2017 •

edited

Loading