Skip to content

[WIP] Introducing execution- and launch policies #300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: development
Choose a base branch
from

Conversation

fuchsto
Copy link
Member

@fuchsto fuchsto commented Feb 24, 2017

Addresses issues #272 #104 #216

@fuchsto
Copy link
Member Author

fuchsto commented Feb 24, 2017

Related to launch policies: Task concept specification (see #193)

Copy link
Member

@devreal devreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting the ball rolling. Questions inline.

/**
* Parallel non-sequential execution policy.
*/
class parallel_unsequenced_policy { };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference in semantics between parallel_policy and parallel_unsequenced_policy?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is well described on cppreference.

InputIt in_last,
OutputIt out_first);
dash::Future<ValueType *> copy(
ExecutionPolicy && policy,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The execution policy is ignored atm, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I just pinned the branch as PR to simplify discussion. It's work in progress (= [WIP] in title).

Copy link

@rkowalewski rkowalewski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It perfectly matches the C++17 Specs.

@fuchsto
Copy link
Member Author

fuchsto commented Feb 25, 2017

I think I got the required concepts figured out.
This summary is mostly relevant to @devreal @fmoessbauer @fuerlinger

Execution Policies

In C++17, execution policies are introduced in algorithm function interfaces and passed as first argument:

template< class ExecutionPolicy, class InputIt, class UnaryFunction2 >
void for_each( ExecutionPolicy&& policy, InputIt first, InputIt last, UnaryFunction2 f );

It is therefore trivial to specialize / overload algorithm interfaces for execution policies.
From the code-mechanics perspective, we can comfortably extend the STL execution policy concept.
Introducing additional traits and some CRTP adapters should do.

The actual effort is in rather formal aspects. Which is nice, because science.

In principle, this boils down to additional PGAS-specific dimensions for execution policies.
Some examples:

  • collaboration, such as collective, asymmetric, leader
  • synchronization such as nowait (as known from OpenMP, would skip implicit barriers)
  • unit-level parallelization, the vanilla STL execution policies, e.g. parallel would allow units to spawn OpenMP threads

... you get the idea. Policies can be arbitrarily combined with foo_policy | bar_policy as usual.

Launch Policies

STL concepts do not cover our requirements regarding launch policies.
In vanilla C++, asynchronous and deferred execution is provided by the std::launch and std::async etc. function interfaces and algorithms are agnostic of launch policies:

template< class Function, class... Args >
std::future<typename std::result_of<Function(Args...)>::type>
    async( std::launch policy, Function&& fun, Args&&... args );

Parameter fun is the actual callable object.

Our algorithm variants depend on launch policies, however. For example, asynchronous copying dash::copy_async and blocking copying dash::copy are independent interfaces, and I don't think there is practicable alternative that is correct and efficient in the general case.
Implementing blocking and deferred variants as wrappers of a single asynchronous implementation seems sensible, but in fact it hurts performance pretty badly.

I sketched out some alternatives, like:

  • Grouping algorithms by launch policy in namespaces, like dash::async::copy. Does not allow combination of policies, so it's not an option.
  • Meta-magic: dash::async could just replace callables depending on the launch policy. For example, passing dash::copy with dash::launch::async would invoke dash::copy_async instead. Nearly impossible to maintain and extremely irritating as the executed task is different from the specified callable.

So far, my initial approach based on phantom- and container proxy types looks like the best option to me. Instead of namespaces or function template specializations, algorithm variants are realized as overloads for iterator traits.

An iterator for async operations would have dash::iterator_traits<It>::async_policy_tag specified.
Containers like dash::Array use adapters or proxies to convert iterators between policies.
This is already implemented in dash::Array<T>::async and allows usage like this:

dash::Array<X> array(size);

// Default:
auto copy_end = dash::copy(array.begin(), array.end(), dest);
// -> dash::Array<X>::iterator

// Async:
auto copy_end = dash::copy(array.async.begin(), array.async.end(), dest);
// current implementation:
// -> dash::GlobAsyncIter<T>
// will be:
// -> dash::Future<dash::Array<X>::iterator>

// Lazy:
auto copy_end = dash::copy(array.deferred.begin(), array.deferred.end(), dest);
// -> dash::Array<X>::iterator

This is very close to the STL specs and there is no way to get it wrong for developers.
No maintenance effort in algorithms at all, no coupling of concerns, much rejoice.

But also, it is important to note that we now can actually define dash::async and dash::launch like in STL, and replace iterator parameters automatically (but, different from replacing the callable parameter as described above, with robust semantics):

// Developer writes:
auto fut_copy_end = dash::async(
             dash::launch::async,
             dash::copy<Foo>,
             array.begin(),
             array.end());
// --> resolves to:
auto fut_copy_end = dash::async(
             dash::launch::async,
             dash::copy<Foo>,
             array.begin().async,  // same as array.async.begin()
             array.end().async);   // same as array.async.end()

This is semantically and conceptually robust:

  • GlobIter<T, ..., dash::launch::async> satisfies GlobIter<T, ..., dash::launch::any> and
  • The developer specified the launch policy explicitly, so the principle of least astonishment is respected, the behavior and signature corresponds to std::async.

So long story short: There is a chance we can have the cake and eat it, too.

In the current implementation in branch development, array.async is a fully specialized proxy accessor with dash::ArrayAsyncRef<T>::iterator = GlobAsyncIter<T>.
I added those as proof of concept., actually these specializations are not necessary for every policy, we can use traits as well.
This is preferrable as we otherwise had to define a fully specialized type for any combination of launch policies, and adding a policy would be the opposite of fun.

Looking forward to your thoughts!

@devreal
Copy link
Member

devreal commented Feb 28, 2017

Thanks for this brain dump! I think I get the idea.

On Execution Policies
I think this is clear and it helps to get rid of some of the confusion we currently have, e.g., which unit receives the results of a dash::accumulate call. One thing to note, though:

unit-level parallelization, the vanilla STL execution policies, e.g. parallel would allow units to spawn OpenMP threads

Keep in mind that DART will eventually have it's own thread-pool independent of OpenMP. OpenMP is just too restrictive to use it as a general-purpose task scheduler (unfortunately). Executing tasks in DART and OpenMP in parallel likely leads to contention and performance degradation.

On Launch Policies
Just to put this into perspective of general tasking in DASH: calling dash::async on dash::copy with dash::launch::async will convert array.begin() to array.begin().async, which issues an asynchronous copy and the returned dash::Future can be used to wait for completion (through a flush), right?

Now, how is this implemented for other DASH algorithms? Do we provide overrides for them as well? For example, combining dash::launch::async with the collective execution policy sounds like a bad idea...

My idea of tasking in DASH currently is that calling dash::async with dash::launch::async specified on any other function will result in an asynchronously executed action, i.e., a task that is scheduled through DART to be executed at some point in time. Again, the dash::future is used to wait for completion. The DART side of this is pretty much there (although maybe not extensively tested).

There are two concepts that I would like to raise awareness of:

  1. Task dependencies
    I've been working on a way to define an ordering of tasks through data dependencies in DART by specifying arbitrary memory locations in global memory (similar to OpenMP task data dependencies). I haven't thought much about the DASH abstraction although I think dash::async would suite well. This would require dash::async to optionally accept dependency definitions.

  2. Fire-and-forget or Bulk Tasks
    Similar to OpenMP tasks, users may want to define tasks without explicitly waiting for them individually. Instead, tasks are defined in bulk and waited for by a single statement. This is almost necessary if you want to taskify one or multiple iterations of your algorithm. We could either introduce a launch policy for dash::async for this type of tasks (returning an empty dash::Future) or have a separate abstraction for it (such as dash::Task::create and dash::Task::complete). I'm OK with both, I just want to make sure we are covering this use-case.

We should definitely discuss this in Garching. I think we are arriving at a state where the tasking in DART is ready to be made available (though maybe not in its full feature-set). I've been holding it back until after the (ominous) next release so far.

@fuchsto
Copy link
Member Author

fuchsto commented Feb 28, 2017

@devreal

Keep in mind that DART will eventually have it's own thread-pool independent of OpenMP

Of course! Just using OpenMP as a reference, but we want light-weight tasks, of course.

My idea of tasking in DASH currently is that calling dash::async with dash::launch::async specified
on any other function will result in an asynchronously executed action

That's not a contradiction: the conversion to array.begin().async is internal, the visible part of the interface is just the return type dash::Future<T> which is also what tasks would return.
It's definitely important to support "real" task models like yours, I kept that in mind in my proposal.

But on the other hand, how would you replace the current dash::copy_async by tasks?
Asynchronous one-sided MPI will be impossible to beat in efficiency, wrapping a blocking dash::copy in a task would not help.

I assume that eventually, the current async interface of DART will be wrapped in a proper task interface so the C++ layer could prefer tasks to abstract async operations - Still, we must be able to support these "async-specialized implementation" cases like dash::copy_async, too.

We need both, and I think / hope the proposed interface could do.

@devreal
Copy link
Member

devreal commented Feb 28, 2017

But on the other hand, how would you replace the current dash::copy_async by tasks?

Not at all, it does not make sense to implement dash::copy_async in terms of tasks as the time is mostly spent in data transfer. I was just wondering about the other DASH algorithms :)

We need both, and I think / hope the proposed interface could do.

Absolutely, I just wanted to make sure we are on the same page here. I will present my current status in Garching and based on that we can discuss the details of the tasking abstraction in it's full beauty :)

@fuchsto
Copy link
Member Author

fuchsto commented Feb 28, 2017

Ah, I got a good feeling about this.

Ehhmm, when exactly is the SPPEXA name-dancing in Garching again?

@fuchsto
Copy link
Member Author

fuchsto commented Feb 28, 2017

Ah, here we go:

http://www.sppexa.de/sppexa-activities/annual-plenary-meeting/2017.html

That's a good time frame. Did you happen to review HPX and Kokkos (also: Alpaka), yet?
I'm about to make DASH views and Kokkos views fancy each other.

@devreal
Copy link
Member

devreal commented Feb 28, 2017

I had a look at Kokkos (see my comments in #193) and will attend a workshop on HPX here at HLRS next week. I haven't looked at Alpaca, yet. Could that be relevant as well?

@fuerlinger
Copy link
Contributor

Yes, Alpaka is relevant as well. I'm just here at the Alpaka Kickoff meeting, will report back tomorrow.

@devreal
Copy link
Member

devreal commented Feb 28, 2017

@fuerlinger Gotcha. If you can provide us with some material I'll be happy to look at it before the Garching meeting.

@devreal
Copy link
Member

devreal commented Mar 29, 2017

Can we get this into 0.3.0? If not, I would start working on a quick fix for dash::accumulate and document the deficiencies so we can close #216.

@fuchsto
Copy link
Member Author

fuchsto commented Apr 24, 2017

@devreal I can get this into 0.3.0 this week.

@pascalj
Copy link
Member

pascalj commented May 3, 2018

@fuchsto May I hijack this PR to add the few lines needed for Alpakas for_each and maybe transform example? It currently does not rely on these changes, but it wouldn't be much work to base the MEPHISTO-stuff (in the MEPHISTO repo) on dash/include/dash/Execution.h

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants