Skip to content

Commit e264309

Browse files
Add more sections
1 parent 7e4cb9d commit e264309

File tree

1 file changed

+176
-1
lines changed

1 file changed

+176
-1
lines changed

doc/advanced/content/executor_design.rst

Lines changed: 176 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,182 @@ functions for creating execution agents from a callable function object. The exe
119119
bound to the executor’s context, and hence to one or more of the resources that context represents.
120120

121121
Why does PCL need executors?
122-
-----------------------------------------------
122+
=================================
123+
124+
PCL has a a diverse collections of modules with various algorithms. Many of these implementations of algorithms
125+
have a diverse set of algorithms targeting various facilities such as SIMD, OpenMP, GPU (CUDA) etc.
126+
Since each facility has a different set of API's and interface, so they have separate implementations,
127+
some of them are in the form of separate classes:.
128+
Some of the problems with the current scenario are:
129+
130+
1. **Divergent Implementation**
131+
The distinct implementations of algorithms leads to disparity in the codebase over time. Since the more popular
132+
implementation get bug fixes, new features and undergoes refactoring while the other implementations remain
133+
untouched.
134+
Example: The parallelized version of an algorithm might be more popular so it will get bug fixes overtime while
135+
those bugs continue to persist in the serial implementations of that algorithm.
136+
137+
2. **Non Uniform API**
138+
The API for one of the implementations for a specific facility might undergo some changes to accommodate
139+
the some of the new features utilizing some of the tools provide by the facilities optimizations or
140+
to adapt to the interface provided by the facilities.
141+
Example: Parallel implementations expose API's to allow configuring the degree of parallelism which is completely
142+
absent from sequential implementations.
143+
144+
3. **Inextensible Design**
145+
The current design doesn't support using new facilities like thread pools, multi-gpu support or nesting
146+
facilities. To add support for these facilities completely new implementations will have to be written
147+
for every algorithm.
148+
Example: It isn't possible to run vectorized code (SIMD) while running a parallel implementation which
149+
uses OpenMP.
150+
151+
4. **Code Duplication**
152+
Even if different facilities might require slightly different implementations, a lot of the code can
153+
be shared. Having different implementations just leads to a majority of the code base being duplicated
154+
and only some of the code gets modified inorder to adapt to the interface provided by the facility.
155+
Example: Most of OpenMP code is quite similar to the sequential implementation with only some additions.
156+
So having separate classed for OpenMP implementations is quite redundant.
157+
158+
5. **Maintenance Overhead**
159+
Maintaining several implementations of the same algorithm is a labour and time intensive task.
160+
Porting changes from one implementation to another and propagating bug fixes to all the implementations
161+
is time consuming. Lack of time to propagate the changes will lead to divergent implementations as mentioned
162+
above.
163+
164+
Design Considerations
165+
=================================
166+
167+
The executor design proposal for C++, aims to build a generalized and extensible framework surrounding
168+
executors. This is necessary since it needs to support a wide rage of uses cases so as to attempt to cater
169+
everyone's needs, in order for it to be accepted into the standard library.
170+
171+
In PCL, the need for executors is limited to certain features and there is no need for the entire feature set
172+
as mentioned in the proposal. Concepts such asynchronous operations and task based parallelism are not present
173+
or needed in PCL at the moment so creating a design which incorporates all those features would be unnecessary.
174+
The main use cases in PCL are:
175+
176+
1. Provide a uniform API for executing existing algorithms on different facilities giving users the freedom
177+
to switch between facilities with ease.
178+
179+
2. Reducing code duplication and trying to avoid completely different implementations of the same algorithm
180+
181+
3. Provide a simple and easy to use mechanism to customize the execution context which users can also access
182+
183+
4. Expose some of the underlying features offered by the the various facilities in a standardized manner
184+
185+
5. Provide a mechanism to automatically choose the best facility to run an algorithm in case the user does
186+
not explicitly specify which facility to use
187+
188+
6. Be extensible enough to allows users to specify their own executors or customize the ones provided by PCL
189+
7. Ensure as little overhead as possible, if possible make everything compile time
190+
191+
8. Last but not the least be forward compatible with the upcoming executor design so that PCL is compatible with
192+
them when they become a part of the standard specification.
193+
194+
Proposed Design
195+
=================================
196+
197+
The currently implemented design draws heavy influence from some of the current implementations which are being
198+
developed in light of the proposal which are:
199+
200+
* `executors-impl <https://github.com/executors/executors-impl>`_
201+
202+
* `cudex <https://github.com/jaredhoberock/cudex>`_
203+
204+
The two main elements of this design are namely executors themselves and executor properties.
205+
206+
207+
**Executor Design**
208+
209+
So as per the proposal the technical definition of an executor is:
210+
An Executor should be a CopyConstructible and EqualityComparable type that provides a function named
211+
execute that eagerly submits work on a single execution agent created for it by the executor.
212+
213+
There are two available execution functions in any executor:
214+
215+
1. `execute`
216+
It takes a nullary callable (a callable which takes no arguments and returns void) and executes
217+
the callable on a single execution agent exactly once.
218+
219+
2. `bulk_execute`
220+
It takes a callable (which returns void but takes an index parameter as argument) and a shape which
221+
corresponds to the number of invocations of the callable. This function generates execution agents
222+
equal to the number of invocations in bulk and then each execution agent invokes the callable once.
223+
The index of the execution agent is passed as argument to the callable, so that the callable
224+
knows the invocation index.
225+
226+
The difference between simply calling execute repeatedly and bulk_execute is that bulk_execute
227+
leverages the facilities API to generate execution agents in bulk which is more efficient then creating
228+
them one by one.
229+
230+
How these executors call the callable internally is dependent on the implementation of each executor
231+
and some aspects of the execution can be customized through properties.
232+
233+
The index passed in bulk execute can be used to internally partition certain parts of the code to only run
234+
on on specific indexes.
235+
Example: It can be used to split the iteration of a loop between the execution agents.
236+
237+
**Shape and Index**
238+
239+
The shape and index will vary depending on the facilities, so a mechanism has been provided to customize
240+
their types. By default in they are `std::size_t`.
241+
242+
The shape or index can be specified by a type or an alias for a type inside the executor with the names
243+
`shape_type` and `executor_type`. There also exists type traits namely `executor_shape` and `executor_index`
244+
to access the type the executors shape or index.
245+
246+
**Executor Properties**
247+
Executor properties are objects associated with an executor. They are used to customize various aspects
248+
of the executor related to execution and are also used to provide guarantees.
249+
The properties which are implemented in PCL currently are:
250+
251+
* Blocking
252+
253+
This specifies whether or not execution inside an execution function should wait/block till all
254+
execution agents are done executing. There are 3 mutually exclusive blocking properties
255+
`blocking.always`, `blocking.never` and `blocking.possibly` their role can determined by their names
256+
itself. The default is `blocking.always`.
257+
258+
* Allocators
259+
260+
It specified the allocator,to associate with an executor. A user may use this
261+
property to suggest the use the specified preferred allocator when allocating storage necessary
262+
for execution. The default is the specialization `allocator_t<void>` which indicates to
263+
use the default allocator available in the system.
264+
265+
As of now only these 2 properties are supported in PCL but even they are not fully supported by
266+
the provided executors.
267+
268+
It is compulsory for a property to define a default property, which indicates the the property
269+
value even if an executor doesn't explicitly support that property.
270+
271+
**Property Customization Mechanism**
272+
273+
Properties of an executor are specified using the template parameters of an executor class template.
274+
A user may introduce a new property to an executor by defining a property type and
275+
specializing either the `require` or `prefer` and `query` member functions inside the executor.
276+
277+
The properties of an executor can strongly or weakly associate properties which are supported by an
278+
executor by call to the require or prefer customization points. This operation might produce a new
279+
executor of a different type. You can also query whether an executor supports a specific property
280+
or not by a call to the query customization point.
281+
282+
283+
**Customizing Executors**
284+
285+
Users are free to create their own executors or customize existing ones by inheriting the ones provided
286+
by PCL. Users can even create their own custom properties add support for them in executors.
287+
288+
As of now only derived executors will work on PCL functions which support the base executor,
289+
using your executor without deriving is not supported in PCL functions. Since this is an advanced
290+
feature and is user dependent, PCL code cannot provide any guarantee that custom executors will work as
291+
expected for PCL functions. Make sure to look and understand the code for the function in which
292+
you are using a custom executor and determine whether the executor will provide the expected results.
293+
294+
**Implementation**
295+
296+
.. code-block:: cpp
297+
123298
124299
125300

0 commit comments

Comments
 (0)