Add more sections

shrijitsingh99 · shrijitsingh99 · commit e264309c60e5 · 2020-08-29T03:22:02.000+05:30
diff --git a/doc/advanced/content/executor_design.rst b/doc/advanced/content/executor_design.rst
@@ -119,7 +119,182 @@ functions for creating execution agents from a callable function object. The exe
 bound to the executor’s context, and hence to one or more of the resources that context represents.
 
 Why does PCL need executors?
------------------------------------------------
+=================================
+
+PCL has a a diverse collections of modules with various algorithms. Many of these implementations of algorithms
+have a diverse set of algorithms targeting various facilities such as SIMD, OpenMP, GPU (CUDA) etc.
+Since each facility has a different set of API's and interface, so they have separate implementations,
+some of them are in the form of separate classes:.
+Some of the problems with the current scenario are:
+
+1. **Divergent Implementation**
+The distinct implementations of algorithms leads to disparity in the codebase over time. Since the more popular
+implementation get bug fixes, new features and undergoes refactoring while the other implementations remain
+untouched.
+Example: The parallelized version of an algorithm might be more popular so it will get bug fixes overtime while
+those bugs continue to persist in the serial implementations of that algorithm.
+
+2. **Non Uniform API**
+The API for one of the implementations for a specific facility might undergo some changes to accommodate
+the some of the new features utilizing some of the tools provide by the facilities optimizations or
+to adapt to the interface provided by the facilities.
+Example: Parallel implementations expose API's to allow configuring the degree of parallelism which is completely
+absent from sequential implementations.
+
+3. **Inextensible Design**
+The current design doesn't support using new facilities like thread pools, multi-gpu support or nesting
+facilities. To add support for these facilities completely new implementations will have to be written
+for every algorithm.
+Example: It isn't possible to run vectorized code (SIMD) while running a parallel implementation which
+uses OpenMP.
+
+4. **Code Duplication**
+Even if different facilities might require slightly different implementations, a lot of the code can
+be shared. Having different implementations just leads to a majority of the code base being duplicated
+and only some of the code gets modified inorder to adapt to the interface provided by the facility.
+Example: Most of OpenMP code is quite similar to the sequential implementation with only some additions.
+So having separate classed for OpenMP implementations is quite redundant.
+
+5. **Maintenance Overhead**
+Maintaining several implementations of the same algorithm is a labour and time intensive task.
+Porting changes from one implementation to another and propagating bug fixes to all the implementations
+is time consuming. Lack of time to propagate the changes will lead to divergent implementations as mentioned
+above.
+
+Design Considerations
+=================================
+
+The executor design proposal for C++, aims to build a generalized and extensible framework surrounding
+executors. This is necessary since it needs to support a wide rage of uses cases so as to attempt to cater
+everyone's needs, in order for it to be accepted into the standard library.
+
+In PCL, the need for executors is limited to certain features and there is no need for the entire feature set
+as mentioned in the proposal. Concepts such asynchronous operations and task based parallelism are not present
+or needed in PCL at the moment so creating a design which incorporates all those features would be unnecessary.
+The main use cases in PCL are:
+
+1. Provide a uniform API for executing existing algorithms on different facilities giving users the freedom
+   to switch between facilities with ease.
+
+2. Reducing code duplication and trying to avoid completely different implementations of the same algorithm
+
+3. Provide a simple and easy to use mechanism to customize the execution context which users can also access
+
+4. Expose some of the underlying features offered by the the various facilities in a standardized manner
+
+5. Provide a mechanism to automatically choose the best facility to run an algorithm in case the user does
+   not explicitly specify which facility to use
+
+6. Be extensible enough to allows users to specify their own executors or customize the ones provided by PCL
+7. Ensure as little overhead as possible, if possible make everything compile time
+
+8. Last but not the least be forward compatible with the upcoming executor design so that PCL is compatible with
+   them when they become a part of the standard specification.
+
+Proposed Design
+=================================
+
+The currently implemented design draws heavy influence from some of the current implementations which are being
+developed in light of the proposal which are:
+
+* `executors-impl <https://github.com/executors/executors-impl>`_
+
+* `cudex <https://github.com/jaredhoberock/cudex>`_
+
+The two main elements of this design are namely executors themselves and  executor properties.
+
+
+**Executor Design**
+
+So as per the proposal the technical definition of an executor is:
+An Executor should be a CopyConstructible and EqualityComparable type that provides a function named
+execute that eagerly submits work on a single execution agent created for it by the executor.
+
+There are two available execution functions in any executor:
+
+1. `execute`
+It takes a nullary callable (a callable which takes no arguments and returns void) and executes
+the callable on a single execution agent exactly once.
+
+2. `bulk_execute`
+It takes a callable (which returns void but takes an index parameter as argument) and a shape which
+corresponds to the number of invocations of the callable. This function generates execution agents
+equal to the number of invocations in bulk and then each execution agent invokes the callable once.
+The index of the execution agent is passed as argument to the callable, so that the callable
+knows the invocation index.
+
+The difference between simply calling execute repeatedly and bulk_execute is that bulk_execute
+leverages the facilities API to generate execution agents in bulk which is more efficient then creating
+them one by one.
+
+How these executors call the callable internally is dependent on the implementation of each executor
+and some aspects of the execution can be customized through properties.
+
+The index passed in bulk execute can be used to internally partition certain parts of the code to only run
+on on specific indexes.
+Example: It can be used to split the iteration of a loop between the execution agents.
+
+**Shape and Index**
+
+The shape and index will vary depending on the facilities, so a mechanism has been provided to customize
+their types. By default in they are `std::size_t`.
+
+The shape or index can be specified by a type or an alias for a type inside the executor with the names
+`shape_type` and `executor_type`. There also exists type traits namely `executor_shape` and `executor_index`
+to access the type the executors shape or index.
+
+**Executor Properties**
+Executor properties are objects associated with an executor. They are used to customize various aspects
+of the executor related to execution and are also used to provide guarantees.
+The properties which are implemented in PCL currently are:
+
+* Blocking
+
+This specifies whether or not execution inside an execution function should wait/block till all
+execution agents are done executing. There are 3 mutually exclusive blocking properties
+`blocking.always`, `blocking.never` and `blocking.possibly` their role can determined by their names
+itself. The default is `blocking.always`.
+
+* Allocators
+
+It specified the allocator,to associate with an executor. A user may use this
+property to suggest the use the specified preferred allocator when allocating storage necessary
+for execution. The default is the specialization `allocator_t<void>` which indicates to
+use the default allocator available in the system.
+
+As of now only these 2 properties are supported in PCL but even they are not fully supported by
+the provided executors.
+
+It is compulsory for a property to define a default property, which indicates the the property
+value even if an executor doesn't explicitly support that property.
+
+**Property Customization Mechanism**
+
+Properties of an executor are specified using the template parameters of an executor class template.
+A user may introduce a new property to an executor by defining a property type and
+specializing either the `require` or `prefer` and `query` member functions inside the executor.
+
+The properties of an executor can strongly or weakly associate properties which are supported by an
+executor by call to the require or prefer customization points. This operation might produce a new
+executor of a different type. You can also query whether an executor supports a specific property
+or not by a call to the query customization point.
+
+
+**Customizing Executors**
+
+Users are free to create their own executors or customize existing ones by inheriting the ones provided
+by PCL. Users can even create their own custom properties add support for them in executors.
+
+As of now only derived executors will work on PCL functions which support the base executor,
+using your executor without deriving is not supported in PCL functions. Since this is an advanced
+feature and is user dependent, PCL code cannot provide any guarantee that custom executors will work as
+expected for PCL functions. Make sure to look and understand the code for the function in which
+you are using a custom executor and determine whether the executor will provide the expected results.
+
+**Implementation**
+
+.. code-block:: cpp
+