From 06f7ea16aea51653b999ab8359a30831efd61303 Mon Sep 17 00:00:00 2001
From: Jan Veverak Koniarik <squirrelcze@gmail.com>
Date: Fri, 29 Apr 2022 20:06:06 +0200
Subject: [PATCH 1/9] draft for testing library

---
 _posts/20-04-20-testing-library.md | 331 +++++++++++++++++++++++++++++
 1 file changed, 331 insertions(+)
 create mode 100644 _posts/20-04-20-testing-library.md

diff --git a/_posts/20-04-20-testing-library.md b/_posts/20-04-20-testing-library.md
new file mode 100644
index 00000000..8b0ab9ec
--- /dev/null
+++ b/_posts/20-04-20-testing-library.md
@@ -0,0 +1,331 @@
+---
+title: Testing library
+description: Design of a testing library
+author: veverak
+tags: [c++, testing]
+---
+
+<!-- excerpt start -->
+
+We have multiple testing library focused on desktop C++ aplications, but there is a lack of library designed for embedded devices.
+
+The traditional libraries are not designed for constrained resources and rely on functionality like a filesystem or standard output.
+
+I decided to design a testing library for microcontrollers.
+In this article I want to show rationale, design choices, and thoughts on the prototype.
+
+<!-- excerpt end --->
+
+{% include newsletter.html %}
+
+{% include toc.html %}
+
+## Motivation
+
+When developing any code, being able to test is crucial for sustainable development.
+
+In the case of code that is executable on systems with OS, widely used solutions are GoogleTest or Catch libraries frameworks.
+What we usually expect from such a framework is:
+- a tool that will organize and orchestrate the execution of the tests
+- basic functions/API to check the corretness of the results in the test
+- features for scaling: fixtures, parameterized tests, executing tests multiple times, metrics
+
+In the context of microprocessosrs, these libraries are not usable.
+They rely on the file system, input/output into a terminal, dynamic memory, and do not care about tight limits for code size.
+
+These frameworks are usable only for testing of  the embedded firmware.
+These parts are independent on the hardware: algorithms, internal business logic etc..
+We, however, can't test anything that is tied to the hardware.
+
+For that reason, I decided to implement a custom opinionated testing framework designed for a specific use case: executing tests on the embedded hardware itself.
+
+The goal is to be able to test embedded code that is tied to the hardware itself:
+- interrupt-based mechanics
+- control algorithms that are unpractical to simulate
+- code tied to peripherals
+
+## Requirements
+
+Based on my experience and opinions, I decided to specify the following requirements:
+
+emlabcpp integration
+   The code is tightly integrated into an existing C++20 library that I am developing. 
+   That is: it can't be used without the library.
+   This eases development of the testing framework as I reuse functionality from the library, specifically: protocol library.
+
+simplicity
+   The library should be simple and should not try to provide entire set of functionality that catch/gtest provides.
+   That should not be necessary and I prefer simpler and more effective tool.
+
+integration into existing testing tools
+   Wide set of tools exist that can work with test results of catch/gtest - for example gitlab has integration of test results from these tools.
+   The library should be compatible from this perspective - it should be integrable into existing systems.
+
+small footprint
+   The assertion is that a big percentage of available memory of the microchip will be taken by the application code itself.
+   That implies that the library should have small memory footprint - so it can coexist with present code.
+
+no dyn. memory, no exception
+   Both are C++ features which we may want to avoid in the firmware.
+   The testing library should not require them for it's functionality, to allow usage in context when they are not enabled at all.
+
+no platform fixation
+   Ideally, we would prefer this to be reusable between different embedded platforms and situations.
+   That imposes the limit that the library should not be tied to any specific platform.
+
+
+## Design
+
+The library itself is implemented as a two part system:
+
+reactor
+   Is present in the embedded device, and controls it.
+   It has small footprint and has limited functionality, it can:
+      - register tests to itself
+      - store bare minimum information about firmware/tests
+      - execute the tests
+      - communicate information/exchange data between itself and controller
+
+controller
+   Controls the testing process and is presented on the device that controls the tests.
+   It is still developet as microcontroller compatible software, but there si weak assumption that it will be mostly used on PC.
+
+   It can:
+      - communicate with and control the reactor
+      - load test information from reactor
+      - orchestrate test execution
+      - provide input data for tests
+      - provide data collected from the tests
+
+The separation of the design into two tools impose restriction: the tests on the embedded device can't be executed without the controller. 
+But that allows really small footprint of the testing firmware on the firmware size, as I can move as much of the testing logic as reasonable to the controller side.
+Especially data collection can be done in a way that nothing is stored in the reactor itself.
+
+The communication method between the parts is not defined.
+Both parts use messages for communication, but is up to the user to implement how the messages are transfered.
+Each expect to be provided with an interface that implements read/write methods - it's up to the user to design how.
+This makes it platform independent and gives flexibility for various scenarios.
+But I do silently expect that UART will be mostly used.
+
+The way the controller gets input data and processes the collected data from tests is up to the user.
+The interface for controller just provides an API for both.
+
+In the end, the perspective one can use for this is:
+The testing library is just fancy remote execution library - the controller executes functions registered to reactor in the firmware and collects result.
+
+## Basic implementation details
+
+Each part is object - `testing_reactor` object and `testing_controller` object.
+Both are designed to take control of their application and both expect to be provided with appropriate interface `testing_reactor_interface` and `testing_controller_interface`.
+Interfaces are designed/selected by the user and define how the object interacts with it's environment.
+
+In case of the embedded firmware, one creates instance of the reactor, registers tests into it and passes control to the reactor.
+This is done in a way that still gives user some control over the main loop::
+```cpp
+   emlabcpp::testing_basic_reactor rec{"test suite name"};
+   my_reactor_interface rif{..};
+   // register tests
+
+   while(true){
+      rec.tick(rif);
+   }
+```
+
+The reactor expects that it's `tick` method is called repeatedly and the method contains one iteration of reactors control loop.
+It either answers the reactor in the control loop or actually executes entire test during one `tick` call - it can block for a while.
+
+`controller` has similar beavior and interface. With the exception that the `controller_interface` also contains customization points for additional features:
+- methods to provide input data for tests on request
+- `on_test(emlabcpp::testing_result)` method that is called with results of one test call
+- `on_error` method that is called once error happens in the library.
+
+It's up to the user to implement the interface for the specific use case or to use existing integration in the library.
+
+## Dynamic memory
+
+Both the `reactor` and the `controller` contains data structure with dynamic size.
+To avoid dynamic memory, I wanted to use `std::pmr`: that is, that the internal containers would use allocator and expects memory resource as an input argument.
+This implements the behavior: "the central objects expect a memory resource they should use for allocation of adata".
+
+I think that this fits the use case quite nicely, as both types require dynamic data structures but at the same way I want them to be usable without dynamic memory itself - compromise is interface that can be provided with static buffers.
+
+However `std::pmr` does not feel usable, as the default construction of allocator uses a default memory instance that exists as a global object. (that can be changed only at runtime)
+The default instance uses new/delete.
+That means that it is easy for code that uses `std::pmr` to include in the firmware entire stack for dynamic allocation - something that I want to avoid.
+
+Given that I implemented custom allocator/memory_resource concept that mirrors the wanted behavior but avoids the problem with default instance.
+That means that to use the objects, user has to instance a memory resource also provided by `emlabcpp` and give it to the object.
+
+To ease usage, there exists `emlabcpp::testing_basic_reactor` which inherits from the `reactor` and provdies it with basic memory resource that can be used by it - sane default.
+
+## Binary protocol
+
+The binary protocol is intetionally considered an implementation detail, as I want to have freedom to change it at will.
+
+It is implemented with a protocol library I did previously in C++. The short description is: imagine protocol buffers, but instead of external tool it is just C++ library that gets definition of protocol via templates.
+
+## Data exchange
+
+The framework provides mechanic to exchange data between controller and reactor.
+
+Tests can request test data from the controller as an form of input.
+(It's up to the user how controller gets/provides that data)
+The request is a blocking communication operation - the input is not stored on the side of reactor.
+
+The test can collect data - reactors has an API to send data to the controller.
+The controller stores the data during test execution and it is passed to the user once test is done in test_result.
+
+In both cases, I use only simple key/value mechanism.
+That is each data point is made of 'key' that identifies it and corresponding 'value'.
+
+To give some flexibility, the types are:
+
+key
+   can be either string or integer
+
+value
+   can be string, integer, bool, unsigned
+
+In each case, the framework is able to serialize (thanks to `emlabcpp::protocol` library) and deserialize any of the types and send them over the communication channel.
+
+As for the strings: These are limited by size to 32 characters, as this way I can use static buffers for them and they do not have to be allocated.
+
+
+## Examples of tests
+
+I tried to prepare a simple interface for the registration of tests, as may general assumption is that the tests should be easy to write.
+(Note: Generally I don't mind some cost on setting up the library, but I think that adding test should be easy)
+To guide the explanation let's assert we are testing wending machine:
+
+```cpp
+   emlabcpp::testing_basic_reactor rec{"test suite for wending machine"};
+
+   rec.register_callable("my simple test", [&]( emlabcpp::testing_record & rec){
+
+      int product_id = rec.get_arg<int>("product_id");
+
+      rec.expect( product_id < MAX_PRODUCTS_N );
+
+      wending_machine::release_product(product_id);
+
+      rec.collect( "released: ", product_id );
+
+      bool occupied = wending_machine::is_takeout_area_occupied();
+
+      rec.expect( occupied );
+   });
+```
+
+What happens here is that lambda function is registered as an test.
+That test is identified by "my simple test" and that is used to identify it from controller.
+
+Once the test is executed (that is: controller tells the reactor to execute it), it is provided with `testing_record` object that serves as an API between the test and the reactor.
+
+The testing code should use the record to get any data from controller, collect any data during the test and mainly: to provide information whenever the test failed or succceeded.
+
+In the example you can see usage of all the primivites:
+ - `rec.get_arg<int>("product_id")` tells the reactor to ask controller for argument with key `product_id` and retreive it as integer type
+ - `rec.expect( product_id < MAX_PRODUCTS_N )` is a form checking properties in the test - in any moment if `false` is passed to the `expect(bool)` method the test is marked as failed.
+ - `rec.collect("released: ", product_id )` collects the data `product_id` with key `released: ` and sends it to the controller.
+
+## Building the tests
+
+That is solely handled by the user, the testing framework just provides a object that expects communication API and can register test - how that is assembled into a firmware is up to the user.
+
+The idea is that single 'testing firmware' will be a collection of multiple tests registered into one reactor.
+It's up to the user to orchestrate the build process in a way that this is sensible.
+
+In case of CMake, I decided to split the application itself into "application library" and "main executable". 
+That is, most of the logic of the firmware is in the application library and the main executable just implements main function and starts up the application library.
+
+The main executable of tests uses that library to prepare and setup tests.
+Note that the idea is that there are multiple test binaries with different tests, I don't assume that all the tests would fit into one binary.
+
+This way, any test firmware is closely similar to the application executable - just with different main file.
+
+## Google Test
+
+One small win that appeared was that given the flexibility, it was easy to integrate gtest and controller together.
+That is, the controller can register each test from reactor as a test in the google test library.
+Tt can use the gtest facillity on PC to provide user-readable output about execution of the tests, more orchestration logic and output of the testing in form of JUnit XML files.
+These can be used by tools like gitlab to provide test results in it's GUI.
+
+What this means? that it was easy to provide necessary facility for the testing firmware to be integrated into modern CI with traditional tools.
+And yet the integration is not tight, any integration into gtest is just a set of few functions/classes in emlabcpp t hat can be ignored for anybody not favoring gtest.
+
+Test output from the project I used this framework first time can look like this:
+
+```
+    ./cmake-build-debug/util/tester --device /dev/ttyACM0
+   [==========] Running 1 test from 1 test suite.
+   [----------] Global test environment set-up.
+   [----------] 1 test from emlabcpp::testing
+   [ RUN      ] emlabcpp::testing.basic_control_test
+   /home/veverak/Projects/servio/util/src/tester.cpp:32: Failure
+   Test produced a failure, stopping
+   collected:
+    11576 :	0
+    11679 :	0
+    11782 :	0
+    11885 :	0
+    11988 :	0
+    12091 :	0
+    12194 :	0
+    12297 :	0
+    12400 :	0
+    12503 :	0
+    12606 :	0
+    12709 :	0
+    12812 :	0
+    12915 :	0
+    13018 :	0
+    13121 :	0
+    13224 :	0
+    13327 :	0
+    13430 :	0
+    13533 :	0
+    13636 :	0
+    13739 :	0
+    13842 :	0
+    13945 :	0
+    14048 :	0
+   [  FAILED  ] emlabcpp::testing.basic_control_test (2597 ms)
+   [----------] 1 test from emlabcpp::testing (2597 ms total)
+
+   [----------] Global test environment tear-down
+   [==========] 1 test from 1 test suite ran. (2597 ms total)
+   [  PASSED  ] 0 tests.
+   [  FAILED  ] 1 test, listed below:
+   [  FAILED  ] emlabcpp::testing.basic_control_test
+
+    1 FAILED TEST
+```
+
+In this example, the controller registered all tests that were in the firmware (on device that was connected to the PC and was accessible via the `/dev/ttyACM0` serial device) as google tests they were executed.
+
+The name of the testing suite `emlabcpp::testing`, name of the test `basic_control_test` we all collected on the fly from the testing firmware itself, we can also see values collected by the test during the execution.
+
+## Controller is independent
+
+Based on the specific project and testing needs, one can use one binary with `controller` for multiple `reactors` , that is something I intend with actuall main project that uses it.
+
+As the controller loads most information from the reactor and in case the gtest integration is used there is not much of the logic that can be varied.
+
+Sole exception is how data is provided for the tests.
+But than that again can be implemented in some general way - for example that the `controller` binary would load the data from json file in some generic way.
+
+## Experience
+
+Is quite limited so far, but I am happy with the first prototype.
+I am sure that I will refactor the library in the future, as there are obvious places to be improved but so far it behaves good enough.
+It gives me a simple way to test and develop various ways to control smart servomotor I am working on.
+(Note: yes, this is one of the cases of "oh, I need to develop a library so I can do a project"...)
+
+What could be developed more in the future and what pains me so far is:
+ - it still does not report 100% of the possible errors on the side of the testing library - I have to go throu the codebase and be more strict
+ - it can't handle exceptions - while it should not rely on them, I think the library should respect them, that means in case test throws exception it should not stop the reactor
+ - data exchange can be improved - what can be exchanged as of now is quite limited, I suppose I can provide more types to send and receive
+ - memory resource - use internal emlabcpp mechanism that is underdeveloped, that definetly would benefit from more work 
+ - more experience in CI - I think I am on a good track to have automatized test in CI that are flashed to real hardware somewhere in laboratory. That could show limits of the library 
+ - `get_arg<int>(` is an example of interface of test library that can result in an error. I don't want to move introduce an change in the API that would make it return error as I can't figure out anything that would not pollute the tests. The idea is that for the errors under category "error in processing of the test and not the test" that can't be handled the library: A. throws exception if possible B. stops at the point and spams the controller with error message. 
+

From 5e41967881bf0a7511dcddfc597d23efd7280849 Mon Sep 17 00:00:00 2001
From: Tyler Hoffman <tyler@memfault.com>
Date: Thu, 26 May 2022 14:36:51 -0700
Subject: [PATCH 2/9] Rename post for preview to work

---
 ...{20-04-20-testing-library.md => 2022-04-20-testing-library.md} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename _posts/{20-04-20-testing-library.md => 2022-04-20-testing-library.md} (100%)

diff --git a/_posts/20-04-20-testing-library.md b/_posts/2022-04-20-testing-library.md
similarity index 100%
rename from _posts/20-04-20-testing-library.md
rename to _posts/2022-04-20-testing-library.md

From 74103a0412c3e91a52125ba6f48479b60bf4fdf1 Mon Sep 17 00:00:00 2001
From: Jan Veverak Koniarik <squirrelcze@gmail.com>
Date: Sat, 28 May 2022 12:35:58 +0200
Subject: [PATCH 3/9] Added paragraph about source repository

---
 _posts/20-04-20-testing-library.md | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/_posts/20-04-20-testing-library.md b/_posts/20-04-20-testing-library.md
index 8b0ab9ec..ffc78c86 100644
--- a/_posts/20-04-20-testing-library.md
+++ b/_posts/20-04-20-testing-library.md
@@ -329,3 +329,11 @@ What could be developed more in the future and what pains me so far is:
  - more experience in CI - I think I am on a good track to have automatized test in CI that are flashed to real hardware somewhere in laboratory. That could show limits of the library 
  - `get_arg<int>(` is an example of interface of test library that can result in an error. I don't want to move introduce an change in the API that would make it return error as I can't figure out anything that would not pollute the tests. The idea is that for the errors under category "error in processing of the test and not the test" that can't be handled the library: A. throws exception if possible B. stops at the point and spams the controller with error message. 
 
+## The code
+
+The testing library is part of emlabcpp - my personal library, which purpose is mostly for me to have up-to-date collection of tools for my development. 
+Given that I restrain myself from saying "it should be used by others" as I don't really want to care about backward compability outside of my projects.
+
+The primary example of the testing library is an example file: [emlabcpp/tests/testing](https://github.com/koniarik/emlabcpp/blob/v1.2/tests/testing_test.cpp)
+
+The interface to the library itself is in: [emlabcpp/include/emlabcpp/experimental/testing](https://github.com/koniarik/emlabcpp/tree/v1.2/include/emlabcpp/experimental/testing)

From 3fb8f7daa6377668e39ed99fb30c9afc2a6de67d Mon Sep 17 00:00:00 2001
From: Jan Veverak Koniarik <squirrelcze@gmail.com>
Date: Sat, 28 May 2022 15:05:06 +0200
Subject: [PATCH 4/9] Grammarlied the testing library post

---
 _posts/2022-04-20-testing-library.md | 209 ++++++++++++++-------------
 1 file changed, 107 insertions(+), 102 deletions(-)

diff --git a/_posts/2022-04-20-testing-library.md b/_posts/2022-04-20-testing-library.md
index ffc78c86..4e78381b 100644
--- a/_posts/2022-04-20-testing-library.md
+++ b/_posts/2022-04-20-testing-library.md
@@ -7,12 +7,12 @@ tags: [c++, testing]
 
 <!-- excerpt start -->
 
-We have multiple testing library focused on desktop C++ aplications, but there is a lack of library designed for embedded devices.
+We have multiple testing libraries focused on C++ applications for GPOS (general-purpose operating system), but there is a lack of testing libraries designed for embedded devices.
 
 The traditional libraries are not designed for constrained resources and rely on functionality like a filesystem or standard output.
 
 I decided to design a testing library for microcontrollers.
-In this article I want to show rationale, design choices, and thoughts on the prototype.
+In this article, I want to show the rationale, design choices, and thoughts on the prototype.
 
 <!-- excerpt end --->
 
@@ -24,17 +24,17 @@ In this article I want to show rationale, design choices, and thoughts on the pr
 
 When developing any code, being able to test is crucial for sustainable development.
 
-In the case of code that is executable on systems with OS, widely used solutions are GoogleTest or Catch libraries frameworks.
+In the case of executable code on systems with GPOS, widely used solutions are GoogleTest or Catch libraries.
 What we usually expect from such a framework is:
 - a tool that will organize and orchestrate the execution of the tests
-- basic functions/API to check the corretness of the results in the test
+- basic functions/API to check the correctness of the results in the test
 - features for scaling: fixtures, parameterized tests, executing tests multiple times, metrics
 
-In the context of microprocessosrs, these libraries are not usable.
-They rely on the file system, input/output into a terminal, dynamic memory, and do not care about tight limits for code size.
+In the context of microprocessors, these libraries are not usable.
+They rely on the file system, input/output into a terminal, and dynamic memory. They also do not care about tight limits for code size.
 
-These frameworks are usable only for testing of  the embedded firmware.
-These parts are independent on the hardware: algorithms, internal business logic etc..
+These frameworks are usable only for testing parts of the embedded firmware.
+These parts are independent of the hardware: algorithms, internal business logic, and others.
 We, however, can't test anything that is tied to the hardware.
 
 For that reason, I decided to implement a custom opinionated testing framework designed for a specific use case: executing tests on the embedded hardware itself.
@@ -49,25 +49,25 @@ The goal is to be able to test embedded code that is tied to the hardware itself
 Based on my experience and opinions, I decided to specify the following requirements:
 
 emlabcpp integration
-   The code is tightly integrated into an existing C++20 library that I am developing. 
+   The code is tightly integrated into my personal C++20 library.
    That is: it can't be used without the library.
-   This eases development of the testing framework as I reuse functionality from the library, specifically: protocol library.
+   This eases the development of the testing framework as I reuse functionality from the library, specifically: the protocol library inside emlabcpp.
 
 simplicity
-   The library should be simple and should not try to provide entire set of functionality that catch/gtest provides.
-   That should not be necessary and I prefer simpler and more effective tool.
+   The library should be simple and should not try to provide an entire set of functionality that Catch/Google Test offers.
+   That should not be necessary, and I prefer a simpler and more efficient tool.
 
 integration into existing testing tools
-   Wide set of tools exist that can work with test results of catch/gtest - for example gitlab has integration of test results from these tools.
+   A wide set of tools exist that can work with the test results of Catch/Google Test - for example, GitLab has the integration of test results from these tools.
    The library should be compatible from this perspective - it should be integrable into existing systems.
 
 small footprint
-   The assertion is that a big percentage of available memory of the microchip will be taken by the application code itself.
-   That implies that the library should have small memory footprint - so it can coexist with present code.
+   The assertion is that the application code itself will take a big percentage of the available memory of the microchip.
+   That implies that the library should have a small memory footprint - so it can coexist with present code.
 
-no dyn. memory, no exception
-   Both are C++ features which we may want to avoid in the firmware.
-   The testing library should not require them for it's functionality, to allow usage in context when they are not enabled at all.
+no dynamic memory, no exception
+   Both are C++ features that we may want to avoid in the firmware.
+   The testing library should not require them for its functionality to allow usage in context when they are not enabled at all.
 
 no platform fixation
    Ideally, we would prefer this to be reusable between different embedded platforms and situations.
@@ -76,50 +76,50 @@ no platform fixation
 
 ## Design
 
-The library itself is implemented as a two part system:
+The library itself is implemented as a two-part system:
 
 reactor
-   Is present in the embedded device, and controls it.
-   It has small footprint and has limited functionality, it can:
+   It is present in the embedded device and controls it.
+   It has a small footprint and limited functionality. It can:
       - register tests to itself
       - store bare minimum information about firmware/tests
       - execute the tests
-      - communicate information/exchange data between itself and controller
+      - communicate information/exchange data between itself and the controller
 
 controller
    Controls the testing process and is presented on the device that controls the tests.
-   It is still developet as microcontroller compatible software, but there si weak assumption that it will be mostly used on PC.
-
+   It is still developed as microcontroller-compatible software (no dynamic memory, no exceptions), but there is a weak assumption that it will be mainly used on a system with GPOS.
    It can:
       - communicate with and control the reactor
-      - load test information from reactor
+      - load test information from the reactor
       - orchestrate test execution
       - provide input data for tests
       - provide data collected from the tests
 
-The separation of the design into two tools impose restriction: the tests on the embedded device can't be executed without the controller. 
-But that allows really small footprint of the testing firmware on the firmware size, as I can move as much of the testing logic as reasonable to the controller side.
-Especially data collection can be done in a way that nothing is stored in the reactor itself.
+The separation of the design into two tools imposes restrictions: the tests on the embedded device can't be executed without the controller. 
+But that allows a minimal memory footprint of the testing firmware on the firmware size, as I can move as much of the testing logic as reasonable to the controller side.
+Especially data collection can be done so that everything is stored in the controller.
 
 The communication method between the parts is not defined.
-Both parts use messages for communication, but is up to the user to implement how the messages are transfered.
-Each expect to be provided with an interface that implements read/write methods - it's up to the user to design how.
-This makes it platform independent and gives flexibility for various scenarios.
+Both parts use messages for communication, but it is up to the user to implement how the messages are transferred.
+Each expects to be provided with an interface that implements read/write methods - it's up to the user to design how.
+This makes it platform-independent and gives flexibility for various scenarios.
 But I do silently expect that UART will be mostly used.
 
 The way the controller gets input data and processes the collected data from tests is up to the user.
-The interface for controller just provides an API for both.
+The interface for the controller only provides an API for both.
 
 In the end, the perspective one can use for this is:
-The testing library is just fancy remote execution library - the controller executes functions registered to reactor in the firmware and collects result.
+The testing library is just a fancy remote execution library - the controller executes functions registered to the reactor in the firmware and collects results.
 
 ## Basic implementation details
 
 Each part is object - `testing_reactor` object and `testing_controller` object.
-Both are designed to take control of their application and both expect to be provided with appropriate interface `testing_reactor_interface` and `testing_controller_interface`.
-Interfaces are designed/selected by the user and define how the object interacts with it's environment.
+Both are designed to take control of their application.
+Both expect to be provided with user-provided objects implementing interfaces `testing_reactor_interface` and `testing_controller_interface`.
+Interfaces are implemented by the user and define how the object interacts with its environment.
 
-In case of the embedded firmware, one creates instance of the reactor, registers tests into it and passes control to the reactor.
+In the case of the embedded firmware, one creates an instance of the reactor, registers tests into it, and passes control to the reactor.
 This is done in a way that still gives user some control over the main loop::
 ```cpp
    emlabcpp::testing_basic_reactor rec{"test suite name"};
@@ -131,52 +131,55 @@ This is done in a way that still gives user some control over the main loop::
    }
 ```
 
-The reactor expects that it's `tick` method is called repeatedly and the method contains one iteration of reactors control loop.
-It either answers the reactor in the control loop or actually executes entire test during one `tick` call - it can block for a while.
+The reactor expects that its `tick` method is called repeatedly, and the method contains one iteration of the reactor's control loop.
+It either answers the reactor in the control loop or executes the entire test during one `tick` call - it can block for a while.
 
-`controller` has similar beavior and interface. With the exception that the `controller_interface` also contains customization points for additional features:
+The `controller` has similar behavior and interface. With the exception that the `controller_interface` also contains customization points for additional features:
 - methods to provide input data for tests on request
 - `on_test(emlabcpp::testing_result)` method that is called with results of one test call
-- `on_error` method that is called once error happens in the library.
+- `on_error` method is called once an error happens in the library.
 
-It's up to the user to implement the interface for the specific use case or to use existing integration in the library.
+It's up to the user to implement the interface for the specific use case or use existing implementations (the library may provide some).
 
 ## Dynamic memory
 
 Both the `reactor` and the `controller` contains data structure with dynamic size.
-To avoid dynamic memory, I wanted to use `std::pmr`: that is, that the internal containers would use allocator and expects memory resource as an input argument.
-This implements the behavior: "the central objects expect a memory resource they should use for allocation of adata".
+To avoid dynamic memory, I wanted to use `std::pmr`: the internal containers would use an allocator and expect memory resource as an input argument.
+This implements the behavior: "The central objects expect a memory resource they should use for data allocation."
 
-I think that this fits the use case quite nicely, as both types require dynamic data structures but at the same way I want them to be usable without dynamic memory itself - compromise is interface that can be provided with static buffers.
+I think that this fits the use case quite nicely, as both types require dynamic data structures, but in the same way, I want them to be usable without dynamic memory itself - compromise is an interface that can be provided with static buffers.
 
-However `std::pmr` does not feel usable, as the default construction of allocator uses a default memory instance that exists as a global object. (that can be changed only at runtime)
+However, `std::pmr` does not feel usable, as the default construction of the allocator uses a default memory instance that exists as a global object. (that can be changed only at runtime)
 The default instance uses new/delete.
-That means that it is easy for code that uses `std::pmr` to include in the firmware entire stack for dynamic allocation - something that I want to avoid.
+That means that it is easy for code that uses `std::pmr` to include in the firmware the entire stack for dynamic allocation - something that I want to avoid.
 
-Given that I implemented custom allocator/memory_resource concept that mirrors the wanted behavior but avoids the problem with default instance.
-That means that to use the objects, user has to instance a memory resource also provided by `emlabcpp` and give it to the object.
+I decided to re-implement `std::pmr` in my custom library with a few changes in the API that are more fitting to the embedded library.
+The key one is that memory resource with new/delete operators simply does not exists.
+The user has to instance a memory resource also provided by `emlabcpp` and give it to the object.
 
-To ease usage, there exists `emlabcpp::testing_basic_reactor` which inherits from the `reactor` and provdies it with basic memory resource that can be used by it - sane default.
+As a simple alternative, there exists `emlabcpp::testing_basic_reactor,` which inherits from the `reactor` and provides it with a basic memory resource that can be used by it - sane default.
 
 ## Binary protocol
 
-The binary protocol is intetionally considered an implementation detail, as I want to have freedom to change it at will.
+The binary protocol is intentionally considered an implementation detail, as I want to have the freedom to change it at will.
 
-It is implemented with a protocol library I did previously in C++. The short description is: imagine protocol buffers, but instead of external tool it is just C++ library that gets definition of protocol via templates.
+It is implemented with a C++ protocol library I did previously. The short description is: imagine protocol buffers, but instead of an external tool, it is just a C++ library that gets the definition of protocol via templates.
 
 ## Data exchange
 
-The framework provides mechanic to exchange data between controller and reactor.
+The framework provides mechanics to exchange data between controller and reactor.
+
+Tests can request test data from the controller as a form of input.
+(It's up to the user how the controller gets/provides that data)
+The request is a blocking communication operation - the input is not stored on the side of the reactor.
 
-Tests can request test data from the controller as an form of input.
-(It's up to the user how controller gets/provides that data)
-The request is a blocking communication operation - the input is not stored on the side of reactor.
+The test can collect data - reactors have an API to send data to the controller.
+The controller stores the data during test execution, and it is passed to the user once the test is done in test_result.
 
-The test can collect data - reactors has an API to send data to the controller.
-The controller stores the data during test execution and it is passed to the user once test is done in test_result.
+In the case of input, I use a simple key/value mechanism.
+In the case of the collected data, these can be organized into a tree, where each node has key/value pair.
 
-In both cases, I use only simple key/value mechanism.
-That is each data point is made of 'key' that identifies it and corresponding 'value'.
+That is, each data point is made of a 'key' that identifies it and its corresponding 'value.'
 
 To give some flexibility, the types are:
 
@@ -186,16 +189,16 @@ key
 value
    can be string, integer, bool, unsigned
 
-In each case, the framework is able to serialize (thanks to `emlabcpp::protocol` library) and deserialize any of the types and send them over the communication channel.
+In each case, the framework can serialize (thanks to the `emlabcpp::protocol` library) and deserialize any types and send them over the communication channel.
 
-As for the strings: These are limited by size to 32 characters, as this way I can use static buffers for them and they do not have to be allocated.
+As for the strings: These are limited by size to 32 characters, as this way, I can use static buffers for them, and they do not have to be allocated.
 
 
 ## Examples of tests
 
-I tried to prepare a simple interface for the registration of tests, as may general assumption is that the tests should be easy to write.
-(Note: Generally I don't mind some cost on setting up the library, but I think that adding test should be easy)
-To guide the explanation let's assert we are testing wending machine:
+I tried to prepare a simple interface for the registration of tests, as I believe that tests should be easy to write.
+(Note: Generally, I don't mind some cost of setting up the library, but I think that adding tests should be easy)
+To guide the explanation, let's assert we are testing a wending machine:
 
 ```cpp
    emlabcpp::testing_basic_reactor rec{"test suite for wending machine"};
@@ -216,44 +219,47 @@ To guide the explanation let's assert we are testing wending machine:
    });
 ```
 
-What happens here is that lambda function is registered as an test.
-That test is identified by "my simple test" and that is used to identify it from controller.
+What happens here is that the lambda function is registered as a test.
+That test is identified by the "my simple test" string, used to identify it from the controller.
 
-Once the test is executed (that is: controller tells the reactor to execute it), it is provided with `testing_record` object that serves as an API between the test and the reactor.
+Once the test is executed (the controller tells the reactor to execute it), it is provided with a `testing_record` object that serves as an API between the test and the reactor.
 
-The testing code should use the record to get any data from controller, collect any data during the test and mainly: to provide information whenever the test failed or succceeded.
+The testing code should use the record to get any data from the controller, collect any data during the test, and mainly: provide information whenever the test failed or succeeded.
 
-In the example you can see usage of all the primivites:
+In the example, you can see the usage of all the primitives:
  - `rec.get_arg<int>("product_id")` tells the reactor to ask controller for argument with key `product_id` and retreive it as integer type
- - `rec.expect( product_id < MAX_PRODUCTS_N )` is a form checking properties in the test - in any moment if `false` is passed to the `expect(bool)` method the test is marked as failed.
+ - `rec.expect( product_id < MAX_PRODUCTS_N )` is a form checking properties in the test - if `false` is passed to the `expect(bool)` method the test is marked as failed.
  - `rec.collect("released: ", product_id )` collects the data `product_id` with key `released: ` and sends it to the controller.
 
 ## Building the tests
 
-That is solely handled by the user, the testing framework just provides a object that expects communication API and can register test - how that is assembled into a firmware is up to the user.
+The user solely handles that. The testing framework just provides an object that expects communication API and can register test - how that is assembled into a firmware is up to the user.
+
+The idea is that single 'testing firmware' will collect multiple tests registered into one reactor.
+It's up to the user to orchestrate the build process in a sensible way.
 
-The idea is that single 'testing firmware' will be a collection of multiple tests registered into one reactor.
-It's up to the user to orchestrate the build process in a way that this is sensible.
+In the case of CMake, I decided to split the application itself into "application library" and "main executable." 
+Most of the logic of the firmware is in the application library, and the main executable just implements the main function and starts up the application library.
 
-In case of CMake, I decided to split the application itself into "application library" and "main executable". 
-That is, most of the logic of the firmware is in the application library and the main executable just implements main function and starts up the application library.
+The main executable of tests uses that library to prepare and set up tests.
+Note that the idea is that there are multiple test binaries with different tests.
+I don't assume that all the tests would fit into one binary.
 
-The main executable of tests uses that library to prepare and setup tests.
-Note that the idea is that there are multiple test binaries with different tests, I don't assume that all the tests would fit into one binary.
+This way, any test firmware is closely similar to the application executable - just with a different main file.
 
-This way, any test firmware is closely similar to the application executable - just with different main file.
+From the controller's perspective, it can be just a simple application that is meant to be executed on GPOS.
 
 ## Google Test
 
-One small win that appeared was that given the flexibility, it was easy to integrate gtest and controller together.
-That is, the controller can register each test from reactor as a test in the google test library.
-Tt can use the gtest facillity on PC to provide user-readable output about execution of the tests, more orchestration logic and output of the testing in form of JUnit XML files.
-These can be used by tools like gitlab to provide test results in it's GUI.
+One small win that appeared was that, given the flexibility, it was easy to integrate Google Test and controller.
+The controller can register each test from the reactor as a test in the google test library.
+It can use the Google Test facility on GPOS to provide user-readable output about the execution of the tests, more orchestration logic, and output of the testing in the form of JUnit XML files.
+Systems like GitLab can use this.
 
-What this means? that it was easy to provide necessary facility for the testing firmware to be integrated into modern CI with traditional tools.
-And yet the integration is not tight, any integration into gtest is just a set of few functions/classes in emlabcpp t hat can be ignored for anybody not favoring gtest.
+This shows that it was easy to provide the necessary facility for the testing firmware to be integrated into modern CI with traditional tools.
+And yet the integration is not tight. Any integration into Google Test is just a set of few functions/classes in emlabcpp that can be ignored by anybody not favoring Google Test.
 
-Test output from the project I used this framework first time can look like this:
+The test output from the project I used this framework the first time can look like this:
 
 ```
     ./cmake-build-debug/util/tester --device /dev/ttyACM0
@@ -301,38 +307,37 @@ Test output from the project I used this framework first time can look like this
     1 FAILED TEST
 ```
 
-In this example, the controller registered all tests that were in the firmware (on device that was connected to the PC and was accessible via the `/dev/ttyACM0` serial device) as google tests they were executed.
+In this example, the controller registered all tests in the firmware (on the device that was connected to the PC and was accessible via the `/dev/ttyACM0` serial device). After that, it executed all of them.
 
-The name of the testing suite `emlabcpp::testing`, name of the test `basic_control_test` we all collected on the fly from the testing firmware itself, we can also see values collected by the test during the execution.
+The name of the testing suite `emlabcpp::testing` and the name of the test `basic_control_test` were all extracted on the fly from the testing firmware itself. We can also see values collected by the test during the execution.
 
 ## Controller is independent
 
-Based on the specific project and testing needs, one can use one binary with `controller` for multiple `reactors` , that is something I intend with actuall main project that uses it.
+Based on the specific project and testing needs, one can use one binary with a `controller` for multiple `reactors.` That is something I intend with the actual main project that uses it.
 
-As the controller loads most information from the reactor and in case the gtest integration is used there is not much of the logic that can be varied.
+As the controller loads most information from the reactor and if the Google Test integration is used, there is not much logic that can be varied.
 
-Sole exception is how data is provided for the tests.
-But than that again can be implemented in some general way - for example that the `controller` binary would load the data from json file in some generic way.
+The sole exception is how data is provided for the tests.
+But then it can be implemented in some general way - for example, the `controller` binary would load the data from a JSON file in some generic way.
 
 ## Experience
 
-Is quite limited so far, but I am happy with the first prototype.
-I am sure that I will refactor the library in the future, as there are obvious places to be improved but so far it behaves good enough.
-It gives me a simple way to test and develop various ways to control smart servomotor I am working on.
+It is pretty limited, but I am happy with the prototype.
+I am sure that I will refactor the library in the future, as there are prominent places to be improved but so far it behaves good enough.
+It gives me a simple way to test and develop various ways to control the smart servomotor I am working on.
 (Note: yes, this is one of the cases of "oh, I need to develop a library so I can do a project"...)
 
 What could be developed more in the future and what pains me so far is:
- - it still does not report 100% of the possible errors on the side of the testing library - I have to go throu the codebase and be more strict
- - it can't handle exceptions - while it should not rely on them, I think the library should respect them, that means in case test throws exception it should not stop the reactor
- - data exchange can be improved - what can be exchanged as of now is quite limited, I suppose I can provide more types to send and receive
- - memory resource - use internal emlabcpp mechanism that is underdeveloped, that definetly would benefit from more work 
- - more experience in CI - I think I am on a good track to have automatized test in CI that are flashed to real hardware somewhere in laboratory. That could show limits of the library 
- - `get_arg<int>(` is an example of interface of test library that can result in an error. I don't want to move introduce an change in the API that would make it return error as I can't figure out anything that would not pollute the tests. The idea is that for the errors under category "error in processing of the test and not the test" that can't be handled the library: A. throws exception if possible B. stops at the point and spams the controller with error message. 
+ - it still does not report 100% of the possible errors on the side of the testing library - I have to go through the codebase and be more strict
+ - it can't handle exceptions - while it should not rely on them, I think the library should respect them. That means in case the test throws an exception. It should not stop the reactor.
+ - data exchange can be improved - what can be exchanged as of now is quite limited. I suppose I can provide more types to send and receive.
+ - memory resource - uses internal emlabcpp mechanism that is underdeveloped, that definitely would benefit from more work.
+ - more experience in CI - I think I am on a good track to having automatized tests in CI that are flashed to real hardware somewhere in the laboratory. That could show the limits of the library.
 
 ## The code
 
-The testing library is part of emlabcpp - my personal library, which purpose is mostly for me to have up-to-date collection of tools for my development. 
-Given that I restrain myself from saying "it should be used by others" as I don't really want to care about backward compability outside of my projects.
+The testing library is part of emlabcpp - my personal library, which purpose is for me to have an up-to-date collection of tools for my development. 
+I restrain myself from saying "it should be used by others," as I don't want to care about backward compatibility outside of my projects.
 
 The primary example of the testing library is an example file: [emlabcpp/tests/testing](https://github.com/koniarik/emlabcpp/blob/v1.2/tests/testing_test.cpp)
 

From 14c0b421e45841c07f6d46d6ea799ed8f6f96af7 Mon Sep 17 00:00:00 2001
From: Jan Veverak Koniarik <squirrelcze@gmail.com>
Date: Tue, 29 Nov 2022 20:08:56 +0100
Subject: [PATCH 5/9] Vanilla version of artcile about coroutines

---
 _posts/2024-04-20-coroutines.md | 343 ++++++++++++++++++++++++++++++++
 1 file changed, 343 insertions(+)
 create mode 100644 _posts/2024-04-20-coroutines.md

diff --git a/_posts/2024-04-20-coroutines.md b/_posts/2024-04-20-coroutines.md
new file mode 100644
index 00000000..4d4a7a6d
--- /dev/null
+++ b/_posts/2024-04-20-coroutines.md
@@ -0,0 +1,343 @@
+
+---
+title: Embedded Coroutines
+description: Yet another blog about coroutines, this time in embedded
+author: nash
+---
+
+In the last year, it seems to me that there was quite a high activity about coroutines in C++.
+That is, we got plenty of blogposts/videos on the topic of coroutines that are officialy available since C++20.
+
+This is another such a blogpost, but this time focused more on embedded environment.
+And I will try to simplify the part about explanation of what coroutine is (there is plenty of good blogposts about that), and focus more on relevant properties for embedded and provide my input about possible abstractions we can build with coroutines that might be interesting for embedded.
+(Given that there is high chance of few new blog posts appearing as this article was written, it might loose some originality at certain moment)
+
+The text is structured into two sections:
+ - basic intro to coroutines
+ - embedded-related specifics
+
+## Introduction
+
+Coroutines are not something new to programming world, we are talking about a feature that exists for a longterm in various languages.
+(Lua had them since like 2003, Python has them, C has various libraries that bring in coroutines)
+
+C++ got language support for the coroutines in version C++20, but that is only a low level language support.
+The standard does not yet define any usefull constructs with coroutines, and direct usage is troublesome.
+
+### Basis
+
+We all should be aware af what `function` is. It is a segment of code that can be executed.
+When we `call` a function, the code of the function is executed and at certain point the `function` will finish its execution and flow returns to the place from which function was valled.
+
+For the sake of this blogpost, we will define coroutines as following:
+
+Coroutine is a segment of code that can be executed. When we call a coroutine, the code of the coroutine is executed.
+The coroutine might suspend itself and return control to it's parent.
+When suspended, the coroutine can be resumed again to continue execution of the code.
+Eventually, he coroutine will finish it's execution and can't be resumed again.
+
+That is, `coroutine` is just `function` that can interrupt itself and be resumed by the caller.
+
+### How does it look like
+
+The simplest coroutine might look like this:
+
+```cpp
+coroutine_type fun(int i)
+{
+    int j = i*3;
+    std::cout << i << std::endl;
+    co_await awaiter_type{};
+    std::cout << i*2 << std::endl;
+    co_await awaiter_type{};
+    std::cout << j << std::endl;
+}
+```
+
+`fun` is a coroutine that interrupts itself on the `co_await` calls and can be resumed to continue the execution.
+The exact interface of how that happens depends on the coroutine, but in simple cases, we can assume something like this:
+
+```cpp
+
+coroutine_type ct = fun(2); 
+// `2` is printed now
+ct.step();
+// `4  was printed now
+ct.step();
+// `5` was printed now
+
+```
+
+In normal use cases, the coroutine causes allocation, as all variables that existis between the suspension of the coroutine and it's resume (`j`) have to be stored somewhere.
+To do that the coroutines allocates memory for it's `coroutine frame` which contains all necessary information for it:
+ - `promise_type` (explained later)
+ - arguments of the coroutine (their copy)
+ - compiler defined holder of all variables of coroutine that survives between the suspension points
+(Note: The dynamic memory can be avoided, will be explained later)
+
+The `coroutine_type` is the type of the coroutine, and usually represents `handle` that points to the allocated frame. (By owning the `std::coroutine_handle<promise_type>` which is a handle given by compiler for the frame)
+To allow data exchange between the coroutine, the coroutine frame contains instance of `promise_type`, that is accessible from the `coroutine_type` and from the coroutine itself.
+(Compiler will select `coroutine_type::promise_type` as the promise, this type can be alias, nested structure, or some other valid type)
+
+`awaiter` is an entity that is used for the suspension process. It is a type that should be passed to the `co_await` call and that is used by the compiler to handle the suspension.
+When the coroutine is suspended by the `co_await`, as a last step, the compiler will call `void awaiter::await_suspend(std::coroutine_handle<promise_type>)` which gets access to the promise via `coroutine_handle` and after that the coroutine is suspended.
+Once the parent of the coroutine resumes it, the `U awaiter::await_resume()` of the awaiter used for suspension is called.
+The `U` returned by this method is return value of the `co_await` statement.
+
+The purpose of the `awaiter` is to serve as customization point that makes it possible to extract data from coroutine (awaiter can get data in it's constructor and pass those to the promise) and also give data to the coroutine by extracting it from promise_type in the await_resume method
+
+In this context, it is good idea to point out few properties of the mechanism:
+ - Suspended coroutine can be destroyed. This destructiong is safe: all destructors of promise_type, arguments, stored variables... are destroyed properly
+ - The `coroutine_type` does not need to have `step` semantics, the `coroutine_type` has access to `std::coroutine_handle<promise_type>` which provides the interface to resume the coroutine. The `coroutine_handle` might as well be implemented in a way that one method keeps resuming the coroutine until it finishes.
+ - Coroutines can be nested. One can combine `coroutine_type` to be also valid `awaiter`, this gives possibility to have recursive coroutines, in which one `step` of the top coroutine does one step of the inner coroutine.
+
+** TODO: add pictures, lots of them)
+
+### Some coroutines
+
+Given that the features exists for some time and have some background from other languages, we can already talk about interesting types of coroutines to work with.
+For this I would like to point out https://github.com/lewissbaker/cppcoro as one of the interesting libraries with coroutines.
+
+First common concept are `generators`.
+Generator is a coroutine that spits one value at each suspension point which is provided to the caller.
+This can be used to generate simple sequences like:
+
+```cpp
+generator<int> sequence()
+{
+    i = 0;
+    while(true){
+        co_yield i;
+        i += 1;
+    }
+}
+```
+
+Here, `co_yield` is just another expression in coroutine API, you can imagine it as statement with different semantics than `co_await`: `co_await` should wait for awaiter, `co_yield` throws to parent a value.
+Implementation wise, `co_yield x` causes call of `promise.yield_value(x)` which constructs awaiter which is awaited.
+
+Generators can have same API as containers, which gives as abillity to create this nice infinite loop:
+
+```cpp
+for(int i : sequence()){
+    std::cout << i << std::endl;
+}
+```
+(The idea usually is to either build generators that are not inifnite, or to eventually stop using the infinite ones)
+
+Another concept of coroutines that can be interesting are: `io coroutines`
+That is, we can use coroutine to represent process that requires IO operations that are processed during the suspenions.
+
+For example, assume that we have library for network communication that provides us coroutine to do the io with:
+
+```cpp
+network_coroutine send_data(tcp_connection& con)
+{
+    co_await con.make_data_send_awaiter("How are you?");
+    std::string response = co_await con.make_receive_data_awaiter();
+    if(response == "good")
+    {
+        co_await con.make_data_send_awaiter("Good, me too!");
+    }
+    else
+    {
+        co_await con.make_data_send_awaiter("Oh, I see");
+    }
+}
+```
+
+It is good to resumarize some properties of this coroutine:
+ - It does not have to block, once the `co_await` call the parent will get control of the program flow and can just send the data provided by `awaiter` asynchronously.
+ - After that, the parent is free to ask another coroutine for data or to kills this coroutine entirely. This makes sense mostly in cases the request to get data would fail.
+
+### More
+
+This was fast and simple explanation of coroutines, the purpose of this article is not to give detailed explanation of croutines I can suggest blogpost such as this one: https://www.scs.stanford.edu/~dm/blog/c++-coroutines.html
+
+## Embedded coroutines
+
+Let's get back to embedded and try to talk about how are coroutines for embedded development.
+That is, what are possible problems with coroutines in embedded, why are they relevant, and some suggestions about ways we can use them.
+
+### Relevance
+
+Common task for emebedded is to do a plenty of IO communication via peripherals and in the meantime to take care about multiple process in pararel on single-core system.
+
+That is, we have to handle:
+ - complex computations (Kalman filter with matrix computations)
+ - IO (send/receive data over UART/I2C/...)
+ - longterm processes (PID regulator regulating temperature of the room)
+
+To handle all those processes at single point in time, we have two major approaches today:
+ 1) implement them as iterative process and schedule them
+ 2) use threads for the processes
+
+Let's take communication over UART as an example:
+ 1) iterative process means usage of structure to store the state of the communication, and providing a function that based on the current state advances the state. (Commonly with usage of enum representing actual status)
+ 2) implement the communication in a thread which my release it's time if it waits for some operation
+
+
+Approach 1) has multiple potential issues, it might take a lot of code to implement complex exchange of data in this way (a lot of state variables) and the approach is problematic, as there is a chance that one of the steps might take longer than expected and we can't prevent that.
+
+Approach 2) has another set of issues. Each thread requires it's own stack space (which might not scale), and we got all the problems of pararelism - exchanging data between can suffer to various potential concurrency issues.
+
+From this perspective, coroutines are third way of approaching these processes, that is not better than 1) or 2), but also not worse.
+Coroutines bring in a new (and not yet finetuned) mechanism that makes it possible to write exchange of data over UART in way that does not require as complex code as 1), does not suffer to so many concurrency issues as 2) (and does not need its own stack, just frame).
+
+However, coroutines still suffer to the issue that one step of computation might take longer than expected, and we can assume that any errors in coroutines might be harder to inspect.
+
+What is interesting, is that coroutines have good potential to have more effecient context switches than threads, which was shown in this article: https://www.researchgate.net/publication/339221041_C20_Coroutines_on_Microcontrollers_-What_We_Learned 
+
+Note that in the article does not compare corutines in a lot of situations, but it still shows something.
+
+### Problems
+
+There are multiple potential issues with coroutines and using them correctly, but I think there are few that are really noteworthy for the embedded community.
+
+#### Dynamic memory
+
+First thing that I assume most of you noticed at the begining is the fact the coroutine requiers dynamic memory.
+This is not favorable in embedded in many cases, but there are ways to avoid that.
+
+Coroutines have `allocator` support, we can provide coroutine an allocator that can be used to get memory for the coroutine and hence avoid the dyn. allocation. (Approach that I can suggest)
+This is done by implementing custom `operator new` and `operator delete` on the `promise_type` which allocates the `entire frame`.
+
+Alternative is to rely on `halo` optimization, if the coroutine is implemented correctly and the parent function executes entire coroutine in its context. The compiler can optimize away the dynamic allocation and just store the frame on the stack of the coroutines parent.
+This can be be enforced by deleting appropiate `operator new` and `operator delete` overloads of the `promise_type`, but it seems clumsy to me.
+
+And to kinda ruin the pretty thing here, there is one catch. As of now, the compiler can't tell you how much memory the coroutine needs, as it is known only during the link time - the size of the coroutine frame can't be compiler constant. (TODO: link to the source for this)
+This means that you effectively can't prepare static buffers for the coroutines.
+
+What I would suggest (that is what I do), is to use coroutines for the long term process and just build them during initialization of the device. 
+Or just live with the dynamic memory.
+
+(Note: if I could dream a bit here, I would really want an explicit way of forcing the coroutine to live on stack, which would have clearly defined behavior of when and how it should happen, and with proper compiler errors if one fails to do so)
+
+#### Frame size is big
+
+One more invisible caveat that appeared is that GCC is not yet smart enough with the frame size.
+That is, the section of the frame for storing variables that live during the suspension calls is quite big.
+
+The intuitive way to work with the section would be: At each suspension point, we only want to remember variables that survive `that` suspension point and do not care about the rest.
+
+The issue is that currently GCC is storing all variables for all suspension points there.
+This might became problematic if you have a lot of `co_await` statements with various types/interfaces.
+(For example: if each `awaiter` stores a buffer of data, there might be separate space for each buffer in the frame)
+
+That might improve in the future, friend of mine works on a patch for gcc: https://gcc.gnu.org/bugzilla/attachment.cgi?id=53290&action=diff that uses unions of struct, with a separate struct for each suspension point - we pay much less memory.
+
+#### Inspectability
+
+For all that we do on embedded, one of the biggest limitation is our limited ability to inspect what went wrong. Either due to limited resources on the devices or due to the real-time nature of the things.
+
+At this point in time, `gdb` has support or threads, and mechanisms that use manullay written state machine for asynchronous processing are inspectable by default.
+But there is no explicit support for corutines.
+
+By it's nature, all varibales in the coroutine that survive suspension points form a state.
+And yet that state is kinda problematic for us, as other parts of the code can't access it while it is suspended. 
+(For inspection/logging/data recording...)
+
+In this scenario, I am afraid that it might tike while until our debug tools are smart enough to give as easy live with coroutines in the system. Mostly: what if something wents wrong in the coroutine? how do you inspect the state the coroutine is in?
+
+### Suggestions
+
+We talked about what coroutines are, what are potential problems, now question is: What can we use them for?
+
+For that I am afraid it is hard to give good answer. 
+The functionality is simply quite new and I suspect that it will take a while (years) until we gain enough experience to be sure in which ways it is really beneficial.
+
+That said, based on my (little) experience with usage of coroutines in embedded, I have some suggestions and patterns that might be interesting.
+Mostly to peek the interest of community in these, as I believe the potential is here.
+
+#### IO Coroutine
+
+I think that the IO example above is the most obvious way we can benefit from coroutines in embedded.
+That is, we can use coroutines as an abstraction to interact with peripherals that do IO operations.
+
+To describe this, I will use a personal experience with `i2c_coroutine`. 
+We can design it as follows:
+
+When `i2c_coroutine` is used as the type of the coroutine, the developer can interact with the `i2c` peripheral by `suspending` the coroutine with an `awaiter` that initiates operation on the coroutine, the coroutine is resumed once that operation finishes.
+That is, we use the suspension point to allow interaction with the peripheral.
+
+For example:
+
+```cpp
+i2c_coroutine interact_with_device(uint8_t addr)
+{
+    std::array<uint8_t, 42> some_data;
+    co_await i2c_transmit_awaiter(addr, some_data);
+
+    std::span<const uint8_t> data = co_await i2c_receive_awaiter(addr, 42);
+    // do something with data
+}
+```
+
+This has the benefit that while the `i2c operation` is processed by the peripheral, the main loop can be busy something else and the execution is not stuck in the `interact_with_device` with busy waiting.
+And we do not have to pay for that by having complexity (manual state machine doing the interaction) or having threads (which bring in other problems).
+
+Given the bus nature of `i2c`, it is also quite easy to achieve sharing of the `i2c bus` with multiple `device drivers` for various devices on the bus itself.
+We can just implement `i2c_coroutine round_robin_run(std::span<i2c_coroutine> coros)` coroutine that uses round roubin to share the access to the peripheral between multiple devices.
+
+This can obviously transfer to any interaction with our peripherals, we can model that interaction as a suspension of the coroutine, and let the suspension process initiate the operation with peripheral, or simply give us request for some operation.
+
+Note that this has one clear benefit: The coroutine can be completely oblivious to whenever the method of transporting data over peripheral is:
+ - blocking
+ - using interrupts
+ - using DMA
+
+We can also delete any suspended coroutine once we come to conclusion that it is beneficial for us.
+In some cases it might make sense to use this to restart the communication process with the device.
+(Note that all variables in the coroutine would be destroyed, that is, it is good idea to use RAII to revert any effects that coroutine had on it's environment (such as pulling GPIO pin high/low))
+
+I believe there are two variations to how exactly this might be implemented (and I am using both for experimentation):
+ - the coroutine throws back a token that represent operation, which is propagated to the peripheral by the APP (for example: `std::variant<no_op, read_request, write_request>`)
+ - the awaiter directly interacts with the API of the peripheral
+
+ Given my architectural opinions and experience I tend to prefer the token based approach, as it feels like lighter coupling between the peripheral and the coroutine.
+
+ However, I've found out that direct interaction has one strong benefit.
+ It is much easier to point the peripheral back to the `awaiter`, this way the peripheral might interact with the awaiter during the suspension - much stronger API.
+TODO: check that I can actually do that - that it is legal
+
+#### Computing coroutines
+
+One of the biggest advantages of threads over manual implementation of operation as steps is that some computations are painfull to be split into reasonable steps, let's take matrix multiplication as an example:
+
+```cpp
+void multiply_matrix(const matrix_t& A, const matrix_t& B, matrix_t& result){
+    for(int i = 0; i < A.rows(); i++){
+        for(int j = 0; j < B.cols(); j++){
+            result[i][j] = 0;
+            for(int k = 0; k < A.cols(); k++){
+                result[i][j] += A[i][k] + B[k][j];           
+            }
+        }
+    }
+}
+```
+
+As much as it is feasible to convert this into statefull object that contains the state of multiplication and hence can do just part of the multiplication at once... The coroutines bring in more elegant solution:
+
+```cpp
+simple_coroutine multiply_matrix(const matrix_t& A, const matrix_t& B, matrix_t& result){
+    for(int i = 0; i < A.rows(); i++){
+        for(int j = 0; j < B.cols(); j++){
+            result[i][j] = 0;
+            for(int k = 0; k < A.cols(); k++){
+                result[i][j] += A[i][k] + B[k][j];           
+            }
+            co_await simple_awaiter();
+        }
+    }
+}
+```
+
+This is of course not perfect, and give the problems with coroutine and dyn. memory, it might be more reasonable to write the `matrix_multiplicator` object.
+But problems with dyn. memory of coroutines can be resolved in time, new patterns to handle them can emerge and suddenly something like this might be viable.
+
+## Conclusion
+
+Coroutines are a new feature of C++20 and there is potential for them to be of interest for us.
+However it seems that there is already plenty to be concerned about.
+(Unless, of course, I am greatly mistaken in my observations)
\ No newline at end of file

From a7640e97b6e927e8121c930d26c10d5e76963a60 Mon Sep 17 00:00:00 2001
From: Jan Veverak Koniarik <squirrelcze@gmail.com>
Date: Mon, 30 Jan 2023 18:18:32 +0100
Subject: [PATCH 6/9] updated coroutines

---
 _posts/2024-04-20-coroutines.md | 85 +++++++++++++++++----------------
 1 file changed, 44 insertions(+), 41 deletions(-)

diff --git a/_posts/2024-04-20-coroutines.md b/_posts/2024-04-20-coroutines.md
index 4d4a7a6d..93137000 100644
--- a/_posts/2024-04-20-coroutines.md
+++ b/_posts/2024-04-20-coroutines.md
@@ -2,39 +2,45 @@
 ---
 title: Embedded Coroutines
 description: Yet another blog about coroutines, this time in embedded
-author: nash
+author: Veverak
+tags: [c++, coroutines]
 ---
 
+<!-- excerpt start -->
+
 In the last year, it seems to me that there was quite a high activity about coroutines in C++.
 That is, we got plenty of blogposts/videos on the topic of coroutines that are officialy available since C++20.
 
 This is another such a blogpost, but this time focused more on embedded environment.
-And I will try to simplify the part about explanation of what coroutine is (there is plenty of good blogposts about that), and focus more on relevant properties for embedded and provide my input about possible abstractions we can build with coroutines that might be interesting for embedded.
+I will try to simplify the part about explanation of what coroutine is (there is plenty of good blogposts about that), and focus more on relevant properties for embedded and provide my input about possible abstractions we can build with coroutines that might be interesting for embedded.
 (Given that there is high chance of few new blog posts appearing as this article was written, it might loose some originality at certain moment)
 
+<!-- excerpt end -->
+
 The text is structured into two sections:
- - basic intro to coroutines
+ - basic introduction to coroutines
  - embedded-related specifics
 
 ## Introduction
 
 Coroutines are not something new to programming world, we are talking about a feature that exists for a longterm in various languages.
-(Lua had them since like 2003, Python has them, C has various libraries that bring in coroutines)
+(Lua had them since like 2003, Python mentions them since 2005 , C has various libraries that bring in coroutines)
 
-C++ got language support for the coroutines in version C++20, but that is only a low level language support.
-The standard does not yet define any usefull constructs with coroutines, and direct usage is troublesome.
+C++ got low level language support for the coroutines in version C++20.
+The standard does not yet define any usefull constructs with coroutines, which makes usage more troublesome.
 
-### Basis
+### Basics
 
 We all should be aware af what `function` is. It is a segment of code that can be executed.
-When we `call` a function, the code of the function is executed and at certain point the `function` will finish its execution and flow returns to the place from which function was valled.
+When we `call` a function, the code of the function is executed and at certain point the `function` will finish its execution and flow returns to the place from which function was called.
 
 For the sake of this blogpost, we will define coroutines as following:
 
-Coroutine is a segment of code that can be executed. When we call a coroutine, the code of the coroutine is executed.
+Coroutine is a segment of code that can be executed.
+When we call a coroutine, the process will start execution of coorutines code.
 The coroutine might suspend itself and return control to it's parent.
 When suspended, the coroutine can be resumed again to continue execution of the code.
-Eventually, he coroutine will finish it's execution and can't be resumed again.
+Eventually, the coroutine will finish it's execution and can't be resumed again.
 
 That is, `coroutine` is just `function` that can interrupt itself and be resumed by the caller.
 
@@ -43,7 +49,7 @@ That is, `coroutine` is just `function` that can interrupt itself and be resumed
 The simplest coroutine might look like this:
 
 ```cpp
-coroutine_type fun(int i)
+coroutine_type coro(int i)
 {
     int j = i*3;
     std::cout << i << std::endl;
@@ -54,45 +60,43 @@ coroutine_type fun(int i)
 }
 ```
 
-`fun` is a coroutine that interrupts itself on the `co_await` calls and can be resumed to continue the execution.
-The exact interface of how that happens depends on the coroutine, but in simple cases, we can assume something like this:
+`coro` is a coroutine that interrupts itself on the `co_await` calls and can be resumed to continue the execution.
+The exact interface of how that works depends on the `coroutine_type`, but in simple cases, we can assume something like this:
 
 ```cpp
 
-coroutine_type ct = fun(2); 
+coroutine_type ct = coro(2); 
 // `2` is printed now
 ct.step();
 // `4  was printed now
 ct.step();
-// `5` was printed now
+// `6` was printed now
 
 ```
 
-In normal use cases, the coroutine causes allocation, as all variables that existis between the suspension of the coroutine and it's resume (`j`) have to be stored somewhere.
+In normal use cases, the call of coroutine causes allocation, as all variables that existis between the suspension of the coroutine and it's resume (`j`) have to be stored somewhere.
 To do that the coroutines allocates memory for it's `coroutine frame` which contains all necessary information for it:
  - `promise_type` (explained later)
- - arguments of the coroutine (their copy)
+ - copy arguments of the coroutine
  - compiler defined holder of all variables of coroutine that survives between the suspension points
 (Note: The dynamic memory can be avoided, will be explained later)
 
 The `coroutine_type` is the type of the coroutine, and usually represents `handle` that points to the allocated frame. (By owning the `std::coroutine_handle<promise_type>` which is a handle given by compiler for the frame)
-To allow data exchange between the coroutine, the coroutine frame contains instance of `promise_type`, that is accessible from the `coroutine_type` and from the coroutine itself.
+To allow data exchange between the coroutine, the coroutine frame contains instance of `promise_type`, that is accessible from the `coroutine_type` and from the code of the coroutine itself.
 (Compiler will select `coroutine_type::promise_type` as the promise, this type can be alias, nested structure, or some other valid type)
 
 `awaiter` is an entity that is used for the suspension process. It is a type that should be passed to the `co_await` call and that is used by the compiler to handle the suspension.
-When the coroutine is suspended by the `co_await`, as a last step, the compiler will call `void awaiter::await_suspend(std::coroutine_handle<promise_type>)` which gets access to the promise via `coroutine_handle` and after that the coroutine is suspended.
-Once the parent of the coroutine resumes it, the `U awaiter::await_resume()` of the awaiter used for suspension is called.
+When the coroutine is suspended by the `co_await`, the compiler will call `void awaiter::await_suspend(std::coroutine_handle<promise_type>)` which gets access to the promise via `coroutine_handle` and after that the coroutine is suspended.
+Once the parent of the coroutine resumes it, the `U awaiter::await_resume()` of the awaiter used is called.
 The `U` returned by this method is return value of the `co_await` statement.
 
 The purpose of the `awaiter` is to serve as customization point that makes it possible to extract data from coroutine (awaiter can get data in it's constructor and pass those to the promise) and also give data to the coroutine by extracting it from promise_type in the await_resume method
 
 In this context, it is good idea to point out few properties of the mechanism:
- - Suspended coroutine can be destroyed. This destructiong is safe: all destructors of promise_type, arguments, stored variables... are destroyed properly
+ - Suspended coroutine can be destroyed. This destructiong is safe: all destructors of promise_type, arguments, stored variables... are called properly
  - The `coroutine_type` does not need to have `step` semantics, the `coroutine_type` has access to `std::coroutine_handle<promise_type>` which provides the interface to resume the coroutine. The `coroutine_handle` might as well be implemented in a way that one method keeps resuming the coroutine until it finishes.
  - Coroutines can be nested. One can combine `coroutine_type` to be also valid `awaiter`, this gives possibility to have recursive coroutines, in which one `step` of the top coroutine does one step of the inner coroutine.
 
-** TODO: add pictures, lots of them)
-
 ### Some coroutines
 
 Given that the features exists for some time and have some background from other languages, we can already talk about interesting types of coroutines to work with.
@@ -113,7 +117,7 @@ generator<int> sequence()
 }
 ```
 
-Here, `co_yield` is just another expression in coroutine API, you can imagine it as statement with different semantics than `co_await`: `co_await` should wait for awaiter, `co_yield` throws to parent a value.
+Here, `co_yield` is just another expression in coroutine API, you can imagine it as statement with different semantics than `co_await`: `co_await` should wait for awaiter, `co_yield` throws a value to the parent.
 Implementation wise, `co_yield x` causes call of `promise.yield_value(x)` which constructs awaiter which is awaited.
 
 Generators can have same API as containers, which gives as abillity to create this nice infinite loop:
@@ -123,7 +127,7 @@ for(int i : sequence()){
     std::cout << i << std::endl;
 }
 ```
-(The idea usually is to either build generators that are not inifnite, or to eventually stop using the infinite ones)
+(The idea usually is to either build generators that are not infinite, or to eventually stop using the infinite ones)
 
 Another concept of coroutines that can be interesting are: `io coroutines`
 That is, we can use coroutine to represent process that requires IO operations that are processed during the suspenions.
@@ -152,11 +156,11 @@ It is good to resumarize some properties of this coroutine:
 
 ### More
 
-This was fast and simple explanation of coroutines, the purpose of this article is not to give detailed explanation of croutines I can suggest blogpost such as this one: https://www.scs.stanford.edu/~dm/blog/c++-coroutines.html
+This was fast and simple explanation of coroutines, the purpose of this article is not to give detailed explanation. I can suggest blogpost such as this one: https://www.scs.stanford.edu/~dm/blog/c++-coroutines.html
 
 ## Embedded coroutines
 
-Let's get back to embedded and try to talk about how are coroutines for embedded development.
+Let's get back to embedded and try to talk about how are coroutines relevant for embedded development.
 That is, what are possible problems with coroutines in embedded, why are they relevant, and some suggestions about ways we can use them.
 
 ### Relevance
@@ -164,18 +168,17 @@ That is, what are possible problems with coroutines in embedded, why are they re
 Common task for emebedded is to do a plenty of IO communication via peripherals and in the meantime to take care about multiple process in pararel on single-core system.
 
 That is, we have to handle:
- - complex computations (Kalman filter with matrix computations)
+ - complex computations (Matrix computations)
  - IO (send/receive data over UART/I2C/...)
  - longterm processes (PID regulator regulating temperature of the room)
 
-To handle all those processes at single point in time, we have two major approaches today:
- 1) implement them as iterative process and schedule them
+To handle all those processes at single point in time, we have two major approaches:
+ 1) implement it as an iterative process and schedule it
  2) use threads for the processes
 
 Let's take communication over UART as an example:
  1) iterative process means usage of structure to store the state of the communication, and providing a function that based on the current state advances the state. (Commonly with usage of enum representing actual status)
- 2) implement the communication in a thread which my release it's time if it waits for some operation
-
+ 2) implement the communication in a thread which might release it's time if it waits for some operation
 
 Approach 1) has multiple potential issues, it might take a lot of code to implement complex exchange of data in this way (a lot of state variables) and the approach is problematic, as there is a chance that one of the steps might take longer than expected and we can't prevent that.
 
@@ -188,7 +191,7 @@ However, coroutines still suffer to the issue that one step of computation might
 
 What is interesting, is that coroutines have good potential to have more effecient context switches than threads, which was shown in this article: https://www.researchgate.net/publication/339221041_C20_Coroutines_on_Microcontrollers_-What_We_Learned 
 
-Note that in the article does not compare corutines in a lot of situations, but it still shows something.
+Note that in the article does not compare corutines in a variety of situations, but it still shows something.
 
 ### Problems
 
@@ -202,7 +205,8 @@ This is not favorable in embedded in many cases, but there are ways to avoid tha
 Coroutines have `allocator` support, we can provide coroutine an allocator that can be used to get memory for the coroutine and hence avoid the dyn. allocation. (Approach that I can suggest)
 This is done by implementing custom `operator new` and `operator delete` on the `promise_type` which allocates the `entire frame`.
 
-Alternative is to rely on `halo` optimization, if the coroutine is implemented correctly and the parent function executes entire coroutine in its context. The compiler can optimize away the dynamic allocation and just store the frame on the stack of the coroutines parent.
+Alternative is to rely on `halo` optimization (Heap Allocation Elision Optimization), if the coroutine is implemented correctly and the parent function executes entire coroutine in its context.
+The compiler can optimize away the dynamic allocation and just store the frame on the stack of the coroutines parent.
 This can be be enforced by deleting appropiate `operator new` and `operator delete` overloads of the `promise_type`, but it seems clumsy to me.
 
 And to kinda ruin the pretty thing here, there is one catch. As of now, the compiler can't tell you how much memory the coroutine needs, as it is known only during the link time - the size of the coroutine frame can't be compiler constant. (TODO: link to the source for this)
@@ -237,7 +241,7 @@ By it's nature, all varibales in the coroutine that survive suspension points fo
 And yet that state is kinda problematic for us, as other parts of the code can't access it while it is suspended. 
 (For inspection/logging/data recording...)
 
-In this scenario, I am afraid that it might tike while until our debug tools are smart enough to give as easy live with coroutines in the system. Mostly: what if something wents wrong in the coroutine? how do you inspect the state the coroutine is in?
+In this scenario, I am afraid that it might take while until our debug tools are smart enough to give as easy live with coroutines in the system. Mostly: what if something wents wrong in the coroutine? how do you inspect the state the coroutine is in?
 
 ### Suggestions
 
@@ -257,7 +261,7 @@ That is, we can use coroutines as an abstraction to interact with peripherals th
 To describe this, I will use a personal experience with `i2c_coroutine`. 
 We can design it as follows:
 
-When `i2c_coroutine` is used as the type of the coroutine, the developer can interact with the `i2c` peripheral by `suspending` the coroutine with an `awaiter` that initiates operation on the coroutine, the coroutine is resumed once that operation finishes.
+When `i2c_coroutine` is used as the type of the coroutine, the developer can interact with the `i2c` peripheral by `suspending` the coroutine with an `awaiter` that initiates operation on the `i2c` peripheral, the coroutine is resumed once that operation finishes.
 That is, we use the suspension point to allow interaction with the peripheral.
 
 For example:
@@ -273,11 +277,11 @@ i2c_coroutine interact_with_device(uint8_t addr)
 }
 ```
 
-This has the benefit that while the `i2c operation` is processed by the peripheral, the main loop can be busy something else and the execution is not stuck in the `interact_with_device` with busy waiting.
+This has the benefit that while the `i2c operation` is processed by the peripheral, the main loop can be busy doing something else, and the execution is not stuck in the `interact_with_device` with busy waiting.
 And we do not have to pay for that by having complexity (manual state machine doing the interaction) or having threads (which bring in other problems).
 
 Given the bus nature of `i2c`, it is also quite easy to achieve sharing of the `i2c bus` with multiple `device drivers` for various devices on the bus itself.
-We can just implement `i2c_coroutine round_robin_run(std::span<i2c_coroutine> coros)` coroutine that uses round roubin to share the access to the peripheral between multiple devices.
+We can just implement `i2c_coroutine round_robin_run(std::span<i2c_coroutine> coros)` coroutine that uses round roubin to share the access to the peripheral between multiple coroutines (devices).
 
 This can obviously transfer to any interaction with our peripherals, we can model that interaction as a suspension of the coroutine, and let the suspension process initiate the operation with peripheral, or simply give us request for some operation.
 
@@ -298,11 +302,10 @@ I believe there are two variations to how exactly this might be implemented (and
 
  However, I've found out that direct interaction has one strong benefit.
  It is much easier to point the peripheral back to the `awaiter`, this way the peripheral might interact with the awaiter during the suspension - much stronger API.
-TODO: check that I can actually do that - that it is legal
 
 #### Computing coroutines
 
-One of the biggest advantages of threads over manual implementation of operation as steps is that some computations are painfull to be split into reasonable steps, let's take matrix multiplication as an example:
+One of the biggest advantages of threads over manual decomposition is that some computations are painfull to be split into reasonable steps, let's take matrix multiplication as an example:
 
 ```cpp
 void multiply_matrix(const matrix_t& A, const matrix_t& B, matrix_t& result){
@@ -340,4 +343,4 @@ But problems with dyn. memory of coroutines can be resolved in time, new pattern
 
 Coroutines are a new feature of C++20 and there is potential for them to be of interest for us.
 However it seems that there is already plenty to be concerned about.
-(Unless, of course, I am greatly mistaken in my observations)
\ No newline at end of file
+(Unless, of course, I am greatly mistaken in my observations)

From 59b3fe830e8afcd868afaa5c0ba44c2b1c13e7e8 Mon Sep 17 00:00:00 2001
From: Jan Veverak Koniarik <squirrelcze@gmail.com>
Date: Mon, 30 Jan 2023 19:30:47 +0100
Subject: [PATCH 7/9] cleaned up coroutines article with grammarly

---
 _posts/2024-04-20-coroutines.md | 194 ++++++++++++++++----------------
 1 file changed, 98 insertions(+), 96 deletions(-)

diff --git a/_posts/2024-04-20-coroutines.md b/_posts/2024-04-20-coroutines.md
index 93137000..07e0e962 100644
--- a/_posts/2024-04-20-coroutines.md
+++ b/_posts/2024-04-20-coroutines.md
@@ -9,11 +9,12 @@ tags: [c++, coroutines]
 <!-- excerpt start -->
 
 In the last year, it seems to me that there was quite a high activity about coroutines in C++.
-That is, we got plenty of blogposts/videos on the topic of coroutines that are officialy available since C++20.
+That is, we got plenty of blog posts/videos on the topic of coroutines that are officially available since C++20.
 
-This is another such a blogpost, but this time focused more on embedded environment.
-I will try to simplify the part about explanation of what coroutine is (there is plenty of good blogposts about that), and focus more on relevant properties for embedded and provide my input about possible abstractions we can build with coroutines that might be interesting for embedded.
-(Given that there is high chance of few new blog posts appearing as this article was written, it might loose some originality at certain moment)
+
+This is another such blog post, but this time focused more on the embedded environment.
+I will try to simplify the part about an explanation of what coroutine is (there is plenty of good blog posts about that), focus more on relevant properties for embedded and provide my input about possible abstractions we can build with coroutines that might be interesting for embedded.
+(Given that there is a high chance of few new blog posts appearing as this article was written, it might lose some originality at a certain moment)
 
 <!-- excerpt end -->
 
@@ -23,26 +24,26 @@ The text is structured into two sections:
 
 ## Introduction
 
-Coroutines are not something new to programming world, we are talking about a feature that exists for a longterm in various languages.
-(Lua had them since like 2003, Python mentions them since 2005 , C has various libraries that bring in coroutines)
+Coroutines are not something new to the programming world, we are talking about a feature that exists for a long term in various languages.
+(Lua had them since 2003, Python mentions them since 2005, and C has various libraries that bring in coroutines)
 
-C++ got low level language support for the coroutines in version C++20.
-The standard does not yet define any usefull constructs with coroutines, which makes usage more troublesome.
+C++ got low-level language support for the coroutines in version C++20.
+The standard does not yet define any useful constructs with coroutines, which makes usage more troublesome.
 
 ### Basics
 
-We all should be aware af what `function` is. It is a segment of code that can be executed.
-When we `call` a function, the code of the function is executed and at certain point the `function` will finish its execution and flow returns to the place from which function was called.
+We all should be aware of what `function` is. It is a segment of code that can be executed.
+When we `call` a function, the code of the function is executed and at a certain point the `function` will finish its execution and flow returns to the place from which the function was called.
 
-For the sake of this blogpost, we will define coroutines as following:
+For the sake of this blog post, we will define coroutines as follows:
 
-Coroutine is a segment of code that can be executed.
-When we call a coroutine, the process will start execution of coorutines code.
-The coroutine might suspend itself and return control to it's parent.
-When suspended, the coroutine can be resumed again to continue execution of the code.
-Eventually, the coroutine will finish it's execution and can't be resumed again.
+A coroutine is a segment of code that can be executed.
+When we call a coroutine, the process will start the execution of the coroutine code.
+The coroutine might suspend itself and return control to its parent.
+When suspended, the coroutine can be resumed again to continue the execution of the code.
+Eventually, the coroutine will finish its execution and can't be resumed again.
 
-That is, `coroutine` is just `function` that can interrupt itself and be resumed by the caller.
+That is, `coroutine` is just a `function` that can interrupt itself and be resumed by the caller.
 
 ### How does it look like
 
@@ -74,36 +75,36 @@ ct.step();
 
 ```
 
-In normal use cases, the call of coroutine causes allocation, as all variables that existis between the suspension of the coroutine and it's resume (`j`) have to be stored somewhere.
-To do that the coroutines allocates memory for it's `coroutine frame` which contains all necessary information for it:
+In normal use cases, the call of coroutine causes allocation, as all variables that exist between the suspension of the coroutine and its resume (`j`) have to be stored somewhere.
+To do that the coroutine allocates memory for its `coroutine frame` which contains all the necessary information for it:
  - `promise_type` (explained later)
  - copy arguments of the coroutine
  - compiler defined holder of all variables of coroutine that survives between the suspension points
 (Note: The dynamic memory can be avoided, will be explained later)
 
-The `coroutine_type` is the type of the coroutine, and usually represents `handle` that points to the allocated frame. (By owning the `std::coroutine_handle<promise_type>` which is a handle given by compiler for the frame)
-To allow data exchange between the coroutine, the coroutine frame contains instance of `promise_type`, that is accessible from the `coroutine_type` and from the code of the coroutine itself.
-(Compiler will select `coroutine_type::promise_type` as the promise, this type can be alias, nested structure, or some other valid type)
+The `coroutine_type` is the type of the coroutine, and usually represents the `handle` that points to the allocated frame. (By owning the `std::coroutine_handle<promise_type>` which is a handle given by the compiler for the frame)
+To allow data exchange between the coroutine, the coroutine frame contains an instance of `promise_type`, that is accessible from the `coroutine_type` and from the code of the coroutine itself.
+(Compiler will select `coroutine_type::promise_type` as the promise, this type can be an alias, nested structure, or some other valid type)
 
 `awaiter` is an entity that is used for the suspension process. It is a type that should be passed to the `co_await` call and that is used by the compiler to handle the suspension.
-When the coroutine is suspended by the `co_await`, the compiler will call `void awaiter::await_suspend(std::coroutine_handle<promise_type>)` which gets access to the promise via `coroutine_handle` and after that the coroutine is suspended.
+When the coroutine is suspended by the `co_await`, the compiler will call `void awaiter::await_suspend(std::coroutine_handle<promise_type>)` which gets access to the promise via `coroutine_handle` and after that, the coroutine is suspended.
 Once the parent of the coroutine resumes it, the `U awaiter::await_resume()` of the awaiter used is called.
-The `U` returned by this method is return value of the `co_await` statement.
+The `U` returned by this method is the return value of the `co_await` statement.
 
-The purpose of the `awaiter` is to serve as customization point that makes it possible to extract data from coroutine (awaiter can get data in it's constructor and pass those to the promise) and also give data to the coroutine by extracting it from promise_type in the await_resume method
+The purpose of the `awaiter` is to serve as a customization point that makes it possible to extract data from the coroutine (awaiter can get data in its constructor and pass those to the promise) and also give data to the coroutine by extracting it from promise_type in the await_resume method.
 
-In this context, it is good idea to point out few properties of the mechanism:
- - Suspended coroutine can be destroyed. This destructiong is safe: all destructors of promise_type, arguments, stored variables... are called properly
+In this context, it is a good idea to point out a few properties of the mechanism:
+ - Suspended coroutine can be destroyed. This destruction is safe: all destructors of promise_type, arguments, and stored variables... are called properly
  - The `coroutine_type` does not need to have `step` semantics, the `coroutine_type` has access to `std::coroutine_handle<promise_type>` which provides the interface to resume the coroutine. The `coroutine_handle` might as well be implemented in a way that one method keeps resuming the coroutine until it finishes.
- - Coroutines can be nested. One can combine `coroutine_type` to be also valid `awaiter`, this gives possibility to have recursive coroutines, in which one `step` of the top coroutine does one step of the inner coroutine.
+ - Coroutines can be nested. One can combine `coroutine_type` to be also valid `awaiter`, this gives a possibility to have recursive coroutines, in which one `step` of the top coroutine does one step of the inner coroutine.
 
 ### Some coroutines
 
-Given that the features exists for some time and have some background from other languages, we can already talk about interesting types of coroutines to work with.
-For this I would like to point out https://github.com/lewissbaker/cppcoro as one of the interesting libraries with coroutines.
+Given that the features exist for some time and have some background from other languages, we can already talk about interesting types of coroutines to work with.
+For this, I would like to point out https://github.com/lewissbaker/cppcoro as one of the interesting libraries with coroutines.
 
-First common concept are `generators`.
-Generator is a coroutine that spits one value at each suspension point which is provided to the caller.
+The first common concept is the `generator`.
+The generator is a coroutine that spits one value at each suspension point which is provided to the caller.
 This can be used to generate simple sequences like:
 
 ```cpp
@@ -116,23 +117,23 @@ generator<int> sequence()
     }
 }
 ```
+Here, `co_yield` is just another expression in coroutine API, you can imagine it as a statement with different semantics than `co_await`: `co_await` should wait for awaiter, `co_yield` throws a value to the parent.
+Implementation-wise, `co_yield x` causes a call of `promise.yield_value(x)` which constructs awaiter which is awaited.
 
-Here, `co_yield` is just another expression in coroutine API, you can imagine it as statement with different semantics than `co_await`: `co_await` should wait for awaiter, `co_yield` throws a value to the parent.
-Implementation wise, `co_yield x` causes call of `promise.yield_value(x)` which constructs awaiter which is awaited.
-
-Generators can have same API as containers, which gives as abillity to create this nice infinite loop:
+Generators can have the same API as containers, which gives as the ability to create this nice infinite loop:
 
 ```cpp
 for(int i : sequence()){
     std::cout << i << std::endl;
 }
 ```
-(The idea usually is to either build generators that are not infinite, or to eventually stop using the infinite ones)
 
-Another concept of coroutines that can be interesting are: `io coroutines`
-That is, we can use coroutine to represent process that requires IO operations that are processed during the suspenions.
+(The idea usually is to either build generators that are not infinite or to eventually stop using the infinite ones)
 
-For example, assume that we have library for network communication that provides us coroutine to do the io with:
+Another concept of coroutines that can be interesting is: `io coroutines`
+That is, we can use a coroutine to represent a process that requires IO operations that are processed during the suspension points.
+
+For example, assume that we have a library for network communication that provides us coroutine to do the io with:
 
 ```cpp
 network_coroutine send_data(tcp_connection& con)
@@ -150,72 +151,72 @@ network_coroutine send_data(tcp_connection& con)
 }
 ```
 
-It is good to resumarize some properties of this coroutine:
+It is good to summarize some properties of this coroutine:
  - It does not have to block, once the `co_await` call the parent will get control of the program flow and can just send the data provided by `awaiter` asynchronously.
- - After that, the parent is free to ask another coroutine for data or to kills this coroutine entirely. This makes sense mostly in cases the request to get data would fail.
+ - After that, the parent is free to ask another coroutine for data or to destroy this coroutine entirely. This makes sense mostly in cases the request to get data would fail.
 
 ### More
 
-This was fast and simple explanation of coroutines, the purpose of this article is not to give detailed explanation. I can suggest blogpost such as this one: https://www.scs.stanford.edu/~dm/blog/c++-coroutines.html
+This was a fast and simple explanation of coroutines, the purpose of this article is not to give a detailed explanation. I can suggest a blog post such as this one: https://www.scs.stanford.edu/~dm/blog/c++-coroutines.html
 
 ## Embedded coroutines
 
 Let's get back to embedded and try to talk about how are coroutines relevant for embedded development.
-That is, what are possible problems with coroutines in embedded, why are they relevant, and some suggestions about ways we can use them.
+That means possible problems with coroutines in embedded development, the relevance of coroutines, and some hints for proper usage.
 
 ### Relevance
 
-Common task for emebedded is to do a plenty of IO communication via peripherals and in the meantime to take care about multiple process in pararel on single-core system.
+A common task for embedded is to do plenty of IO communication via peripherals and in the meantime to take care of multiple processes in parallel on a single-core system.
 
 That is, we have to handle:
  - complex computations (Matrix computations)
  - IO (send/receive data over UART/I2C/...)
- - longterm processes (PID regulator regulating temperature of the room)
+ - long-term processes (PID regulator regulating the temperature of the room)
 
-To handle all those processes at single point in time, we have two major approaches:
+To handle all those processes at a single point in time, we have two major approaches:
  1) implement it as an iterative process and schedule it
  2) use threads for the processes
 
 Let's take communication over UART as an example:
- 1) iterative process means usage of structure to store the state of the communication, and providing a function that based on the current state advances the state. (Commonly with usage of enum representing actual status)
- 2) implement the communication in a thread which might release it's time if it waits for some operation
+ 1) iterative process means the usage of structure to store the state of the communication, and provide a function that based on the current state advances the state. (Commonly with the usage of an enum representing actual status)
+ 2) implement the communication in a thread which might release its time if it waits for some operation
 
-Approach 1) has multiple potential issues, it might take a lot of code to implement complex exchange of data in this way (a lot of state variables) and the approach is problematic, as there is a chance that one of the steps might take longer than expected and we can't prevent that.
+Approach 1) has multiple potential issues, it might take a lot of code to implement a complex exchange of data in this way (a lot of state variables) and the approach is problematic, as there is a chance that one of the steps might take longer than expected and we can't prevent that.
 
-Approach 2) has another set of issues. Each thread requires it's own stack space (which might not scale), and we got all the problems of pararelism - exchanging data between can suffer to various potential concurrency issues.
+Approach 2) has another set of issues. Each thread requires its own stack space (which might not scale), and we got all the problems of parallelism - exchanging data between can suffer various potential concurrency issues.
 
-From this perspective, coroutines are third way of approaching these processes, that is not better than 1) or 2), but also not worse.
-Coroutines bring in a new (and not yet finetuned) mechanism that makes it possible to write exchange of data over UART in way that does not require as complex code as 1), does not suffer to so many concurrency issues as 2) (and does not need its own stack, just frame).
+From this perspective, coroutines are a third way of approaching these processes, which is not better than 1) or 2), but also not worse.
+Coroutines bring in a new (and not yet finetuned) mechanism that makes it possible to write an exchange of data over UART in a way that does not require as complex code as 1), does not suffer so many concurrency issues as 2) (and does not need its own stack, just frame).
 
-However, coroutines still suffer to the issue that one step of computation might take longer than expected, and we can assume that any errors in coroutines might be harder to inspect.
+However, coroutines still suffer from the issue that one step of computation might take longer than expected, and we can assume that any errors in coroutines might be harder to inspect.
 
-What is interesting, is that coroutines have good potential to have more effecient context switches than threads, which was shown in this article: https://www.researchgate.net/publication/339221041_C20_Coroutines_on_Microcontrollers_-What_We_Learned 
+What is interesting, is that coroutines have good potential to have more efficient context switches than threads, which was shown in this article: https://www.researchgate.net/publication/339221041_C20_Coroutines_on_Microcontrollers_-What_We_Learned
 
-Note that in the article does not compare corutines in a variety of situations, but it still shows something.
+Note that the article does not compare coroutines in a variety of situations, but it still shows something.
 
 ### Problems
 
-There are multiple potential issues with coroutines and using them correctly, but I think there are few that are really noteworthy for the embedded community.
+There are multiple potential issues with coroutines and using them correctly, but I think there are a few that are noteworthy for the embedded community.
 
 #### Dynamic memory
 
-First thing that I assume most of you noticed at the begining is the fact the coroutine requiers dynamic memory.
-This is not favorable in embedded in many cases, but there are ways to avoid that.
+The first thing that I assume most of you noticed at the beginning is the fact the coroutine requires dynamic memory.
+This is not favourable in embedded in many cases, but there are ways to avoid that.
 
-Coroutines have `allocator` support, we can provide coroutine an allocator that can be used to get memory for the coroutine and hence avoid the dyn. allocation. (Approach that I can suggest)
+Coroutines have `allocator` support, we can provide the coroutine with an allocator that can be used to get memory for the coroutine and hence avoid the dynamic allocation. (Approach that I can suggest)
 This is done by implementing custom `operator new` and `operator delete` on the `promise_type` which allocates the `entire frame`.
 
-Alternative is to rely on `halo` optimization (Heap Allocation Elision Optimization), if the coroutine is implemented correctly and the parent function executes entire coroutine in its context.
-The compiler can optimize away the dynamic allocation and just store the frame on the stack of the coroutines parent.
-This can be be enforced by deleting appropiate `operator new` and `operator delete` overloads of the `promise_type`, but it seems clumsy to me.
+An alternative is to rely on `halo` optimization (Heap Allocation Elision Optimization) if the coroutine is implemented correctly and the parent function executes the entire coroutine in its context.
+The compiler can optimize away the dynamic allocation and just store the frame on the stack of the coroutine's parent.
+This can be enforced by deleting appropriate `operator new` and `operator delete` overloads of the `promise_type`, but it seems clumsy to me.
 
 And to kinda ruin the pretty thing here, there is one catch. As of now, the compiler can't tell you how much memory the coroutine needs, as it is known only during the link time - the size of the coroutine frame can't be compiler constant. (TODO: link to the source for this)
 This means that you effectively can't prepare static buffers for the coroutines.
 
-What I would suggest (that is what I do), is to use coroutines for the long term process and just build them during initialization of the device. 
+What I would suggest (that is what I do), is to use coroutines for the long-term process and just build them during the initialization of the device.
 Or just live with the dynamic memory.
 
-(Note: if I could dream a bit here, I would really want an explicit way of forcing the coroutine to live on stack, which would have clearly defined behavior of when and how it should happen, and with proper compiler errors if one fails to do so)
+(Note: if I could dream a bit here, I would want an explicit way of forcing the coroutine to live on the stack, which would have clearly defined behaviour of when and how it should happen, and with proper compiler errors if one fails to do so)
 
 #### Frame size is big
 
@@ -224,41 +225,42 @@ That is, the section of the frame for storing variables that live during the sus
 
 The intuitive way to work with the section would be: At each suspension point, we only want to remember variables that survive `that` suspension point and do not care about the rest.
 
-The issue is that currently GCC is storing all variables for all suspension points there.
-This might became problematic if you have a lot of `co_await` statements with various types/interfaces.
+The issue is that currently, GCC is storing all variables for all suspension points there.
+This might become problematic if you have a lot of `co_await` statements with various types/interfaces.
 (For example: if each `awaiter` stores a buffer of data, there might be separate space for each buffer in the frame)
 
-That might improve in the future, friend of mine works on a patch for gcc: https://gcc.gnu.org/bugzilla/attachment.cgi?id=53290&action=diff that uses unions of struct, with a separate struct for each suspension point - we pay much less memory.
+That might improve in the future, a friend of mine works on a patch for gcc: https://gcc.gnu.org/bugzilla/attachment.cgi?id=53290&action=diff that uses unions of struct, with a separate struct for each suspension point - we pay much less memory.
 
 #### Inspectability
 
-For all that we do on embedded, one of the biggest limitation is our limited ability to inspect what went wrong. Either due to limited resources on the devices or due to the real-time nature of the things.
+For all that we do on embedded, one of the biggest limitations is our limited ability to inspect what went wrong. Either due to limited resources on the devices or due to the real-time nature of the things.
 
-At this point in time, `gdb` has support or threads, and mechanisms that use manullay written state machine for asynchronous processing are inspectable by default.
-But there is no explicit support for corutines.
+At this point, `gdb` has support or threads, and mechanisms that use manually written state machines for asynchronous processing are inspectable by default.
+But there is no explicit support for coroutines.
 
-By it's nature, all varibales in the coroutine that survive suspension points form a state.
-And yet that state is kinda problematic for us, as other parts of the code can't access it while it is suspended. 
+By its nature, all variables in the coroutine that survive suspension points form a state.
+And yet that state is kinda problematic for us, as other parts of the code can't access it while it is suspended.
 (For inspection/logging/data recording...)
 
-In this scenario, I am afraid that it might take while until our debug tools are smart enough to give as easy live with coroutines in the system. Mostly: what if something wents wrong in the coroutine? how do you inspect the state the coroutine is in?
+In this scenario, I am afraid that it might take a while until our debug tools are smart enough to give an easy life with coroutines in the system. Mostly: what if something went wrong in the coroutine? how do you inspect the state the coroutine is in?
 
 ### Suggestions
 
-We talked about what coroutines are, what are potential problems, now question is: What can we use them for?
+We talked about what coroutines are, and what are potential problems, now the question is: What can we use them for?
 
-For that I am afraid it is hard to give good answer. 
-The functionality is simply quite new and I suspect that it will take a while (years) until we gain enough experience to be sure in which ways it is really beneficial.
+For that, I am afraid it is hard to give a good answer.
+The functionality is simply quite new and I suspect that it will take a while (years) until we gain enough experience to be sure in which ways it is beneficial.
 
-That said, based on my (little) experience with usage of coroutines in embedded, I have some suggestions and patterns that might be interesting.
-Mostly to peek the interest of community in these, as I believe the potential is here.
+That said, based on my (little) experience with the usage of coroutines in embedded, I have some suggestions and patterns that might be interesting.
+Mostly to pique the interest of the community in these, as I believe the potential is here.
 
 #### IO Coroutine
 
-I think that the IO example above is the most obvious way we can benefit from coroutines in embedded.
+
+I think that the IO example above is the most obvious way we can benefit from coroutines embedded.
 That is, we can use coroutines as an abstraction to interact with peripherals that do IO operations.
 
-To describe this, I will use a personal experience with `i2c_coroutine`. 
+To describe this, I will use a personal experience with `i2c_coroutine`.
 We can design it as follows:
 
 When `i2c_coroutine` is used as the type of the coroutine, the developer can interact with the `i2c` peripheral by `suspending` the coroutine with an `awaiter` that initiates operation on the `i2c` peripheral, the coroutine is resumed once that operation finishes.
@@ -281,31 +283,31 @@ This has the benefit that while the `i2c operation` is processed by the peripher
 And we do not have to pay for that by having complexity (manual state machine doing the interaction) or having threads (which bring in other problems).
 
 Given the bus nature of `i2c`, it is also quite easy to achieve sharing of the `i2c bus` with multiple `device drivers` for various devices on the bus itself.
-We can just implement `i2c_coroutine round_robin_run(std::span<i2c_coroutine> coros)` coroutine that uses round roubin to share the access to the peripheral between multiple coroutines (devices).
+We can just implement the `i2c_coroutine round_robin_run(std::span<i2c_coroutine> coros)` coroutine that uses round robin to share access to the peripheral between multiple coroutines (devices).
 
-This can obviously transfer to any interaction with our peripherals, we can model that interaction as a suspension of the coroutine, and let the suspension process initiate the operation with peripheral, or simply give us request for some operation.
+This can transfer to any interaction with our peripherals, we can model that interaction as a suspension of the coroutine, and let the suspension process initiate the operation with the peripheral, or simply give us a request for some operation.
 
 Note that this has one clear benefit: The coroutine can be completely oblivious to whenever the method of transporting data over peripheral is:
  - blocking
  - using interrupts
  - using DMA
 
-We can also delete any suspended coroutine once we come to conclusion that it is beneficial for us.
-In some cases it might make sense to use this to restart the communication process with the device.
-(Note that all variables in the coroutine would be destroyed, that is, it is good idea to use RAII to revert any effects that coroutine had on it's environment (such as pulling GPIO pin high/low))
+We can also delete any suspended coroutine once we conclude that it is beneficial for us.
+In some cases, it might make sense to use this to restart the communication process with the device.
+(Note that all variables in the coroutine would be destroyed, that is, it is a good idea to use RAII to revert any effects that the coroutine had on its environment (such as pulling GPIO pin high/low))
 
 I believe there are two variations to how exactly this might be implemented (and I am using both for experimentation):
- - the coroutine throws back a token that represent operation, which is propagated to the peripheral by the APP (for example: `std::variant<no_op, read_request, write_request>`)
+ - the coroutine throws back a token that represents operation, which is propagated to the peripheral by the APP (for example: `std::variant<no_op, read_request, write_request>`)
  - the awaiter directly interacts with the API of the peripheral
 
- Given my architectural opinions and experience I tend to prefer the token based approach, as it feels like lighter coupling between the peripheral and the coroutine.
+Given my architectural opinions and experience, I tend to prefer the token-based approach, as it feels like a lighter coupling between the peripheral and the coroutine.
 
- However, I've found out that direct interaction has one strong benefit.
- It is much easier to point the peripheral back to the `awaiter`, this way the peripheral might interact with the awaiter during the suspension - much stronger API.
+However, I've found out that direct interaction has one strong benefit.
+ It is much easier to point the peripheral back to the `awaiter`, this way the peripheral might interact with the awaiter during the suspension - a much stronger API.
 
 #### Computing coroutines
 
-One of the biggest advantages of threads over manual decomposition is that some computations are painfull to be split into reasonable steps, let's take matrix multiplication as an example:
+One of the biggest advantages of threads over manual decomposition is that some computations are painful to be split into reasonable steps, let's take matrix multiplication as an example:
 
 ```cpp
 void multiply_matrix(const matrix_t& A, const matrix_t& B, matrix_t& result){
@@ -320,7 +322,7 @@ void multiply_matrix(const matrix_t& A, const matrix_t& B, matrix_t& result){
 }
 ```
 
-As much as it is feasible to convert this into statefull object that contains the state of multiplication and hence can do just part of the multiplication at once... The coroutines bring in more elegant solution:
+As much as it is feasible to convert this into a stateful object that contains the state of multiplication and hence can do just part of the multiplication at once... The coroutines bring in a more elegant solution:
 
 ```cpp
 simple_coroutine multiply_matrix(const matrix_t& A, const matrix_t& B, matrix_t& result){
@@ -336,11 +338,11 @@ simple_coroutine multiply_matrix(const matrix_t& A, const matrix_t& B, matrix_t&
 }
 ```
 
-This is of course not perfect, and give the problems with coroutine and dyn. memory, it might be more reasonable to write the `matrix_multiplicator` object.
-But problems with dyn. memory of coroutines can be resolved in time, new patterns to handle them can emerge and suddenly something like this might be viable.
+This is of course not perfect and given the problems with coroutine and dynamic memory, it might be more reasonable to write the `matrix_multiplicator` object.
+But problems with dynamic memory of coroutines can be resolved in time, new patterns to handle them can emerge and suddenly something like this might be viable.
 
 ## Conclusion
 
-Coroutines are a new feature of C++20 and there is potential for them to be of interest for us.
-However it seems that there is already plenty to be concerned about.
+Coroutines are a new feature of C++20 and there is potential for them to be of interest to us.
+However, it seems that there is already plenty to be concerned about.
 (Unless, of course, I am greatly mistaken in my observations)

From daa3b3482803e71a984d4786a3363a44360ef54e Mon Sep 17 00:00:00 2001
From: Jan Veverak Koniarik <squirrelcze@gmail.com>
Date: Mon, 6 Feb 2023 18:32:30 +0100
Subject: [PATCH 8/9] removed the beta version of testing lib

---
 _posts/2022-04-20-testing-library.md | 344 ---------------------------
 1 file changed, 344 deletions(-)
 delete mode 100644 _posts/2022-04-20-testing-library.md

diff --git a/_posts/2022-04-20-testing-library.md b/_posts/2022-04-20-testing-library.md
deleted file mode 100644
index 4e78381b..00000000
--- a/_posts/2022-04-20-testing-library.md
+++ /dev/null
@@ -1,344 +0,0 @@
----
-title: Testing library
-description: Design of a testing library
-author: veverak
-tags: [c++, testing]
----
-
-<!-- excerpt start -->
-
-We have multiple testing libraries focused on C++ applications for GPOS (general-purpose operating system), but there is a lack of testing libraries designed for embedded devices.
-
-The traditional libraries are not designed for constrained resources and rely on functionality like a filesystem or standard output.
-
-I decided to design a testing library for microcontrollers.
-In this article, I want to show the rationale, design choices, and thoughts on the prototype.
-
-<!-- excerpt end --->
-
-{% include newsletter.html %}
-
-{% include toc.html %}
-
-## Motivation
-
-When developing any code, being able to test is crucial for sustainable development.
-
-In the case of executable code on systems with GPOS, widely used solutions are GoogleTest or Catch libraries.
-What we usually expect from such a framework is:
-- a tool that will organize and orchestrate the execution of the tests
-- basic functions/API to check the correctness of the results in the test
-- features for scaling: fixtures, parameterized tests, executing tests multiple times, metrics
-
-In the context of microprocessors, these libraries are not usable.
-They rely on the file system, input/output into a terminal, and dynamic memory. They also do not care about tight limits for code size.
-
-These frameworks are usable only for testing parts of the embedded firmware.
-These parts are independent of the hardware: algorithms, internal business logic, and others.
-We, however, can't test anything that is tied to the hardware.
-
-For that reason, I decided to implement a custom opinionated testing framework designed for a specific use case: executing tests on the embedded hardware itself.
-
-The goal is to be able to test embedded code that is tied to the hardware itself:
-- interrupt-based mechanics
-- control algorithms that are unpractical to simulate
-- code tied to peripherals
-
-## Requirements
-
-Based on my experience and opinions, I decided to specify the following requirements:
-
-emlabcpp integration
-   The code is tightly integrated into my personal C++20 library.
-   That is: it can't be used without the library.
-   This eases the development of the testing framework as I reuse functionality from the library, specifically: the protocol library inside emlabcpp.
-
-simplicity
-   The library should be simple and should not try to provide an entire set of functionality that Catch/Google Test offers.
-   That should not be necessary, and I prefer a simpler and more efficient tool.
-
-integration into existing testing tools
-   A wide set of tools exist that can work with the test results of Catch/Google Test - for example, GitLab has the integration of test results from these tools.
-   The library should be compatible from this perspective - it should be integrable into existing systems.
-
-small footprint
-   The assertion is that the application code itself will take a big percentage of the available memory of the microchip.
-   That implies that the library should have a small memory footprint - so it can coexist with present code.
-
-no dynamic memory, no exception
-   Both are C++ features that we may want to avoid in the firmware.
-   The testing library should not require them for its functionality to allow usage in context when they are not enabled at all.
-
-no platform fixation
-   Ideally, we would prefer this to be reusable between different embedded platforms and situations.
-   That imposes the limit that the library should not be tied to any specific platform.
-
-
-## Design
-
-The library itself is implemented as a two-part system:
-
-reactor
-   It is present in the embedded device and controls it.
-   It has a small footprint and limited functionality. It can:
-      - register tests to itself
-      - store bare minimum information about firmware/tests
-      - execute the tests
-      - communicate information/exchange data between itself and the controller
-
-controller
-   Controls the testing process and is presented on the device that controls the tests.
-   It is still developed as microcontroller-compatible software (no dynamic memory, no exceptions), but there is a weak assumption that it will be mainly used on a system with GPOS.
-   It can:
-      - communicate with and control the reactor
-      - load test information from the reactor
-      - orchestrate test execution
-      - provide input data for tests
-      - provide data collected from the tests
-
-The separation of the design into two tools imposes restrictions: the tests on the embedded device can't be executed without the controller. 
-But that allows a minimal memory footprint of the testing firmware on the firmware size, as I can move as much of the testing logic as reasonable to the controller side.
-Especially data collection can be done so that everything is stored in the controller.
-
-The communication method between the parts is not defined.
-Both parts use messages for communication, but it is up to the user to implement how the messages are transferred.
-Each expects to be provided with an interface that implements read/write methods - it's up to the user to design how.
-This makes it platform-independent and gives flexibility for various scenarios.
-But I do silently expect that UART will be mostly used.
-
-The way the controller gets input data and processes the collected data from tests is up to the user.
-The interface for the controller only provides an API for both.
-
-In the end, the perspective one can use for this is:
-The testing library is just a fancy remote execution library - the controller executes functions registered to the reactor in the firmware and collects results.
-
-## Basic implementation details
-
-Each part is object - `testing_reactor` object and `testing_controller` object.
-Both are designed to take control of their application.
-Both expect to be provided with user-provided objects implementing interfaces `testing_reactor_interface` and `testing_controller_interface`.
-Interfaces are implemented by the user and define how the object interacts with its environment.
-
-In the case of the embedded firmware, one creates an instance of the reactor, registers tests into it, and passes control to the reactor.
-This is done in a way that still gives user some control over the main loop::
-```cpp
-   emlabcpp::testing_basic_reactor rec{"test suite name"};
-   my_reactor_interface rif{..};
-   // register tests
-
-   while(true){
-      rec.tick(rif);
-   }
-```
-
-The reactor expects that its `tick` method is called repeatedly, and the method contains one iteration of the reactor's control loop.
-It either answers the reactor in the control loop or executes the entire test during one `tick` call - it can block for a while.
-
-The `controller` has similar behavior and interface. With the exception that the `controller_interface` also contains customization points for additional features:
-- methods to provide input data for tests on request
-- `on_test(emlabcpp::testing_result)` method that is called with results of one test call
-- `on_error` method is called once an error happens in the library.
-
-It's up to the user to implement the interface for the specific use case or use existing implementations (the library may provide some).
-
-## Dynamic memory
-
-Both the `reactor` and the `controller` contains data structure with dynamic size.
-To avoid dynamic memory, I wanted to use `std::pmr`: the internal containers would use an allocator and expect memory resource as an input argument.
-This implements the behavior: "The central objects expect a memory resource they should use for data allocation."
-
-I think that this fits the use case quite nicely, as both types require dynamic data structures, but in the same way, I want them to be usable without dynamic memory itself - compromise is an interface that can be provided with static buffers.
-
-However, `std::pmr` does not feel usable, as the default construction of the allocator uses a default memory instance that exists as a global object. (that can be changed only at runtime)
-The default instance uses new/delete.
-That means that it is easy for code that uses `std::pmr` to include in the firmware the entire stack for dynamic allocation - something that I want to avoid.
-
-I decided to re-implement `std::pmr` in my custom library with a few changes in the API that are more fitting to the embedded library.
-The key one is that memory resource with new/delete operators simply does not exists.
-The user has to instance a memory resource also provided by `emlabcpp` and give it to the object.
-
-As a simple alternative, there exists `emlabcpp::testing_basic_reactor,` which inherits from the `reactor` and provides it with a basic memory resource that can be used by it - sane default.
-
-## Binary protocol
-
-The binary protocol is intentionally considered an implementation detail, as I want to have the freedom to change it at will.
-
-It is implemented with a C++ protocol library I did previously. The short description is: imagine protocol buffers, but instead of an external tool, it is just a C++ library that gets the definition of protocol via templates.
-
-## Data exchange
-
-The framework provides mechanics to exchange data between controller and reactor.
-
-Tests can request test data from the controller as a form of input.
-(It's up to the user how the controller gets/provides that data)
-The request is a blocking communication operation - the input is not stored on the side of the reactor.
-
-The test can collect data - reactors have an API to send data to the controller.
-The controller stores the data during test execution, and it is passed to the user once the test is done in test_result.
-
-In the case of input, I use a simple key/value mechanism.
-In the case of the collected data, these can be organized into a tree, where each node has key/value pair.
-
-That is, each data point is made of a 'key' that identifies it and its corresponding 'value.'
-
-To give some flexibility, the types are:
-
-key
-   can be either string or integer
-
-value
-   can be string, integer, bool, unsigned
-
-In each case, the framework can serialize (thanks to the `emlabcpp::protocol` library) and deserialize any types and send them over the communication channel.
-
-As for the strings: These are limited by size to 32 characters, as this way, I can use static buffers for them, and they do not have to be allocated.
-
-
-## Examples of tests
-
-I tried to prepare a simple interface for the registration of tests, as I believe that tests should be easy to write.
-(Note: Generally, I don't mind some cost of setting up the library, but I think that adding tests should be easy)
-To guide the explanation, let's assert we are testing a wending machine:
-
-```cpp
-   emlabcpp::testing_basic_reactor rec{"test suite for wending machine"};
-
-   rec.register_callable("my simple test", [&]( emlabcpp::testing_record & rec){
-
-      int product_id = rec.get_arg<int>("product_id");
-
-      rec.expect( product_id < MAX_PRODUCTS_N );
-
-      wending_machine::release_product(product_id);
-
-      rec.collect( "released: ", product_id );
-
-      bool occupied = wending_machine::is_takeout_area_occupied();
-
-      rec.expect( occupied );
-   });
-```
-
-What happens here is that the lambda function is registered as a test.
-That test is identified by the "my simple test" string, used to identify it from the controller.
-
-Once the test is executed (the controller tells the reactor to execute it), it is provided with a `testing_record` object that serves as an API between the test and the reactor.
-
-The testing code should use the record to get any data from the controller, collect any data during the test, and mainly: provide information whenever the test failed or succeeded.
-
-In the example, you can see the usage of all the primitives:
- - `rec.get_arg<int>("product_id")` tells the reactor to ask controller for argument with key `product_id` and retreive it as integer type
- - `rec.expect( product_id < MAX_PRODUCTS_N )` is a form checking properties in the test - if `false` is passed to the `expect(bool)` method the test is marked as failed.
- - `rec.collect("released: ", product_id )` collects the data `product_id` with key `released: ` and sends it to the controller.
-
-## Building the tests
-
-The user solely handles that. The testing framework just provides an object that expects communication API and can register test - how that is assembled into a firmware is up to the user.
-
-The idea is that single 'testing firmware' will collect multiple tests registered into one reactor.
-It's up to the user to orchestrate the build process in a sensible way.
-
-In the case of CMake, I decided to split the application itself into "application library" and "main executable." 
-Most of the logic of the firmware is in the application library, and the main executable just implements the main function and starts up the application library.
-
-The main executable of tests uses that library to prepare and set up tests.
-Note that the idea is that there are multiple test binaries with different tests.
-I don't assume that all the tests would fit into one binary.
-
-This way, any test firmware is closely similar to the application executable - just with a different main file.
-
-From the controller's perspective, it can be just a simple application that is meant to be executed on GPOS.
-
-## Google Test
-
-One small win that appeared was that, given the flexibility, it was easy to integrate Google Test and controller.
-The controller can register each test from the reactor as a test in the google test library.
-It can use the Google Test facility on GPOS to provide user-readable output about the execution of the tests, more orchestration logic, and output of the testing in the form of JUnit XML files.
-Systems like GitLab can use this.
-
-This shows that it was easy to provide the necessary facility for the testing firmware to be integrated into modern CI with traditional tools.
-And yet the integration is not tight. Any integration into Google Test is just a set of few functions/classes in emlabcpp that can be ignored by anybody not favoring Google Test.
-
-The test output from the project I used this framework the first time can look like this:
-
-```
-    ./cmake-build-debug/util/tester --device /dev/ttyACM0
-   [==========] Running 1 test from 1 test suite.
-   [----------] Global test environment set-up.
-   [----------] 1 test from emlabcpp::testing
-   [ RUN      ] emlabcpp::testing.basic_control_test
-   /home/veverak/Projects/servio/util/src/tester.cpp:32: Failure
-   Test produced a failure, stopping
-   collected:
-    11576 :	0
-    11679 :	0
-    11782 :	0
-    11885 :	0
-    11988 :	0
-    12091 :	0
-    12194 :	0
-    12297 :	0
-    12400 :	0
-    12503 :	0
-    12606 :	0
-    12709 :	0
-    12812 :	0
-    12915 :	0
-    13018 :	0
-    13121 :	0
-    13224 :	0
-    13327 :	0
-    13430 :	0
-    13533 :	0
-    13636 :	0
-    13739 :	0
-    13842 :	0
-    13945 :	0
-    14048 :	0
-   [  FAILED  ] emlabcpp::testing.basic_control_test (2597 ms)
-   [----------] 1 test from emlabcpp::testing (2597 ms total)
-
-   [----------] Global test environment tear-down
-   [==========] 1 test from 1 test suite ran. (2597 ms total)
-   [  PASSED  ] 0 tests.
-   [  FAILED  ] 1 test, listed below:
-   [  FAILED  ] emlabcpp::testing.basic_control_test
-
-    1 FAILED TEST
-```
-
-In this example, the controller registered all tests in the firmware (on the device that was connected to the PC and was accessible via the `/dev/ttyACM0` serial device). After that, it executed all of them.
-
-The name of the testing suite `emlabcpp::testing` and the name of the test `basic_control_test` were all extracted on the fly from the testing firmware itself. We can also see values collected by the test during the execution.
-
-## Controller is independent
-
-Based on the specific project and testing needs, one can use one binary with a `controller` for multiple `reactors.` That is something I intend with the actual main project that uses it.
-
-As the controller loads most information from the reactor and if the Google Test integration is used, there is not much logic that can be varied.
-
-The sole exception is how data is provided for the tests.
-But then it can be implemented in some general way - for example, the `controller` binary would load the data from a JSON file in some generic way.
-
-## Experience
-
-It is pretty limited, but I am happy with the prototype.
-I am sure that I will refactor the library in the future, as there are prominent places to be improved but so far it behaves good enough.
-It gives me a simple way to test and develop various ways to control the smart servomotor I am working on.
-(Note: yes, this is one of the cases of "oh, I need to develop a library so I can do a project"...)
-
-What could be developed more in the future and what pains me so far is:
- - it still does not report 100% of the possible errors on the side of the testing library - I have to go through the codebase and be more strict
- - it can't handle exceptions - while it should not rely on them, I think the library should respect them. That means in case the test throws an exception. It should not stop the reactor.
- - data exchange can be improved - what can be exchanged as of now is quite limited. I suppose I can provide more types to send and receive.
- - memory resource - uses internal emlabcpp mechanism that is underdeveloped, that definitely would benefit from more work.
- - more experience in CI - I think I am on a good track to having automatized tests in CI that are flashed to real hardware somewhere in the laboratory. That could show the limits of the library.
-
-## The code
-
-The testing library is part of emlabcpp - my personal library, which purpose is for me to have an up-to-date collection of tools for my development. 
-I restrain myself from saying "it should be used by others," as I don't want to care about backward compatibility outside of my projects.
-
-The primary example of the testing library is an example file: [emlabcpp/tests/testing](https://github.com/koniarik/emlabcpp/blob/v1.2/tests/testing_test.cpp)
-
-The interface to the library itself is in: [emlabcpp/include/emlabcpp/experimental/testing](https://github.com/koniarik/emlabcpp/tree/v1.2/include/emlabcpp/experimental/testing)

From 4db2ff3482ce7dcd7c2ee99df6ea1ff71f428c85 Mon Sep 17 00:00:00 2001
From: Jan Veverak Koniarik <squirrelcze@gmail.com>
Date: Tue, 15 Aug 2023 18:04:13 +0200
Subject: [PATCH 9/9] fixes

---
 _posts/2024-04-20-coroutines.md | 38 ++++++++++++++++++++++++++++-----
 1 file changed, 33 insertions(+), 5 deletions(-)
 mode change 100644 => 100755 _posts/2024-04-20-coroutines.md

diff --git a/_posts/2024-04-20-coroutines.md b/_posts/2024-04-20-coroutines.md
old mode 100644
new mode 100755
index 07e0e962..0b16f854
--- a/_posts/2024-04-20-coroutines.md
+++ b/_posts/2024-04-20-coroutines.md
@@ -8,7 +8,7 @@ tags: [c++, coroutines]
 
 <!-- excerpt start -->
 
-In the last year, it seems to me that there was quite a high activity about coroutines in C++.
+In the last few years, it seems to me that there was quite a high activity about coroutines in C++.
 That is, we got plenty of blog posts/videos on the topic of coroutines that are officially available since C++20.
 
 
@@ -84,7 +84,7 @@ To do that the coroutine allocates memory for its `coroutine frame` which contai
 
 The `coroutine_type` is the type of the coroutine, and usually represents the `handle` that points to the allocated frame. (By owning the `std::coroutine_handle<promise_type>` which is a handle given by the compiler for the frame)
 To allow data exchange between the coroutine, the coroutine frame contains an instance of `promise_type`, that is accessible from the `coroutine_type` and from the code of the coroutine itself.
-(Compiler will select `coroutine_type::promise_type` as the promise, this type can be an alias, nested structure, or some other valid type)
+(Compiler will select `std::coroutine_traits<coroutine_type>::promise_type` as the promise, this type can be an alias, nested structure, or some other valid type)
 
 `awaiter` is an entity that is used for the suspension process. It is a type that should be passed to the `co_await` call and that is used by the compiler to handle the suspension.
 When the coroutine is suspended by the `co_await`, the compiler will call `void awaiter::await_suspend(std::coroutine_handle<promise_type>)` which gets access to the promise via `coroutine_handle` and after that, the coroutine is suspended.
@@ -98,6 +98,8 @@ In this context, it is a good idea to point out a few properties of the mechanis
  - The `coroutine_type` does not need to have `step` semantics, the `coroutine_type` has access to `std::coroutine_handle<promise_type>` which provides the interface to resume the coroutine. The `coroutine_handle` might as well be implemented in a way that one method keeps resuming the coroutine until it finishes.
  - Coroutines can be nested. One can combine `coroutine_type` to be also valid `awaiter`, this gives a possibility to have recursive coroutines, in which one `step` of the top coroutine does one step of the inner coroutine.
 
+More detailed description and a good source of truth is cppreference: https://en.cppreference.com/w/cpp/language/coroutines
+
 ### Some coroutines
 
 Given that the features exist for some time and have some background from other languages, we can already talk about interesting types of coroutines to work with.
@@ -186,7 +188,7 @@ Approach 1) has multiple potential issues, it might take a lot of code to implem
 Approach 2) has another set of issues. Each thread requires its own stack space (which might not scale), and we got all the problems of parallelism - exchanging data between can suffer various potential concurrency issues.
 
 From this perspective, coroutines are a third way of approaching these processes, which is not better than 1) or 2), but also not worse.
-Coroutines bring in a new (and not yet finetuned) mechanism that makes it possible to write an exchange of data over UART in a way that does not require as complex code as 1), does not suffer so many concurrency issues as 2) (and does not need its own stack, just frame).
+Coroutines bring in a new (and not yet finetuned) mechanism that makes it possible to write an exchange of data over UART in a way that does not require as complex code as 1), does not suffer so many concurrency issues as 2) (and does not need its own stack, just frame). Note that with coroutines you still have to deal with concurrent access to resources, but in more preditacble way.
 
 However, coroutines still suffer from the issue that one step of computation might take longer than expected, and we can assume that any errors in coroutines might be harder to inspect.
 
@@ -205,10 +207,12 @@ This is not favourable in embedded in many cases, but there are ways to avoid th
 
 Coroutines have `allocator` support, we can provide the coroutine with an allocator that can be used to get memory for the coroutine and hence avoid the dynamic allocation. (Approach that I can suggest)
 This is done by implementing custom `operator new` and `operator delete` on the `promise_type` which allocates the `entire frame`.
+Note that you can reuse any allocator implementation for this, or any general implementation that give you a chance to get and release a memory.
 
 An alternative is to rely on `halo` optimization (Heap Allocation Elision Optimization) if the coroutine is implemented correctly and the parent function executes the entire coroutine in its context.
 The compiler can optimize away the dynamic allocation and just store the frame on the stack of the coroutine's parent.
-This can be enforced by deleting appropriate `operator new` and `operator delete` overloads of the `promise_type`, but it seems clumsy to me.
+This can be enforced by deleting appropriate `operator new` and `operator delete` overloads of the `promise_type`.
+The issue with this approach is that you can't control the `halo` optimization, the compiler either decides to do it or does not do it, if it does not happen it might be tricky to find out why and force the compiler to cooperate.
 
 And to kinda ruin the pretty thing here, there is one catch. As of now, the compiler can't tell you how much memory the coroutine needs, as it is known only during the link time - the size of the coroutine frame can't be compiler constant. (TODO: link to the source for this)
 This means that you effectively can't prepare static buffers for the coroutines.
@@ -283,7 +287,31 @@ This has the benefit that while the `i2c operation` is processed by the peripher
 And we do not have to pay for that by having complexity (manual state machine doing the interaction) or having threads (which bring in other problems).
 
 Given the bus nature of `i2c`, it is also quite easy to achieve sharing of the `i2c bus` with multiple `device drivers` for various devices on the bus itself.
-We can just implement the `i2c_coroutine round_robin_run(std::span<i2c_coroutine> coros)` coroutine that uses round robin to share access to the peripheral between multiple coroutines (devices).
+We can just implement the `i2c_coroutine round_robin_run(std::span<i2c_coroutine> coros)` coroutine that uses round robin to share access to the peripheral between multiple coroutines (devices). Example of RRR:
+
+```cpp
+// returns true if all coroutines finished
+bool are_all_done(std::span<i2c_coroutine>);
+
+i2c_coroutine round_robin_run(std::span<i2c_coroutine> coros)
+{
+    std::size_t i = 0;
+    while(true){
+        i2c_coroutine& coro = coros[i];
+        i = (i+1)%coros.size();
+
+        if(coro.done()){
+            if(i == 0 && are_all_done(coros)){
+                co_return;
+            }
+            continue;
+        }
+
+        co_await coro.tick();
+    }
+}
+
+```
 
 This can transfer to any interaction with our peripherals, we can model that interaction as a suspension of the coroutine, and let the suspension process initiate the operation with the peripheral, or simply give us a request for some operation.