Refactor CropBox and PassThrough with FunctorFilter #4892

tin1254 · 2021-08-11T20:39:54Z

Description

Explore approaches to integrate FunctorFilter to the existing filters
Improve FunctorFilter API
Refactor CropBox and PassThrough with FunctorFilter
Reduce boilerplate code
Implement benchmark for comparing new and old implementations
Improve runtime performance

TODO

Apply the finalized code design

* Add separate loop for not dense point cloud * Change passing FunctionObject to setFunctionObject by value to move

* Implement experimental::CropBox * Copy set up the same CropBox unit tests

* Implement experimental::PassThrough * Copy set up the same PassThrough unit tests

mvieth · 2021-08-12T20:26:00Z

You chose the approach to let PassThrough inherit from a FunctorFilter (same for CropBox, I am going to use PassThrough as an example). An alternative could be to create a FunctorFilter object inside the applyFilter function of PassThrough, e.g. with a lambda as a functor. An advantage I see would be that the internals are better concealed that way. Assuming API and behaviour stay the same, we could even consider adapting the existing PassThrough instead of creating a new one in experimental.
Thoughts on this?

mvieth · 2021-08-12T20:29:39Z

filters/include/pcl/filters/experimental/passthrough.h

+    const std::uint8_t* pt_data = reinterpret_cast<const std::uint8_t*>(&cloud.at(idx));
+    memcpy(&field_value, pt_data + *field_offset_, sizeof(float));
+    if (!std::isfinite(field_value))
+      return *negative_;


Why do you not return false here?

returning negative_ can make sure nan points are always removed in FunctorFilter
(L99 will be always false with any value of negative_)

pcl/filters/include/pcl/filters/experimental/functor_filter.h

Lines 97 to 106 in fad821e

for (const auto index : *indices_) {

// function object returns true for points that should be selected

if (negative_ != functionObject_(*input_, index)) {

indices.push_back(index);

}

else if (extract_removed_indices_) {

removed_indices_->push_back(index);

}

}

}

in the original PassThrough filtering nan points is not affected by negative_

pcl/filters/include/pcl/filters/impl/passthrough.hpp

Lines 55 to 71 in fad821e

if (filter_field_name_.empty ())

{

// Only filter for non-finite entries then

for (const auto ii : *indices_) // ii = input index

{

// Non-finite entries are always passed to removed indices

if (!std::isfinite ((*input_)[ii].x) ||

!std::isfinite ((*input_)[ii].y) ||

!std::isfinite ((*input_)[ii].z))

{

if (extract_removed_indices_)

(*removed_indices_)[rii++] = ii;

continue;

}

indices[oii++] = ii;

}

}

tin1254 · 2021-08-12T22:02:49Z

You chose the approach to let PassThrough inherit from a FunctorFilter (same for CropBox, I am going to use PassThrough as an example). An alternative could be to create a FunctorFilter object inside the applyFilter function of PassThrough, e.g. with a lambda as a functor. An advantage I see would be that the internals are better concealed that way. Assuming API and behaviour stay the same, we could even consider adapting the existing PassThrough instead of creating a new one in experimental.
Thoughts on this?

Agree, I have also thought it, but my concern is it would have two FilterIndices allocated at the same time (Both PassThrough and FunctorFilter inherit from FilterIndices)

If it is a neglectable overhead, we could also do it in that way

kunaltyagi

Tests look good, though a bit long

filters/include/pcl/filters/experimental/passthrough.h

test/filters/test_experimental_filters.cpp

kunaltyagi · 2021-08-16T00:54:27Z

filters/include/pcl/filters/experimental/crop_box.h

+    filter.setInputCloud(this->getInputCloud());
+    filter.applyFilter(indices);
+    if (this->extract_removed_indices_)
+      *removed_indices_ = *filter.getRemovedIndices(); // copy


This copy is quite a tragedy, else it looks to be shorter than the previous version

as removed_indices_ is protected, we don't have a choice except adding a new API to return non const ptr of it

Why can't we do remove_indices_ = filter.getRemovedIndices();?

Because getRemovedIndices() returns IndicesConstPtr only (and the other overload return with copying)

An option I would see is to add a function applyFilter(Indices& indices, Indices& removed_indices) to the functor filter. Indices would be placed in the given removed_indices instead of the member removed_indices_. Then that function could be called like filter.applyFilter(indices, *removed_indices). Just an idea, haven't thought it through completely.

Or we could make a change all the way up in Filter. 3 possibilities for better integration:

add ctor for removed_indices

add additional output in filter

add additional function with output arg as IndicesPtr

Which one seems the best way forward?

I couldn't come up with any use case after applying to changes to Filter, so I would prefer to limit this to functor filter only.

Maybe either:

@mvieth suggestion

add additional function with output arg as IndicesPtr in FunctorFilter

kunaltyagi · 2021-08-21T03:00:04Z

filters/include/pcl/filters/experimental/crop_box.h

+             (pt.array() <= max_pt_.array()).template head<3>().all();
+    };
+
+    auto filter = advanced::FunctorFilter<PointT, decltype(lambda)>(


Could this be static?

I found it can't easily be static, because lambda is passed only by ctor and it avoids later initialization by ctor. It makes transformation can't be updated, as assigning a new FunctorFilter is not allowed (this is the only way I know to update a static variable)

kunaltyagi · 2021-08-22T02:15:34Z

filters/include/pcl/filters/experimental/crop_box.h

+    const auto lambda = [&](const PointCloud& cloud, index_t idx) {
+      const Eigen::Vector4f pt = pt_transform * cloud.at(idx).getVector4fMap();
+      return (pt.array() >= min_pt_.array()).template head<3>().all() &&
+             (pt.array() <= max_pt_.array()).template head<3>().all();
+    };


Do you think a check for pt_transform being identity would help in increasing the speed a tiny bit?

Possibly, but I think that would be rare in real-world scenarios. More ideas: 1. make pt_transform a 3x4 matrix because the last element of pt is not used anyway. 2. if pt_transform is translation only or rotation only, compute only what is necessary instead of full matrix-vector-multiplication (these ideas can be explored later though)

Possibly, but I think that would be rare in real-world scenarios.

I usually transform my point cloud first so multiple downstream logic can run independently on it. So all crop-box/filter operations happen with no transform. Having no TF might not be so rare

Having no TF might not be so rare

I agree this is not so rare. From my experience working with lidar, I use CropBox to just remove the points far from the origin without any TFs, just to reduce the runtime for downstream tasks.

I added a check for identity transformation for now

make pt_transform a 3x4 matrix because the last element of pt is not used anyway.

I just benchmark it, and surprisingly found 4x3` is a bit slower. I guess it is due to the following conditions check can be run with SSE

return (pt.array() >= min_pt_.array()).template head<3>().all() && (pt.array() <= max_pt_.array()).template head<3>().all();

pt will be Vector3f with 3x4 matrix which may not trigger SSE

Benchmark

3x4

--------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------- 320p_CropBox 1.50 ms 1.50 ms 461 320p_FunctorCropBox 1.35 ms 1.35 ms 580 480p_CropBox 3.03 ms 3.03 ms 230 480p_FunctorCropBox 2.72 ms 2.72 ms 253 720p_CropBox 7.93 ms 7.90 ms 67 720p_FunctorCropBox 8.25 ms 8.24 ms 118 1080p_CropBox 20.8 ms 20.8 ms 49 1080p_FunctorCropBox 18.5 ms 18.5 ms 37 1440p_CropBox 38.6 ms 38.5 ms 27 1440p_FunctorCropBox 33.2 ms 33.2 ms 26

4x4

--------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------- 320p_CropBox 1.57 ms 1.57 ms 445 320p_FunctorCropBox 1.28 ms 1.28 ms 674 480p_CropBox 3.03 ms 3.03 ms 229 480p_FunctorCropBox 2.46 ms 2.46 ms 273 720p_CropBox 7.88 ms 7.88 ms 67 720p_FunctorCropBox 7.55 ms 7.54 ms 134 1080p_CropBox 21.1 ms 21.1 ms 48 1080p_FunctorCropBox 17.0 ms 17.0 ms 40 1440p_CropBox 38.3 ms 38.2 ms 27 1440p_FunctorCropBox 30.6 ms 30.6 ms 28

kunaltyagi · 2021-08-22T02:15:39Z

filters/include/pcl/filters/experimental/crop_box.h

+    filter.setInputCloud(this->getInputCloud());
+    filter.applyFilter(indices);
+    if (this->extract_removed_indices_)
+      *removed_indices_ = *filter.getRemovedIndices(); // copy


Which one seems the best way forward?

filters/include/pcl/filters/experimental/passthrough.h

BaltashovIlia · 2021-09-15T13:27:54Z

Hi,

I just tested this MR. The improvement in quality of code architecture is huge, but I haven't seen any significant filtering speedup on average. Both on benchmarks and on real data.

CPU: Ryzen 7 2700x
PCL was build with SSE instructions enabled and AVX instructions disabled since Zen+ doesn’t have real AVX instructions.

-------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
320p_PassThrough              0.508 ms        0.508 ms         1369
320p_FunctorPassThrough       0.443 ms        0.443 ms         1583
480p_PassThrough               1.48 ms         1.48 ms          469
480p_FunctorPassThrough        1.53 ms         1.53 ms          455
720p_PassThrough               4.73 ms         4.73 ms          135
720p_FunctorPassThrough        4.95 ms         4.95 ms          134
1080p_PassThrough              9.28 ms         9.28 ms           68
1080p_FunctorPassThrough       9.90 ms         9.90 ms           67
1440p_PassThrough              17.0 ms         17.0 ms           38
1440p_FunctorPassThrough       17.7 ms         17.7 ms           36

---------------------------------------------------------------
Benchmark                     Time             CPU   Iterations
---------------------------------------------------------------
320p_CropBox               1.10 ms         1.10 ms          640
320p_FunctorCropBox       0.967 ms        0.967 ms          830
480p_CropBox               2.54 ms         2.54 ms          272
480p_FunctorCropBox        2.23 ms         2.23 ms          312
720p_CropBox               6.11 ms         6.11 ms           83
720p_FunctorCropBox        7.04 ms         7.04 ms          100
1080p_CropBox              16.5 ms         16.5 ms           64
1080p_FunctorCropBox       14.5 ms         14.5 ms           46
1440p_CropBox              29.9 ms         29.9 ms           35
1440p_FunctorCropBox       26.1 ms         26.1 ms           33

tin1254 · 2021-09-15T14:51:45Z

@BaltashovIlia Thanks for trying out this work!

The main goal of using FunctorFilter in the old filter is to simplify the code structure and allowing easy parallelization support in the further (since the main loop of all upgraded filters will be the one FunctorFilter). Performance is not our main purpose but I investigated a lot of time to improve the performance simultaneously, by mainly simplifying the code, so I would expect some improvements...

I just rerun the benchmark of the new CropBox and it is still consistent with the ones before
(PassThrough is still WIP though, but I think there may be still some room for improvements after seeing your result)

---------------------------------------------------------------
Benchmark                     Time             CPU   Iterations
---------------------------------------------------------------
320p_CropBox               1.44 ms         1.44 ms          477
320p_FunctorCropBox        1.20 ms         1.20 ms          705
480p_CropBox               2.97 ms         2.97 ms          238
480p_FunctorCropBox        2.45 ms         2.45 ms          288
720p_CropBox               7.75 ms         7.74 ms           70
720p_FunctorCropBox        7.43 ms         7.43 ms          137
1080p_CropBox              20.6 ms         20.6 ms           49
1080p_FunctorCropBox       16.9 ms         16.9 ms           41
1440p_CropBox              38.4 ms         38.3 ms           27
1440p_FunctorCropBox       30.7 ms         30.7 ms           28

Compiled with default settings
System: pop-os/ubuntu 20.04
CPU: intel 4980HQ

This PR is still WIP and experimental, so feel free to give us some suggestions or opinions on our idea

BaltashovIlia · 2021-09-16T12:29:12Z

On Ryzen 5800x + GCC 9.3.0 + pcl default compilation settings:

Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
320p_PassThrough              0.396 ms        0.396 ms         1770
320p_FunctorPassThrough       0.382 ms        0.382 ms         1834
480p_PassThrough              0.849 ms        0.849 ms          821
480p_FunctorPassThrough       0.763 ms        0.763 ms          915
720p_PassThrough               2.55 ms         2.55 ms          273
720p_FunctorPassThrough        2.30 ms         2.30 ms          304
1080p_PassThrough              5.92 ms         5.92 ms          103
1080p_FunctorPassThrough       5.36 ms         5.36 ms          124
1440p_PassThrough              11.1 ms         11.1 ms           57
1440p_FunctorPassThrough       10.0 ms         10.0 ms           63

---------------------------------------------------------------
Benchmark                     Time             CPU   Iterations
---------------------------------------------------------------
320p_CropBox              0.730 ms        0.730 ms          962
320p_FunctorCropBox       0.720 ms        0.720 ms         1190
480p_CropBox               1.52 ms         1.52 ms          454
480p_FunctorCropBox        1.45 ms         1.45 ms          484
720p_CropBox               3.68 ms         3.68 ms          136
720p_FunctorCropBox        4.34 ms         4.34 ms          161
1080p_CropBox              10.5 ms         10.5 ms           95
1080p_FunctorCropBox       9.94 ms         9.94 ms           69
1440p_CropBox              19.1 ms         19.1 ms           54
1440p_FunctorCropBox       18.1 ms         18.1 ms           49

tin1254 · 2021-09-16T13:44:58Z

Glad to see at least it isn't slower than the original one on the AMD side 😌

Thanks for benchmarking it on your setup! @BaltashovIlia

tin1254 added 3 commits August 11, 2021 02:32

Improve FunctorFilter

f0eca90

* Add separate loop for not dense point cloud * Change passing FunctionObject to setFunctionObject by value to move

Implement CropBox with FunctorFilter

a750e08

* Implement experimental::CropBox * Copy set up the same CropBox unit tests

Implement PassThrough with FunctorFilter

8800b44

* Implement experimental::PassThrough * Copy set up the same PassThrough unit tests

mvieth added the module: filters label Aug 12, 2021

mvieth reviewed Aug 12, 2021

View reviewed changes

kunaltyagi reviewed Aug 14, 2021

View reviewed changes

filters/include/pcl/filters/experimental/passthrough.h Show resolved Hide resolved

test/filters/test_experimental_filters.cpp Outdated Show resolved Hide resolved

[WIP] Implement CropBox prototype

f30572a

tin1254 force-pushed the refactor_crop_box branch from 7f98d23 to f30572a Compare August 14, 2021 21:28

kunaltyagi reviewed Aug 16, 2021

View reviewed changes

tin1254 force-pushed the refactor_crop_box branch from 89c66f8 to 294593a Compare August 17, 2021 23:17

kunaltyagi reviewed Aug 21, 2021

View reviewed changes

tin1254 force-pushed the refactor_crop_box branch 4 times, most recently from e34169b to 955c885 Compare August 22, 2021 00:29

kunaltyagi reviewed Aug 22, 2021

View reviewed changes

tin1254 added 6 commits August 23, 2021 00:06

Improve CropBox unit test

d87e134

Add reasoning behind return negative

3a82760

Implement benchmark for functor PassThrough

3e34a63

Implement benchmark for functor CropBox

bcfbf45

Declare functor filter as static

cd3efee

Increases benchmark point cloud size

994a389

tin1254 force-pushed the refactor_crop_box branch from 955c885 to 994a389 Compare August 23, 2021 00:03

Improve passthrough functor comment

3af4b14

tin1254 changed the title ~~Refactor CropBox and PassThrough~~ Refactor CropBox and PassThrough with FunctorFilter Aug 23, 2021

tin1254 added 3 commits September 5, 2021 22:31

Check filters unit test input argument

5f1b5d7

Improve performance by checking transformation

c06e169

Remove static from FuntorFilter

2e3502d

Improve performance by switching to Matrix4f

6e1e225

	for (const auto index : *indices_) {
	// function object returns true for points that should be selected
	if (negative_ != functionObject_(*input_, index)) {
	indices.push_back(index);
	}
	else if (extract_removed_indices_) {
	removed_indices_->push_back(index);
	}
	}
	}

	if (filter_field_name_.empty ())
	{
	// Only filter for non-finite entries then
	for (const auto ii : *indices_) // ii = input index
	{
	// Non-finite entries are always passed to removed indices
	if (!std::isfinite ((*input_)[ii].x) \|\|
	!std::isfinite ((*input_)[ii].y) \|\|
	!std::isfinite ((*input_)[ii].z))
	{
	if (extract_removed_indices_)
	(*removed_indices_)[rii++] = ii;
	continue;
	}
	indices[oii++] = ii;
	}
	}

Uh oh!

Refactor CropBox and PassThrough with FunctorFilter #4892

Are you sure you want to change the base?

Refactor CropBox and PassThrough with FunctorFilter #4892

Uh oh!

Conversation

tin1254 commented Aug 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related

TODO

Uh oh!

mvieth commented Aug 12, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tin1254 commented Aug 12, 2021

Uh oh!

kunaltyagi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tin1254 Sep 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kunaltyagi Aug 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tin1254 Sep 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Benchmark

3x4

4x4

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

BaltashovIlia commented Sep 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tin1254 commented Sep 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BaltashovIlia commented Sep 16, 2021

Uh oh!

tin1254 commented Sep 16, 2021

Uh oh!

Uh oh!

tin1254 commented Aug 11, 2021 •

edited

Loading

tin1254 Sep 5, 2021 •

edited

Loading

kunaltyagi Aug 23, 2021 •

edited

Loading

tin1254 Sep 5, 2021 •

edited

Loading

BaltashovIlia commented Sep 15, 2021 •

edited

Loading

tin1254 commented Sep 15, 2021 •

edited

Loading