Skip to content

Refactor CropBox and PassThrough with FunctorFilter #4892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

tin1254
Copy link
Contributor

@tin1254 tin1254 commented Aug 11, 2021

Description

  • Explore approaches to integrate FunctorFilter to the existing filters
  • Improve FunctorFilter API
  • Refactor CropBox and PassThrough with FunctorFilter
  • Reduce boilerplate code
  • Implement benchmark for comparing new and old implementations
  • Improve runtime performance

Related

TODO

  • Apply the finalized code design

* Add separate loop for not dense point cloud

* Change passing FunctionObject to setFunctionObject by value to move
* Implement experimental::CropBox

* Copy set up the same CropBox unit tests
* Implement experimental::PassThrough

* Copy set up the same PassThrough unit tests
@mvieth
Copy link
Member

mvieth commented Aug 12, 2021

You chose the approach to let PassThrough inherit from a FunctorFilter (same for CropBox, I am going to use PassThrough as an example). An alternative could be to create a FunctorFilter object inside the applyFilter function of PassThrough, e.g. with a lambda as a functor. An advantage I see would be that the internals are better concealed that way. Assuming API and behaviour stay the same, we could even consider adapting the existing PassThrough instead of creating a new one in experimental.
Thoughts on this?

const std::uint8_t* pt_data = reinterpret_cast<const std::uint8_t*>(&cloud.at(idx));
memcpy(&field_value, pt_data + *field_offset_, sizeof(float));
if (!std::isfinite(field_value))
return *negative_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you not return false here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returning negative_ can make sure nan points are always removed in FunctorFilter
(L99 will be always false with any value of negative_)

for (const auto index : *indices_) {
// function object returns true for points that should be selected
if (negative_ != functionObject_(*input_, index)) {
indices.push_back(index);
}
else if (extract_removed_indices_) {
removed_indices_->push_back(index);
}
}
}

in the original PassThrough filtering nan points is not affected by negative_

if (filter_field_name_.empty ())
{
// Only filter for non-finite entries then
for (const auto ii : *indices_) // ii = input index
{
// Non-finite entries are always passed to removed indices
if (!std::isfinite ((*input_)[ii].x) ||
!std::isfinite ((*input_)[ii].y) ||
!std::isfinite ((*input_)[ii].z))
{
if (extract_removed_indices_)
(*removed_indices_)[rii++] = ii;
continue;
}
indices[oii++] = ii;
}
}

@tin1254
Copy link
Contributor Author

tin1254 commented Aug 12, 2021

You chose the approach to let PassThrough inherit from a FunctorFilter (same for CropBox, I am going to use PassThrough as an example). An alternative could be to create a FunctorFilter object inside the applyFilter function of PassThrough, e.g. with a lambda as a functor. An advantage I see would be that the internals are better concealed that way. Assuming API and behaviour stay the same, we could even consider adapting the existing PassThrough instead of creating a new one in experimental.
Thoughts on this?

Agree, I have also thought it, but my concern is it would have two FilterIndices allocated at the same time (Both PassThrough and FunctorFilter inherit from FilterIndices)

If it is a neglectable overhead, we could also do it in that way

Copy link
Member

@kunaltyagi kunaltyagi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests look good, though a bit long

@tin1254 tin1254 force-pushed the refactor_crop_box branch from 7f98d23 to f30572a Compare August 14, 2021 21:28
filter.setInputCloud(this->getInputCloud());
filter.applyFilter(indices);
if (this->extract_removed_indices_)
*removed_indices_ = *filter.getRemovedIndices(); // copy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This copy is quite a tragedy, else it looks to be shorter than the previous version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as removed_indices_ is protected, we don't have a choice except adding a new API to return non const ptr of it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we do remove_indices_ = filter.getRemovedIndices();?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because getRemovedIndices() returns IndicesConstPtr only (and the other overload return with copying)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An option I would see is to add a function applyFilter(Indices& indices, Indices& removed_indices) to the functor filter. Indices would be placed in the given removed_indices instead of the member removed_indices_. Then that function could be called like filter.applyFilter(indices, *removed_indices). Just an idea, haven't thought it through completely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we could make a change all the way up in Filter. 3 possibilities for better integration:

  • add ctor for removed_indices
  • add additional output in filter
  • add additional function with output arg as IndicesPtr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which one seems the best way forward?

Copy link
Contributor Author

@tin1254 tin1254 Sep 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't come up with any use case after applying to changes to Filter, so I would prefer to limit this to functor filter only.

Maybe either:

  • @mvieth suggestion
  • add additional function with output arg as IndicesPtr in FunctorFilter

@tin1254 tin1254 force-pushed the refactor_crop_box branch from 89c66f8 to 294593a Compare August 17, 2021 23:17
(pt.array() <= max_pt_.array()).template head<3>().all();
};

auto filter = advanced::FunctorFilter<PointT, decltype(lambda)>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be static?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it can't easily be static, because lambda is passed only by ctor and it avoids later initialization by ctor. It makes transformation can't be updated, as assigning a new FunctorFilter is not allowed (this is the only way I know to update a static variable)

@tin1254 tin1254 force-pushed the refactor_crop_box branch 4 times, most recently from e34169b to 955c885 Compare August 22, 2021 00:29
Comment on lines 148 to 151
const auto lambda = [&](const PointCloud& cloud, index_t idx) {
const Eigen::Vector4f pt = pt_transform * cloud.at(idx).getVector4fMap();
return (pt.array() >= min_pt_.array()).template head<3>().all() &&
(pt.array() <= max_pt_.array()).template head<3>().all();
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think a check for pt_transform being identity would help in increasing the speed a tiny bit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, but I think that would be rare in real-world scenarios. More ideas: 1. make pt_transform a 3x4 matrix because the last element of pt is not used anyway. 2. if pt_transform is translation only or rotation only, compute only what is necessary instead of full matrix-vector-multiplication (these ideas can be explored later though)

Copy link
Member

@kunaltyagi kunaltyagi Aug 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, but I think that would be rare in real-world scenarios.

I usually transform my point cloud first so multiple downstream logic can run independently on it. So all crop-box/filter operations happen with no transform. Having no TF might not be so rare

Copy link
Contributor Author

@tin1254 tin1254 Sep 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having no TF might not be so rare

I agree this is not so rare. From my experience working with lidar, I use CropBox to just remove the points far from the origin without any TFs, just to reduce the runtime for downstream tasks.

I added a check for identity transformation for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. make pt_transform a 3x4 matrix because the last element of pt is not used anyway.

I just benchmark it, and surprisingly found 4x3` is a bit slower. I guess it is due to the following conditions check can be run with SSE

return (pt.array() >= min_pt_.array()).template head<3>().all() &&
       (pt.array() <= max_pt_.array()).template head<3>().all();

pt will be Vector3f with 3x4 matrix which may not trigger SSE

Benchmark

3x4

---------------------------------------------------------------
Benchmark                     Time             CPU   Iterations
---------------------------------------------------------------
320p_CropBox               1.50 ms         1.50 ms          461
320p_FunctorCropBox        1.35 ms         1.35 ms          580
480p_CropBox               3.03 ms         3.03 ms          230
480p_FunctorCropBox        2.72 ms         2.72 ms          253
720p_CropBox               7.93 ms         7.90 ms           67
720p_FunctorCropBox        8.25 ms         8.24 ms          118
1080p_CropBox              20.8 ms         20.8 ms           49
1080p_FunctorCropBox       18.5 ms         18.5 ms           37
1440p_CropBox              38.6 ms         38.5 ms           27
1440p_FunctorCropBox       33.2 ms         33.2 ms           26

4x4

---------------------------------------------------------------
Benchmark                     Time             CPU   Iterations
---------------------------------------------------------------
320p_CropBox               1.57 ms         1.57 ms          445
320p_FunctorCropBox        1.28 ms         1.28 ms          674
480p_CropBox               3.03 ms         3.03 ms          229
480p_FunctorCropBox        2.46 ms         2.46 ms          273
720p_CropBox               7.88 ms         7.88 ms           67
720p_FunctorCropBox        7.55 ms         7.54 ms          134
1080p_CropBox              21.1 ms         21.1 ms           48
1080p_FunctorCropBox       17.0 ms         17.0 ms           40
1440p_CropBox              38.3 ms         38.2 ms           27
1440p_FunctorCropBox       30.6 ms         30.6 ms           28

filter.setInputCloud(this->getInputCloud());
filter.applyFilter(indices);
if (this->extract_removed_indices_)
*removed_indices_ = *filter.getRemovedIndices(); // copy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which one seems the best way forward?

@tin1254 tin1254 force-pushed the refactor_crop_box branch from 955c885 to 994a389 Compare August 23, 2021 00:03
@tin1254 tin1254 changed the title Refactor CropBox and PassThrough Refactor CropBox and PassThrough with FunctorFilter Aug 23, 2021
@BaltashovIlia
Copy link

BaltashovIlia commented Sep 15, 2021

Hi,

I just tested this MR. The improvement in quality of code architecture is huge, but I haven't seen any significant filtering speedup on average. Both on benchmarks and on real data.

CPU: Ryzen 7 2700x
PCL was build with SSE instructions enabled and AVX instructions disabled since Zen+ doesn’t have real AVX instructions.

-------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
320p_PassThrough              0.508 ms        0.508 ms         1369
320p_FunctorPassThrough       0.443 ms        0.443 ms         1583
480p_PassThrough               1.48 ms         1.48 ms          469
480p_FunctorPassThrough        1.53 ms         1.53 ms          455
720p_PassThrough               4.73 ms         4.73 ms          135
720p_FunctorPassThrough        4.95 ms         4.95 ms          134
1080p_PassThrough              9.28 ms         9.28 ms           68
1080p_FunctorPassThrough       9.90 ms         9.90 ms           67
1440p_PassThrough              17.0 ms         17.0 ms           38
1440p_FunctorPassThrough       17.7 ms         17.7 ms           36
---------------------------------------------------------------
Benchmark                     Time             CPU   Iterations
---------------------------------------------------------------
320p_CropBox               1.10 ms         1.10 ms          640
320p_FunctorCropBox       0.967 ms        0.967 ms          830
480p_CropBox               2.54 ms         2.54 ms          272
480p_FunctorCropBox        2.23 ms         2.23 ms          312
720p_CropBox               6.11 ms         6.11 ms           83
720p_FunctorCropBox        7.04 ms         7.04 ms          100
1080p_CropBox              16.5 ms         16.5 ms           64
1080p_FunctorCropBox       14.5 ms         14.5 ms           46
1440p_CropBox              29.9 ms         29.9 ms           35
1440p_FunctorCropBox       26.1 ms         26.1 ms           33

@tin1254
Copy link
Contributor Author

tin1254 commented Sep 15, 2021

@BaltashovIlia Thanks for trying out this work!

The main goal of using FunctorFilter in the old filter is to simplify the code structure and allowing easy parallelization support in the further (since the main loop of all upgraded filters will be the one FunctorFilter). Performance is not our main purpose but I investigated a lot of time to improve the performance simultaneously, by mainly simplifying the code, so I would expect some improvements...

I just rerun the benchmark of the new CropBox and it is still consistent with the ones before
(PassThrough is still WIP though, but I think there may be still some room for improvements after seeing your result)

---------------------------------------------------------------
Benchmark                     Time             CPU   Iterations
---------------------------------------------------------------
320p_CropBox               1.44 ms         1.44 ms          477
320p_FunctorCropBox        1.20 ms         1.20 ms          705
480p_CropBox               2.97 ms         2.97 ms          238
480p_FunctorCropBox        2.45 ms         2.45 ms          288
720p_CropBox               7.75 ms         7.74 ms           70
720p_FunctorCropBox        7.43 ms         7.43 ms          137
1080p_CropBox              20.6 ms         20.6 ms           49
1080p_FunctorCropBox       16.9 ms         16.9 ms           41
1440p_CropBox              38.4 ms         38.3 ms           27
1440p_FunctorCropBox       30.7 ms         30.7 ms           28

Compiled with default settings
System: pop-os/ubuntu 20.04
CPU: intel 4980HQ

This PR is still WIP and experimental, so feel free to give us some suggestions or opinions on our idea

@BaltashovIlia
Copy link

On Ryzen 5800x + GCC 9.3.0 + pcl default compilation settings:

Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
320p_PassThrough              0.396 ms        0.396 ms         1770
320p_FunctorPassThrough       0.382 ms        0.382 ms         1834
480p_PassThrough              0.849 ms        0.849 ms          821
480p_FunctorPassThrough       0.763 ms        0.763 ms          915
720p_PassThrough               2.55 ms         2.55 ms          273
720p_FunctorPassThrough        2.30 ms         2.30 ms          304
1080p_PassThrough              5.92 ms         5.92 ms          103
1080p_FunctorPassThrough       5.36 ms         5.36 ms          124
1440p_PassThrough              11.1 ms         11.1 ms           57
1440p_FunctorPassThrough       10.0 ms         10.0 ms           63
---------------------------------------------------------------
Benchmark                     Time             CPU   Iterations
---------------------------------------------------------------
320p_CropBox              0.730 ms        0.730 ms          962
320p_FunctorCropBox       0.720 ms        0.720 ms         1190
480p_CropBox               1.52 ms         1.52 ms          454
480p_FunctorCropBox        1.45 ms         1.45 ms          484
720p_CropBox               3.68 ms         3.68 ms          136
720p_FunctorCropBox        4.34 ms         4.34 ms          161
1080p_CropBox              10.5 ms         10.5 ms           95
1080p_FunctorCropBox       9.94 ms         9.94 ms           69
1440p_CropBox              19.1 ms         19.1 ms           54
1440p_FunctorCropBox       18.1 ms         18.1 ms           49

@tin1254
Copy link
Contributor Author

tin1254 commented Sep 16, 2021

Glad to see at least it isn't slower than the original one on the AMD side 😌

Thanks for benchmarking it on your setup! @BaltashovIlia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants