Skip to content

Add OpenMP Parallelization to FarthestPointSampling #6287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

alexnavtt
Copy link
Contributor

FarthestPointSampling is a relatively recent addition to the filters module and has not yet been upgraded with a parallel implementation. The inner loop which recalculates the maximum minimum distance involves a large number of (mostly) independent operations, making it a good candidate for parallelization. The outer loop is serial due to the nature of the filter and so cannot be parallelized.

This PR adds OpenMP-based parallelization to that inner loop, and introduces an API to allow the user to select the number of threads, with the default being the maximum CPU thread count. I also had another lock-free version where each thread would independently calculate it's own maximum and then all the results would be reduced at the end, but that performed about the same but had more complicated (i.e. bug-prone) control flow.

The following code was used to benchmark performance:

#include <chrono>
#include <iostream>
#include <pcl/conversions.h>
#include <pcl/io/vtk_lib_io.h>
#include <pcl/filters/farthest_point_sampling.h>

int main() {
    pcl::PolygonMesh mesh;
    pcl::io::loadPolygonFile("/home/alex/Downloads/Espeon.stl", mesh);

    auto vertices = pcl::PointCloud<pcl::PointXYZ>{}.makeShared();
    pcl::fromPCLPointCloud2(mesh.cloud, *vertices);
    std::cout << "Loaded pointcloud with " << vertices->size() << " points\n";

    pcl::FarthestPointSampling<pcl::PointXYZ> farthest_point_sampler;
    farthest_point_sampler.setInputCloud(vertices);
    
    for (unsigned int nr_threads : {1, 2, 5, 10, 0}) {
        farthest_point_sampler.setNumberOfThreads(nr_threads);
        std::cout << farthest_point_sampler.getNumberOfThreads() << " thread" << (farthest_point_sampler.getNumberOfThreads() > 1 ? "s" : "") << ":\n";
        for (std::size_t sample_count = 10; sample_count <= 10000; sample_count *= 10) {
            farthest_point_sampler.setSample(sample_count);
            std::size_t total_milliseconds = 0;
            for (std::size_t trial_idx = 0; trial_idx < 10; trial_idx++) {
                pcl::Indices indices;
                farthest_point_sampler.setSeed(0);
                auto start = std::chrono::steady_clock::now();
                farthest_point_sampler.filter(indices);
                auto stop = std::chrono::steady_clock::now();
                total_milliseconds += std::chrono::duration_cast<std::chrono::milliseconds>(stop - start).count();
            }

            std::cout << std::setw(8) << sample_count << ": " << total_milliseconds/100 << " ms\n"; 
        }
    }
}

which gave this output, where the number on the left is the number of samples drawn at each trial, and the number on the right is the average execution time over 10 trials:

Loaded pointcloud with 638087 points
1 thread:
      10: 2 ms
     100: 23 ms
    1000: 246 ms
   10000: 2133 ms
2 threads:
      10: 1 ms
     100: 11 ms
    1000: 111 ms
   10000: 1124 ms
5 threads:
      10: 0 ms
     100: 5 ms
    1000: 58 ms
   10000: 598 ms
10 threads:
      10: 0 ms
     100: 3 ms
    1000: 35 ms
   10000: 353 ms
20 threads:
      10: 0 ms
     100: 2 ms
    1000: 28 ms
   10000: 337 ms

@larshg larshg added this to the pcl-1.16.0 milestone Jun 3, 2025
@mvieth mvieth added changelog: enhancement Meta-information for changelog generation module: filters labels Jun 3, 2025
for (std::size_t i = 0; i < size; ++i)
{
if (distances_to_selected_points[i] == -1.0)
continue;
distances_to_selected_points[i] = std::min(distances_to_selected_points[i], geometry::distance((*input_)[toCloudIndex(i)], max_index_point));
if (distances_to_selected_points[i] > distances_to_selected_points[next_max_index])
next_max_index = i;
if (distances_to_selected_points[i] > distances_to_selected_points[max_index]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not seem thread-safe: max_index could be changed simultaneously by another thread, which could cause this condition to evaluate to false even though it should be true. You mentioned you have another version where each thread computes its own maximum first. That seems like a good approach, could you show that code? Alternatively, we could use such an approach: https://stackoverflow.com/a/39993717/6540043 (and another example of the same idea: https://stackoverflow.com/a/23957676/6540043 )

Copy link
Contributor Author

@alexnavtt alexnavtt Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks. I implemented the method from those links, they ended up performing better than either of the two versions had before, probably due to it now being almost completely lock-free. Had some weird timing issues at high thread counts but eventually discovered it was caused by hyper-threading on my laptop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loaded pointcloud with 638087 points
1 thread:
      10: 1 ms
     100: 16 ms
    1000: 167 ms
   10000: 1656 ms
2 threads:
      10: 0 ms
     100: 8 ms
    1000: 85 ms
   10000: 881 ms
5 threads:
      10: 0 ms
     100: 4 ms
    1000: 49 ms
   10000: 507 ms
10 threads:
      10: 0 ms
     100: 3 ms
    1000: 33 ms
   10000: 327 ms
14 threads:
      10: 0 ms
     100: 2 ms
    1000: 25 ms
   10000: 254 ms

@alexnavtt
Copy link
Contributor Author

Alright, I think I've addressed everything

mvieth
mvieth previously approved these changes Jun 5, 2025
Copy link
Member

@mvieth mvieth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks!

@alexnavtt
Copy link
Contributor Author

Hi, bumping on this in case there's anything else that needs to be done before the merge

@mvieth
Copy link
Member

mvieth commented Jun 10, 2025

Hi, bumping on this in case there's anything else that needs to be done before the merge

I am happy with the changes, not sure if @larshg has any further requests. However, I should mention that we cannot immediately merge this pull request because by adding the nr_threads_ field, the ABI of the FarthestPointSampling class changes, and we are aiming to keep the next PCL release (1.15.1) ABI and API compatible to the last PCL release (1.15.0). So earliest we can merge this PR is directly after the 1.15.1 release, which is probably in a few weeks (exact date is not fixed yet).

@alexnavtt
Copy link
Contributor Author

Alright sounds good. Thanks for the update

setNumberOfThreads (unsigned int nr_threads)
{
#ifdef _OPENMP
nr_threads_ = nr_threads == 0 ? omp_get_num_procs() : nr_threads;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nr_threads_ = nr_threads == 0 ? omp_get_num_procs() : nr_threads;
nr_threads_ = nr_threads != 0 ? nr_threads : omp_get_num_procs();

To keep it more consistent with existing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed this one 😄

@larshg
Copy link
Contributor

larshg commented Jun 11, 2025

Maybe also change nr_threads_ to num_threads_ (used in most other classes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog: enhancement Meta-information for changelog generation module: filters
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants