Add OpenMP Parallelization to FarthestPointSampling #6287

alexnavtt · 2025-06-03T04:12:03Z

FarthestPointSampling is a relatively recent addition to the filters module and has not yet been upgraded with a parallel implementation. The inner loop which recalculates the maximum minimum distance involves a large number of (mostly) independent operations, making it a good candidate for parallelization. The outer loop is serial due to the nature of the filter and so cannot be parallelized.

This PR adds OpenMP-based parallelization to that inner loop, and introduces an API to allow the user to select the number of threads, with the default being the maximum CPU thread count. I also had another lock-free version where each thread would independently calculate it's own maximum and then all the results would be reduced at the end, but that performed about the same but had more complicated (i.e. bug-prone) control flow.

The following code was used to benchmark performance:

#include <chrono>
#include <iostream>
#include <pcl/conversions.h>
#include <pcl/io/vtk_lib_io.h>
#include <pcl/filters/farthest_point_sampling.h>

int main() {
    pcl::PolygonMesh mesh;
    pcl::io::loadPolygonFile("/home/alex/Downloads/Espeon.stl", mesh);

    auto vertices = pcl::PointCloud<pcl::PointXYZ>{}.makeShared();
    pcl::fromPCLPointCloud2(mesh.cloud, *vertices);
    std::cout << "Loaded pointcloud with " << vertices->size() << " points\n";

    pcl::FarthestPointSampling<pcl::PointXYZ> farthest_point_sampler;
    farthest_point_sampler.setInputCloud(vertices);
    
    for (unsigned int nr_threads : {1, 2, 5, 10, 0}) {
        farthest_point_sampler.setNumberOfThreads(nr_threads);
        std::cout << farthest_point_sampler.getNumberOfThreads() << " thread" << (farthest_point_sampler.getNumberOfThreads() > 1 ? "s" : "") << ":\n";
        for (std::size_t sample_count = 10; sample_count <= 10000; sample_count *= 10) {
            farthest_point_sampler.setSample(sample_count);
            std::size_t total_milliseconds = 0;
            for (std::size_t trial_idx = 0; trial_idx < 10; trial_idx++) {
                pcl::Indices indices;
                farthest_point_sampler.setSeed(0);
                auto start = std::chrono::steady_clock::now();
                farthest_point_sampler.filter(indices);
                auto stop = std::chrono::steady_clock::now();
                total_milliseconds += std::chrono::duration_cast<std::chrono::milliseconds>(stop - start).count();
            }

            std::cout << std::setw(8) << sample_count << ": " << total_milliseconds/100 << " ms\n"; 
        }
    }
}

which gave this output, where the number on the left is the number of samples drawn at each trial, and the number on the right is the average execution time over 10 trials:

Loaded pointcloud with 638087 points
1 thread:
      10: 2 ms
     100: 23 ms
    1000: 246 ms
   10000: 2133 ms
2 threads:
      10: 1 ms
     100: 11 ms
    1000: 111 ms
   10000: 1124 ms
5 threads:
      10: 0 ms
     100: 5 ms
    1000: 58 ms
   10000: 598 ms
10 threads:
      10: 0 ms
     100: 3 ms
    1000: 35 ms
   10000: 353 ms
20 threads:
      10: 0 ms
     100: 2 ms
    1000: 28 ms
   10000: 337 ms

filters/include/pcl/filters/farthest_point_sampling.h

filters/include/pcl/filters/impl/farthest_point_sampling.hpp

filters/include/pcl/filters/farthest_point_sampling.h

mvieth · 2025-06-03T08:56:53Z

filters/include/pcl/filters/impl/farthest_point_sampling.hpp

    for (std::size_t i = 0; i < size; ++i)
    {
      if (distances_to_selected_points[i] == -1.0)
        continue;
      distances_to_selected_points[i] = std::min(distances_to_selected_points[i], geometry::distance((*input_)[toCloudIndex(i)], max_index_point));
-      if (distances_to_selected_points[i] > distances_to_selected_points[next_max_index])
-        next_max_index = i;
+      if (distances_to_selected_points[i] > distances_to_selected_points[max_index]) {


This does not seem thread-safe: max_index could be changed simultaneously by another thread, which could cause this condition to evaluate to false even though it should be true. You mentioned you have another version where each thread computes its own maximum first. That seems like a good approach, could you show that code? Alternatively, we could use such an approach: https://stackoverflow.com/a/39993717/6540043 (and another example of the same idea: https://stackoverflow.com/a/23957676/6540043 )

Good catch, thanks. I implemented the method from those links, they ended up performing better than either of the two versions had before, probably due to it now being almost completely lock-free. Had some weird timing issues at high thread counts but eventually discovered it was caused by hyper-threading on my laptop.

Loaded pointcloud with 638087 points 1 thread: 10: 1 ms 100: 16 ms 1000: 167 ms 10000: 1656 ms 2 threads: 10: 0 ms 100: 8 ms 1000: 85 ms 10000: 881 ms 5 threads: 10: 0 ms 100: 4 ms 1000: 49 ms 10000: 507 ms 10 threads: 10: 0 ms 100: 3 ms 1000: 33 ms 10000: 327 ms 14 threads: 10: 0 ms 100: 2 ms 1000: 25 ms 10000: 254 ms

alexnavtt · 2025-06-04T04:11:51Z

Alright, I think I've addressed everything

mvieth

Looks good to me, thanks!

alexnavtt · 2025-06-09T13:26:33Z

Hi, bumping on this in case there's anything else that needs to be done before the merge

mvieth · 2025-06-10T08:21:12Z

Hi, bumping on this in case there's anything else that needs to be done before the merge

I am happy with the changes, not sure if @larshg has any further requests. However, I should mention that we cannot immediately merge this pull request because by adding the nr_threads_ field, the ABI of the FarthestPointSampling class changes, and we are aiming to keep the next PCL release (1.15.1) ABI and API compatible to the last PCL release (1.15.0). So earliest we can merge this PR is directly after the 1.15.1 release, which is probably in a few weeks (exact date is not fixed yet).

alexnavtt · 2025-06-10T13:39:15Z

Alright sounds good. Thanks for the update

larshg · 2025-06-11T08:00:57Z

filters/include/pcl/filters/farthest_point_sampling.h

+        setNumberOfThreads (unsigned int nr_threads)
+        {
+          #ifdef _OPENMP
+          nr_threads_ = nr_threads == 0 ? omp_get_num_procs() : nr_threads;


Suggested change

nr_threads_ = nr_threads == 0 ? omp_get_num_procs() : nr_threads;

nr_threads_ = nr_threads != 0 ? nr_threads : omp_get_num_procs();

To keep it more consistent with existing

Missed this one 😄

larshg · 2025-06-11T08:03:07Z

Maybe also change nr_threads_ to num_threads_ (used in most other classes)

alexnavtt added 4 commits June 2, 2025 21:54

changed behavior to index into user indices instead of total pointcloud

2b05c57

made index mapping lambda take auto parameter

2295d0b

Added parallel implementation of farthest_point_sampling

8bad843

Added default(none) to the omp parallel clause

7f6cff9

larshg reviewed Jun 3, 2025

View reviewed changes

filters/include/pcl/filters/farthest_point_sampling.h Outdated Show resolved Hide resolved

larshg reviewed Jun 3, 2025

View reviewed changes

filters/include/pcl/filters/impl/farthest_point_sampling.hpp Outdated Show resolved Hide resolved

larshg reviewed Jun 3, 2025

View reviewed changes

filters/include/pcl/filters/farthest_point_sampling.h Outdated Show resolved Hide resolved

larshg reviewed Jun 3, 2025

View reviewed changes

filters/include/pcl/filters/farthest_point_sampling.h Outdated Show resolved Hide resolved

larshg added this to the pcl-1.16.0 milestone Jun 3, 2025

mvieth added changelog: enhancement Meta-information for changelog generation module: filters labels Jun 3, 2025

mvieth reviewed Jun 3, 2025

View reviewed changes

alexnavtt added 2 commits June 3, 2025 21:08

Fixed first set of review comments

7f2a4c6

Fixed race condition and added separate reduction section

030910f

mvieth previously approved these changes Jun 5, 2025

View reviewed changes

larshg reviewed Jun 11, 2025

View reviewed changes

Renamed nr_threads_ to num_threads_

c92b5d0

alexnavtt dismissed mvieth’s stale review via c92b5d0 June 12, 2025 01:36

Switched num_threads_ == 0 to num_threads != 0

193f78c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add OpenMP Parallelization to FarthestPointSampling #6287

Add OpenMP Parallelization to FarthestPointSampling #6287

alexnavtt commented Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mvieth Jun 3, 2025

Uh oh!

alexnavtt Jun 4, 2025 •

edited

Loading

Uh oh!

alexnavtt Jun 4, 2025

Uh oh!

alexnavtt commented Jun 4, 2025

Uh oh!

mvieth left a comment

Uh oh!

alexnavtt commented Jun 9, 2025

Uh oh!

mvieth commented Jun 10, 2025

Uh oh!

alexnavtt commented Jun 10, 2025

Uh oh!

larshg Jun 11, 2025

Uh oh!

larshg Jun 12, 2025

Uh oh!

larshg commented Jun 11, 2025

Uh oh!

Uh oh!

	nr_threads_ = nr_threads == 0 ? omp_get_num_procs() : nr_threads;
	nr_threads_ = nr_threads != 0 ? nr_threads : omp_get_num_procs();

Uh oh!

Add OpenMP Parallelization to FarthestPointSampling #6287

Are you sure you want to change the base?

Add OpenMP Parallelization to FarthestPointSampling #6287

Conversation

alexnavtt commented Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mvieth Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

alexnavtt Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexnavtt Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

alexnavtt commented Jun 4, 2025

Uh oh!

mvieth left a comment

Choose a reason for hiding this comment

Uh oh!

alexnavtt commented Jun 9, 2025

Uh oh!

mvieth commented Jun 10, 2025

Uh oh!

alexnavtt commented Jun 10, 2025

Uh oh!

larshg Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

larshg Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

larshg commented Jun 11, 2025

Uh oh!

Uh oh!

alexnavtt Jun 4, 2025 •

edited

Loading