-
Notifications
You must be signed in to change notification settings - Fork 222
Env-based API for CUB part 3/3 #4877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Env-based API for CUB part 3/3 #4877
Conversation
🟨 CI finished in 2h 01m: Pass: 80%/187 | Total: 3d 01h | Avg: 23m 31s | Max: 1h 27m | Hits: 85%/251419
|
Project | |
---|---|
CCCL Infrastructure | |
CCCL Packaging | |
+/- | libcu++ |
+/- | CUB |
Thrust | |
+/- | CUDA Experimental |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | CCCL Packaging |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 187)
# | Runner |
---|---|
129 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
12 | linux-arm64-cpu16 |
12 | linux-amd64-gpu-rtxa6000-latest-1 |
11 | linux-amd64-gpu-rtx2080-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
3661201
to
603ec97
Compare
🟨 CI finished in 2h 04m: Pass: 93%/187 | Total: 3d 10h | Avg: 26m 26s | Max: 1h 30m | Hits: 85%/280760
|
Project | |
---|---|
CCCL Infrastructure | |
CCCL Packaging | |
+/- | libcu++ |
+/- | CUB |
Thrust | |
+/- | CUDA Experimental |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | CCCL Packaging |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 187)
# | Runner |
---|---|
129 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
12 | linux-arm64-cpu16 |
12 | linux-amd64-gpu-rtxa6000-latest-1 |
11 | linux-amd64-gpu-rtx2080-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
1f09d4e
to
7ff6bf6
Compare
🟩 CI finished in 1h 40m: Pass: 100%/187 | Total: 2d 04h | Avg: 17m 00s | Max: 1h 23m | Hits: 74%/294458
|
Project | |
---|---|
CCCL Infrastructure | |
CCCL Packaging | |
+/- | libcu++ |
+/- | CUB |
Thrust | |
+/- | CUDA Experimental |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | CCCL Packaging |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 187)
# | Runner |
---|---|
129 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
12 | linux-arm64-cpu16 |
12 | linux-amd64-gpu-rtxa6000-latest-1 |
11 | linux-amd64-gpu-rtx2080-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
🟩 CI finished in 2h 00m: Pass: 100%/187 | Total: 3d 18h | Avg: 29m 01s | Max: 1h 20m | Hits: 75%/294458
|
Project | |
---|---|
CCCL Infrastructure | |
CCCL Packaging | |
+/- | libcu++ |
+/- | CUB |
Thrust | |
+/- | CUDA Experimental |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | CCCL Packaging |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 187)
# | Runner |
---|---|
129 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
12 | linux-arm64-cpu16 |
12 | linux-amd64-gpu-rtxa6000-latest-1 |
11 | linux-amd64-gpu-rtx2080-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed everything except catch2_test_env_launch_helper.h
which seems to duplicate code from the existing catch2_test_launch_helper.h
and I have not assessed the whether that incurs any new technical dept.
|
||
// RFA is only supported for float and double accumulators | ||
constexpr bool is_float_or_double = _CUDA_VSTD::is_same_v<accum_t, float> || _CUDA_VSTD::is_same_v<accum_t, double>; | ||
constexpr bool is_sum = _CUDA_VSTD::is_same_v<ReductionOpT, ::cuda::std::plus<>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important: Use this to also detect plus<T>
constexpr bool is_sum = _CUDA_VSTD::is_same_v<ReductionOpT, ::cuda::std::plus<>>; | |
constexpr bool is_sum = detail::reduce::is_plus<ReductionOpT>; |
Description
closes #2126 and works towards #3855
This PR adds one instance of env-based single-phase overload to device reduction that can be used to:
This PR is based on top of #4876, so disregard "Add require, determinism, and tuning" commit for now.
Checklist