Skip to content

Introduce BlockSize #3716

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 33 commits into
base: main
Choose a base branch
from
Open

Conversation

schnellerhase
Copy link
Contributor

@schnellerhase schnellerhase commented Apr 27, 2025

In performance critical parts some block sizes are optimized for by compiling explicit versions with the block size being provided as a compile time constant. At the same time general runtime block sizes are supported through an argument to these functions.

This causes

  1. Code duplication: one path for the runtime and one for the compile time definitions of the block sizes, and
  2. duplicate input of the block sizes: once as template argument once as argument (matching of both is only asserted does not raise in release due to performance impact)

Introduces a BlockSize concept that either holds a runtime int or a compile time std::integral_constant<int, bs> which allows to generate code paths explicitly for certain sizes, while maintaining a shared code path in both cases.

  • form packing optimizes for block sizes 1,2,3 - vector assembly for 1,3: is this miss match intentional?
  • matrix operation routines

@jhale
Copy link
Member

jhale commented Apr 27, 2025

Looks very nice. Could we review the basic approach before you spend lots more time on it?

@schnellerhase
Copy link
Contributor Author

Sure thing. Should be good to go as is and can be extended further when approved. One neat byproduct, that these changes would allow for, are non compile time sized operations on the MatrixCSR which we are currently missing.

@schnellerhase schnellerhase marked this pull request as ready for review April 27, 2025 18:50
@chrisrichardson chrisrichardson self-requested a review April 28, 2025 15:26
Copy link
Contributor

@chrisrichardson chrisrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.

@garth-wells
Copy link
Member

Looks really neat.

  • Should the name be more generic, it's basically a runtime or templated integer. I can think of applications outside of block size, e.g. geometric dimension, where it could be useful.
  • Should it support different integer types?
  • Could tests be added to check that when it's a compile time integer that it really is a compiler time integer?

@schnellerhase
Copy link
Contributor Author

For points 1 and 2 that should be no problem - how about: ConstexprType as name for the general concept?

Regarding 3: the interface to retrieve the value (here block_size) needs to be able to produce both a runtime value and a compile time value. Therefore it can not be marked constexpr. Testing for in lining of the compile time variant is also not straight forward as this remains in all cases a compiler decision. Best way to check for its effect, I assume, would be with a benchmark of those cases.

@garth-wells
Copy link
Member

Regarding 3: the interface to retrieve the value (here block_size) needs to be able to produce both a runtime value and a compile time value. Therefore it can not be marked constexpr. Testing for in lining of the compile time variant is also not straight forward as this remains in all cases a compiler decision. Best way to check for its effect, I assume, would be with a benchmark of those cases.

I don't like relying on the compiler to inline things that we know are known at compile time. We have avoided this in the past and preferred being explicit over relying on the compiler and then not knowing what the compiler does.

@schnellerhase
Copy link
Contributor Author

It would be best if the block_size/value function would be constexpr for the compile time case. I will try if I can recover that behaviour.

@schnellerhase
Copy link
Contributor Author

schnellerhase commented Apr 30, 2025

It think I have a fix: value(ConxtexprType<T, V>) is now constexpr for is_compile_v<T, V> == True and otherwise not. The test case showcases that we can assert during compile time now. (Block size is not yet adapted).

@jhale
Copy link
Member

jhale commented Jul 18, 2025

I have reviewed the PR and the above discussion and all comments have been addressed.

@jhale jhale added this pull request to the merge queue Jul 18, 2025
@garth-wells garth-wells removed this pull request from the merge queue due to a manual request Jul 18, 2025
@garth-wells
Copy link
Member

I’d like to discuss this one more. Not quite convinced that it’s a simplification.

for (int k = 0; k < _bs; ++k)
coeffs[pos_c + k] = v[pos_v + k];
}
int bs = block_size(_bs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of passing BlockSize auto _bs if the implementation using int bs?

Copy link
Contributor Author

@schnellerhase schnellerhase Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It allows for inovocations with both a compile time block size, of std::integral_constant type, or a runtime one, of int type. In both cases however we then extract an integral block size bs which gets used for the computations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that int bs; isn't constexpr

const auto [dmap1, _bs1, cells1] = dofmap1;

int bs0 = block_size(_bs0);
int bs1 = block_size(_bs1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't constexpr, so we lose the compile-time 'const-ness'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We loose the constexpr int = ... version. However in the case of a compile time value the block_size callback has signature constexpr int block_size(). Meaning the compiler will see int bs = constexpr int.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The block size get assigned to the runtime int - it's not clear or guaranteed that the compiler can exploit that the block size is a compile time known.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type of the unpacked value is now changed to auto. Therefore the type is inferred from the return type of the block_size function, which is int for the runtime case and constexpr int for a compile time value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants