Add support for FP16 to openBLAS and shgemm on RISCV #5290

Srangrang · 2025-06-04T12:46:00Z

-add HFLOAT16 and BUILD_HFLOAT16 macro define to distinguish BFLOAT16 and BUILD_BFLOAT16
-add shgemm for RISCV_ZVL128B and RISCV_ZVL256B
-using fp16 on RISCV requires zfh and zvfh instruction sets
-enable fp16 support in Makefile.rule

Related to issue #5279
Co-authored-by Ao Dong

…r RISCV64_ZVL256B Added HFLOAT16 support for RISCV64 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16 The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0 Related to issue OpenMathLib#5279 Co-authored-by Linjin Li <[email protected]>

Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 fo…

- modify the macro conditions in Makefile.system - Delete development test code Related to issue#5279

…develop

martin-frbg · 2025-06-12T21:34:34Z

I think we'll need some adjustments in interface/gemm.c to disable the "small matrix" pathway, adjust the function name used in error messages, and also keep hfloat16 out of the optimum thread number computations, now that it is separate from bfloat16. Please see attached (and add this or similar to the PR if you agree)
gemm.c.txt

Also it probably makes sense to seperate the two types to such an extent that one can be built without the other - I have further modified your version of gensymbol(.pl) to take another parameter for the BUILD_HFLOAT16, and adjusted exports/Makefile accordingly:
Makefile.txt
gensymbol.pl.txt
gensymbol.txt

martin-frbg · 2025-06-12T21:38:07Z

In similar vein, the affected files in the benchmark folder probably need adjusting as well, please check
Makefile.txt

gemm.c.txt

martin-frbg · 2025-06-12T21:50:10Z

Lastly, I found that I cannot build the "generic" replacements on a non-RISCV host unless I remove the ifneq ($(SHGEMM_UNROLL_M), $(SHGEMM_UNROLL_N)) conditional around the compiler commands for the SHGEMMINCOPYOBJ and SHGEMMITCOPYOBJ in kernel/Makefile.L3 (and I wonder if it would make sense to provide hfloat16 conversion functions in kernel/generic/gemm_kernel2x2.c similar to what is present there for bfloat16) ?

Unfortunately, I have not had the time to try a CMake build yet.

martin-frbg · 2025-06-14T21:27:34Z

cmake/utils.cmake
utils.cmake.txt
cmake/kernel.cmake
kernel.cmake.txt
kernel/CMakeLists
CMakeLists.txt
interface/CMakeLists
CMakeLists.txt

toplevel CMakeLists (added status messages, updated gensymbol calls, updated benchmark build to include B/HFLOAT16)
CMakeLists.txt

codspeed-hq · 2025-06-14T22:04:34Z

CodSpeed Performance Report

Merging #5290 will improve performances by 10.42%

_{Comparing Srangrang:develop (fb89820) with develop (02267d8)}

Summary

⚡ 1 improvements
✅ 61 untouched benchmarks

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
⚡	`test_dgemv[1000-s]`	7.7 ms	7 ms	+10.42%

Srangrang · 2025-06-15T05:52:57Z

I think we'll need some adjustments in interface/gemm.c to disable the "small matrix" pathway, adjust the function name used in error messages, and also keep hfloat16 out of the optimum thread number computations, now that it is separate from bfloat16. Please see attached (and add this or similar to the PR if you agree) gemm.c.txt

Also it probably makes sense to seperate the two types to such an extent that one can be built without the other - I have further modified your version of gensymbol(.pl) to take another parameter for the BUILD_HFLOAT16, and adjusted exports/Makefile accordingly: Makefile.txt gensymbol.pl.txt gensymbol.txt

Thank you for your comment, @martin-frbg . I agree with your modification and will incorporate your changes into this PR.

Srangrang · 2025-06-15T06:16:14Z

In similar vein, the affected files in the benchmark folder probably need adjusting as well, please check Makefile.txt

gemm.c.txt

I have reviewed the contents of the benchmark folder and noticed that references to BF16 appear to be defined as HALF. However, the modifications provided in your Makefile.txt suggest changing this to BFLOAT16. I believe this change could reduce ambiguity and more effectively differentiate between FP16 and BF16. Could we consider standardizing the use of HFLOAT16 for FP16 and BFLOAT16 for BF16 within OpenBLAS (by modifying the definition of GOTO_HF16_TARGETS to GOTO_HFLOAT16_TARGETS in this file)?

Additionally, I have identified other instances where HALF is used to define BF16, such as in the driver/level3/Makefile. Should we also adjust these to use the BFLOAT16 definition?

martin-frbg · 2025-06-15T06:41:52Z

You are right - any occurences of HALF must be leftovers from the initial BFLOAT16 code contribution that was using "half precision" and the "h" prefix throughout.

Srangrang and others added 6 commits May 24, 2025 23:55

add shgemm for RISCV_ZVL128B

2996c25

Add FP16 support for RISCV

0a96779

Merge pull request #1 from gkdddd/riscv_shgemm

fa2b08b

Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 fo…

fix: resolve the compilation failure without zfh instruction

4e1a381

- modify the macro conditions in Makefile.system - Delete development test code Related to issue#5279

Merge branch 'develop' of https://github.com/Srangrang/OpenBLAS into …

fb89820

…develop

martin-frbg added this to the 0.3.31 milestone Jun 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for FP16 to openBLAS and shgemm on RISCV #5290

Add support for FP16 to openBLAS and shgemm on RISCV #5290

Srangrang commented Jun 4, 2025

Uh oh!

martin-frbg commented Jun 12, 2025

Uh oh!

martin-frbg commented Jun 12, 2025

Uh oh!

martin-frbg commented Jun 12, 2025

Uh oh!

martin-frbg commented Jun 14, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented Jun 14, 2025

Uh oh!

Srangrang commented Jun 15, 2025

Uh oh!

Srangrang commented Jun 15, 2025 •

edited

Loading

Uh oh!

martin-frbg commented Jun 15, 2025

Uh oh!

Uh oh!

Add support for FP16 to openBLAS and shgemm on RISCV #5290

Are you sure you want to change the base?

Add support for FP16 to openBLAS and shgemm on RISCV #5290

Conversation

Srangrang commented Jun 4, 2025

Uh oh!

martin-frbg commented Jun 12, 2025

Uh oh!

martin-frbg commented Jun 12, 2025

Uh oh!

martin-frbg commented Jun 12, 2025

Uh oh!

martin-frbg commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Jun 14, 2025

CodSpeed Performance Report

Merging #5290 will improve performances by 10.42%

Summary

Benchmarks breakdown

Uh oh!

Srangrang commented Jun 15, 2025

Uh oh!

Srangrang commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martin-frbg commented Jun 15, 2025

Uh oh!

Uh oh!

martin-frbg commented Jun 14, 2025 •

edited

Loading

Srangrang commented Jun 15, 2025 •

edited

Loading