-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add support for FP16 to openBLAS and shgemm on RISCV #5290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
…r RISCV64_ZVL256B Added HFLOAT16 support for RISCV64 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16 The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0 Related to issue OpenMathLib#5279 Co-authored-by Linjin Li <[email protected]>
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 fo…
- modify the macro conditions in Makefile.system - Delete development test code Related to issue#5279
I think we'll need some adjustments in interface/gemm.c to disable the "small matrix" pathway, adjust the function name used in error messages, and also keep hfloat16 out of the optimum thread number computations, now that it is separate from bfloat16. Please see attached (and add this or similar to the PR if you agree) Also it probably makes sense to seperate the two types to such an extent that one can be built without the other - I have further modified your version of gensymbol(.pl) to take another parameter for the BUILD_HFLOAT16, and adjusted exports/Makefile accordingly: |
In similar vein, the affected files in the benchmark folder probably need adjusting as well, please check |
Lastly, I found that I cannot build the "generic" replacements on a non-RISCV host unless I remove the Unfortunately, I have not had the time to try a CMake build yet. |
cmake/utils.cmake toplevel CMakeLists (added status messages, updated gensymbol calls, updated benchmark build to include B/HFLOAT16) |
CodSpeed Performance ReportMerging #5290 will improve performances by 10.42%Comparing Summary
Benchmarks breakdown
|
Thank you for your comment, @martin-frbg . I agree with your modification and will incorporate your changes into this PR. |
I have reviewed the contents of the benchmark folder and noticed that references to BF16 appear to be defined as Additionally, I have identified other instances where |
You are right - any occurences of HALF must be leftovers from the initial BFLOAT16 code contribution that was using "half precision" and the "h" prefix throughout. |
-add HFLOAT16 and BUILD_HFLOAT16 macro define to distinguish BFLOAT16 and BUILD_BFLOAT16
-add shgemm for RISCV_ZVL128B and RISCV_ZVL256B
-using fp16 on RISCV requires zfh and zvfh instruction sets
-enable fp16 support in Makefile.rule
Related to issue #5279
Co-authored-by Ao Dong