Update inputamaxv.txt #24

KR-Chandrashekara · 2024-11-28T12:06:55Z

No description provided.

* Bug Fixes in LPGEMM for AVX512(SkyLake) machine - B-matrix in bf16bf16f32obf16/f32 API is re-ordered. For machines that doesn't support BF16 instructions, the BF16 input is unre-ordered and converted to FP32 to use FP32 kernels. - For n = 1 and k = 1 sized matrices, re-ordering in BF16 is copying the matrix to the re-ordered buffer array. But the un-reordering to FP32 requires the matrix to have size multiple of 16 along n and multiple of 2 along k dimension. - The entry condition to the above has been modified for AVX512 configuration. - In bf16 API, the tiny path entry check has been modified to prevent seg fault while AOCL_ENABLE_INSTRUCTIONS=AVX2 is set in BF16 supporting machines. - Modified existing store instructions in FP32 AVX512 kernels to support execution in machines that has AVX512 support but not BF16/VNNI(SkyLake). - Added Bf16 beta and store types in FP32 avx512_256 kernels AMD Internal: [SWLCSG-3552] * Bug Fixes in LPGEMM for AVX512(SkyLake) machine - B-matrix in bf16bf16f32obf16/f32 API is re-ordered. For machines that doesn't support BF16 instructions, the BF16 input is unre-ordered and converted to FP32 to use FP32 kernels. - For n = 1 and k = 1 sized matrices, re-ordering in BF16 is copying the matrix to the re-ordered buffer array. But the un-reordering to FP32 requires the matrix to have size multiple of 16 along n and multiple of 2 along k dimension. - The entry condition to the above has been modified for AVX512 configuration. - In bf16 API, the tiny path entry check has been modified to prevent seg fault while AOCL_ENABLE_INSTRUCTIONS=AVX2 is set in BF16 supporting machines. - Modified existing store instructions in FP32 AVX512 kernels to support execution in machines that has AVX512 support but not BF16/VNNI(SkyLake). - Added Bf16 beta and store types, along with BIAS and ZP in FP32 avx512_256 kernels AMD Internal: [SWLCSG-3552] * Bug Fixes in LPGEMM for AVX512(SkyLake) machine - Support added in FP32 512_256 kerenls for : Beta, BIAS, Zero-point and BF16 store types for bf16bf16f32obf16 API execution in AVX2 mode. - B-matrix in bf16bf16f32obf16/f32 API is re-ordered. For machines that doesn't support BF16 instructions, the BF16 input is unre-ordered and converted to FP32 type to use FP32 kernels. - For n = 1 and k = 1 sized matrices, re-ordering in BF16 is copying the matrix to the re-ordered buffer array. But the un-reordering to FP32 requires the matrix to have size multiple of 16 along n and multiple of 2 along k dimension. The entry condition here has been modified for AVX512 configuration. - Fix for seg fault with AOCL_ENABLE_INSTRUCTIONS=AVX2 mode in BF16/VNNI ISA supporting configruations: - BF16 tiny path entry check has been modified to take into account arch_id to ensure improper entry into the tiny kernel. - The store in BF16->FP32 col-major for m = 1 conditions were updated to correct storage pattern, - BF16 beta load macro was modified to account for data in unaligned memory. - Modified existing store instructions in FP32 AVX512 kernels to support execution in machines that has AVX512 support but not BF16/VNNI(SkyLake) AMD Internal: [SWLCSG-3552] --------- Co-authored-by: VarshaV <[email protected]>

Update inputamaxv.txt

ddb3c56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update inputamaxv.txt #24

Update inputamaxv.txt #24

KR-Chandrashekara commented Nov 28, 2024

Uh oh!

Uh oh!

Update inputamaxv.txt #24

Are you sure you want to change the base?

Update inputamaxv.txt #24

Conversation

KR-Chandrashekara commented Nov 28, 2024

Uh oh!

Uh oh!