Skip to content

Rebase with AOCL5.1 #32

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 231 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
231 commits
Select commit Hold shift + click to select a range
28b0982
Refactored her[2]k/syr[2]k in terms of gemmt. (#531)
devinamatthews Nov 10, 2021
7bc8ab4
Added BLAS/CBLAS APIs for axpby, gemm_batch. (#566)
Meghana-vankadari Nov 11, 2021
7bde468
Added support for addons.
fgvanzee Nov 13, 2021
78cd1b0
Added 'Example Code' section to README.md.
fgvanzee Nov 16, 2021
cbc88fe
Marked some markdown shell code blocks as 'bash'.
fgvanzee Nov 16, 2021
74c0c62
Reverted cbc88fe.
fgvanzee Nov 16, 2021
26e4b6b
Added support for AMD's Zen3 microarchitecture.
dzambare Nov 17, 2021
9be97c1
Support all four dts in test/test_her[2][k].c (#578)
madanm3 Nov 17, 2021
b727645
Merge branch 'dev'
fgvanzee Nov 19, 2021
a4bc03b
Brief mention/link to Addons.md in README.md.
fgvanzee Nov 19, 2021
12c66a4
Minor updates to README.md, docs/Addons.md.
fgvanzee Nov 19, 2021
e229e04
Added recu-sed.sh script to 'build' directory.
fgvanzee Dec 1, 2021
cf7d616
Enable user-customized packm ukernel/variant. (#549)
devinamatthews Dec 2, 2021
961d9d5
Re-add BLIS_ENABLE_ZEN_BLOCK_SIZES macro for 'zen'.
kvaragan Dec 7, 2021
54fa28b
Move edge cases to gemm ukr; more user-custom mods. (#583)
devinamatthews Dec 24, 2021
08174a2
Evict <arm_sve.h> Requirement for SVE GEMM
xrq-phys Jan 1, 2022
466b68a
Add unique tag to branch labels for Apple ARM64.
devinamatthews Jan 2, 2022
864bfab
CREDITS file update.
fgvanzee Jan 4, 2022
3f2440b
Added m, n dims to gemmd/gemmlike ukernel calls.
fgvanzee Jan 6, 2022
268ce1f
Relax alignment constraints
devinamatthews Jan 10, 2022
81f93be
Fix row-/column-major pref. in 16x8 haswell sgemm ukr (unused)
devinamatthews Jan 10, 2022
0ab20c0
the Apple local label thing is required by Clang in general
jeffhammond Jan 13, 2022
0be9282
Updated zen3 macro constant names.
fgvanzee Jan 26, 2022
35195bb
Add armclang detection to configure.
devinamatthews Jan 31, 2022
b5df181
Armv8a, ArmSVE: Simplify Gen-C
xrq-phys Feb 2, 2022
9cc897f
Fix SVE Compil.
xrq-phys Feb 3, 2022
72089bb
ArmSVE Use Predicate in M-Direction
xrq-phys Feb 5, 2022
2f3872e
ArmSVE Adopts Label Wrapper
xrq-phys Feb 7, 2022
2674291
Update CC_VENDOR logic
devinamatthews Feb 13, 2022
5a4d3f5
Use -flat_namespace option to link on macOS
devinamatthews Feb 13, 2022
2506159
Don't use `-Wl,-flat-namespace`.
devinamatthews Feb 14, 2022
ee9ff98
Move edge cases to gemmtrsm ukrs; doc updates.
fgvanzee Feb 15, 2022
c9700f3
Renamed SIMD-related macro constants for clarity.
fgvanzee Feb 15, 2022
4d83523
Add armsve to arm64 Metaconfig (#614)
xrq-phys Feb 22, 2022
d514658
ArmSVE Ensure Non-zero Block Size (#615)
xrq-phys Feb 22, 2022
84732bf
Revamp how tools are handled/checked by configure.
fgvanzee Feb 28, 2022
71851a0
Fixed level-3 performance bug in haswell ukernels.
fgvanzee Mar 8, 2022
cad1041
POWER10: edge cases in microkernel (#620)
ivan23kor Mar 10, 2022
7c07b47
Avoid gemmsup barriers when not packing A or B. (#622)
fgvanzee Mar 11, 2022
f1dbb0e
Trival whitespace change; commit log addendum.
fgvanzee Mar 11, 2022
d681000
Update Multithreading.md
devinamatthews Mar 14, 2022
0db2bd5
Added BLAS/CBLAS APIs for gemm3m. (#590)
BhaskarNallani Mar 24, 2022
1ec020b
AMD kernel updates; frame-specific AMD updates. (#597)
dzambare Mar 29, 2022
cf06364
Fixed typo in BLAS gemm3m call to _check().
fgvanzee Mar 29, 2022
bee7678
CREDITS file update.
fgvanzee Mar 31, 2022
99bb900
ReleaseNotes.md update in advance of next version.
fgvanzee Apr 1, 2022
14c86f6
Version file update (0.9.0)
fgvanzee Apr 1, 2022
88cab83
CHANGELOG update (0.9.0)
fgvanzee Apr 1, 2022
69fa915
Fixed broken "tagged releases" link in README.md.
fgvanzee Apr 1, 2022
b3e674d
README.md update to link to releases page.
fgvanzee Apr 4, 2022
ae10d94
Simplify and rewrite reference packm kernels. (#610)
devinamatthews Apr 7, 2022
9fea633
Partial addition of 'const' to all interfaces above the (micro)kernel…
devinamatthews Apr 13, 2022
6431c9e
Added missing 'const' to zen bli_gemm_small.c.
fgvanzee Apr 14, 2022
1c73340
Fix version check for znver3, which needs gcc >= 10.3 (#628)
jedbrown Apr 28, 2022
64a9b06
Fixed misspelling of 'xpbys' in gemm macrokernel.
fgvanzee May 10, 2022
4603324
Init/finalize via bli_pthread_switch_t API (#634).
fgvanzee May 19, 2022
5677289
Added SMU citation to README.md intro.
fgvanzee Jun 1, 2022
d93df02
Removed unused dt arg in bli_gks_query_ind_cntx().
fgvanzee Jun 15, 2022
d429b6b
Support clang targetting MinGW (#639)
isuruf Jun 28, 2022
667f201
Fixed type bug in bli_cntx_set_ukr_prefs().
fgvanzee Jul 7, 2022
7cba7ce
Minor cleanups, comment updates to bli_gks.c.
fgvanzee Jul 8, 2022
ffde54c
Minor changes to .gitignore and LICENSE files. (#642)
jdiamondGitHub Jul 11, 2022
98d4678
Change complex_return='intel' for ifx. (#637)
bartoldeman Jul 11, 2022
9b1beec
Use BLIS_ENABLE_COMPLEX_RETURN_INTEL in blastest files (#636)
bartoldeman Jul 12, 2022
cc260fd
Allow uniform max problem sizes in test/3/runme.sh.
fgvanzee Jul 13, 2022
17b0caa
Fixed out-of-bounds read in haswell gemmsup kernels.
fgvanzee Jul 14, 2022
af3a41e
Add autodetection for POWER7, POWER9 & POWER10 (#647)
Flamefire Jul 21, 2022
6826c1c
Add `#line` directives to flattened `blis.h`. (#643)
devinamatthews Jul 25, 2022
4dde947
Fixed out-of-bounds bug in sup s6x16m haswell kernel.
fgvanzee Jul 26, 2022
56de31b
Disable modification of KC in the gemmsup kernels. (#648)
devinamatthews Jul 27, 2022
5b29893
Removed buggy cruft from power10 subconfig.
fgvanzee Jul 28, 2022
a48e29d
CREDITS file update.
fgvanzee Jul 28, 2022
bbaf29a
Very minor variable updates to common.mk.
fgvanzee Aug 4, 2022
775148b
Updated ARMv8a kernels to fix 2 prefetching issues. (#649)
jdiamondGitHub Aug 5, 2022
9e5594a
Temporarily disabled #line directives from 6826c1c.
fgvanzee Aug 11, 2022
dfa5413
Arm64 dgemmsup with extended MR&NR (#655)
xrq-phys Aug 30, 2022
a87eae2
Added '-q' quiet mode option to testsuite. (#657)
fgvanzee Sep 6, 2022
4afe0cf
Defined invscalv, invscalm, invscald operations. (#661)
fgvanzee Sep 8, 2022
6e5431e
Fix line number issue in flattened blis.h. (#660)
devinamatthews Sep 10, 2022
cb74202
Fixed incorrect sizeof(type) in edge case macros. (#662)
fgvanzee Sep 13, 2022
fd885cf
Use kernel CFLAGS for 'kernels' subdirs in addons. (#658)
fgvanzee Sep 13, 2022
05a811e
Initialize rntm_t nt/ways fields with 1 (not -1). (#663)
fgvanzee Sep 14, 2022
63177dc
Fixed gemmlike sandbox bug introduced in 7c07b47.
fgvanzee Sep 15, 2022
e86076b
Test the 'gemmlike' sandbox via AppVeyor. (#664)
fgvanzee Sep 15, 2022
fb91337
Fixed a harmless pc_nt bug in 05a811e.
fgvanzee Sep 16, 2022
89df7b8
De-templatized _sup_var1n2m.c; unified _sup_packm_a/b(). (#659)
devinamatthews Sep 18, 2022
a1a5a9b
Implemented support for fat multithreading. (#665)
fgvanzee Sep 21, 2022
036a4f9
Refactored some rntm_t management code. (#666)
fgvanzee Sep 22, 2022
ee81efc
Parameterized test/3 drivers via command line args. (#667)
fgvanzee Sep 23, 2022
b861c71
Add consistent NaN/Inf handling in sumsqv. (#668)
devinamatthews Sep 23, 2022
42d0e66
Add AddressSanitizer (-fsanitize=address) option. (#669)
devinamatthews Sep 29, 2022
63470b4
Fix some bugs in bli_pool.c (#670)
devinamatthews Sep 29, 2022
76a23bd
Reinstate sanity check in bli_pool_finalize. (#671)
devinamatthews Oct 3, 2022
9453e0f
CREDITS file update.
fgvanzee Oct 4, 2022
23f5b8d
Shuffled checked properties in bli_l3_check.c. (#676)
fgvanzee Oct 18, 2022
88105db
Added Discord documentation (#677)
fgvanzee Oct 21, 2022
2dd692b
Fix auto-detection of firestorm (Apple M1).
devinamatthews Oct 26, 2022
c803b03
Add check to disable armsve on Apple M1.
devinamatthews Oct 26, 2022
aeb5f0c
Omnibus PR - Oct 2023 (#678)
devinamatthews Oct 27, 2022
29f79f0
Fixed performance bug caused by redundant packing. (#680)
devinamatthews Oct 31, 2022
5eea6ad
Add mention of Wilkinson Prize to README.md. (#683)
fgvanzee Nov 2, 2022
edcc2f9
Support --nosup, --sup configure options. (#684)
fgvanzee Nov 3, 2022
872898d
Fixed trmm[3]/trsm performance bug in cf7d616. (#685)
fgvanzee Nov 3, 2022
6774bf0
Fix typo in configure --help text. (#686)
leekillough Nov 3, 2022
8d813f7
Some decluttering of the top-level directory.
fgvanzee Nov 4, 2022
713d078
Delete mpi_test garbage. (#689)
fgvanzee Nov 4, 2022
dc6e5f3
Enhance emacs formatting of C files to remove trailing whitespace and…
leekillough Nov 3, 2022
e1ea25d
Fixed subtle barrier_fpa bug in bli_thrcomm.c. (#690)
fgvanzee Nov 11, 2022
2b05948
blis support for hpx (#682)
ct-clmsn Nov 13, 2022
f0337b7
Trival whitespace/comment tweaks.
fgvanzee Nov 14, 2022
db10dd8
Fixed _gemm_small() prototype; disabled gemm_small.
fgvanzee Nov 30, 2022
4833ba2
Fixed perf of mt sup with packing, and mt gemmlike. (#696)
fgvanzee Dec 13, 2022
3accacf
Skip 1m optimization when forcing hemm_l/symm_l. (#697)
fgvanzee Dec 16, 2022
7d23dc2
Fix a race condition which manifested as incorrect results (rarely). …
devinamatthews Dec 26, 2022
538150c
Applied race condition fix to sup thread decorator.
fgvanzee Dec 26, 2022
f956b79
Switch to l3 sup decorator in gemmlike sandbox. (#704)
fgvanzee Jan 1, 2023
b6735ca
Refactor structure awareness in packm_blk_var1.c. (#707)
devinamatthews Jan 6, 2023
2e1ba9d
Tile-level partitioning in jr/ir loops (ex-trsm). (#695)
fgvanzee Jan 11, 2023
d220f9c
Fix k = 0 edge case in power10 microkernels (#706)
nisanthmp Jan 11, 2023
cdb22b8
Disable power10 kernels other than sgemm, dgemm. (#705)
nisanthmp Jan 11, 2023
38d88d5
Define new global scalar (obj_t) constants. (#703)
devinamatthews Jan 11, 2023
b895ec9
Fixing type-mismatch errors in power10 sandbox (#701)
nisanthmp Jan 11, 2023
9a366b1
Implement cntx_t pointer caching in gks. (#709)
fgvanzee Jan 12, 2023
16d2e9e
Defined lt, lte, gt, gte + misc. other updates. (#712)
fgvanzee Jan 14, 2023
5793a77
Fixed mis-mapped instruction for VEXTRACTF64X2. (#713)
HarshDave12 Jan 17, 2023
c334ec2
Merge tlb- and slab/rr-specific gemm macrokernels. (#711)
devinamatthews Jan 18, 2023
ecbcf40
Use here-document for 'configure --help' output. (#714)
leekillough Jan 19, 2023
dc5d00a
Typecast printf() args to avoid compiler warnings. (#716)
leekillough Jan 27, 2023
e730c68
Define `BLIS_VERSION_STRING` in `blis.h`. (#720)
fgvanzee Feb 6, 2023
e3d352f
Added runtime selection of 'power' config family. (#718)
nisanthmp Feb 8, 2023
b1d3fc7
Redirect grep stderr to /dev/null. (#723)
fgvanzee Feb 10, 2023
0b421ef
Added an 'arm64' entry to `.travis.yml`. (#726)
fgvanzee Feb 18, 2023
059f151
Updated hpx namespace for make_count_shape. (#725)
ct-clmsn Feb 18, 2023
0ba6e9e
Refined emacs handling of indentation. (#717)
leekillough Feb 18, 2023
4e18cd3
Restored ArmSVE general storage case. (#708)
xrq-phys Feb 18, 2023
93c63d1
Use 'const' pointers in kernel APIs. (#722)
fgvanzee Feb 20, 2023
fab18dc
Use 'void*' datatypes in kernel APIs. (#727)
fgvanzee Feb 22, 2023
60f3634
Fixed bugs in scal2v ref kernel when alpha == 1. (#728)
fgvanzee Feb 23, 2023
72c37eb
Updated configure to pass all shellcheck checks. (#729)
leekillough Mar 23, 2023
5f84130
Omit -fPIC if shared library build is disabled. (#732)
fgvanzee Mar 25, 2023
04090df
Fixed compile errors with `BLIS_DISABLE_BLAS_DEFS`. (#730)
fgvanzee Mar 27, 2023
9d778e0
Move -fPIC insertion to subconfigs' make_defs.mk. (#738)
fgvanzee Mar 29, 2023
17cd260
Added mm_algorithm pptx files (bp and pb).
fgvanzee Mar 30, 2023
38fc523
Added mm_algorithm pdf files (bp and pb).
fgvanzee Mar 30, 2023
3f1432a
Add output.testsuite to .gitignore (#736)
leekillough Apr 3, 2023
aea8e1d
Optionally disable thread-local storage. (#735)
fgvanzee Apr 3, 2023
259f684
CREDITS file update.
fgvanzee Apr 7, 2023
593d017
CREDITS file update.
fgvanzee Apr 8, 2023
6b38c5a
Add RISC-V target (#693)
angsch Apr 11, 2023
8215b02
Apply #738 to make_defs.mk of RISC-V subconfigs. (#740)
leekillough Apr 12, 2023
6fd9aab
Fix bug in detecting Fortran compiler vendor (#745)
devinamatthews May 5, 2023
ef9d3e6
Added missing #include <io.h> for Windows. (#747)
h-vetinari May 6, 2023
0873c0f
Consolidate INSERT_ macro sets via variadic macros. (#744)
devinamatthews May 7, 2023
138de3b
add nvhpc compiler support (#719)
ajaypanyala May 7, 2023
89b7863
Fix 1m enablement for herk/her2k/syrk/syr2k. (#743)
devinamatthews May 8, 2023
d639554
Pad thrcomm_t fields to avoid false sharing.
fgvanzee Jun 7, 2023
6b894c3
Rewrote/fixed broken tree barrier implementation.
fgvanzee Jun 12, 2023
a0b04e3
Rewrote regen-symbols.sh (gen-libblis-symbols.sh). (#751)
fgvanzee Jun 26, 2023
c91b41d
Auto-detect the RISC-V ABI of the compiler and use -mabi= during RISC…
leekillough Jul 26, 2023
22ad8c1
Small fixes to support hpx in the testsuite (#759)
ct-clmsn Jul 27, 2023
2db31e0
Exclude -lrt on Android with Bionic libraries. (#755)
leekillough Jul 27, 2023
915daaa
Fix typos in docs + example code comments. (#753)
jip Jul 27, 2023
dbc7981
CREDITS file update.
fgvanzee Jul 28, 2023
3cf17b4
Small fixes/improvements to docs/Multithreading.md. (#764)
fgvanzee Aug 7, 2023
634e532
Set thrcomm timpl_t id inside init functions. (#766)
fgvanzee Aug 10, 2023
fa6a9b2
Fixed error when using common.mk from testsuite. (#768)
fgvanzee Aug 19, 2023
6dcf766
Revamped bli_init() to use TLS where feasible. (#767)
fgvanzee Aug 27, 2023
c6546c1
Fixed broken link in Multithreading.md. (#774)
jmather-sesi Sep 20, 2023
a4a6329
Fixes to HPC runtime code path. (#773)
srinivasyadav18 Sep 26, 2023
6f41220
Added 'altra', 'altramax' subconfigs. (#775)
fgvanzee Sep 26, 2023
37ca4fd
Implemented [cz]symv_(), [cz]syr_(), [cz]rot_(). (#778)
fgvanzee Sep 28, 2023
c2099ed
Fixed brokenness when sba is disabled. (#777)
fgvanzee Oct 2, 2023
1e264a4
Update zen3 subconfig to support NVHPC compilers. (#779)
abagusetty Oct 2, 2023
8fff1e3
Fixed bug in sup threshold registration. (#782)
fgvanzee Oct 12, 2023
7a87e57
Fixed HPX barrier synchronization (#783)
srinivasyadav18 Oct 14, 2023
05388dd
Added 'sifive_x280' subconfig, kernel set. (#737)
Aaron-Hutchinson Nov 3, 2023
f7ce54a
CREDITS file update.
fgvanzee Nov 3, 2023
2d94392
Allow users to defines [sd]complex using std::complex (#784)
devinamatthews Nov 21, 2023
141a6c9
Install helper headers to INCDIR prefix. (#787)
fgvanzee Nov 21, 2023
1236dda
Fixed random segfault in test/3 drivers. (#788)
fgvanzee Dec 3, 2023
a72e456
Include bli_config.h before bli_system.h in cblas.h. (#789)
fgvanzee Dec 7, 2023
c382d8b
Fix errors and typos in docs/BLIS*API.md (#791)
jip Jan 14, 2024
1a8c818
Add cpu part codes for various manufacturers and use in the code (#794)
j-bm Feb 15, 2024
664cc6b
Update BLIS_*_INITIALIZER macros for C++ compatibility. (#802)
devinamatthews Mar 26, 2024
a316d2c
Fix incorrect commenting of `BLIS_RNTM_INITIALIZER` and `BLIS_OBJECT_…
devinamatthews Mar 28, 2024
a49238e
Refactor the control tree and other infrastructure (#710)
devinamatthews Apr 24, 2024
fd1a7e3
Allow test/3 drivers to use default ind_t method. (#804)
fgvanzee Apr 25, 2024
cad5149
Use "-i auto" by default in test/3 drivers.
fgvanzee Apr 30, 2024
5ab286f
Added a script to help create new rc branches.
fgvanzee May 6, 2024
c2af113
Version file update (1.0)
fgvanzee May 6, 2024
a876918
CHANGELOG update (1.0)
fgvanzee May 6, 2024
06dddf1
ReleaseNotes.md update.
fgvanzee May 6, 2024
01e151a
Updated RELEASING file; fixes to ReleaseNotes.md.
fgvanzee May 6, 2024
6d0ab74
Updates to README.md section on downloading.
fgvanzee May 6, 2024
729c57c
Fix SyntaxWarning messages from python 3.12 (#809)
AngryLoki Jun 4, 2024
5cbec65
Update CREDITS
devinamatthews Jun 4, 2024
4158930
Add ScaLAPACK compatibility mode. (#813)
devinamatthews Jun 19, 2024
31ecf82
Fix a bug in the piledriver microkernels. (#814)
devinamatthews Jun 20, 2024
8820f8f
Fixed typo in 4158930; variable renames. (#815)
fgvanzee Jun 26, 2024
a822cb2
Fixed out-of-bounds read bug in sup haswell ukr. (#824)
fgvanzee Aug 8, 2024
8d9be87
Flatten cblas.h immediately after blis.h. (#819)
fgvanzee Aug 8, 2024
b36bc95
Fix some aspects of the control tre/plugin infrastructure (#827)
devinamatthews Oct 10, 2024
827c50b
Implemented `--omit-symbols=LIST` `configure` option. (#823)
fgvanzee Oct 16, 2024
50b7117
Use intrinsics for all sifive_x280 kernels (#822)
myeh01 Nov 29, 2024
12f2efa
Add complex return detection for nvfortran (#765)
jeffhammond Nov 29, 2024
5cb70d8
Add documentation for plugins (#820)
devinamatthews Jan 14, 2025
fa6ddb1
Create a new type to represent IDs for all kernels, blocksizes, etc. …
devinamatthews Jan 17, 2025
fb7ba1d
Update release instructions. (#837)
devinamatthews Jan 17, 2025
4bc4a1c
ReleaseNotes.md update.
devinamatthews Jan 17, 2025
534d52b
Clarified OMP_NUM_THREADS (#835)
zerothi Jan 20, 2025
967d29d
CREDITS file update.
devinamatthews Jan 20, 2025
38063dd
Optionally ignore extra dirs in `gen-make-frag.sh`. (#833)
fgvanzee Jan 20, 2025
769d73f
Add the sifive_rvv configuration (#832)
myeh01 Jan 20, 2025
d161545
Run full "make check" for SDE tests. (#818)
devinamatthews Jan 20, 2025
a3cfbef
Fix errors and typos in some docs (#843)
jip Jan 21, 2025
d17c063
Add guard in `examples/*/Makefile` to check that `common.mk` was actu…
devinamatthews Jan 22, 2025
7e8a589
Blacklist KNL with GCC 15+ (#844)
loveshack Jan 24, 2025
a6f2ce9
Alias *gemmt_ as *gemmtr_ to fix lapack 3.12.1 compatibility. (#849)
cdluminate Feb 5, 2025
5ad37a8
Increase the max size for stack buffers. (#851)
devinamatthews Feb 5, 2025
028be42
Fix problem with clang-14.0.0 and reference `gemm` ukr. (#854)
devinamatthews Feb 7, 2025
40a52dc
Add CircleCI (#855)
devinamatthews Feb 8, 2025
14047f6
Do not use symbol aliases on macOS. (#856)
devinamatthews Feb 19, 2025
3c71737
Update README.md
devinamatthews Feb 19, 2025
a014a08
Add new level-0 macro layer. (#830)
devinamatthews Feb 27, 2025
97084c7
Fix problem in `bli_obj_imag_part`. (#861)
devinamatthews Mar 2, 2025
37e52a6
Fix check for SVE instructions which caused problems on Windows. (#859)
devinamatthews Mar 2, 2025
50054a6
Adjust CI testing (#860)
devinamatthews Mar 2, 2025
53d21cb
Fix for plugins without explicit optimized kernels.
devinamatthews Apr 2, 2025
5d9e110
Examples: replace all 4.1f printm format by 4.3f (#865)
hominhquan Apr 7, 2025
ec5b572
Fix to prevent is_win flag setting with clang on macOS (#867)
yoshoku May 1, 2025
5097c59
Update CREDITS
devinamatthews May 1, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
7 changes: 7 additions & 0 deletions .appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,12 @@ environment:
CC: clang
THREADING: openmp

- LIB_TYPE: static
CONFIG: auto
CC: clang
THREADING: openmp
SANDBOX: yes

install:
- set "PATH=C:\msys64\mingw64\bin;C:\msys64\bin;%PATH%"
- if [%CC%]==[clang] set "PATH=C:\Program Files\LLVM\bin;%PATH%"
Expand All @@ -34,6 +40,7 @@ build_script:
- if [%LIB_TYPE%]==[shared] set "CONFIGURE_OPTS=%CONFIGURE_OPTS% --enable-shared --disable-static"
- if [%LIB_TYPE%]==[static] set "CONFIGURE_OPTS=%CONFIGURE_OPTS% --disable-shared --enable-static"
- if not [%CBLAS%]==[no] set "CONFIGURE_OPTS=%CONFIGURE_OPTS% --enable-cblas"
- if [%SANDBOX%]==[yes] set "CONFIGURE_OPTS=%CONFIGURE_OPTS% -s gemmlike"
- set RANLIB=echo
- set LIBPTHREAD=
- set "PATH=%PATH%;C:\blis\lib"
Expand Down
264 changes: 264 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,264 @@
version: 2.1

branches:
only:
- master
- dev
- amd

executors:
linux: # Docker using the Base Convenience Image
docker:
- image: cimg/base:2024.10
linuxnew: # Docker using the Base Convenience Image
docker:
- image: cimg/base:current-22.04
macos: &macos-executor # macos executor running Xcode
macos:
xcode: 14.2.0
linuxvm: # executor type
machine:
image: ubuntu-2204:current

workflows:
build:
jobs:
# Default:
# - build:
# os: linux
# CC: gcc
# OOT: 0
# TEST: FAST
# SDE: 0
# LEVEL0: 0
# THR: none
# CONF: auto
# BLD: ''
# LDFLAGS: ''
# TESTSUITE_WRAPPER: ''
# PACKAGES: ''

# full testsuite (all tests + mixed datatype (gemm_nn only) + salt + OOT)
- build:
OOT: 1
TEST: ALL
THR: openmp,pthreads
CONF: x86_64

# SDE testing for x86_64
# Also test LEVEL0 here because g++ uses tons of memory for test_taxpbys.cxx
- build:
# linuxvm must be used because it provides 8G RAM and SDE fails with 4G RAM
os: linuxvm
SDE: 1
LEVEL0: 1
CONF: x86_64

# test generic kernels
- build:
CONF: generic_broadcast

# clang build
- build:
CC: clang
THR: openmp,pthreads
CXX: clang++
PACKAGES: clang libomp-dev

# macOS with system compiler (clang)
- build:
os: macos
THR: pthreads
CC: clang
CXX: clang++

# cortexa15 build and fast testsuite (qemu)
- build:
CC: arm-linux-gnueabihf-gcc
CXX: arm-linux-gnueabihf-g++
CONF: cortexa15
PACKAGES: 'gcc-arm-linux-gnueabihf g++-arm-linux-gnueabihf libc6-dev-armhf-cross qemu-system-arm qemu-user'
TESTSUITE_WRAPPER: 'qemu-arm -cpu cortex-a15 -L /usr/arm-linux-gnueabihf/'

# cortexa57 build and fast testsuite (qemu)
- build:
CC: aarch64-linux-gnu-gcc
CXX: aarch64-linux-gnu-g++
CONF: cortexa57
PACKAGES: 'gcc-aarch64-linux-gnu g++-aarch64-linux-gnu libc6-dev-arm64-cross qemu-system-arm qemu-user'
TESTSUITE_WRAPPER: 'qemu-aarch64 -L /usr/aarch64-linux-gnu/'

# Apple M1 (firestorm) build and fast testsuite (qemu)
- build:
CC: aarch64-linux-gnu-gcc
CXX: aarch64-linux-gnu-g++
CONF: firestorm
PACKAGES: 'gcc-aarch64-linux-gnu g++-aarch64-linux-gnu libc6-dev-arm64-cross qemu-system-arm qemu-user'
TESTSUITE_WRAPPER: 'qemu-aarch64 -L /usr/aarch64-linux-gnu/'

# armsve build and fast testsuite (qemu)
- build:
CC: aarch64-linux-gnu-gcc-10
CXX: aarch64-linux-gnu-g++-10
CONF: armsve
PACKAGES: 'gcc-10-aarch64-linux-gnu g++-10-aarch64-linux-gnu libc6-dev-arm64-cross qemu-system-arm qemu-user'
TESTSUITE_WRAPPER: 'qemu-aarch64 -cpu max,sve=true,sve512=true -L /usr/aarch64-linux-gnu/'

# arm64 build and fast testsuite (qemu)
# NOTE: This entry omits the -cpu flag so that while both NEON and SVE kernels
# are compiled, only NEON kernels will be tested. (h/t to RuQing Xu)
- build:
CC: aarch64-linux-gnu-gcc-10
CXX: aarch64-linux-gnu-g++-10
CONF: arm64
PACKAGES: 'gcc-10-aarch64-linux-gnu g++-10-aarch64-linux-gnu libc6-dev-arm64-cross qemu-system-arm qemu-user'
TESTSUITE_WRAPPER: 'qemu-aarch64 -L /usr/aarch64-linux-gnu/'

# The RISC-V targets require the qemu version available in jammy or newer.
# When CI is upgraded, the packages should be activated and do_script.sh
# cleaned up.
# PACKAGES="qemu-user qemu-user-binfmt"
- build:
CONF: rv64iv
BLD: --disable-shared
LDFLAGS: -static
- build:
CONF: rv32iv
BLD: --disable-shared
LDFLAGS: -static
- build:
CONF: sifive_x280
BLD: --disable-shared
LDFLAGS: -static

jobs:
build:
parameters:
os:
type: executor
default: linux
CC:
type: string
default: gcc
CXX:
type: string
default: g++
OOT:
type: integer
default: 0
TEST:
type: string
default: FAST
SDE:
type: integer
default: 0
LEVEL0:
type: integer
default: 0
THR:
type: string
default: none
CONF:
type: string
default: auto
BLD:
type: string
default: ''
LDFLAGS:
type: string
default: ''
TESTSUITE_WRAPPER:
type: string
default: ''
PACKAGES:
type: string
default: ''
executor: << parameters.os >>
steps:
- checkout

- when:
condition:
not:
equal: [ *macos-executor, << parameters.os >> ]
steps:
- run:
name: Installing Dependencies
command:
sudo apt-get update && sudo NEEDRESTART_MODE=a apt-get install -y make python3 << parameters.PACKAGES >>

- run:
name: Configuring, Building, Testing
command: |
export DIST_PATH=.
export CC="<< parameters.CC >>"
export CXX="<< parameters.CXX >>"
export OOT="<< parameters.OOT >>"
export CONF="<< parameters.CONF >>"
export TEST="<< parameters.TEST >>"
export BLD="<< parameters.BLD >>"
export LDFLAGS="<< parameters.LDFLAGS >>"
export SDE="<< parameters.SDE >>"
export LEVEL0="<< parameters.LEVEL0 >>"
export THR="<< parameters.THR >>"
export TESTSUITE_WRAPPER="<< parameters.TESTSUITE_WRAPPER >>"

pwd
if [ $OOT -eq 1 ]; then export DIST_PATH=`pwd`; mkdir ../oot; cd ../oot; chmod -R a-w $DIST_PATH; fi
pwd

if [ "$CONF" = "rv64iv" ]; then
$DIST_PATH/ci/do_riscv.sh "$CONF";
export CC=$DIST_PATH/../toolchain/riscv/bin/riscv64-unknown-linux-gnu-gcc;
export CXX=$DIST_PATH/../toolchain/riscv/bin/riscv64-unknown-linux-gnu-g++;
export TESTSUITE_WRAPPER="$DIST_PATH/../toolchain/qemu-riscv64 -cpu rv64,vext_spec=v1.0,v=true,vlen=128 -B 0x100000";
fi
if [ "$CONF" = "rv32iv" ]; then
$DIST_PATH/ci/do_riscv.sh "$CONF";
export CC=$DIST_PATH/../toolchain/riscv/bin/riscv32-unknown-linux-gnu-gcc;
export CXX=$DIST_PATH/../toolchain/riscv/bin/riscv32-unknown-linux-gnu-g++;
export TESTSUITE_WRAPPER="$DIST_PATH/../toolchain/qemu-riscv32 -cpu rv32,vext_spec=v1.0,v=true,vlen=128 -B 0x100000";
fi
if [ "$CONF" = "sifive_x280" ]; then
$DIST_PATH/ci/do_riscv.sh "$CONF";
export CC=$DIST_PATH/../toolchain/riscv/bin/clang;
export CXX=$DIST_PATH/../toolchain/riscv/bin/clang++;
export TESTSUITE_WRAPPER="$DIST_PATH/../toolchain/qemu-riscv64 -cpu rv64,vext_spec=v1.0,v=true,vlen=512 -B 0x100000";
fi

if [ "$CONF" = "generic_broadcast" ]; then
export CONF=generic
export CFLAGS="-DBLIS_BBM_s=2 -DBLIS_BBM_d=2 -DBLIS_BBM_c=2 -DBLIS_BBM_z=2 -DBLIS_BBN_s=4 -DBLIS_BBN_d=4 -DBLIS_BBN_c=4 -DBLIS_BBN_z=4"
fi

echo "Configuration:"
echo "CC = $CC"
echo "CXX = $CXX"
echo "OOT = $OOT"
echo "CONF = $CONF"
echo "THR = $THR"
echo "TEST = $TEST"
echo "BLD = $BLD"
echo "SDE = $SDE"
echo "LEVEL0 = $LEVEL0"
echo "DIST_PATH = $DIST_PATH"
echo "CFLAGS = $CFLAGS"
echo "LDFLAGS = $LDFLAGS"
echo "TESTSUITE_WRAPPER = $TESTSUITE_WRAPPER"

$DIST_PATH/configure -p `pwd`/../install -t $THR $BLD CC=$CC $CONF
pwd
ls -l
$CC --version
$CC -v

make V=1 -j2
make install

if [ "$BLD" = "" ] && [ "$TESTSUITE_WRAPPER" = "" ] ; then $DIST_PATH/ci/cxx/cxx-test.sh $DIST_PATH $(ls -1 include); fi
# Qemu SVE is failing sgemmt in some cases. Skip as this issue is not observed
# on real chip (A64fx).
if [ "$CONF" = "armsve" ]; then sed -i 's/.*\<gemmt\>.*/0/' $DIST_PATH/testsuite/input.operations.fast; fi
if [ "$TEST" != "0" ]; then $DIST_PATH/ci/do_testsuite.sh; fi
if [ "$SDE" = "1" ]; then $DIST_PATH/ci/do_sde.sh; fi
if [ "$LEVEL0" = "1" ]; then $DIST_PATH/ci/do_level0.sh; fi
35 changes: 29 additions & 6 deletions .dir-locals.el
Original file line number Diff line number Diff line change
@@ -1,9 +1,32 @@
;; First (minimal) attempt at configuring Emacs CC mode for the BLIS
;; layout requirements.
((c-mode . ((c-file-style . "stroustrup")
;; Emacs formatting for the BLIS layout requirements.

(
;; Recognize *.mk files as Makefile fragments
(auto-mode-alist . (("\\.mk\\'" . makefile-mode)) )

;; Makefiles require tabs and are almost always width 8
(makefile-mode . (
(indent-tabs-mode . t)
(tab-width . 8)
)
)

;; C code formatting roughly according to docs/CodingConventions.md
(c-mode . (
(c-file-style . "bsd")
(c-basic-offset . 4)
(comment-start . "// ")
(comment-end . "")
(indent-tabs-mode . t)
(tab-width . 4)
(parens-require-spaces . nil))))
(parens-require-spaces . nil)
)
)

;; Default formatting for all source files not overriden above
(prog-mode . (
(indent-tabs-mode . nil)
(tab-width . 4)
(require-final-newline . t)
(eval add-hook `before-save-hook `delete-trailing-whitespace)
)
)
)
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@

config.mk
bli_config.h
bli_addon.h

# -- monolithic headers --

Expand All @@ -43,6 +44,7 @@ include/*/*.h
# -- misc. --

# BLIS testsuite output file
output.testsuite
output.testsuite.*

# BLAS test output files
Expand All @@ -52,3 +54,6 @@ out.*
GPATH
GRTAGS
GTAGS

# Mac DS.store files
.DS_Store
Loading