forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
Xcvmem #287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
realqhc
wants to merge
576
commits into
realqhc:main
Choose a base branch
from
ChunyuLiao:xcvmem
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Xcvmem #287
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
795a445
to
26f3a72
Compare
Signed-off-by: Jerry Zhang Jian <[email protected]>
…)" This reverts commit 42944e4.
In -convert-vector-to-arm-sme the permutation_map is explicitly checked for transpose when converting xfer ops, but for 2-D vector types the only non-identity permutation map is transpose so this can be simplified.
…3791) Found while investigating llvm#93709
…llvm#93795) The current way of lowering `llvm.clear_cache` is a bit unusual. As suggested by Matt Arsenault we are better off using an ISD node. This change introduces a new `ISD::CLEAR_CACHE`, registers a new libcall by default named `__clear_cache` and the default legalisation is a libcall. This is preparatory work for a custom lowering of `ISD::CLEAR_CACHE` needed by RISC-V on some platforms.
…t` (llvm#90754) The "reduction" clause is not allowed on the "target" construct.
…m#93772) If present, the optional second argument of the ieee_exceptions intrinsic module procedure ieee_support_flag may be either a scalar or an array. Change the signature of the routine that implements this function so that it is processed as a transformational function, not an elemental function, which accounts for this argument variant.
The `target reduction` combination is no longer accepted. Disable the test to avoid build failures, until a better fix is ready.
…container-size-empty (llvm#93724) Verify that size/length methods are called with no arguments. Closes llvm#88203
Last used in e35fbf5.
Better design to put semantics on the ops, and in this case the ntt/intt op can lower in multiple ways depending on the polynomial ring modulus (it can need an nth root of unity for cyclic polymul -> ntt, or a 2nth root for negacyclic polymul -> ntt) --------- Co-authored-by: Jeremy Kun <[email protected]>
…93816) Reduce code bloat by checking test requirements in a common test fixture
Re-enable test disabled in 1bf1f93 with a fix.
…#93606) Back to back `linalg.transpose` can be rewritten to a single transpose
…ring (llvm#93592) Handle lowering of non optional inquired argument in custom lowering. Also fix an issue in the lowering of associated optional argument where a box was emboxed again which led to weird result.
And sort out some unused headers
Macos will automatically load dependent modules when creating a target, resulting in more modules than the test expects.
…en not poison Since llvm#93182 we can now call computeKnownBits inside getValidMaximumShiftAmount to determine the bounds of the shift amount ensuring that it wasn't poison, meaning if we did freeze the ahift amount, isGuaranteedNotToBeUndefOrPoison would then fail as we can't call computeKnownBits through FREEZE for potentially poison values. I'm still reducing a decent test case but wanted to get the buildbot fix ASAP.
…llvm#88712) This commit adds an API (`tileAndFuseConsumerOfSlice`) to fuse consumer to a producer within scf.for/scf.forall loop. To support this two new methods are added to the `TilingInterface` - `getIterationDomainTileFromOperandTile` - `getTiledImplementationFromOperandTile`. Consumer operations that implement this method can be used to be fused with tiled producer operands in a manner similar to (but essentially the inverse of) the fusion of an untiled producer with a tiled consumer. Note that this only does one `tiled producer` -> `consumer` fusion. This could be called repeatedly for fusing multiple consumers. The current implementation also is conservative in when this kicks in (like single use of the value returned by the inter-tile loops that surround the tiled producer, etc.) These can be relaxed over time. Signed-off-by: Abhishek Varma <[email protected]> --------- Signed-off-by: Abhishek Varma <[email protected]> Signed-off-by: Abhishek Varma <[email protected]> Co-authored-by: cxy <[email protected]>
…vm#92783) in `atomic::wait`, when we call the platform wait ulock_wait , we are using UL_COMPARE_AND_WAIT. But we should use UL_COMPARE_AND_WAIT64 instead as the address we are waiting for is a 64 bit integer. fixes llvm#85107 It is rather hard to test directly because in `atomic::wait`, before calling into the platform wait, our c++ code has some poll logic which checks the value not changing. Thus in this patch, the test is using the internal function.
Also drop errant header include from `Linalg` dialect into `Dialect/SCF/Transforms/TileUsingInterface.cpp`
We can convert this to a select based on the `(icmp eq X, C)`, then constant fold the addition the true arm begin `(add C, (sext/zext 1))` and the false arm being `(add X, 0)` e.g - `(select (icmp eq X, C), (add C, (sext/zext 1)), (add X, 0))`. This is essentially a specialization of the only case that sees to actually show up from llvm#89020 Closes llvm#93840
…llvm#94146) This reverts commit de37c06 to de37c06 It still breaks EXPENSIVE_CHECKS build. Sorry.
We were missing the PoisonOnly argument (so Depth + 1 was being used instead and the default Depth = 0 argument then being silently used) Fixes llvm#94145 and serves as the test case for 9e22c7a
…eship coverage (llvm#94120) Three unrelated, small improvements: * `test_macros.h` was incorrectly saying `__has_include("<version>")` instead of `__has_include(<version>)`. + This caused `<ciso646>` to always be included (noticed because MSVC's STL emitted a deprecation warning). + I searched all of LLVM and found no other occurrences. * `thread.condition.condvarany/wait_for_pred.pass.cpp` forgot to test anything. + I followed what `wait_for.pass.cpp` is testing. * Uncomment spaceship test coverage.
I noticed that these tests had empty `main` functions. Dropping them and renaming the tests to `MEOW.compile.pass.cpp` will slightly improve test throughput.
There is no reason to give any of the functions C linkage. This makes all of the libc++ functions have C++ linkage, removing the need for `_LIBCPP_HIDE_FROM_ABI_C`.
The type of *Iter here is "const IndexedMemProfRecord &" as defined in RecordLookupTrait. Assigning *Iter to a variable of type "const IndexedMemProfRecord &" avoids a copy, reducing the cycle and instruction counts by 1.8% and 0.2%, respectively, with "llvm-profdata show" modified to deserialize all MemProfRecords. Note that RecordLookupTrait has an internal copy of IndexedMemProfRecord, so we don't have to worry about a dangling reference to a temporary.
This avoids the pitfall where we set the uwtable to none: ``` func.setUWTableKind(llvm::UWTableKind::None) ``` `Attribute::getAsString()` would see an unknown attribute and fail an assertion. In this patch, we assert that we do not see a None uwtable kind. This also skips the check of `UWTableKind::Async`. It is dominated by the check of `UWTableKind::Default`, which has the same enum value (nfc).
The `LoopBlock` stored in `LoopWorkList` consist of basic block and its loop data information. When iterate `LoopWorkList`, if estimated weight of a loop is not stored in `EstimatedLoopWeight`, `getLoopExitBlocks()` is called to get all exit blocks of the loop. The estimated weight of a loop is calculated by iterating over edges leading from basic block to all exit blocks of the loop. If at least one edge has unknown estimated weight, the estimated weight of loop is unknown and will not be stored in `EstimatedLoopWeight`. `LoopWorkList` can contain different blocks in a same loop, so there is wasted work that calls `getLoopExitBlocks()` for same loop multiple times. Since computing the exit blocks of loop is expensive and the loop structure is not mutated in Branch Probability Analysis, we can cache the result and improve compile time. With this change, the overall compile time for a file containing a very large loop is dropped by around 82%.
All post-Increment load/store, register-register load/store spec: https://github.com/openhwgroup/cv32e40p/blob/master/docs/source/instruction_set_extensions.rst
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.