Releases: pytorch-labs/helion
Releases · pytorch-labs/helion
v0.0.10
What's Changed
- [Benchmark] Add initial TritonBench integration and vector_add benchmark example by @yf225 in #247
- Add static_range by @joydddd in #235
- Cleanup/improve docstrings by @jansel in #250
- [Benchmark] Add embedding benchmark by @yf225 in #248
- [Benchmark] Add vector_exp benchmark by @yf225 in #249
- Add rms_norm example and test by @yf225 in #252
- [Benchmark] Add rms_norm benchmark by @yf225 in #253
- Strip extra newlines from *.expected files by @jansel in #255
- Fix issue with BLOCK_SIZE0.to(torch.int32) by @jansel in #254
- Add hl.wait & AllGather Matmul example (via hl_ext helper). by @joydddd in #189
- Add sum example and test by @yf225 in #256
- [Benchmark] Add sum to TritonBench integration by @yf225 in #257
- Rename benchmark folder by @yf225 in #258
- Add hl.signal by @joydddd in #233
- Add hl.wait for simultenous waiting for multiple gmem barriers by @joydddd in #243
- Swap to using pyright by @oulgen in #259
- Fix pyright errors in type_propagation.py by @yf225 in #266
- [BE] Add spellchecker by @oulgen in #265
- Remove pyre-ignore/pyre-fixme calls by @jansel in #274
- Improve typing for helion.kernel by @jansel in #270
- Add jagged_mean example by @yf225 in #263
- [Benchmark] Add jagged_mean tritonbench integration by @yf225 in #264
- Add fp8_gemm example and test by @yf225 in #267
- [Benchmark] Add fp8_gemm to TritonBench integration by @yf225 in #268
- Fix some pyright errors by @jansel in #276
- Remove unused exception types by @jansel in #271
- Fix docstring see also lists by @jansel in #272
- [benchmarks] Change tritonbench api by @xuzhao9 in #260
- Initial versison of documentation by @jansel in #273
- Deploy docs to github pages by @jansel in #277
- Fix lint error on main by @jansel in #281
- Add a link to the documentation by @jansel in #282
- [Benchmark] Fix tritonbench integration due to upstream changes by @yf225 in #278
- [Benchmark] Allow using 'python benchmarks/run.py' to run all kernels by @yf225 in #280
- Add implicit broadcasting tests by @jansel in #285
- Add additional tl.range choices to persistent loop by @jansel in #287
- Update autotuning example in docs by @jansel in #288
- Add host side dead code elimination by @oulgen in #289
- [Benchmark] Add attention tritonbench integration by @yf225 in #284
- Add helion.exc.CannotModifyHostVariableOnDevice and helion.exc.CannotReadDeviceVariableOnHost by @jansel in #290
- Fix unstable CI by @jansel in #299
- Make to_triton_code config arg optional by @jansel in #291
- Add helion.exc.DeviceTensorSubscriptAssignmentNotAllowed by @jansel in #292
- Remove default configs from examples by @jansel in #295
- Fix bug with tensor descriptor and small block size by @jansel in #296
- Relax typing for CombineFunction by @jansel in #297
- Add examples/segment_reduction.py by @jansel in #300
- Add error for using a host tensor directly by @jansel in #306
- Improve Tensor.item() handling by @jansel in #307
- Fix type_info null errors by @oulgen in #294
- Improve DCE by marking math functions as pure by @oulgen in #312
- [Benchmark] Add softmax tritonbench integration by @yf225 in #286
- Make imports relative by @jansel in #310
- Generalize l2_grouping to support 3+ dimensions by @jansel in #313
- Remove make_precompiler generated wrapper by @jansel in #314
- Enforce ANN/PGH lints by @jansel in #315
- Support dynamic fill value to hl.full by @jansel in #316
- Use tensor device reference in persistent kernels by @jansel in #317
- Add tl._experimental_make_tensor_descriptor support by @oulgen in #322
- Fix variable scoping in nested loops for multi-pass kernels by @yf225 in #324
- Add HELION_DEV_LOW_VRAM env var for low GPU memory machines by @yf225 in #325
- Add cross_entropy example and unit test by @yf225 in #320
- [Benchmark] Add cross_entropy to tritonbench integration by @yf225 in #321
- Add literal index into tuple by @joydddd in #327
- Improve naming for generated helper functions by @jansel in #323
- Add hl.inline_asm_elementwise by @jansel in #328
- Implement static tuple unrolling and hl.static_range by @jansel in #329
- Add fp8_attention example and unit test by @yf225 in #318
- [Benchmark] Add fp8_attention to tritonbench integration by @yf225 in #319
New Contributors
Full Changelog: v0.0.9...v0.0.10
v0.0.9
What's Changed
- Add tl.range warp_specialize to autotuner by @jansel in #230
- Switch from TensorDescriptor to tl.make_tensor_descriptor by @jansel in #232
- Enable Test fixed by Fixed by #195 by @joydddd in #236
- Implement persistent kernels by @jansel in #238
- Add hl.associative_scan by @jansel in #239
- Fix failing tests on main by @jansel in #244
- Add hl.reduce by @jansel in #240
- Switch from expecttest/assertExpectedInline to assertExpectedJournal by @jansel in #241
Full Changelog: v0.0.8...v0.0.9
v0.0.8
What's Changed
- Improve loop end bound optimization for nested tiling by @jansel in #192
- Set default dot_precision to TRITON_F32_DEFAULT by @jansel in #197
- Use _disable_flatten_get_tile helper in tile_id by @jansel in #200
- Throw type errors immediately by @jansel in #202
- Fix typo in LiteralType.merge by @jansel in #201
- Add support for global statements in type propagation by @jansel in #203
- Remove ErrorReporting class and simplify warning handling by @jansel in #204
- Add InvalidDeviceForLoop exception type by @jansel in #205
- Fix bug with renamed variable flowing into phi() node by @jansel in #206
- Move hl.grid tests to their own file by @jansel in #208
- Remove NDGridTileStrategy by @jansel in #209
- Simplify codegen for hl.grid by @jansel in #210
- Add support for hl.grid(begin, end, step) by @jansel in #211
- Support range() loops (alias for hl.grid) by @jansel in #212
- Move yz_grid disabling logic to ConfigSpec by @jansel in #213
- Relax chebyshev kernel test tolerance by @jansel in #214
- [RFC] Add static loop unrolling by @oulgen in #216
- Add support for torch.arange by @jansel in #215
- Fix a performance issue with Helion-emitted Flash Attention by @manman-ren in #181
- Fix issue with phi nodes and aliasing by @jansel in #220
- Fix duplicate argument handling in inductor lowering by @jansel in #222
- x[i] returns scalar when i=scalar by @joydddd in #223
- Fix config flatten spec for tile.id by @joydddd in #224
- Fix failing tests on main by @jansel in #231
- Refactor examples to use run_example helper by @jansel in #225
- Add tl.range loop_unroll_factor to autotuner by @jansel in #226
- Add tl.range num_stages to autotuner by @jansel in #227
- Add tl.range disallow_acc_multi_buffer to autotuner by @jansel in #228
- Add tl.range flatten to autotuner by @jansel in #229
New Contributors
- @manman-ren made their first contribution in #181
Full Changelog: v0.0.7...v0.0.8
v0.0.7
What's Changed
- Fix bug with computations based on hl.register_block_size by @jansel in #157
- Generalize workaround for unbacked size hints by @jansel in #159
- Don't hardcode cuda in test files by @jansel in #160
- Move register_block_size/register_reduction_dim to tunable_ops.py by @jansel in #161
- Unskip some previosly failing tests by @jansel in #162
- Use workflow matrix to deduplicate code by @oulgen in #168
- Rename TileIndexProxy to hl.Tile by @jansel in #171
- Fix block size variable handling and atomic operations with symints by @jansel in #177
- Codegen
if tl.sum(one_elem_tensor):
instead ofif one_elem_tensor
by @yf225 in #158 - Fix visitCall in deviceIR. Always visit argument nodes by @joydddd in #180
- Relax bounds on test_mask_dot by @oulgen in #182
- Add lowering for Constant assignment by @joydddd in #187
- Expose tile.id by @joydddd in #188
- Do not precompile set configs by @oulgen in #183
- Add option to ban/disallow autotuning by @oulgen in #184
- Recommend PyTorch nightly build in readme by @jansel in #193
- Fix issue with ConfigSpec mutation in codegen by @jansel in #195
- enable_python_dispatcher() in propagate_types by @laithsakka in #191
New Contributors
- @laithsakka made their first contribution in #191
Full Changelog: v0.0.6...v0.0.7
v0.0.6
What's Changed
- Fix ast read writes by @oulgen in #148
- Update pre-commit by @oulgen in #149
- Try enable test_moe_matmul_ogs on CI by @yf225 in #147
- [Ready for review] Add support for print(prefix_str, *tensors) by @yf225 in #140
- Support hl.tile_{begin,end,block_size} by @jansel in #150
- Rename TileStrategy.get_block_index to CompileEnvironment.get_block_id by @jansel in #151
- Fix bug in merging sequence types by @jansel in #152
- Increase atol for test_matmul_split_k by @jansel in #155
- Fix bug in test_matmul_split_k by @jansel in #156
- Add hl.register_tunable by @jansel in #154
Full Changelog: v0.0.5...v0.0.6
v0.0.5
What's Changed
- Rename linter/check_main.py -> scripts/lint_examples_main.py by @jansel in #124
- Improve error message for unpacking a tile by @jansel in #125
- Improve error message for overpacked tiles by @jansel in #126
- [BC breaking] Simplify block size configs by @jansel in #127
- Refactor reduction loop config spec by @jansel in #128
- Move BlockIdSequence to its own file by @jansel in #129
- Do not print output code durring autotuning by @jansel in #130
- Make helion.exc.TensorOperationInWrapper not fire on non-torch ops by @jansel in #131
- Add HELION_FORCE_AUTOTUNE=1 and update readme by @jansel in #132
- Correct units for time printouts by @jansel in #133
- Rename block_size_idx to block_id by @jansel in #134
- Rename block_indices to block_ids by @jansel in #135
- Add Pyre Pre-Commit Hook by @lolpack in #136
- Update .pre-commit-config.yaml by @oulgen in #137
- [Ready for review] Add hl.register_reduction_dim(); add support for matmul+layernorm example by @yf225 in #80
- Fix bug with errors on unreachable if branch by @jansel in #138
- [Error Message] Update block config size length mismatch by @drisspg in #139
- Increase atol/rtol for test_error_in_non_taken_branch by @jansel in #142
- Fix some typos by @jansel in #141
- More fair comparison by @drisspg in #146
New Contributors
Full Changelog: v0.0.4...v0.0.5
v0.0.4
What's Changed
- Beef up pre-commit checks by @oulgen in #106
- Run pre-commit as part of lint action by @oulgen in #108
- Add jagged_dense_add_2d example in generalize tensor indexing by @jansel in #105
- Update README.md with Helion logo by @oulgen in #100
- Optimization pass to remove unneeded masking by @jansel in #109
- Improve mask optimization to cover control flow and inductor ops by @jansel in #111
- Expand README.md by @jansel in #112
- Fix ImportError: cannot import name 'Never' from 'typing' by @jansel in #114
- Remove 'first_non_grid_index' for hl.grid index by @jansel in #113
- Pass to remove unnecessary hl.tile_index calls by @jansel in #115
- Replace torch.fx.GraphModule with torch.fx.Graph by @jansel in #116
- MoE matmul example by @yf225 in #110
- Add main() to moe_matmul_ogs by @yf225 in #118
- Add pre-commit hook to make sure examples have a main function by @oulgen in #119
- Add reduction example: Long sum by @joydddd in #92
- Make loop reordering work with register_block_size by @jansel in #117
- Temporarily disable unit test for moe_matmul_ogs example by @yf225 in #120
- Skip test_moe_matmul_ogs on older cards by @jansel in #121
- Make l2_grouping work with register_block_size by @jansel in #122
- Re-enable unit test for moe_matmul_ogs example; skip in fbcode by @yf225 in #123
New Contributors
Full Changelog: v0.0.3...v0.0.4
v0.0.3
What's Changed
- Minor fix to test file name by @yf225 in #1
- Add CI workflow by @yf225 in #2
- Allow direct running of add.py example by @yf225 in #6
- [CI] Use A10G (g5.4xlarge) machine type by @yf225 in #4
- Use site-package for torch in pyre_configuration by @jansel in #8
- Add use_default_config setting by @jansel in #9
- Add LICENSE/CONTRIBUTING.md/CODE_OF_CONDUCT.md by @jansel in #11
- Support persistent reductions by @jansel in #10
- Fix handling of block_ptr + reductions by @jansel in #12
- Support inductor lowerings that require multiple buffers by @jansel in #13
- Adjust rtol/atol for test_sum_keepdims by @yf225 in #14
- Support Python 3.10; Run lint in CI by @yf225 in #7
- Support looped reductions by @jansel in #15
- Compile in a subprocess to kill hangs by @jansel in #16
- Refactor autotuning logging by @jansel in #17
- Support view ops by @jansel in #18
- Support indirect loads by @jansel in #19
- Improve README.md by @jansel in #20
- Support if/else control flow by @jansel in #21
- Add hl.constexpr specialization by @jansel in #22
- Fix license file for PEP 621 by @oulgen in #23
- Use search-strategy: all for all site packages in pyre config by @stroxler in #25
- Add decorator check by @oulgen in #24
- Trigger CI on pull requests made by ghstack by @oulgen in #27
- Add hl.register_block_size and explicit tile sizes by @jansel in #30
- Update lint github workflow by @jansel in #31
- Add ../pytorch-nightly to Pyre optional_search_path by @yf225 in #36
- Fix TensorDescriptor handling in _find_device by @yf225 in #35
- Add HELION_USE_DEFAULT_CONFIG env var to force use default config by @yf225 in #37
- Add more pytorchbot utils by @oulgen in #43
- Add the core properties to Config object by @drisspg in #49
- Switch build system to Hatchling which has much better Language Server support by @drisspg in #55
- Add attention example and fix some bugs by @jansel in #56
- Fix bug where non-tensor variables are not exposed to inner loops by @jansel in #58
- Add
hl.grid(...)
support by @yf225 in #59 - Fix more unit tests by @oulgen in #64
- Fix test_matmul_tensor_descriptor unit test by @yf225 in #65
- Prototyping an hl.atomic opp by @drisspg in #63
- Add hl.specialize and improve reduction handling by @jansel in #72
- [test] Touch test/init.py to support more testing workflows by @danzimm in #73
- [reland without ghstack] handle PTXASError by @jansel in #79
- Support data-dependent loop bounds by @jansel in #81
- Add support for hl.tile(begin, end) and hl.tile(begin, end, block_size) by @jansel in #82
- Support user-defined minimum in hl.register_block_size by @jansel in #83
- Don't re-wrap exceptions in exc.TorchOpTracingError by @jansel in #84
- Add hl.tile_index() by @jansel in #89
- Add filecheck dependency by @jansel in #95
- Add env HELION_PRINT_OUTPUT_CODE=1 by @jansel in #93
- Add extra_mask arg to hl.load and hl.store by @jansel in #94
- Bump project version by @oulgen in #101
- Swap to using hatch vcs by @oulgen in #103
- Add publish to pypi workflow by @oulgen in #104
New Contributors
- @stroxler made their first contribution in #25
- @drisspg made their first contribution in #49
- @danzimm made their first contribution in #73
Full Changelog: https://github.com/pytorch-labs/helion/commits/v0.0.3