Releases · pytorch-labs/helion

17 Jul 18:14

oulgen

v0.0.10

7d01817

v0.0.10 Latest

Latest

What's Changed

[Benchmark] Add initial TritonBench integration and vector_add benchmark example by @yf225 in #247
Add static_range by @joydddd in #235
Cleanup/improve docstrings by @jansel in #250
[Benchmark] Add embedding benchmark by @yf225 in #248
[Benchmark] Add vector_exp benchmark by @yf225 in #249
Add rms_norm example and test by @yf225 in #252
[Benchmark] Add rms_norm benchmark by @yf225 in #253
Strip extra newlines from *.expected files by @jansel in #255
Fix issue with BLOCK_SIZE0.to(torch.int32) by @jansel in #254
Add hl.wait & AllGather Matmul example (via hl_ext helper). by @joydddd in #189
Add sum example and test by @yf225 in #256
[Benchmark] Add sum to TritonBench integration by @yf225 in #257
Rename benchmark folder by @yf225 in #258
Add hl.signal by @joydddd in #233
Add hl.wait for simultenous waiting for multiple gmem barriers by @joydddd in #243
Swap to using pyright by @oulgen in #259
Fix pyright errors in type_propagation.py by @yf225 in #266
[BE] Add spellchecker by @oulgen in #265
Remove pyre-ignore/pyre-fixme calls by @jansel in #274
Improve typing for helion.kernel by @jansel in #270
Add jagged_mean example by @yf225 in #263
[Benchmark] Add jagged_mean tritonbench integration by @yf225 in #264
Add fp8_gemm example and test by @yf225 in #267
[Benchmark] Add fp8_gemm to TritonBench integration by @yf225 in #268
Fix some pyright errors by @jansel in #276
Remove unused exception types by @jansel in #271
Fix docstring see also lists by @jansel in #272
[benchmarks] Change tritonbench api by @xuzhao9 in #260
Initial versison of documentation by @jansel in #273
Deploy docs to github pages by @jansel in #277
Fix lint error on main by @jansel in #281
Add a link to the documentation by @jansel in #282
[Benchmark] Fix tritonbench integration due to upstream changes by @yf225 in #278
[Benchmark] Allow using 'python benchmarks/run.py' to run all kernels by @yf225 in #280
Add implicit broadcasting tests by @jansel in #285
Add additional tl.range choices to persistent loop by @jansel in #287
Update autotuning example in docs by @jansel in #288
Add host side dead code elimination by @oulgen in #289
[Benchmark] Add attention tritonbench integration by @yf225 in #284
Add helion.exc.CannotModifyHostVariableOnDevice and helion.exc.CannotReadDeviceVariableOnHost by @jansel in #290
Fix unstable CI by @jansel in #299
Make to_triton_code config arg optional by @jansel in #291
Add helion.exc.DeviceTensorSubscriptAssignmentNotAllowed by @jansel in #292
Remove default configs from examples by @jansel in #295
Fix bug with tensor descriptor and small block size by @jansel in #296
Relax typing for CombineFunction by @jansel in #297
Add examples/segment_reduction.py by @jansel in #300
Add error for using a host tensor directly by @jansel in #306
Improve Tensor.item() handling by @jansel in #307
Fix type_info null errors by @oulgen in #294
Improve DCE by marking math functions as pure by @oulgen in #312
[Benchmark] Add softmax tritonbench integration by @yf225 in #286
Make imports relative by @jansel in #310
Generalize l2_grouping to support 3+ dimensions by @jansel in #313
Remove make_precompiler generated wrapper by @jansel in #314
Enforce ANN/PGH lints by @jansel in #315
Support dynamic fill value to hl.full by @jansel in #316
Use tensor device reference in persistent kernels by @jansel in #317
Add tl._experimental_make_tensor_descriptor support by @oulgen in #322
Fix variable scoping in nested loops for multi-pass kernels by @yf225 in #324
Add HELION_DEV_LOW_VRAM env var for low GPU memory machines by @yf225 in #325
Add cross_entropy example and unit test by @yf225 in #320
[Benchmark] Add cross_entropy to tritonbench integration by @yf225 in #321
Add literal index into tuple by @joydddd in #327
Improve naming for generated helper functions by @jansel in #323
Add hl.inline_asm_elementwise by @jansel in #328
Implement static tuple unrolling and hl.static_range by @jansel in #329
Add fp8_attention example and unit test by @yf225 in #318
[Benchmark] Add fp8_attention to tritonbench integration by @yf225 in #319

New Contributors

@xuzhao9 made their first contribution in #260

Full Changelog: v0.0.9...v0.0.10

Contributors

xuzhao9, jansel, and 3 other contributors

Assets 2

08 Jul 19:27

jansel

v0.0.9

902741b

v0.0.9

What's Changed

Add tl.range warp_specialize to autotuner by @jansel in #230
Switch from TensorDescriptor to tl.make_tensor_descriptor by @jansel in #232
Enable Test fixed by Fixed by #195 by @joydddd in #236
Implement persistent kernels by @jansel in #238
Add hl.associative_scan by @jansel in #239
Fix failing tests on main by @jansel in #244
Add hl.reduce by @jansel in #240
Switch from expecttest/assertExpectedInline to assertExpectedJournal by @jansel in #241

Full Changelog: v0.0.8...v0.0.9

Contributors

jansel and joydddd

Assets 2

01 Jul 15:16

jansel

v0.0.8

43faf72

v0.0.8

What's Changed

Improve loop end bound optimization for nested tiling by @jansel in #192
Set default dot_precision to TRITON_F32_DEFAULT by @jansel in #197
Use _disable_flatten_get_tile helper in tile_id by @jansel in #200
Throw type errors immediately by @jansel in #202
Fix typo in LiteralType.merge by @jansel in #201
Add support for global statements in type propagation by @jansel in #203
Remove ErrorReporting class and simplify warning handling by @jansel in #204
Add InvalidDeviceForLoop exception type by @jansel in #205
Fix bug with renamed variable flowing into phi() node by @jansel in #206
Move hl.grid tests to their own file by @jansel in #208
Remove NDGridTileStrategy by @jansel in #209
Simplify codegen for hl.grid by @jansel in #210
Add support for hl.grid(begin, end, step) by @jansel in #211
Support range() loops (alias for hl.grid) by @jansel in #212
Move yz_grid disabling logic to ConfigSpec by @jansel in #213
Relax chebyshev kernel test tolerance by @jansel in #214
[RFC] Add static loop unrolling by @oulgen in #216
Add support for torch.arange by @jansel in #215
Fix a performance issue with Helion-emitted Flash Attention by @manman-ren in #181
Fix issue with phi nodes and aliasing by @jansel in #220
Fix duplicate argument handling in inductor lowering by @jansel in #222
x[i] returns scalar when i=scalar by @joydddd in #223
Fix config flatten spec for tile.id by @joydddd in #224
Fix failing tests on main by @jansel in #231
Refactor examples to use run_example helper by @jansel in #225
Add tl.range loop_unroll_factor to autotuner by @jansel in #226
Add tl.range num_stages to autotuner by @jansel in #227
Add tl.range disallow_acc_multi_buffer to autotuner by @jansel in #228
Add tl.range flatten to autotuner by @jansel in #229

New Contributors

@manman-ren made their first contribution in #181

Full Changelog: v0.0.7...v0.0.8

Contributors

jansel, oulgen, and 2 other contributors

Assets 2

18 Jun 18:06

oulgen

v0.0.7

248ece6

v0.0.7

What's Changed

Fix bug with computations based on hl.register_block_size by @jansel in #157
Generalize workaround for unbacked size hints by @jansel in #159
Don't hardcode cuda in test files by @jansel in #160
Move register_block_size/register_reduction_dim to tunable_ops.py by @jansel in #161
Unskip some previosly failing tests by @jansel in #162
Use workflow matrix to deduplicate code by @oulgen in #168
Rename TileIndexProxy to hl.Tile by @jansel in #171
Fix block size variable handling and atomic operations with symints by @jansel in #177
Codegen if tl.sum(one_elem_tensor): instead of if one_elem_tensor by @yf225 in #158
Fix visitCall in deviceIR. Always visit argument nodes by @joydddd in #180
Relax bounds on test_mask_dot by @oulgen in #182
Add lowering for Constant assignment by @joydddd in #187
Expose tile.id by @joydddd in #188
Do not precompile set configs by @oulgen in #183
Add option to ban/disallow autotuning by @oulgen in #184
Recommend PyTorch nightly build in readme by @jansel in #193
Fix issue with ConfigSpec mutation in codegen by @jansel in #195
enable_python_dispatcher() in propagate_types by @laithsakka in #191

New Contributors

@laithsakka made their first contribution in #191

Full Changelog: v0.0.6...v0.0.7

Contributors

jansel, oulgen, and 3 other contributors

Assets 2

12 Jun 20:42

oulgen

v0.0.6

b9e93c0

v0.0.6

What's Changed

Fix ast read writes by @oulgen in #148
Update pre-commit by @oulgen in #149
Try enable test_moe_matmul_ogs on CI by @yf225 in #147
[Ready for review] Add support for print(prefix_str, *tensors) by @yf225 in #140
Support hl.tile_{begin,end,block_size} by @jansel in #150
Rename TileStrategy.get_block_index to CompileEnvironment.get_block_id by @jansel in #151
Fix bug in merging sequence types by @jansel in #152
Increase atol for test_matmul_split_k by @jansel in #155
Fix bug in test_matmul_split_k by @jansel in #156
Add hl.register_tunable by @jansel in #154

Full Changelog: v0.0.5...v0.0.6

Contributors

jansel, oulgen, and yf225

Assets 2

09 Jun 15:53

oulgen

v0.0.5

9a9f3e7

v0.0.5

What's Changed

Rename linter/check_main.py -> scripts/lint_examples_main.py by @jansel in #124
Improve error message for unpacking a tile by @jansel in #125
Improve error message for overpacked tiles by @jansel in #126
[BC breaking] Simplify block size configs by @jansel in #127
Refactor reduction loop config spec by @jansel in #128
Move BlockIdSequence to its own file by @jansel in #129
Do not print output code durring autotuning by @jansel in #130
Make helion.exc.TensorOperationInWrapper not fire on non-torch ops by @jansel in #131
Add HELION_FORCE_AUTOTUNE=1 and update readme by @jansel in #132
Correct units for time printouts by @jansel in #133
Rename block_size_idx to block_id by @jansel in #134
Rename block_indices to block_ids by @jansel in #135
Add Pyre Pre-Commit Hook by @lolpack in #136
Update .pre-commit-config.yaml by @oulgen in #137
[Ready for review] Add hl.register_reduction_dim(); add support for matmul+layernorm example by @yf225 in #80
Fix bug with errors on unreachable if branch by @jansel in #138
[Error Message] Update block config size length mismatch by @drisspg in #139
Increase atol/rtol for test_error_in_non_taken_branch by @jansel in #142
Fix some typos by @jansel in #141
More fair comparison by @drisspg in #146

New Contributors

@lolpack made their first contribution in #136

Full Changelog: v0.0.4...v0.0.5

Contributors

jansel, lolpack, and 3 other contributors

Assets 2

02 Jun 17:00

oulgen

v0.0.4

80510e0

v0.0.4

What's Changed

Beef up pre-commit checks by @oulgen in #106
Run pre-commit as part of lint action by @oulgen in #108
Add jagged_dense_add_2d example in generalize tensor indexing by @jansel in #105
Update README.md with Helion logo by @oulgen in #100
Optimization pass to remove unneeded masking by @jansel in #109
Improve mask optimization to cover control flow and inductor ops by @jansel in #111
Expand README.md by @jansel in #112
Fix ImportError: cannot import name 'Never' from 'typing' by @jansel in #114
Remove 'first_non_grid_index' for hl.grid index by @jansel in #113
Pass to remove unnecessary hl.tile_index calls by @jansel in #115
Replace torch.fx.GraphModule with torch.fx.Graph by @jansel in #116
MoE matmul example by @yf225 in #110
Add main() to moe_matmul_ogs by @yf225 in #118
Add pre-commit hook to make sure examples have a main function by @oulgen in #119
Add reduction example: Long sum by @joydddd in #92
Make loop reordering work with register_block_size by @jansel in #117
Temporarily disable unit test for moe_matmul_ogs example by @yf225 in #120
Skip test_moe_matmul_ogs on older cards by @jansel in #121
Make l2_grouping work with register_block_size by @jansel in #122
Re-enable unit test for moe_matmul_ogs example; skip in fbcode by @yf225 in #123

New Contributors

@joydddd made their first contribution in #92

Full Changelog: v0.0.3...v0.0.4

Contributors

jansel, oulgen, and 2 other contributors

Assets 2

30 May 20:27

oulgen

v0.0.3

2f6f528

v0.0.3

What's Changed

Minor fix to test file name by @yf225 in #1
Add CI workflow by @yf225 in #2
Allow direct running of add.py example by @yf225 in #6
[CI] Use A10G (g5.4xlarge) machine type by @yf225 in #4
Use site-package for torch in pyre_configuration by @jansel in #8
Add use_default_config setting by @jansel in #9
Add LICENSE/CONTRIBUTING.md/CODE_OF_CONDUCT.md by @jansel in #11
Support persistent reductions by @jansel in #10
Fix handling of block_ptr + reductions by @jansel in #12
Support inductor lowerings that require multiple buffers by @jansel in #13
Adjust rtol/atol for test_sum_keepdims by @yf225 in #14
Support Python 3.10; Run lint in CI by @yf225 in #7
Support looped reductions by @jansel in #15
Compile in a subprocess to kill hangs by @jansel in #16
Refactor autotuning logging by @jansel in #17
Support view ops by @jansel in #18
Support indirect loads by @jansel in #19
Improve README.md by @jansel in #20
Support if/else control flow by @jansel in #21
Add hl.constexpr specialization by @jansel in #22
Fix license file for PEP 621 by @oulgen in #23
Use search-strategy: all for all site packages in pyre config by @stroxler in #25
Add decorator check by @oulgen in #24
Trigger CI on pull requests made by ghstack by @oulgen in #27
Add hl.register_block_size and explicit tile sizes by @jansel in #30
Update lint github workflow by @jansel in #31
Add ../pytorch-nightly to Pyre optional_search_path by @yf225 in #36
Fix TensorDescriptor handling in _find_device by @yf225 in #35
Add HELION_USE_DEFAULT_CONFIG env var to force use default config by @yf225 in #37
Add more pytorchbot utils by @oulgen in #43
Add the core properties to Config object by @drisspg in #49
Switch build system to Hatchling which has much better Language Server support by @drisspg in #55
Add attention example and fix some bugs by @jansel in #56
Fix bug where non-tensor variables are not exposed to inner loops by @jansel in #58
Add hl.grid(...) support by @yf225 in #59
Fix more unit tests by @oulgen in #64
Fix test_matmul_tensor_descriptor unit test by @yf225 in #65
Prototyping an hl.atomic opp by @drisspg in #63
Add hl.specialize and improve reduction handling by @jansel in #72
[test] Touch test/init.py to support more testing workflows by @danzimm in #73
[reland without ghstack] handle PTXASError by @jansel in #79
Support data-dependent loop bounds by @jansel in #81
Add support for hl.tile(begin, end) and hl.tile(begin, end, block_size) by @jansel in #82
Support user-defined minimum in hl.register_block_size by @jansel in #83
Don't re-wrap exceptions in exc.TorchOpTracingError by @jansel in #84
Add hl.tile_index() by @jansel in #89
Add filecheck dependency by @jansel in #95
Add env HELION_PRINT_OUTPUT_CODE=1 by @jansel in #93
Add extra_mask arg to hl.load and hl.store by @jansel in #94
Bump project version by @oulgen in #101
Swap to using hatch vcs by @oulgen in #103
Add publish to pypi workflow by @oulgen in #104

New Contributors

@stroxler made their first contribution in #25
@drisspg made their first contribution in #49
@danzimm made their first contribution in #73

Full Changelog: https://github.com/pytorch-labs/helion/commits/v0.0.3

Contributors

danzimm, jansel, and 4 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

Releases: pytorch-labs/helion

v0.0.10

What's Changed

New Contributors

Contributors

Uh oh!

v0.0.9

What's Changed

Contributors

Uh oh!

v0.0.8

What's Changed

New Contributors

Contributors

Uh oh!

v0.0.7

What's Changed

New Contributors

Contributors

Uh oh!

v0.0.6

What's Changed

Contributors

Uh oh!

v0.0.5

What's Changed

New Contributors

Contributors

Uh oh!

v0.0.4

What's Changed

New Contributors

Contributors

Uh oh!

v0.0.3

What's Changed

New Contributors

Contributors

Uh oh!