Skip to content

add options for xccl work #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1,661 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1661 commits
Select commit Hold shift + click to select a range
064a7db
[invoke_subgraph] turn on supports_input_mutation by default (#157177)
ydwu4 Jun 27, 2025
aeffb68
[schema_upgrader] add C++ upgrader for json based upgrading (#156761)
ydwu4 Jun 28, 2025
67f8270
[ROCm] test_hip_device_count safely runs on 1 GPU systems (#156398)
BLOrange-AMD Jun 28, 2025
0629dfb
Fix FSDP offload pin_memory bug (#157147)
Edenzzzz Jun 28, 2025
996206e
cublaslt/hipblaslt persistent workspace (#156495)
jeffdaily Jun 28, 2025
772d590
[CUTLASS] [CUDA] SM100 GroupMM (#156203)
AaronWang04 Jun 28, 2025
90b973a
[BE] parse CMake version from `cmake -E capabilities` instead of `cma…
XuehaiPan Jun 28, 2025
2380115
[BE] use `pathlib.Path` instead of `os.path.*` in `setup.py` (#156742)
XuehaiPan Jun 28, 2025
836bb19
[hop] support torch.func.functional_call in hop subgraph (#155886)
ydwu4 Jun 27, 2025
6cc490d
simplify max(1,x) to x when x known >=1 (#157189)
laithsakka Jun 28, 2025
bccb847
[ROCm] Allow use of rocSOLVER for Cholesky inversion. (#157154)
naromero77amd Jun 29, 2025
2796f31
[DCP] OSS Zero Overhead Checkpointing Implementation (#156207)
Saiteja64 Jun 29, 2025
1913c91
Fixes issue #156414: Fixes bug in implementation of _combine_histogra…
Akabbaj Jun 29, 2025
f829311
[BE][13/16] fix typos in torch/ (torch/ao/) (#156603)
XuehaiPan Jun 29, 2025
347ace4
Inductor logging + analysis of torch.profile (#149697)
exclamaforte Jun 29, 2025
9d67738
[async compile] make it more obvious that we support backwards (#157204)
bobrenjc93 Jun 28, 2025
e959dd0
[TSAN][live speech translation] Fix A data race in caffe2 (#156378)
Polyomino Jun 29, 2025
b147b6c
Increase tolerance for test_corrcoef_cuda_int32 (#157206)
jansel Jun 28, 2025
aec569d
[Triton] [Inductor[ Add tt.descriptor_store to get_tma_stores (#157212)
njriasan Jun 29, 2025
721d258
[dynamo][callbacks] temporarily disable TRITON_AUTOTUNING (#157186)
xmfan Jun 29, 2025
29f76ec
Revert "[BE] use `pathlib.Path` instead of `os.path.*` in `setup.py` …
pytorchmergebot Jun 29, 2025
41f6ace
Update pr_time_benchmarks expected results (#157214)
jansel Jun 29, 2025
018e982
[nativert] hook up memory planning to execution frame (#157053)
dolpm Jun 30, 2025
11f7e2f
[caffe][executorch] rename to avoid shadow in irange (#157107)
JakeStevens Jun 30, 2025
c27f83d
Remove old ASAN Docker images (#157197)
cyyever Jun 30, 2025
12cb06e
[inductor] Increase tolerance for test_comprehensive_nn_functional_li…
jansel Jun 26, 2025
86ced14
increment pending_callbacks_counter before initation the pt2 compile …
burak-turk Jun 30, 2025
771be85
[AOTI] Print out error msg when nvcc compiler fails (#157203)
desertfire Jun 29, 2025
a1282b1
[MPS] Add boilerplate sparse code support (#157238)
Isalia20 Jun 30, 2025
f79689b
updated matplotlib version in docs requirements (#155931)
morrison-turnansky Jun 30, 2025
ccb67f3
Enable the AMP precision with freezing for CPU nightly test (#152298)
LifengWang Jun 30, 2025
b1a54fa
[xla hash update] update the pinned xla hash (#156584)
pytorchupdatebot Jun 30, 2025
ffaed8c
Update slow tests (#155448)
pytorchupdatebot Jun 30, 2025
fab53df
Fixes for CPython int/float tests (#155978)
guilhermeleobas Jun 27, 2025
da1f337
Revert "Fixes for CPython int/float tests (#155978)"
pytorchmergebot Jun 30, 2025
3b4b5f8
[SDPA] Fix `alloc_with_matching_layout` stride sorting (#157145)
eqy Jun 30, 2025
e3afbb0
[inductor] Add typing to _inductor/ir.py (#149958)
rec Jun 30, 2025
39b71d1
[Inductor] add pedantic to limit inductor code follow standard. (#156…
xuhancn Jun 30, 2025
b54eac2
Upgrade to DLPack 1.0. (#145000)
ysiraichi Jun 28, 2025
c038719
Revert "Inductor logging + analysis of torch.profile (#149697)"
pytorchmergebot Jun 30, 2025
ed5d6d2
python definitely_contiguous-> is_contiguous_or_false (#156515)
laithsakka Jun 25, 2025
117db56
HF loads dcp - don't do a full deserialize on every file (#155942)
ankitageorge Jun 23, 2025
f8cc4c0
[inductor] Update triton_key import to support latest Triton (#157242)
jansel Jun 30, 2025
2815eea
[dtensor] relax device_mesh argument constraint in local_map (#157049)
wanchaol Jun 30, 2025
5e18bc3
[PowerPC] Fixed build issue for vsx vec256 complexfloat and scaled_mm…
Tiwari-Avanish Jun 30, 2025
efbf07e
Revert "[dynamo] Fix issue with tensors passed as view() shapes (#156…
pytorchmergebot Jun 30, 2025
d5e6f42
Revert "Use std::string_view in torchgen (#157050)"
pytorchmergebot Jun 30, 2025
c7b6c98
[tp] improve parallelize_module API to support more cases (#157182)
wanchaol Jun 30, 2025
f16053f
Switch to standard pep517 sdist generation (#152098)
zklaus Jun 30, 2025
2349151
Fixes for CPython int/float tests (#155978)
guilhermeleobas Jun 30, 2025
3684be0
[dynamo] Fix source for lru_cache method (#157292)
anijain2305 Jun 30, 2025
d3efd73
Revert "[cutlass backend][BE][ez] Make matmul layouts be row x column…
pytorchmergebot Jun 30, 2025
f096820
[precompile] Detect source code changes for save/load. (#156432)
zhxchen17 Jun 30, 2025
a9352bd
Script for consolidation of sharded safetensor files (#154743)
ankitageorge Jun 30, 2025
42b48ee
[dynamo][fsdp] Consistent behavior of int attributes (#157262)
anijain2305 Jun 30, 2025
3dda80e
Overload `mul_overflows` for `size_t` (#155736)
aaron-ang Jun 30, 2025
fee2377
Reapply D77381084 / #156964: Rename torch::standalone to headeronly (…
swolchok Jun 30, 2025
7709ff5
[remove untyped defs] batch 1 (#157011)
bobrenjc93 Jun 30, 2025
4ebd269
[Testing] Remove duplicate MPSInductor tests (#157328)
malfet Jun 30, 2025
b60569e
HF - consolidate shards of safetensors files to full tensors in finis…
ankitageorge Jun 30, 2025
f860992
Add a custom profiler configuration option (#151656)
fwenguang Jul 1, 2025
6dc2b22
[ROCm][SymmetricMemory] Performance improvements for two-shot allredu…
pragupta Jul 1, 2025
c174f3a
[ONNX] Delete deprecated tutorial page link (#157310)
titaiwangms Jul 1, 2025
3ed4384
[dynamo] temporarily disabling generation of weblinks for torch v2.8 …
Sidharth123-cpu Jun 30, 2025
f40efde
[CI] Add prebuild command option, set prebuild command option for CI …
clee2000 Jun 27, 2025
b5ce77c
[ROCm] Initial AITER Integration for mha_bwd asm kernels (#152630)
alugorey Jul 1, 2025
b40981c
Fix incorrect stride handling in adaptive_avg_pool3d (#157326)
jansel Jun 30, 2025
4d5d627
Remove super spammy log (#157157)
drisspg Jun 27, 2025
c811f41
[BE] Remove unused variable from Pooling.metal (#157332)
malfet Jul 1, 2025
04bd7e6
[ROCm] Remove use of `warpsize` on host-side compilation (#156979)
ethanwee1 Jul 1, 2025
8f0998a
Check F2C BLAS for OpenBLAS and other vendors (#143846)
isuruf Jul 1, 2025
7546996
[BE] always use `uv pip` if possible in `pip_init.py` for `lintrunner…
XuehaiPan Jun 28, 2025
c202a73
Revert "Fixes for CPython int/float tests (#155978)"
pytorchmergebot Jul 1, 2025
0596323
Better fix for `__index__` SymInt issue (#157201)
jansel Jul 1, 2025
210632f
[ROCm] support experimental CU carveout (#149466)
jeffdaily Jul 1, 2025
a767e50
remove allow-untyped-defs from torch/fx/experimental/migrate_gradual_…
bobrenjc93 Jul 1, 2025
0bce390
Revert "[dynamo] Add fx_graph_runnable test coverage (#157021)"
pytorchmergebot Jul 1, 2025
13bf265
Revert "HF loads dcp - don't do a full deserialize on every file (#15…
pytorchmergebot Jul 1, 2025
534c454
Revert "[xla hash update] update the pinned xla hash (#156584)"
pytorchmergebot Jul 1, 2025
1586521
Revert "Compute contiguity symbolically to avoid dde, and introduce c…
pytorchmergebot Jul 1, 2025
023887f
Revert "Switch to standard pep517 sdist generation (#152098)"
pytorchmergebot Jul 1, 2025
c78fce9
[dynamo] show frame information when recompilation is triggered on fa…
zhxchen17 Jun 30, 2025
b146e1a
[BE] remove duplicates in generated `torch._VF.__all__` (#157365)
XuehaiPan Jul 1, 2025
0f9c1b3
[dynamo] Ensure global state guard is preserved across serialization.…
zhxchen17 Jun 30, 2025
47f10d0
Inductor logging + analysis of torch.profile (#149697)
exclamaforte Jul 1, 2025
3bc6bdc
[BE] add type annotations and run `mypy` on `setup.py` (#156741)
XuehaiPan Jul 1, 2025
720c2c4
[Inductor UT][XPU] Reduce the runtime of the test case test_comprehen…
etaf Jul 1, 2025
1c8844d
[MPS] Switch Cholesky decomp to column wise (#157014)
malfet Jul 1, 2025
e1aee86
Fused RMSNorm implementation (#153666)
AaronWang04 Jul 1, 2025
02608e5
[ROCm] Add more shards for inductor dashboard, more frequent runs (#1…
jataylo Jul 1, 2025
3a5677a
Revert "ci: Add ability to test images for build-triton-wheel (#156894)"
pytorchmergebot Jul 1, 2025
6401d1d
Revert "Fused RMSNorm implementation (#153666)"
pytorchmergebot Jul 1, 2025
01b0f09
Fix full_like decomposition to preserve strides (#144765)
isuruf Jul 1, 2025
ffac0de
[export] Remove stack trace from input/output (#157302)
angelayi Jul 1, 2025
6bc2638
[SymmMem] Add NVSHMEM_CHECK macro (#157174)
kwen2501 Jul 1, 2025
4500a4a
remove allow-untyped-defs from torch/backends/mps/__init__.py (#157227)
bobrenjc93 Jul 1, 2025
019e30e
[BE] Decorate LargeTensorTest with serialTests (#157382)
malfet Jul 1, 2025
e5f6ffd
[BE] Replace `checkcall("chmod")` with `os.chmod` (#157373)
malfet Jul 1, 2025
22edb45
[invoke_subgraph][partitioner] Add meta val on run_and_save_rng ops (…
anijain2305 Jul 1, 2025
d0a9629
[do not revert] Compute contiguity symbolically to avoid dde, and int…
laithsakka Jul 1, 2025
3df6360
[BE][Easy][setup] use `super().method(...)` in command subclasses in …
XuehaiPan Jul 1, 2025
6ef70ed
Revert "Inductor logging + analysis of torch.profile (#149697)"
pytorchmergebot Jul 1, 2025
563fd95
[inductor][user triton] sanitize triple-quoted docstrings in kernel d…
davidberard98 Jul 1, 2025
c6a27ba
Revert "[do not revert] Compute contiguity symbolically to avoid dde,…
pytorchmergebot Jul 1, 2025
ab6cb34
Revert "[inductor][user triton] sanitize triple-quoted docstrings in …
pytorchmergebot Jul 1, 2025
617e3f6
[FP8] Fix Benchmarking for certain Priors (#155722)
oniononion36 Jul 2, 2025
7767675
[dynamo] Add fx_graph_runnable test coverage (#157021)
skarjala Jul 1, 2025
fa1c20a
Fix test consolidate hf safetensors (#157386)
ankitageorge Jul 1, 2025
bb47631
[dynamo][guards] Stash root guard manager pointer in the LeafGuard (#…
anijain2305 Jul 1, 2025
0a63053
Don't store flamegraph to tmp folder (#157374)
malfet Jul 1, 2025
5a2db51
allow to use bf16 as fp32 internal precision for mkldnn conv (#126050)
zhuhaozhe Jul 1, 2025
4c8eb65
allow to use bf16 as fp32 internal precision for mkldnn conv backward…
zhuhaozhe Jul 1, 2025
f8c0a4b
[inductor] enable bf32 test for mkldnn conv (#127293)
zhuhaozhe Jul 1, 2025
0364db7
[PT] support custom all_gather and reduce_scatter comms (#155189)
xunnanxu Jul 2, 2025
8c0df6f
Revert "[dynamo][fsdp] Consistent behavior of int attributes (#157262)"
pytorchmergebot Jul 2, 2025
3173616
[nativert] start to move generated static dispatch kernels (#157403)
dolpm Jul 2, 2025
ab2294d
[dynamo] fix _torchdynamo_orig_callable naming issues (#156901)
williamwen42 Jul 1, 2025
34c8033
Fix a div_mod bug in generic_math.h (#157383)
desertfire Jul 1, 2025
bdb7819
[dynamo, nested graph breaks] remove recursive cell/freevar in instru…
williamwen42 Jul 1, 2025
d5a8917
Revert "[dynamo] Add fx_graph_runnable test coverage (#157021)"
pytorchmergebot Jul 2, 2025
c553c55
Revert "Fix full_like decomposition to preserve strides (#144765)"
pytorchmergebot Jul 2, 2025
82eefae
[inductor][user triton] sanitize triple-quoted docstrings in kernel d…
davidberard98 Jul 1, 2025
b096341
[BE] use `pathlib.Path` instead of `os.path.*` in `setup.py` (#156742)
XuehaiPan Jul 2, 2025
9d175bc
Fixes for CPython int/float tests (#155978)
guilhermeleobas Jul 2, 2025
9f5276d
Fix typo: 'Intializes' → 'Initializes' in _distributed_c10d.pyi docst…
abhitorch81 Jul 2, 2025
0edc1b9
[Inductor] Disable decompose_k for AMD (#157283)
PaulZhang12 Jul 2, 2025
54701a0
Add is_hidden_event method to KinetoEvent Python interface (#155214)
wdziurdz Jul 2, 2025
bd6b5fd
[Precompile] [easy] Serialize requires_grad for tensors when serializ…
jamesjwu Jul 1, 2025
156bc24
Back out "Include c++ stack traces when we hit constraint violation (…
ppanchalia Jul 2, 2025
d5d14ee
[nativert] create persistent value helper (#157286)
dolpm Jul 2, 2025
0105cd8
[ONNX] Fix conversion of attention - 4D (#157130)
xadupre Jul 2, 2025
0e9d803
[build] remove cmake cache and reconfigure again if it is invalid (#1…
XuehaiPan Jul 2, 2025
eaf32ff
fixed a tiny typo in torch.compiler.md (#157462)
Dhia-naouali Jul 2, 2025
5e636d6
[BE] `@serialTest` decorator must be called (#157388)
malfet Jul 1, 2025
32983ea
[nativert] continue to move generated static dispatch kernels (#157460)
dolpm Jul 2, 2025
36dd598
layernorm tests: Tweak test thresholds for comparing tensors (#156699)
ahmadsharif1 Jul 2, 2025
06f39a7
Add Release 2.8 CUDA matrix. Update Release schedule for 2.7.1 and 2.…
atalman Jul 2, 2025
94716db
[BE][DCE] eliminate remnants of global gemm cache (#157327)
nmacchioni Jul 2, 2025
3f569f9
[BE] Remove extra semicolon (#157486)
malfet Jul 2, 2025
e0ab1b5
[ez][BE] Remove max jobs override for CI build jobs (#157473)
clee2000 Jul 2, 2025
1728535
[inductor] more size_hint_or_throw usage (#157394)
ColinPeppler Jul 1, 2025
e20784f
[dynamo] Support BUILTIN_MATCH serialization. (#157016)
zhxchen17 Jul 2, 2025
6f60cfe
[ez] Add super().setUp() in test_ops::TestFakeTensor (#157475)
clee2000 Jul 2, 2025
c09cf29
[ez][BE] Tag deletion script to delete any old ciflow + autorevert ta…
clee2000 Jul 2, 2025
af9c92b
[CI] Remove redundant accuracy benchmarks for cpp_wrapper (#155966)
benjaminglass1 Jul 2, 2025
4b4c2a7
Support complex numbers in DTensor redistribute (#157329)
wconstab Jul 1, 2025
60e66d1
[CI] Keep-going on main (#157470)
clee2000 Jul 2, 2025
fd4f704
[ez][CI] Print set output in CI (#157477)
clee2000 Jul 2, 2025
48560ee
[dynamo] Fix bug in dict(mapping_proxy) (#157467)
anijain2305 Jul 2, 2025
c0e155a
[cutlass backend] Use alignment of D for EVT / Float8 (#157402)
henrylhtsang Jul 1, 2025
541584d
[BE][8/16] fix typos in torch/ (torch/csrc/jit/) (#156318)
XuehaiPan Jul 2, 2025
d5cdc36
[BE][10/16] fix typos in torch/ (torch/csrc/jit/) (#156320)
XuehaiPan Jul 2, 2025
db259bd
[BE][12/16] fix typos in torch/ (#156602)
XuehaiPan Jul 2, 2025
11c07c8
[BE][14/16] fix typos in torch/ (torch/fx/) (#156604)
XuehaiPan Jul 2, 2025
d40aaa4
[BE][16/16] fix typos in torch/ (torch/utils/) (#156606)
XuehaiPan Jul 2, 2025
7cfd054
[attempt 2] Compute contiguity symbolically to avoid dde, and introdu…
laithsakka Jul 2, 2025
e124a0d
[BE] Unskip special ops (#157464)
malfet Jul 2, 2025
9620994
[MPS] Add `shifted_chebyshev_polynomial_[tuvw]` (#157488)
malfet Jul 2, 2025
7597988
[fake tensor] fix issue of no attribute tags (#156689)
Valentine233 Jul 3, 2025
5cc4e85
Add device_id to XPU device properties (#156481)
guangyey Jun 29, 2025
662c1cf
[c10d][PGNCCL] Add waitcounter for watchdog and heartbeat monitoring …
fduwjj Jul 2, 2025
493f42a
[symm_mem] Create a one side get api for symm mem (#157294)
fduwjj Jun 30, 2025
b642a5c
[cutlass backend] Add dynamo timed (#157410)
henrylhtsang Jul 2, 2025
404008e
[build] modernize build-backend: `setuptools.build_meta:__legacy__` -…
XuehaiPan Jul 2, 2025
5d5a5b3
Fix GITHUB_OUTPUT syntax in create_release.yml workflow (#157466)
zklaus Jul 2, 2025
dc524ef
Move logging into inner method for reorder pass (#156879)
wconstab Jul 2, 2025
382598e
Fix unsafe collective reorder past wait (#157489)
wconstab Jul 2, 2025
4ce6e6e
XCCL changes for DDP (#155497)
newtdms Jul 3, 2025
2bb33e7
Fixed triton kernel in ET due to Triton version change. (#157484)
shengfukevin Jul 3, 2025
8c2e450
[PT][FSDP] fail `set_allocate_memory_from_process_group` if used toge…
xunnanxu Jul 3, 2025
5dfd8a9
Remove is_jit_trace option (#157387)
tugsbayasgalan Jul 3, 2025
c329a8f
Fix CPU bitwise shifts for out-of-limit values in VSX-vec (#157463)
Flamefire Jul 3, 2025
8408522
Remove +PTX from CUDA 12.8 builds (#157516)
atalman Jul 3, 2025
b221be9
Fix typo: 'intial_query_grad' → 'initial_query_grad' in test_transfor…
abhitorch81 Jul 3, 2025
a0e0abd
Fix typo: 'intialized' → 'initialized' in test_modules.py (#157226)
abhitorch81 Jul 3, 2025
b6276a4
Revert "[MPS] Add `shifted_chebyshev_polynomial_[tuvw]` (#157488)"
pytorchmergebot Jul 3, 2025
c9174a2
Revert "[BE] Unskip special ops (#157464)"
pytorchmergebot Jul 3, 2025
f17f658
[profiler] add more CUDA API for kernel launcher (#156016)
namgyu-youn Jul 3, 2025
ec816d7
[MPS] Add `shifted_chebyshev_polynomial_[tuvw]` (#157488)
malfet Jul 3, 2025
e472daa
[dynamo] Add fx_graph_runnable test coverage (#157021)
skarjala Jul 2, 2025
2e64e45
Revert "[build] modernize build-backend: `setuptools.build_meta:__leg…
pytorchmergebot Jul 3, 2025
8981793
[cutlass backend] fix CutlassTensor post-renaming (#157408)
henrylhtsang Jul 2, 2025
5cfe437
[dtensor] Rework partial propagation in pointwise op and support mul …
wanchaol Jul 3, 2025
660dbea
[cutlass backend] modify presets ahead of cutlass 4 upgrade (#157522)
henrylhtsang Jul 3, 2025
e3fe001
Add einops x torch.compile testing in PyTorch CI (#157416)
zou3519 Jul 3, 2025
794b95d
Enable Half dtype for logcumsumexp_backward (#157512)
manuelcandales Jul 3, 2025
d56f11a
[MPS] Implement logcumsumexp metal kernel (#156858)
manuelcandales Jul 3, 2025
3fd84a8
[BE][PYFMT] migrate PYFMT for `torch/[a-c]*/` to `ruff format` (#144554)
XuehaiPan Jul 3, 2025
19ae5af
Fix typo: 'recieve' → 'receive' in comments (#157544)
abhitorch81 Jul 3, 2025
7b392ba
all_gather_bucketing fx pass (#157396)
IvanKobzarev Jul 3, 2025
7081b82
[BE] Accelerator agnostic timer.py (#157131)
msaroufim Jul 3, 2025
dd3e717
Add async checkpointing impl to experimental checkpointer and add a b…
teja-rao Jul 3, 2025
a6fab82
[BE]: Fix NVSHMEM builds, add missing 12.9 dependency and update to l…
Skylion007 Jul 3, 2025
b359571
Fix is_unaligned usage of statically_known_true (#157400)
laithsakka Jul 3, 2025
ad86c05
efficient zero_mask implementation for vec128_*_neon (#155766)
swolchok Jul 3, 2025
f7130c0
[nativert] Move Executor to PyTorch core (#157514)
henryoier Jul 3, 2025
6c42afe
Introduce sync_cross_rank_decision (#156287)
Microve Jul 3, 2025
e7167db
[Set] Support sets in VariableBuilder (#153150)
guilhermeleobas Jul 3, 2025
f651e28
[FrozenSet] Fixes for FrozenSet (#152991)
guilhermeleobas Jul 3, 2025
2b82c61
[Generator] Implement generator.__contains__ (#154539)
guilhermeleobas Jul 3, 2025
22abe6d
[Dynamo] [SetSubclass] Add support for user defined sets (#153553)
guilhermeleobas Jul 3, 2025
11c7105
[Dynamo] [Set] Implement some binop operators for dict/set/frozenset/…
guilhermeleobas Jul 3, 2025
f9544f1
[Dynamo] [Set] Raise TypeError if object is unhashable (#154064)
guilhermeleobas Jul 3, 2025
c51da57
[Dynamo] [Set] Raise TypeError in set.union(...) and "__or__" (#154065)
guilhermeleobas Jul 3, 2025
308b88b
[Dynamo] [Set] Add comparison for set subclass (#154066)
guilhermeleobas Jul 3, 2025
0e7f02f
[Dynamo] [FrozensetSubclass] Add support for user defined frozensets …
guilhermeleobas Jul 3, 2025
dfcda61
Ensure Dynamo can trace through explicit dunder method call (#154366)
guilhermeleobas Jul 3, 2025
c9a5bf0
[FP8] FP8 for SwishLayerNorm (#157574)
oniononion36 Jul 4, 2025
f0b3886
Add dynamo_timed to bytecode hook (#157587)
zou3519 Jul 3, 2025
ef97bd4
[torch] Add MTIA to the list of devices supporting foreach/fused kern…
mustafaquraish Jul 4, 2025
8f9a191
[SymmMem] Fix CI name mismatch; remove TORCH_SYMMMEM requirement (#15…
kwen2501 Jul 3, 2025
4ed1b03
Add missing graph and memory related symbols to cuda_to_hip_mappings …
gwkimatem Jul 4, 2025
99c1a6b
[SymmMem] Find NVSHMEM from system installation (#157513)
kwen2501 Jul 4, 2025
f2e712c
Revert "Fix is_unaligned usage of statically_known_true (#157400)"
pytorchmergebot Jul 4, 2025
386bc9e
[audio hash update] update the pinned audio hash (#156905)
pytorchupdatebot Jul 4, 2025
d58ed04
[async-compile] add progressive compile mode (#157305)
bobrenjc93 Jul 3, 2025
fdc5b42
_broadcast_shapes gso generalizations (#157008)
laithsakka Jul 2, 2025
64f2ec7
[inductor] Fix fractional_max_pool2d 3D input causing assertion error…
jansel Jul 2, 2025
52e4e41
[dynamo] do not issue lru_cache warning for functions in the top-leve…
williamwen42 Jul 4, 2025
f41d017
Add device check in `mse_loss` (#155089)
zeshengzong Jul 4, 2025
a46ea8a
Fix typo: 'initalized' → 'initialized' in alias analysis test (#157628)
abhitorch81 Jul 4, 2025
336f1e2
[AOTI] Fix AOT inductor CMake build dependency order (#157557)
XuehaiPan Jul 4, 2025
7be862a
[dynamo] Relax DUPLICATED_INPUT to be serializable. (#157492)
zhxchen17 Jul 3, 2025
7275f28
Fix cuda 12.9 aarch64 GPU builds. Update CUDA_STABLE variable. (#157…
atalman Jul 4, 2025
9968edd
Fix #153942 (#153943)
rec Jul 4, 2025
524e827
[build] modernize build-backend: `setuptools.build_meta:__legacy__` -…
XuehaiPan Jul 4, 2025
bcc98bb
Update _linux-test to support B200 runner (#157341)
huydhn Jul 4, 2025
8a8fac1
[SymmMem] Move code to where it is used (#157611)
kwen2501 Jul 4, 2025
43f7216
Fix typo: 'paramters' → 'parameters' in ATen tunable README (#157575)
abhitorch81 Jul 5, 2025
e0fd48b
Fix typo: 'occurances' → 'occurrences' in mobile model test (#157629)
abhitorch81 Jul 5, 2025
44f5b93
fix: correct sentence punctuation in cuDNN note (#157623)
princeaden1 Jul 5, 2025
f7127b9
[Refactor] Remove unused variables (#157654)
yewentao256 Jul 5, 2025
63e87d6
[Refactor] Add maybe unused flag to remove warning (#157655)
yewentao256 Jul 5, 2025
a952956
Add isnan exit condition to special ops (#157464)
malfet Jul 4, 2025
5ea832e
[pc] migrate progression futures from list to deque (#157614)
bobrenjc93 Jul 5, 2025
db00e16
[pc] introduce ProgressiveCompilationState and clear callback (#157619)
bobrenjc93 Jul 5, 2025
2471cc3
[pc] verify max autotune is in generated source code (#157650)
bobrenjc93 Jul 5, 2025
71a650a
Fix typo: 'Intializing' → 'Initializing' in test_parametrization.py (…
abhitorch81 Jul 5, 2025
548c9d8
Fix typo: 'paramter' → 'parameter' in quantization model report test …
abhitorch81 Jul 5, 2025
9be5860
[dynamo] Fix dynamic shapes handling in after_aot repro generation (#…
skarjala Jul 3, 2025
ee9ac36
Fixing misspelling in documentation (#157565)
dsashidh Jul 5, 2025
3e56a9c
More testing of Python arithmetic operators between tensors and scala…
rec Jul 5, 2025
7cda401
Fix torch.utils.cpp_extension parser for clang version 20.1.7+libcxx …
AngryLoki Jul 6, 2025
17687eb
[BE][4/6] fix typos in test/ (test/inductor/) (#157638)
XuehaiPan Jul 6, 2025
02715d0
[BE][5/6] fix typos in test/ (test/dynamo/) (#157639)
XuehaiPan Jul 6, 2025
2022588
Fix: Ensure writeback handles NO_SHARD correctly by flattening tensor…
ccchow Jul 6, 2025
d26ca5d
Support transpose and pack for bit8 (#156065)
Valentine233 Jul 7, 2025
815545f
[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor…
zhuhaozhe Jul 7, 2025
836ed81
add options for xccl work
Chao1Han Jun 4, 2025
da44942
add comm split support
Chao1Han Jul 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
5 changes: 2 additions & 3 deletions .ci/aarch64_linux/aarch64_ci_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,7 @@ set -eux -o pipefail

GPU_ARCH_VERSION=${GPU_ARCH_VERSION:-}

if [[ "$GPU_ARCH_VERSION" == *"12.6"* ]]; then
export TORCH_CUDA_ARCH_LIST="9.0"
elif [[ "$GPU_ARCH_VERSION" == *"12.8"* ]]; then
if [[ "$GPU_ARCH_VERSION" == *"12.9"* ]]; then
export TORCH_CUDA_ARCH_LIST="9.0;10.0;12.0"
fi

Expand All @@ -27,6 +25,7 @@ if [ "$DESIRED_CUDA" = "cpu" ]; then
USE_PRIORITIZED_TEXT_FOR_LD=1 python /pytorch/.ci/aarch64_linux/aarch64_wheel_ci_build.py --enable-mkldnn
else
echo "BASE_CUDA_VERSION is set to: $DESIRED_CUDA"
export USE_SYSTEM_NCCL=1
#USE_PRIORITIZED_TEXT_FOR_LD for enable linker script optimization https://github.com/pytorch/pytorch/pull/121975/files
USE_PRIORITIZED_TEXT_FOR_LD=1 python /pytorch/.ci/aarch64_linux/aarch64_wheel_ci_build.py --enable-mkldnn --enable-cuda
fi
7 changes: 4 additions & 3 deletions .ci/aarch64_linux/aarch64_wheel_ci_build.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ def package_cuda_wheel(wheel_path, desired_cuda) -> None:
os.system(f"unzip {wheel_path} -d {folder}/tmp")
libs_to_copy = [
"/usr/local/cuda/extras/CUPTI/lib64/libcupti.so.12",
"/usr/local/cuda/extras/CUPTI/lib64/libnvperf_host.so",
"/usr/local/cuda/lib64/libcudnn.so.9",
"/usr/local/cuda/lib64/libcublas.so.12",
"/usr/local/cuda/lib64/libcublasLt.so.12",
Expand All @@ -88,7 +89,7 @@ def package_cuda_wheel(wheel_path, desired_cuda) -> None:
"/usr/local/cuda/lib64/libcusparseLt.so.0",
"/usr/local/cuda/lib64/libcusolver.so.11",
"/usr/local/cuda/lib64/libcurand.so.10",
"/usr/local/cuda/lib64/libnvToolsExt.so.1",
"/usr/local/cuda/lib64/libnccl.so.2",
"/usr/local/cuda/lib64/libnvJitLink.so.12",
"/usr/local/cuda/lib64/libnvrtc.so.12",
"/usr/local/cuda/lib64/libcudnn_adv.so.9",
Expand All @@ -108,9 +109,9 @@ def package_cuda_wheel(wheel_path, desired_cuda) -> None:
"/usr/local/lib/libnvpl_blas_core.so.0",
]

if "128" in desired_cuda:
if "129" in desired_cuda:
libs_to_copy += [
"/usr/local/cuda/lib64/libnvrtc-builtins.so.12.8",
"/usr/local/cuda/lib64/libnvrtc-builtins.so.12.9",
"/usr/local/cuda/lib64/libcufile.so.0",
"/usr/local/cuda/lib64/libcufile_rdma.so.1",
]
Expand Down
13 changes: 7 additions & 6 deletions .ci/docker/almalinux/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ARG CUDA_VERSION=12.4
ARG CUDA_VERSION=12.6
ARG BASE_TARGET=cuda${CUDA_VERSION}
ARG ROCM_IMAGE=rocm/dev-almalinux-8:6.3-complete
FROM amd64/almalinux:8.10-20250519 as base
Expand Down Expand Up @@ -52,10 +52,6 @@ ENV CUDA_VERSION=${CUDA_VERSION}
# Make things in our path by default
ENV PATH=/usr/local/cuda-${CUDA_VERSION}/bin:$PATH

FROM cuda as cuda11.8
RUN bash ./install_cuda.sh 11.8
ENV DESIRED_CUDA=11.8

FROM cuda as cuda12.6
RUN bash ./install_cuda.sh 12.6
ENV DESIRED_CUDA=12.6
Expand All @@ -64,6 +60,10 @@ FROM cuda as cuda12.8
RUN bash ./install_cuda.sh 12.8
ENV DESIRED_CUDA=12.8

FROM cuda as cuda12.9
RUN bash ./install_cuda.sh 12.9
ENV DESIRED_CUDA=12.9

FROM ${ROCM_IMAGE} as rocm
ENV PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201"
ADD ./common/install_mkl.sh install_mkl.sh
Expand All @@ -78,7 +78,8 @@ RUN bash ./install_mnist.sh
FROM base as all_cuda
COPY --from=cuda11.8 /usr/local/cuda-11.8 /usr/local/cuda-11.8
COPY --from=cuda12.6 /usr/local/cuda-12.6 /usr/local/cuda-12.6
COPY --from=cuda12.4 /usr/local/cuda-12.8 /usr/local/cuda-12.8
COPY --from=cuda12.8 /usr/local/cuda-12.8 /usr/local/cuda-12.8
COPY --from=cuda12.9 /usr/local/cuda-12.9 /usr/local/cuda-12.9

# Final step
FROM ${BASE_TARGET} as final
Expand Down
93 changes: 44 additions & 49 deletions .ci/docker/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -50,30 +50,21 @@ if [[ "$image" == *xla* ]]; then
exit 0
fi

if [[ "$image" == *-focal* ]]; then
UBUNTU_VERSION=20.04
elif [[ "$image" == *-jammy* ]]; then
if [[ "$image" == *-jammy* ]]; then
UBUNTU_VERSION=22.04
elif [[ "$image" == *ubuntu* ]]; then
extract_version_from_image_name ubuntu UBUNTU_VERSION
elif [[ "$image" == *centos* ]]; then
extract_version_from_image_name centos CENTOS_VERSION
fi

if [ -n "${UBUNTU_VERSION}" ]; then
OS="ubuntu"
elif [ -n "${CENTOS_VERSION}" ]; then
OS="centos"
else
echo "Unable to derive operating system base..."
exit 1
fi

DOCKERFILE="${OS}/Dockerfile"
# When using ubuntu - 22.04, start from Ubuntu docker image, instead of nvidia/cuda docker image.
if [[ "$image" == *cuda* && "$UBUNTU_VERSION" != "22.04" ]]; then
DOCKERFILE="${OS}-cuda/Dockerfile"
elif [[ "$image" == *rocm* ]]; then
if [[ "$image" == *rocm* ]]; then
DOCKERFILE="${OS}-rocm/Dockerfile"
elif [[ "$image" == *xpu* ]]; then
DOCKERFILE="${OS}-xpu/Dockerfile"
Expand All @@ -98,8 +89,8 @@ tag=$(echo $image | awk -F':' '{print $2}')
# configuration, so we hardcode everything here rather than do it
# from scratch
case "$tag" in
pytorch-linux-focal-cuda12.6-cudnn9-py3-gcc11)
CUDA_VERSION=12.6.3
pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11)
CUDA_VERSION=12.8.1
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11
Expand All @@ -110,7 +101,7 @@ case "$tag" in
TRITON=yes
;;
pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks)
CUDA_VERSION=12.8
CUDA_VERSION=12.8.1
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
Expand All @@ -121,7 +112,31 @@ case "$tag" in
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-focal-cuda12.6-cudnn9-py3-gcc9)
pytorch-linux-jammy-cuda12.8-cudnn9-py3.12-gcc9-inductor-benchmarks)
CUDA_VERSION=12.8.1
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=9
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-jammy-cuda12.8-cudnn9-py3.13-gcc9-inductor-benchmarks)
CUDA_VERSION=12.8.1
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.13
GCC_VERSION=9
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-jammy-cuda12.6-cudnn9-py3-gcc9)
CUDA_VERSION=12.6.3
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
Expand Down Expand Up @@ -168,8 +183,8 @@ case "$tag" in
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-focal-cuda11.8-cudnn9-py3-gcc9)
CUDA_VERSION=11.8.0
pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9)
CUDA_VERSION=12.8.1
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
Expand All @@ -179,25 +194,25 @@ case "$tag" in
UCC_COMMIT=${_UCC_COMMIT}
TRITON=yes
;;
pytorch-linux-focal-py3-clang10-onnx)
pytorch-linux-jammy-py3-clang12-onnx)
ANACONDA_PYTHON_VERSION=3.9
CLANG_VERSION=10
CLANG_VERSION=12
VISION=yes
ONNX=yes
;;
pytorch-linux-focal-py3.9-clang10)
pytorch-linux-jammy-py3.9-clang12)
ANACONDA_PYTHON_VERSION=3.9
CLANG_VERSION=10
CLANG_VERSION=12
VISION=yes
TRITON=yes
;;
pytorch-linux-focal-py3.11-clang10)
pytorch-linux-jammy-py3.11-clang12)
ANACONDA_PYTHON_VERSION=3.11
CLANG_VERSION=10
CLANG_VERSION=12
VISION=yes
TRITON=yes
;;
pytorch-linux-focal-py3.9-gcc9)
pytorch-linux-jammy-py3.9-gcc9)
ANACONDA_PYTHON_VERSION=3.9
GCC_VERSION=9
VISION=yes
Expand Down Expand Up @@ -252,25 +267,14 @@ case "$tag" in
DOCS=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-jammy-cuda11.8-cudnn9-py3.9-clang12)
pytorch-linux-jammy-cuda12.8-cudnn9-py3.9-clang12)
ANACONDA_PYTHON_VERSION=3.9
CUDA_VERSION=11.8
CUDA_VERSION=12.8.1
CUDNN_VERSION=9
CLANG_VERSION=12
VISION=yes
TRITON=yes
;;
pytorch-linux-jammy-py3-clang12-asan)
ANACONDA_PYTHON_VERSION=3.9
CLANG_VERSION=12
VISION=yes
TRITON=yes
;;
pytorch-linux-jammy-py3-clang15-asan)
ANACONDA_PYTHON_VERSION=3.10
CLANG_VERSION=15
VISION=yes
;;
pytorch-linux-jammy-py3-clang18-asan)
ANACONDA_PYTHON_VERSION=3.10
CLANG_VERSION=18
Expand Down Expand Up @@ -303,15 +307,15 @@ case "$tag" in
GCC_VERSION=11
TRITON_CPU=yes
;;
pytorch-linux-focal-linter)
pytorch-linux-jammy-linter)
# TODO: Use 3.9 here because of this issue https://github.com/python/mypy/issues/13627.
# We will need to update mypy version eventually, but that's for another day. The task
# would be to upgrade mypy to 1.0.0 with Python 3.11
PYTHON_VERSION=3.9
;;
pytorch-linux-jammy-cuda11.8-cudnn9-py3.9-linter)
pytorch-linux-jammy-cuda12.8-cudnn9-py3.9-linter)
PYTHON_VERSION=3.9
CUDA_VERSION=11.8
CUDA_VERSION=12.8.1
;;
pytorch-linux-jammy-aarch64-py3.10-gcc11)
ANACONDA_PYTHON_VERSION=3.10
Expand Down Expand Up @@ -370,14 +374,6 @@ esac

tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]')

#when using cudnn version 8 install it separately from cuda
if [[ "$image" == *cuda* && ${OS} == "ubuntu" ]]; then
IMAGE_NAME="nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION}"
if [[ ${CUDNN_VERSION} == 9 ]]; then
IMAGE_NAME="nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}"
fi
fi

no_cache_flag=""
progress_flag=""
# Do not use cache and progress=plain when in CI
Expand All @@ -394,7 +390,6 @@ docker build \
--build-arg "LLVMDEV=${LLVMDEV:-}" \
--build-arg "VISION=${VISION:-}" \
--build-arg "UBUNTU_VERSION=${UBUNTU_VERSION}" \
--build-arg "CENTOS_VERSION=${CENTOS_VERSION}" \
--build-arg "DEVTOOLSET_VERSION=${DEVTOOLSET_VERSION}" \
--build-arg "GLIBC_VERSION=${GLIBC_VERSION}" \
--build-arg "CLANG_VERSION=${CLANG_VERSION}" \
Expand Down
1 change: 1 addition & 0 deletions .ci/docker/centos-rocm/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ RUN bash ./install_user.sh && rm install_user.sh

# Install conda and other packages (e.g., numpy, pytest)
ARG ANACONDA_PYTHON_VERSION
ARG BUILD_ENVIRONMENT
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
COPY requirements-ci.txt /opt/conda/requirements-ci.txt
Expand Down
2 changes: 1 addition & 1 deletion .ci/docker/ci_commit_pins/executorch.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
b173722085b3f555d6ba4533d6bbaddfd7c71144
56392aa978594cc155fa8af48cd949f5b5f1823a
2 changes: 1 addition & 1 deletion .ci/docker/ci_commit_pins/nccl-cu12.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
v2.26.5-1
v2.27.3-1
2 changes: 1 addition & 1 deletion .ci/docker/ci_commit_pins/triton-xpu.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
b0e26b7359c147b8aa0af686c20510fb9b15990a
ae324eeac8e102a2b40370e341460f3791353398
13 changes: 0 additions & 13 deletions .ci/docker/common/install_base.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,18 +30,6 @@ install_ubuntu() {
maybe_libomp_dev=""
fi

# HACK: UCC testing relies on libnccl library from NVIDIA repo, and version 2.16 crashes
# See https://github.com/pytorch/pytorch/pull/105260#issuecomment-1673399729
# TODO: Eliminate this hack, we should not relay on apt-get installation
# See https://github.com/pytorch/pytorch/issues/144768
if [[ "$UBUNTU_VERSION" == "20.04"* && "$CUDA_VERSION" == "11.8"* ]]; then
maybe_libnccl_dev="libnccl2=2.15.5-1+cuda11.8 libnccl-dev=2.15.5-1+cuda11.8 --allow-downgrades --allow-change-held-packages"
elif [[ "$UBUNTU_VERSION" == "20.04"* && "$CUDA_VERSION" == "12.4"* ]]; then
maybe_libnccl_dev="libnccl2=2.26.2-1+cuda12.4 libnccl-dev=2.26.2-1+cuda12.4 --allow-downgrades --allow-change-held-packages"
else
maybe_libnccl_dev=""
fi

# Install common dependencies
apt-get update
# TODO: Some of these may not be necessary
Expand Down Expand Up @@ -70,7 +58,6 @@ install_ubuntu() {
libasound2-dev \
libsndfile-dev \
${maybe_libomp_dev} \
${maybe_libnccl_dev} \
software-properties-common \
wget \
sudo \
Expand Down
7 changes: 6 additions & 1 deletion .ci/docker/common/install_conda.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ set -ex
if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
BASE_URL="https://repo.anaconda.com/miniconda"
CONDA_FILE="Miniconda3-latest-Linux-x86_64.sh"
if [[ $(uname -m) == "aarch64" ]] || [[ "$BUILD_ENVIRONMENT" == *xpu* ]]; then
if [[ $(uname -m) == "aarch64" ]] || [[ "$BUILD_ENVIRONMENT" == *xpu* ]] || [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
BASE_URL="https://github.com/conda-forge/miniforge/releases/latest/download" # @lint-ignore
CONDA_FILE="Miniforge3-Linux-$(uname -m).sh"
fi
Expand Down Expand Up @@ -64,6 +64,11 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
# which is provided in libstdcxx 12 and up.
conda_install libstdcxx-ng=12.3.0 --update-deps -c conda-forge

# Miniforge installer doesn't install sqlite by default
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
conda_install sqlite
fi

# Install PyTorch conda deps, as per https://github.com/pytorch/pytorch README
if [[ $(uname -m) == "aarch64" ]]; then
conda_install "openblas==0.3.29=*openmp*"
Expand Down
Loading