backends/cuda: use async malloc/free #14976

swolchok · 2025-10-10T00:18:06Z

Found device synchronize in aoti_torch_delete_tensor_object via Linux perf. This change appears to significantly improve self-reported latency from voxtral_runner as found in https://github.com/pytorch/executorch/blob/main/.github/workflows/cuda.yml#L111-L172:

Baseline:
Run latency (ms):
audio_encoder: 575.797
token_embedding: 14.571
text_decoder: 3095.356

With this PR:
Run latency (ms):
audio_encoder: 175.807
token_embedding: 8.799
text_decoder: 344.367

[ghstack-poisoned]

swolchok · 2025-10-10T00:18:07Z

Stack from ghstack (oldest at bottom):

-> backends/cuda: use async malloc/free #14976

pytorch-bot · 2025-10-10T00:18:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14976

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures

As of commit a0e9faa with merge base 9764269 ():

NEW FAILURES - The following jobs have failed:

pull / test-multimodal-linux (gemma3-4b) / linux-job (gh)
RuntimeError: Command docker exec -t 13ac699ae99d3c0e8386226c0dc49f71c4763973f221f6b9fdadb2f62f7d0a88 /exec failed with exit code 139
pull / test-qnn-wheel-packages-linux (3.10) / linux-job (gh)
RuntimeError: Command docker exec -t 2c3235bd8701950fdcaeaddd3a5c9506c3246df44babe6489f4c371842956ed7 /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.11) / linux-job (gh)
RuntimeError: Command docker exec -t d7a5e3e75c640e83588e38c0cee1fc36c3f764c2d7937d8437e36cfa79de9f31 /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.12) / linux-job (gh)
RuntimeError: Command docker exec -t 1125a1b6af9c09c04d2dbebe4449e680601af5c47481655929891d08e8600c2e /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Found device synchronize in aoti_torch_delete_tensor_object via Linux perf. This change appears to significantly improve latency. ghstack-source-id: 8083f85 ghstack-comment-id: 3387849830 Pull-Request: #14976

swolchok · 2025-10-10T15:08:28Z

test-qnn-wheel-packages-linux is broken on main. test-multimodal-linux (gemma3-4b) looks like it's a bit flaky on main and segfaulting, so the segfault here is not blocking considering that we have more specific tests for this PR that are passing and this PR should be unrelated. merging.

larryliu0820 · 2025-10-10T17:21:19Z

test-qnn-wheel-packages-linux is broken on main. test-multimodal-linux (gemma3-4b) looks like it's a bit flaky on main and segfaulting, so the segfault here is not blocking considering that we have more specific tests for this PR that are passing and this PR should be unrelated. merging.

Let me see if I can fix the gemma3 issue

Update

a0e9faa

[ghstack-poisoned]

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 10, 2025

swolchok mentioned this pull request Oct 10, 2025

backends/cuda: use async malloc/free #14963

Closed

swolchok requested a review from larryliu0820 October 10, 2025 00:18

swolchok added the release notes: desktop for desktop/laptop workstream label Oct 10, 2025

Gasoonjia approved these changes Oct 10, 2025

View reviewed changes

larryliu0820 approved these changes Oct 10, 2025

View reviewed changes

swolchok merged commit caa0094 into main Oct 10, 2025
133 of 146 checks passed

swolchok deleted the gh/swolchok/587/head branch October 10, 2025 15:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

backends/cuda: use async malloc/free #14976

backends/cuda: use async malloc/free #14976

swolchok commented Oct 10, 2025 •

edited

Loading

Uh oh!

swolchok commented Oct 10, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 10, 2025 •

edited

Loading

Uh oh!

swolchok commented Oct 10, 2025

Uh oh!

Uh oh!

larryliu0820 commented Oct 10, 2025

Uh oh!

Uh oh!

backends/cuda: use async malloc/free #14976

backends/cuda: use async malloc/free #14976

Conversation

swolchok commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

swolchok commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14976

❌ 4 New Failures

Uh oh!

swolchok commented Oct 10, 2025

Uh oh!

Uh oh!

larryliu0820 commented Oct 10, 2025

Uh oh!

Uh oh!

swolchok commented Oct 10, 2025 •

edited

Loading

swolchok commented Oct 10, 2025 •

edited

Loading

pytorch-bot bot commented Oct 10, 2025 •

edited

Loading