Skip to content

Conversation

swolchok
Copy link
Contributor

@swolchok swolchok commented Oct 10, 2025

Found device synchronize in aoti_torch_delete_tensor_object via Linux perf. This change appears to significantly improve self-reported latency from voxtral_runner as found in https://github.com/pytorch/executorch/blob/main/.github/workflows/cuda.yml#L111-L172:

Baseline:
Run latency (ms):
audio_encoder: 575.797
token_embedding: 14.571
text_decoder: 3095.356

With this PR:
Run latency (ms):
audio_encoder: 175.807
token_embedding: 8.799
text_decoder: 344.367

[ghstack-poisoned]
@swolchok
Copy link
Contributor Author

swolchok commented Oct 10, 2025

Stack from ghstack (oldest at bottom):

Copy link

pytorch-bot bot commented Oct 10, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14976

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures

As of commit a0e9faa with merge base 9764269 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

swolchok added a commit that referenced this pull request Oct 10, 2025
Found device synchronize in aoti_torch_delete_tensor_object via Linux perf. This change appears to significantly improve latency.


ghstack-source-id: 8083f85
ghstack-comment-id: 3387849830
Pull-Request: #14976
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 10, 2025
@swolchok swolchok requested a review from larryliu0820 October 10, 2025 00:18
@swolchok swolchok added the release notes: desktop for desktop/laptop workstream label Oct 10, 2025
@swolchok
Copy link
Contributor Author

test-qnn-wheel-packages-linux is broken on main. test-multimodal-linux (gemma3-4b) looks like it's a bit flaky on main and segfaulting, so the segfault here is not blocking considering that we have more specific tests for this PR that are passing and this PR should be unrelated. merging.

@swolchok swolchok merged commit caa0094 into main Oct 10, 2025
133 of 146 checks passed
@swolchok swolchok deleted the gh/swolchok/587/head branch October 10, 2025 15:08
@larryliu0820
Copy link
Contributor

test-qnn-wheel-packages-linux is broken on main. test-multimodal-linux (gemma3-4b) looks like it's a bit flaky on main and segfaulting, so the segfault here is not blocking considering that we have more specific tests for this PR that are passing and this PR should be unrelated. merging.

Let me see if I can fix the gemma3 issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: desktop for desktop/laptop workstream
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants