You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Found device synchronize in aoti_torch_delete_tensor_object via Linux
perf. This change appears to significantly improve self-reported latency
from voxtral_runner as found in
https://github.com/pytorch/executorch/blob/main/.github/workflows/cuda.yml#L111-L172:
Baseline:
Run latency (ms):
audio_encoder: 575.797
token_embedding: 14.571
text_decoder: 3095.356
With this PR:
Run latency (ms):
audio_encoder: 175.807
token_embedding: 8.799
text_decoder: 344.367
0 commit comments