multiplethread pipeline #3370

zhaozheng09 · 2025-09-11T15:39:27Z

we have a host bound model on A100
I found that there would be a long wait when doing embedding key preprocessing, which would cause the subsequent CPU scheduling to not keep up with the GPU operation, resulting in insufficient GPU utilization.
so we move unique and wait op from main thread to other thread .
before:

after:

I have roughly proposed a multi-threaded solution, but I don't know if it is feasible. If it is feasible, I will provide additional and complete code.

TroyGarden · 2025-09-12T00:12:27Z

Hi @zhaozheng09 , thanks for the PR. wondering if you can share the trace files (before vs after) so that we can have better context. Usually the data preprocessing (PreProc) is done in a remote worker, but we'll consider your use case as well.

zhaozheng09 · 2025-09-12T00:47:52Z

Hi @zhaozheng09 , thanks for the PR. wondering if you can share the trace files (before vs after) so that we can have better context. Usually the data preprocessing (PreProc) is done in a remote worker, but we'll consider your use case as well.

<File size too big: 25 MB are allowed, 1066 MB were attempted to upload. > Are there other ways to discuss this? or do I need to add any additional information?

multiplethread_pipeline

0fb9ab2

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

multiplethread pipeline #3370

multiplethread pipeline #3370

Uh oh!

zhaozheng09 commented Sep 11, 2025

Uh oh!

TroyGarden commented Sep 12, 2025

Uh oh!

zhaozheng09 commented Sep 12, 2025

Uh oh!

Uh oh!

multiplethread pipeline #3370

Are you sure you want to change the base?

multiplethread pipeline #3370

Uh oh!

Conversation

zhaozheng09 commented Sep 11, 2025

Uh oh!

TroyGarden commented Sep 12, 2025

Uh oh!

zhaozheng09 commented Sep 12, 2025

Uh oh!

Uh oh!