-
Notifications
You must be signed in to change notification settings - Fork 1.7k
refactor: Update decoder buffer and logits management #4450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/bot run |
PR_Github #5746 [ run ] triggered by Bot |
PR_Github #5746 [ run ] completed with state |
/bot run --disable-fail-fast |
PR_Github #5849 [ run ] triggered by Bot |
PR_Github #5849 [ run ] completed with state |
cc5e39b
to
ce13b9e
Compare
/bot run |
PR_Github #6628 [ run ] triggered by Bot |
PR_Github #6628 [ run ] completed with state |
/bot run --disable-fail-fast |
PR_Github #6783 [ run ] triggered by Bot |
PR_Github #6783 [ run ] completed with state |
51f8341
to
693609b
Compare
/bot run |
1 similar comment
/bot run |
PR_Github #7319 [ run ] triggered by Bot |
PR_Github #7322 [ run ] triggered by Bot |
PR_Github #7319 [ run ] completed with state |
PR_Github #7322 [ run ] completed with state |
f1c1043
to
eabbdf8
Compare
/bot run |
PR_Github #7365 [ run ] triggered by Bot |
PR_Github #7365 [ run ] completed with state |
eabbdf8
to
b82d4dd
Compare
/bot run |
PR_Github #7465 [ run ] triggered by Bot |
PR_Github #7465 [ run ] completed with state |
b82d4dd
to
2ada5c9
Compare
PR_Github #8890 [ run ] completed with state |
47f304e
to
2d0fb6b
Compare
/bot run |
PR_Github #8999 [ run ] triggered by Bot |
PR_Github #8999 [ run ] completed with state |
2d0fb6b
to
68edea3
Compare
/bot run |
PR_Github #9058 [ run ] triggered by Bot |
PR_Github #9058 [ run ] completed with state |
68edea3
to
02d2b71
Compare
/bot run |
PR_Github #9131 [ run ] triggered by Bot |
- Moved the handling of logits from `DecoderBuffers` to `DecoderInputBuffers`. - Separated the `DraftBuffers` from `DecoderBuffers` to manage draft token buffers, enhancing the organization of tensor management. - Removed `DecoderBuffers` from function signatures across various components. - Updated Python bindings to reflect changes, maintaining compatibility with existing interfaces. These changes improve the maintainability and clarity of the decoding process in the batch manager. Signed-off-by: Robin Kobus <[email protected]>
Signed-off-by: Robin Kobus <[email protected]>
Signed-off-by: Robin Kobus <[email protected]>
- Modified the `HandleContextLogits` class to accept `max_num_sequences` and return a list of logits tensors along with the logits index. - Adjusted the `HandleGenerationLogits` class to work with the updated logits handling, removing the dependency on `DecoderInputBuffers`. - Updated the `TRTLLMSampler` to accommodate changes in logits management, ensuring proper buffer handling. These changes enhance the clarity and maintainability of the logits processing workflow. Signed-off-by: Robin Kobus <[email protected]>
02d2b71
to
c90f89b
Compare
PR_Github #9131 [ run ] completed with state |
/bot run |
PR_Github #9192 [ run ] triggered by Bot |
/bot run |
PR_Github #9231 [ run ] triggered by Bot |
PR_Github #9231 [ run ] completed with state |
/bot run |
PR_Github #9235 [ run ] triggered by Bot |
PR_Github #9235 [ run ] completed with state |
Description
DecoderBuffers
toDecoderInputBuffers
.DraftBuffers
fromDecoderBuffers
to manage draft token buffers, enhancing the organization of tensor management.DecoderBuffers
from function signatures across various components.Changes to TRTLLMSampler:
Test Coverage
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...
Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]
to print this help message.See details below for each supported subcommand.
run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]
Launch build/test pipelines. All previously running jobs will be killed.
--disable-fail-fast
(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test
(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-1, xxx"
(OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"
(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--only-multi-gpu-test
(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test
(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test
(OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.--post-merge
(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-[Post-Merge]-1, xxx"
(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".kill
kill
Kill all running builds associated with pull request.
skip
skip --comment COMMENT
Skip testing for latest commit on pull request.
--comment "Reason for skipping build/test"
is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.