Some questions about the parameter --chunked-prefill-size #2815
Unanswered
yuki252111
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
The above is part of the source code from the file scheduler.py. I think the role of self.rem_chunk_tokens is the same as self.rem_input_tokens, both are used to limit the total number of prefill tokens.Both of their modifications are located in the following function.
Of course, self.rem_chunk_tokens is also used to determine whether the prompt of a request needs to be truncated.
What confuses me is that I understand that self.rem_chunk_tokens should be used to split the prompt of the last request or each request, but each request will modify self.rem_chunk_tokens, and finally determine whether to continue adding requests to the batch based on self.rem_chunk_tokens > 0. So I don't understand what self.rem_chunk_tokens actually does.
Hope to receive your feedback, thank you!
Beta Was this translation helpful? Give feedback.
All reactions