-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Use Tokio's task budget consistently, better APIs to support task cancellation #16398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
75fd648
to
935db91
Compare
Thanks for the draft -- this is inline with my understanding from your description. I think it will inch us closer to a good, lasting solution (especially after your upstream tokio also PR merges). Feel free to ping me for a more detailed review once you are done with it |
@ozankabak I've pushed the optimizer rule changes I had in mind. This introduces two new execution plan properties that capture the evaluation type (how children are evaluated: eager vs lazy) and the scheduling type (how poll_next will behave wrt scheduling: blocking vs cooperative). With those two combined the tree can be rewritten in a bottom up fashion. Every leaf that is not cooperative gets wrapped as before. Additionally, any eager evaluating nodes (i.e. exchanges) that are not cooperative are wrapped. This should ensure the entire plan participates in cooperative scheduling. The only caveat that remains is dynamic stream creation. Operators that do that need to take the necessary precautions themselves. I already update the spill manager for this in the previous commit. While I was writing this I started wondering if evaluation type should be a per child thing. In my spawn experiment branch for instance hash join is eager for the build side, but lazy for the probe side. Perhaps it would be best to leave room for that. |
b593bfa
to
c648c0c
Compare
This is in alignment with what I was thinking, let's do it that way |
Thinking about it some more. The evaluation type is intended to describe how the operator computes record batches itself: lazy on demand, or by driving things itself. I’m kind of trying to refer to the terminology from the volcano paper. That talks about demand-driven and data-driven operators. I had first called this 'drive type' with values 'demand' and 'data', but that felt a bit awkward. Since this is actually a property of how the operator prepares its output, one value per operator is probably fine after all. What I'm trying to do with this is find the exchanges in the plan. The current set that's present in DataFusion is all fine, but if you were to implement one using |
Open to suggestions on better names for these properties. |
…l cooperation variants
🤖 |
🤖: Benchmark completed Details
|
🤖 |
🤖: Benchmark completed Details
|
I took the liberty of merging up from main to resolve a logical conflict |
hmm there seems to be some regressions there... |
Yeah, the clickbench benchmark shows a little slower, it seems can be reproduced, about total time 1000ms slower. I am not sure if it's a noise.
|
Which issue does this PR close?
Rationale for this change
RecordBatchStreamReceiver
supports cooperative scheduling implicitly by using Tokio's task budget.YieldStream
currently uses a custom mechanism. It would be better to use a single mechanism consistently.What changes are included in this PR?
Note that the implementation of CooperativeStream in this PR is suboptimal. The final implementation requires tokio-rs/tokio#7405 which I'm trying to move along as best I can.
Are these changes tested?
Covered by
infinite_cancel
coop
test.Are there any user-facing changes?
Yes, the
datafusion.optimizer.yield_period
configuration option is removed, but at the time of writing this has not been released yet.