You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have discussed some comparisons of other schedulers (#29).
I think it would be worth describing how a kueue integration would work.
KAI could support Jobs/Jobset/Pytorch jobs without much effort for Kueue.
For KAI support of services I think #63 is needed.
To expand on batch jobs, I think one needs to investigate if it is possible to use Kueue's ClusterQueues/LocalQueues in place of KAI Queues. To put it simple, Kueue integration (sans Topology Aware Scheduling) could be that Kueue handles queueing and resuming workloads once their is capacity in the cluster (queueing) and KAI can handle scheduling.
For KAI maintainers, the main request would be to figure out what would be lost if KAI's queueing logic was folded into Kueue. Is there anything missing in Kueue that would not allow KAI to utilize Kueue for queueing while leaving scheduling for KAI?
The text was updated successfully, but these errors were encountered:
Another feature in KAI is that jobs can be defined as non preemptible, and we will want to refine the way it is used in the future.
KAI is pretty modular so you could run it with many actions/plugins turned off or configured differently.
For example if you want to integrate them today some of the actions (reclaim, preempt) and plugins (proportion) can be turned off and you could duplicate all the Kueue queues into KAI queues with infinite quota which I think will get Kueue to control the pods creation and KAI to try and schedule everything that is created.
We have discussed some comparisons of other schedulers (#29).
I think it would be worth describing how a kueue integration would work.
KAI could support Jobs/Jobset/Pytorch jobs without much effort for Kueue.
For KAI support of services I think #63 is needed.
To expand on batch jobs, I think one needs to investigate if it is possible to use Kueue's ClusterQueues/LocalQueues in place of KAI Queues. To put it simple, Kueue integration (sans Topology Aware Scheduling) could be that Kueue handles queueing and resuming workloads once their is capacity in the cluster (queueing) and KAI can handle scheduling.
For KAI maintainers, the main request would be to figure out what would be lost if KAI's queueing logic was folded into Kueue. Is there anything missing in Kueue that would not allow KAI to utilize Kueue for queueing while leaving scheduling for KAI?
The text was updated successfully, but these errors were encountered: