Feature/cron scheduling rayjob 2426 #3836

DW-Han · 2025-06-26T09:06:52Z

Why are these changes needed?

Adding cron scheduling for RayJob

Related issue number

Resolves #2426 [Feature] Support cron scheduling for RayJob

Checks

I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- This PR is not tested :(

Explanation

I thought about using the default kubernetes CronJob Resource and have it run a RayJob given a cron string, which might have been fine, but this would change a lot of the structure of the current RayJob controller or maybe would require something like a CronRayJob CRD. I decided to go with a lighter weight solution of added JobDeploymentStatusScheduling, JobDeploymentStatusScheduled, and JobStatusScheduled (JobStatusScheduled is not entirely needed in the logic, but I wanted to be explicit).

Whenever the job in complete instead of not requeueing, we check if the Spec.Schedule is empty, its its not then we set the job as JobDeploymentStatusScheduling. This goes through the same logic as the existing JobDeploymentStatusSuspending and JobDeploymentStatusRetrying, deleting the cluster, job, etc. to free up resources until the next scheduled job.

JobDeploymentStatusScheduling transitions to JobDeploymentStatusScheduled which checks whether we are at time or not by seeing if we are within a ScheduleDelta time of a cron tick. If so we run a job, setting JobStatusNew and JobDeploymentStatusNew again. If not we requeue after NextScheduleTimeDuration.

The reason i used a ScheduleDelta was to be more robust when we schedule since there could be drifts from the requeue by processing speed maybe, also as the only way to know when to transition out of scheduled.
I thought about just queuing and transitioning state and realized that it would actually be difficult in the current logic since all the status updates are done in a since "updateRayJobStatus" after the large switch statement which means that I have to requeue with the NextScheduleTimeDuration at the end of the reconcile function. This approach seems fine but I was facing some issues with reconciling errors, but I want to hear your thoughts. I don't know if this approach is optimal. Any feedback would be great

Updates

Made it so that the job doesn't instantly run when created, instead it goes straight to scheduled for the next cron tick
Added unit tests
Cleaned up code
I realized we only need to check if the current time is within a buffer of the previous tick since we will reconcile to the future tick anyways
Now has the expected behavior of deleting or reusing clusters with the shutdownAfterJobFinishes spec
Added unit tests for the schedule.go util file

Signed-off-by: Kenny Han <[email protected]>

…-2426

chiayi

Have some initial comments and questions. pls add unit tests as well.

ray-operator/apis/ray/v1/rayjob_types.go

ray-operator/config/samples/ray-job.schedule.yaml

ray-operator/controllers/ray/rayjob_controller.go

ray-operator/controllers/ray/utils/cron_helper.go

ray-operator/local_deploy.sh

ray-operator/controllers/ray/rayjob_controller.go

chiayi

Few more comments, PTAL

ray-operator/controllers/ray/rayjob_controller.go

ray-operator/controllers/ray/utils/schedule.go

ray-operator/local_deploy.sh

chiayi

LGTM

ray-operator/config/samples/ray-job.schedule.yaml

ray-operator/controllers/ray/utils/schedule.go

ray-operator/controllers/ray/rayjob_controller.go

ray-operator/apis/ray/v1/rayjob_types.go

ray-operator/controllers/ray/rayjob_controller.go

ray-operator/apis/ray/v1/rayjob_types.go

docs/reference/api.md

ray-operator/controllers/ray/rayjob_controller.go

ray-operator/apis/ray/v1/rayjob_types.go

ray-operator/controllers/ray/rayjob_controller.go

ray-operator/controllers/ray/utils/schedule_test.go

andrewsykim · 2025-07-18T03:45:36Z

Please add unit tests in https://github.com/ray-project/kuberay/blob/master/ray-operator/controllers/ray/rayjob_controller_test.go as well to cover the cron scheduling case

ray-operator/controllers/ray/rayjob_controller.go

ray-operator/controllers/ray/rayjob_controller_scheduled_test.go

DW-Han and others added 2 commits June 23, 2025 03:28

init cron sheduling for ray jobs, transferring for local dev

3c6a9da

initial cron scheduling for ray jobs

0d986a9

DW-Han marked this pull request as ready for review June 26, 2025 09:08

DW-Han marked this pull request as draft June 26, 2025 09:10

DW-Han added 4 commits June 26, 2025 02:21

Delete ray-operator/controllers/ray/utils/cron_helpers.go

acdc061

Signed-off-by: Kenny Han <[email protected]>

updating dependencies

b4330c9

Delete ray-operator/test/e2e/rayjob_scheduling_test.go

f7da0f3

Signed-off-by: Kenny Han <[email protected]>

Merge branch 'ray-project:master' into feature/cron-scheduling-rayjob…

1a87974

…-2426

DW-Han marked this pull request as ready for review June 26, 2025 20:15

DW-Han marked this pull request as draft June 26, 2025 20:19

andrewsykim self-requested a review June 26, 2025 20:42

andrewsykim self-assigned this Jun 26, 2025

ryanaoleary self-requested a review June 26, 2025 22:23

chiayi reviewed Jun 27, 2025

View reviewed changes

DW-Han added 2 commits July 7, 2025 05:34

unit tests and cleaning up scheduling

d2f294c

adding cluster delete option and cleaning code

9b4db80

chiayi suggested changes Jul 11, 2025

View reviewed changes

cleaning

d6f0076

chiayi approved these changes Jul 14, 2025

View reviewed changes

DW-Han marked this pull request as ready for review July 14, 2025 20:55

andrewsykim reviewed Jul 14, 2025

View reviewed changes

DW-Han added 2 commits July 15, 2025 01:05

cleaning and adding schedule util unit tests

b6b3989

cleaning up comment

e449c92

andrewsykim reviewed Jul 17, 2025

View reviewed changes

seperate case for scheduling

d6a2bfc

andrewsykim reviewed Jul 18, 2025

View reviewed changes

ray-operator/apis/ray/v1/rayjob_types.go Outdated Show resolved Hide resolved

ray-operator/apis/ray/v1/rayjob_types.go Outdated Show resolved Hide resolved

cleaning doc string

d369a93

andrewsykim reviewed Jul 18, 2025

View reviewed changes

docs/reference/api.md Outdated Show resolved Hide resolved

ray-operator/controllers/ray/rayjob_controller.go Outdated Show resolved Hide resolved

updating api.md

7a9f462

andrewsykim reviewed Jul 18, 2025

View reviewed changes

ray-operator/controllers/ray/rayjob_controller.go Outdated Show resolved Hide resolved

ray-operator/controllers/ray/rayjob_controller.go Show resolved Hide resolved

andrewsykim reviewed Jul 18, 2025

View reviewed changes

ray-operator/apis/ray/v1/rayjob_types.go Outdated Show resolved Hide resolved

andrewsykim reviewed Jul 18, 2025

View reviewed changes

ray-operator/controllers/ray/rayjob_controller.go Outdated Show resolved Hide resolved

andrewsykim reviewed Jul 18, 2025

View reviewed changes

ray-operator/controllers/ray/utils/schedule_test.go Outdated Show resolved Hide resolved

ray-operator/controllers/ray/utils/schedule_test.go Outdated Show resolved Hide resolved

ray-operator/controllers/ray/utils/schedule_test.go Outdated Show resolved Hide resolved

cleaning up scheduling state, controller, etc.

c06bbed

andrewsykim reviewed Jul 19, 2025

View reviewed changes

ray-operator/controllers/ray/rayjob_controller.go Show resolved Hide resolved

ray-operator/controllers/ray/rayjob_controller.go Outdated Show resolved Hide resolved

DW-Han added 2 commits July 21, 2025 01:25

integration tests

6da9226

cleaning controller

93adf7c

andrewsykim reviewed Jul 21, 2025

View reviewed changes

ray-operator/controllers/ray/rayjob_controller_scheduled_test.go Show resolved Hide resolved

DW-Han added 5 commits July 21, 2025 21:30

cleaning and lint

8055fa7

deepcopy function

23f5e28

working integration tests and cleaning

1a032b9

making tests more air tight

95cd767

making tests more air tight

5f176a3

Feature/cron scheduling rayjob 2426 #3836

Are you sure you want to change the base?

Feature/cron scheduling rayjob 2426 #3836

Uh oh!

Conversation

DW-Han commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Explanation

Updates

Uh oh!

chiayi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chiayi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chiayi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrewsykim commented Jul 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DW-Han commented Jun 26, 2025 •

edited

Loading