Skip to content

Conversation

win5923
Copy link
Collaborator

@win5923 win5923 commented Sep 2, 2025

Why are these changes needed?

Refactor job submit to to be more readable.

  • Simplify complex inline logic into dedicated helper functions
  • Add constants for timeout values and polling intervals

E2E test:

$ kubectl ray job submit --name ray-job-sample  --working-dir ./ --runtime-env ./runtimeEnv.yaml  -- python test.py
Submitted RayJob ray-job-sample.
Waiting for RayCluster ray-job-sample-fdzvv to be ready...
Waiting for port forwarding...Port forwarding service ray-job-sample-fdzvv-head-svc
Forwarding from 127.0.0.1:8265 -> 8265
Forwarding from [::1]:8265 -> 8265
Handling connection for 8265
Port forwarding started on http://localhost:8265
Generated submission ID for Ray job: raysubmit_UwJ5PzzqGzcJDbyc
Ray command: [ray job submit --address http://localhost:8265 --runtime-env runtimeEnv.yaml --submission-id raysubmit_UwJ5PzzqGzcJDbyc --working-dir ./ -- python test.py]
Running Ray submit job command...
Handling connection for 8265
Handling connection for 8265
Handling connection for 8265
2025-09-02 16:23:50,051 INFO dashboard_sdk.py:338 -- Uploading package gcs://_ray_pkg_833f649c1ed96aac.zip.
2025-09-02 16:23:50,052 INFO packaging.py:588 -- Creating a file package for local module './'.
Handling connection for 8265
Handling connection for 8265
2025-09-02 16:23:50,026 INFO cli.py:41 -- Job submission server address: http://localhost:8265
2025-09-02 16:23:50,628 SUCC cli.py:65 -- -------------------------------------------------------
2025-09-02 16:23:50,628 SUCC cli.py:66 -- Job 'raysubmit_UwJ5PzzqGzcJDbyc' submitted successfully
2025-09-02 16:23:50,628 SUCC cli.py:67 -- -------------------------------------------------------
2025-09-02 16:23:50,628 INFO cli.py:291 -- Next steps
2025-09-02 16:23:50,628 INFO cli.py:292 -- Query the logs of the job:
2025-09-02 16:23:50,628 INFO cli.py:294 -- ray job logs raysubmit_UwJ5PzzqGzcJDbyc
2025-09-02 16:23:50,628 INFO cli.py:296 -- Query the status of the job:
2025-09-02 16:23:50,628 INFO cli.py:298 -- ray job status raysubmit_UwJ5PzzqGzcJDbyc
2025-09-02 16:23:50,628 INFO cli.py:300 -- Request the job to be stopped:
2025-09-02 16:23:50,629 INFO cli.py:302 -- ray job stop raysubmit_UwJ5PzzqGzcJDbyc
Handling connection for 8265
Handling connection for 8265
Handling connection for 8265
2025-09-02 16:23:50,675 INFO cli.py:312 -- Tailing logs until the job exits (disable with --no-wait):
2025-09-02 09:23:50,446 INFO job_manager.py:531 -- Runtime env is setting up.
2025-09-02 09:23:55,338 INFO worker.py:1554 -- Using address 10.244.0.18:6379 set in the environment variable RAY_ADDRESS
2025-09-02 09:23:55,338 INFO worker.py:1694 -- Connecting to existing Ray cluster at address: 10.244.0.18:6379...
2025-09-02 09:23:55,350 INFO worker.py:1879 -- Connected to Ray cluster. View the dashboard at 10.244.0.18:8265 
test_counter got 1
test_counter got 2
test_counter got 3
test_counter got 4
test_counter got 5
2025-09-02 16:24:00,728 SUCC cli.py:65 -- ------------------------------------------
2025-09-02 16:24:00,728 SUCC cli.py:66 -- Job 'raysubmit_UwJ5PzzqGzcJDbyc' succeeded
2025-09-02 16:24:00,728 SUCC cli.py:67 -- ------------------------------------------
Waiting for job to finish...
Current status: SUCCEEDED (RayJob: ray-job-sample, JobID: raysubmit_UwJ5PzzqGzcJDbyc)
Job raysubmit_UwJ5PzzqGzcJDbyc finished with status SUCCEEDED.

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@win5923 win5923 force-pushed the refactor-job-submit branch 3 times, most recently from e368922 to 53dccee Compare September 2, 2025 16:40
@win5923 win5923 force-pushed the refactor-job-submit branch from 53dccee to b7b4f1a Compare September 2, 2025 16:48
@win5923 win5923 marked this pull request as ready for review September 7, 2025 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant