-
Notifications
You must be signed in to change notification settings - Fork 3
Docker profiles, container omission, fixed job cancellation #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -5,7 +5,8 @@ | |||
from pymonad.promise import Promise | |||
from fastapi.params import Depends | |||
from fastapi import APIRouter, Body | |||
from fastapi.responses import Response | |||
# MODIFIED: Import JSONResponse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please drop all LLM descriptive comments and guides, instead you can include include short inline documentation and comments where you deem necessary and longer descriptions in docs for methods and classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reply to the comments @micoleaoo, thank you :)
tesp_api/service/event_actions.py
Outdated
print("Volumes:") | ||
print(volumes) | ||
output_confs, volume_confs = map_volumes(str(job_id), volumes, outputs) | ||
mapped_outputs, mapped_volumes = map_volumes(str(job_id), volumes, outputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary temporary variables, keep the original, or what was the reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was an incorrect assumption that the list is already containing data. I'll keep the original.
tesp_api/service/event_actions.py
Outdated
await Promise(lambda resolve, reject: resolve(None))\ | ||
.then(lambda nothing: task_repository.update_task_state( | ||
task_id, | ||
TesTaskState.QUEUED, | ||
TesTaskState.INITIALIZING | ||
)).map(lambda updated_task: get_else_throw( | ||
updated_task, TaskNotFoundError(task_id, Just(TesTaskState.QUEUED)) | ||
)).then(lambda updated_task: setup_data( | ||
)).then(lambda updated_task_val: setup_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explain this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stylistic mishap, I'll discard this change
task = await task_repository.update_task_state( | ||
task_id, | ||
TesTaskState.RUNNING, | ||
TesTaskState.EXECUTOR_ERROR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TesTaskState.EXECUTOR_ERROR change is not included in latest changes
|
||
if command_status.get('returncode', -1) != 0: | ||
print(f"Task {task_id} executor error (return code: {command_status.get('returncode', -1)}). Setting state to EXECUTOR_ERROR.") | ||
await task_repository.update_task_state(task_id, TesTaskState.RUNNING, TesTaskState.EXECUTOR_ERROR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or does it work now logically in the same way as before with the TesTaskState.EXECUTOR_ERROR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the logic should be preserved and made more robust:
-
After the Pulsar job finishes (or
job_status_complete
returns), we checkcommand_status.get('returncode', -1) != 0
. -
Before declaring it an
EXECUTOR_ERROR
, there is a check if the task has been canceled by the API, because a cancelled job might also result in a non-zero exit code from Pulsar's perspective (e.g., if it was killed), so we prioritize theCANCELED
state set by the user. -
If it's not
CANCELED
and the return code is non-zero, then we explicitly update the task state toTesTaskState.EXECUTOR_ERROR
(fromTesTaskState.RUNNING
). -
We also now explicitly call
pulsar_operations.erase_job(task_id)
in this path to ensure the failed Pulsar job is cleaned up. -
Then, we return to prevent the code from falling through to the logic that sets the state to
COMPLETE
.
…lambda parameter name back to updated_task
README.md
Outdated
|
||
#### All services (default): | ||
``` | ||
docker compose --profile all up -d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the readme, no longer applies
Changes made:
input_confs
oroutput_confs
is not presentProblems & Fixes regarding 3rd point:
Galaxy
JSONDecodeError
on Cancel:/cancel
endpoint sent an empty response body; Galaxy expected JSON ({}
)./cancel
endpoint (task_endpoints.py
) to returnJSONResponse(content={})
, sending"{}"
.TESP API Kept Polling Cancelled Jobs:
handle_run_task
(inevent_actions.py
) continued polling Pulsar for jobs already cancelled by the API, leading toLookupError
s and incorrectly changing task state fromCANCELED
toSYSTEM_ERROR
.event_actions.py
):LookupError
),handle_run_task
now checks the DB. If task isCANCELED
, it exits gracefully (as the cancel API handled it).handle_run_task
to stop processing if a task is found to beCANCELED
early in its execution or immediately after the Pulsar job finishes/fails.Outcome:
CANCELED
tasks from becomingSYSTEM_ERROR
.