Skip to content

Unify subprocess calls (to asyncio.create_subprocess_exec where possible) #464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

Sourenm
Copy link
Contributor

@Sourenm Sourenm commented Jul 16, 2025

No description provided.

Copy link

codecov bot commented Jul 16, 2025

Codecov Report

❌ Patch coverage is 5.88235% with 48 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
transformerlab/shared/shared.py 0.00% 28 Missing ⚠️
transformerlab/db/jobs.py 12.50% 7 Missing ⚠️
transformerlab/routers/train.py 12.50% 7 Missing ⚠️
transformerlab/routers/tools.py 14.28% 6 Missing ⚠️

📢 Thoughts on this report? Let us know!

@Sourenm Sourenm marked this pull request as ready for review July 18, 2025 16:17
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that we deprecated this plugin!

return decorator


def async_exporter_job_wrapper(self, progress_start: int = 0, progress_end: int = 100):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a ask since you're working on this code. We don't need to have a "exported_job_wrapper" as everywhere else the pattern is to inherit the job_wrapper classes from the parent tlabplugin class. Can we get rid of the exporter_job_wrapper and just use the native ones in all export plugins?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you missed updating the version number for this one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version number here too

if num_gpus == 0 or not isnum(gpu_ids_formatted[0]):
num_gpus = torch.cuda.device_count()
gpu_ids = ""
async def __bootstrap():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nitpick but maybe we name this something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note we don't need to update version for this as we have deprecated this plugin, incase you end up incrementing for all missing ones

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version number for this one too!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not do the hardware checks in this PR, I'll do a separate PR for these as we seem to change some core functionality and raise errors for rocminfo which is not the way it was intended

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we revert this file back

tensorboard_process.terminate()
if tensorboard_process.returncode is None: # Process is still running
print("Stopping Tensorboard")
tensorboard_process.kill()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we changed this to kill()?
I think you mentioned some weird behaviour otherwise but just want to get it recorded on here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, syncio.create_subprocess_exec does not play well with the kill() command, that's why it had to change to terminate()

loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(run())
loop.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that we have some db sync operations we used to run because of popen, maybe we change them to the async ones if necessary now that we do this change?

deep1401
deep1401 previously approved these changes Jul 25, 2025
@deep1401 deep1401 dismissed their stale review July 25, 2025 20:43

Stashing my own review so Tony can provide a review once, I checked and everything works!

@deep1401 deep1401 requested a review from dadmobile July 25, 2025 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants