Skip to content

Conversation

reggie-firework
Copy link

When trying to use TaskIQ for distributing slow-to-start processes (due to AI imports) meant for long-running tasks, we kept running into issues where TaskIQ would start our worker subprocesses from ProcessManager and then within 10-15 seconds would erroneously deem them "not alive" and restart them. After some experimentation with a forked repo we discovered that this was because the _wait_for_worker_startup was not properly waiting for our python process to report alive.

This PR fixes that method and removes the unused Events and imports, which from what I could tell were never being signaled from anywhere anyways. I tested a simplified version of this change on our dev system, so it seems like this should work, but after cleaning up this change I have not retested other than running unit tests. We are likely going to switch to Celery for observability benefits, but I figured giving you folks this PR would be the right thing to do for our OpenSource community :)

@s3rius
Copy link
Member

s3rius commented Jun 9, 2025

This event was used by previous versions of taskiq, where we were setting this even after broker startup.

Although, since they were removed, I guess it's safe to remove events from process manager as well, as you have suggested.

Also, in regards to observability, what do you miss? Have you tried https://github.com/taskiq-python/taskiq-admin?

@reggie-firework
Copy link
Author

This event was used by previous versions of taskiq, where we were setting this even after broker startup.

Although, since they were removed, I guess it's safe to remove events from process manager as well, as you have suggested.

Also, in regards to observability, what do you miss? Have you tried https://github.com/taskiq-python/taskiq-admin?

Thank you for sharing this, this looks promising! We did not find this in our initial search for observability. We will investigate this further. I think what our company is most interested in in terms of observability is the ability to hook up to lots of worker-state related events (started, ended, failed, idle, busy, process cpu, process memory, etc)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants