Skip to content

Conversation

nuclearcat
Copy link
Member

An exception was observed in the logs when retrieving messages from the Redis subscription. This change adds a try-except block to catch potential exceptions during the get_message call and handle them gracefully by returning None.

This prevents the subscription loop from crashing and improves the overall stability of the PubSub system.

Comment on lines +145 to +154
except aioredis.ConnectionError:
async with self._lock:
channel = self._subscriptions[sub_id]['sub'].channel
new_redis_sub = self._redis.pubsub()
await new_redis_sub.subscribe(channel)
self._subscriptions[sub_id]['redis_sub'] = new_redis_sub
sub['redis_sub'] = new_redis_sub
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the method creates a new subscription on connection error. How will the client receive the new sub ID?
The client will need it to keep listening and also to unsubscribe.
I think it's better to return None instead of creating a new subscription.
The client will need to subscribe again in this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We’re not creating a new subscription ID here—only a new redis_sub object bound to the same sub_id.

On ConnectionError we resubscribe to the same channel and update self._subscriptions[sub_id]['redis_sub'], so the client keeps using the original sub_id for both listening and unsubscribe(). No new ID is generated or needed.
As i recall subscription id we generate artificially in python using counter in redis named kernelci-api-pubsub-id (ID_KEY). Redis Pub/Sub has no subscription IDs or message IDs.

sub['redis_sub'] = new_redis_sub
continue
except aioredis.RedisError:
return None # Handle any exceptions gracefully
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to log something before returning None just to know afterwards what happened from the logs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@nuclearcat nuclearcat force-pushed the fix-exception branch 2 times, most recently from b60a921 to 8064d6b Compare September 24, 2025 07:16
This commit introduces several improvements to the Pub/Sub system to make it more resilient to connection issues and to handle exceptions more gracefully.

The following changes are included:

- A `health_check_interval` is added to the Redis connection to keep it alive and prevent timeouts.
- Reconnection logic is implemented in the `listen` method. If the connection to Redis is lost, the client will now attempt to reconnect and re-subscribe to the channel automatically.
- The exception handling in the `listen` method is made more specific, catching `ConnectionError` for reconnection and other `RedisError` exceptions for graceful exit.

These changes improve the overall stability and robustness of the Pub/Sub system, especially in environments where the Redis server might be restarted or the network connection is not perfectly stable.

Signed-off-by: Denys Fedoryshchenko <[email protected]>
@nuclearcat nuclearcat added this pull request to the merge queue Sep 30, 2025
Merged via the queue into kernelci:main with commit 770ec7d Sep 30, 2025
4 checks passed
@nuclearcat nuclearcat deleted the fix-exception branch September 30, 2025 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants