Skip to content

NATS Service Failure During the Product Upgrade Process #6893

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Inkathu opened this issue May 14, 2025 · 2 comments
Open

NATS Service Failure During the Product Upgrade Process #6893

Inkathu opened this issue May 14, 2025 · 2 comments
Labels
defect Suspected defect such as a bug or regression

Comments

@Inkathu
Copy link

Inkathu commented May 14, 2025

Observed behavior

During an attempt to install the new version of the product, the NATS service experienced failures, leading to unsuccessful upgrade attempts. The issue occurred when attempting to stop a lot of running product jobs.

Symptoms:

  • NATS service failed to respond to multiple StopJobRequest messages.
  • Performance warnings indicated delayed internal subscriptions on various streams (e.g., $JS.API.STREAM.PURGE.SessionEventsStream).
  • The product service failed to start due to no API response from NATS.
  • The NATS service was found to be down during the second upgrade attempt(with "incorrect function error"), and manual restart attempts were initially unsuccessful.
    Performance logs example:

2025/05/07 14:13:59.418304 [WRN] Internal subscription on "$JS.API.STREAM.PURGE.SessionEventsStream" took too long: 3m13.0468125s
2025/05/07 14:13:59.739927 [WRN] Internal subscription on "$JS.API.STREAM.PURGE.SessionEventsStream" took too long: 3m13.3086614s
2025/05/07 14:13:59.739927 [WRN] Internal subscription on "$JS.API.STREAM.PURGE.SessionEventsStream" took too long: 3m13.3689963s

Resolution Attempts:
Cleaning up the NATS folder and restarting the service resolved the issue, allowing for a successful upgrade to new version of the product.

Screen of the service failure between unsuccefull restarts:

Image

Expected behavior

The Nats server works without problems, and there is no need to clean streams.

Server and client version

Server Version: 2.10.22
Client: .net client nats.net v2.2.3

Host environment

No response

Steps to reproduce

No response

@Inkathu Inkathu added the defect Suspected defect such as a bug or regression label May 14, 2025
@neilalexander
Copy link
Member

Difficult to say much without more complete logs and more information about what your application does on a restart, but what exactly was restarted here? Were the NATS Servers included in the restart jobs?

Also you are a bit out-of-date on 2.10.22, which is 8 months old at this point, recommend you try and stay up-to-date with NATS Server releases as we regularly fix bugs and improve performance.

@Inkathu
Copy link
Author

Inkathu commented May 20, 2025

Difficult to say much without more complete logs and more information about what your application does on a restart, but what exactly was restarted here? Were the NATS Servers included in the restart jobs?

Also you are a bit out-of-date on 2.10.22, which is 8 months old at this point, recommend you try and stay up-to-date with NATS Server releases as we regularly fix bugs and improve performance.

It is difficult to provide more details. I have attached the log file of NATS.
About the update - when updating the product, its services are restarted. In short, when actively working and stopping the internal tasks of the product, there is a high load on NATS.
As for the NATS version update - our product is large and we can update NATS only after testing the product on the new version of NATS(in processing)

nats-server_111124.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression
Projects
None yet
Development

No branches or pull requests

2 participants