Skip to content

Refactor telemetry clients to bump diagnostics version #595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

daniyelnnr
Copy link
Contributor

What is the purpose of this pull request?

This pull request updates the telemetry and logging infrastructure to use the latest @vtex/diagnostics-nodejs package and refactors how telemetry clients are initialized and used throughout the codebase. This change only provides the necessary infrastructure for using the diagnostics library and will not yet make it fully available. However, this change has been validated in the test environment and the implemented diagnostics-based telemetry signals (logging) have been verified to be working correctly.

What problem is this solving?

The focus of this change is to update the diagnostics version, but we're also simplifying client initialization, consolidating telemetry logic, and improving type safety. For additional, more detailed context, see below:

Dependency and Telemetry Refactor:
  • Upgraded @vtex/diagnostics-nodejs to version 0.1.0-io-beta.19 and added a resolution for @grpc/grpc-js to ensure compatibility. [1] [2]
  • Refactored the telemetry client implementation in src/service/telemetry/client.ts to initialize and expose a single object containing logs, metrics, and traces clients, instead of initializing them separately. The new initializeTelemetry function replaces the previous getTelemetryClient. [1] [2] [3]
Logger Refactor and Type Improvements:
  • Updated logger logic in src/service/logger/client.ts and src/service/logger/logger.ts to use the new telemetry initialization flow and improved type references from @vtex/diagnostics-nodejs.Types. The logger now retrieves the log client from the consolidated telemetry initialization. [1] [2] [3] [4] [5] [6]
Versioning:
  • Bumped package version to 6.49.8-beta.0 in package.json.

How should this be manually tested?

Screenshots or example usage

Types of changes

  • Bug fix (a non-breaking change which fixes an issue)
  • New feature (a non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Requires change to documentation, which has been updated accordingly.

@daniyelnnr daniyelnnr self-assigned this Aug 5, 2025
@daniyelnnr daniyelnnr added the dependencies Pull requests that update a dependency file label Aug 5, 2025
Enables this client to support logs, metrics, and tracing signals from the diagnostics-nodejs library, making them available already initialized. Additionally, this change enables registration of built-in and community instrumentation.
@daniyelnnr daniyelnnr force-pushed the chore/bump-diagnostics branch from 5b04146 to 95c63b4 Compare August 5, 2025 14:44
Remove unused parameters and use logClient already initialized
Refactor client types to use Types.LogClient for consistency
@daniyelnnr daniyelnnr force-pushed the chore/bump-diagnostics branch from 95c63b4 to ec13074 Compare August 5, 2025 17:32
Copy link

@juliobguedes juliobguedes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides comment and question, LGTM. Leaving it pre-approved since the question is not something that we should solve in this PR. If it is a problem, we should open a new one with the fix to avoid having an overly complex PR :)

Comment on lines +15 to +19
interface TelemetryClients {
logsClient: Logs.LogClient;
metricsClient: Metrics.MetricsClient;
tracesClient: Traces.TraceClient;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: are all of these clients fail-safe? i.e. are they resilient to failures when sending logs/metrics/traces?

Copy link
Contributor Author

@daniyelnnr daniyelnnr Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diagnostics library provides several configurations for defining resilience mechanisms in this regard (e.g., failover and circuit-breaker), but this has not yet been explored, considering that these changes will be validated in the proof of concept we're developing. However, behind the scenes, the library takes care of some aspects by default that avoid overhead during signal export (e.g., batch size, queue size, etc.), and this can contribute to improving resilience.
In short, the clients are fault-tolerant to some extent, but we can evolve this further if necessary.

Copy link
Contributor

@silvadenisaraujo silvadenisaraujo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@filafb filafb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small suggestion that might be a small win.

Improve telemetry client initialization using Promise.all to optimize asynchronous calls
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants