Skip to content

[DRAFT - DO NOT MERGE] Basic OpenTelemetry integration #1201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

nshalman
Copy link
Contributor

@nshalman nshalman commented Dec 12, 2024

OpenTelemetry for Dropshot

This is now version 2. It has leaned into the use of tokio-tracing and opentelemetry-tracing.
The tracing feature funnels tracing logs out to slog for unified logging.
The otel-tracing feature enables shipping traces via OTLP.
They can be used together or individually.

Checklist of things that are needed (note that much of it is currently incomplete):

  • Documentation
  • Tests
  • All desired fields properly traced
  • Examples including how to do propagation

Known Issues

- If a handler panics, currently we get a 499 error span. I don't really like that. See #1359

dtrace.conf(24) References

Slides:
https://docs.google.com/presentation/d/1AW_ugVrmkXDTROgPbEt7bCLrLhFo8-z13PoJAdfPO78/

Raw video link:
https://www.youtube.com/watch?v=KQN1t2qlhaw&t=11760s

Screenshot

image

@nshalman nshalman force-pushed the otel-tracing branch 3 times, most recently from c86d59b to 1db15e7 Compare December 12, 2024 21:11
@nshalman
Copy link
Contributor Author

nshalman commented Feb 5, 2025

As I've been struggling to make this cleaner I think I have had a realization. I want to be able to trace futures which means I need to use tracing-futures which means that I have an explicit opinion on the open-telemetry/opentelemetry-rust#1571 debate. I want to just use tracing and tracing-opentelemetry rather than doing quite so much low-level otel stuff.
This PR needs a refactor to reflect that, but that should also clean it up tremendously.

nshalman and others added 2 commits June 25, 2025 15:43
Handler panics in HandlerTaskMode::Detached were incorrectly being
reported as 499 (Client Closed Request) instead of 500 (Internal
Server Error) in telemetry and traces.

This occurred because the panic handling code called panic::resume_unwind(),
which caused the request handling task to panic and trigger the disconnect
guard, leading to incorrect status code assignment.

Changes:
- Extract panic message from JoinError and convert to HttpError
- Return 500 Internal Server Error instead of propagating panic
- Add integration test to verify panic handling returns proper 500 status
- Preserve existing behavior for HandlerTaskMode::CancelOnDisconnect

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Add an optional "tracing" feature that enables a bridge between the
tracing crate and slog, allowing users to use tracing macros while
maintaining slog's structured logging and bunyan format output.

Key changes:
- Add optional tracing dependencies behind "tracing" feature flag
- Implement SlogTracingBridge that converts tracing events to slog records
- Preserve structured data types (numbers, booleans) in log output
- Use proper detection with tracing::dispatcher::has_been_set()
- Initialize bridge automatically when feature is enabled
- Add comprehensive tests for bridge functionality

The bridge maintains backwards compatibility - existing slog usage
continues to work unchanged, and tracing is only available when
explicitly enabled via the feature flag.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant