Skip to content

Better span sampler #903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: span-processor-with-new-sampler
Choose a base branch
from

Conversation

solnic
Copy link
Collaborator

@solnic solnic commented Jun 4, 2025

This updates OpenTelemetry.Sampler so that it uses traces_sample_rate for sampling spans before they get sent to the span processor.

This way we're not processing spans when they are
dropped during sampling, which was previously the
case as the Client was responsible for sampling
just before attempting to send a transaction.

This is based on:

#skip-changelog

@solnic solnic force-pushed the new-span-sampler branch 4 times, most recently from 5a841bb to 01a7066 Compare June 4, 2025 12:43
@solnic solnic force-pushed the new-span-sampler branch from 8538c0b to af64d0f Compare June 5, 2025 10:43
This updates OpenTelemetry.Sampler so that it uses
`traces_sample_rate` for sampling spans before they
get sent to the span processor.

This way we're not processing spans when they are
dropped during sampling, which was previously the
case as the Client was responsible for sampling
just before attemping to send a transaction.
@solnic solnic force-pushed the new-span-sampler branch from af64d0f to 7e7718f Compare June 5, 2025 10:55
solnic added 3 commits June 5, 2025 12:09
Previously it incorrectly referred to "parent" where
in reality we're dealing with spans from the same
trace and respect sampling decisions that were
already made for a given trace.

This is not the same thing as parent/child spans
that we're handling in the SpanProcessor as this
type of relationship is established post-sampling.
@solnic solnic marked this pull request as ready for review June 5, 2025 13:23
Comment on lines +84 to +100
cond do
sample_rate == 0.0 ->
tracestate = build_tracestate(sample_rate, 1.0, false)
{:drop, [], tracestate}

sample_rate == 1.0 ->
tracestate = build_tracestate(sample_rate, 0.0, true)
{:record_and_sample, [], tracestate}

true ->
random_value = :rand.uniform()
sampled = random_value < sample_rate

tracestate = build_tracestate(sample_rate, random_value, sampled)
decision = if sampled, do: :record_and_sample, else: :drop
{decision, [], tracestate}
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring suggestion:

{sampled, random_value} =
  cond do
    sample_rate == 0.0 -> {false, 1.0}
    sample_rate == 1.0 -> {true, 0.0}
    true -> random_value = :rand.uniform(); {random_value < sample_rate, random_value}
  end

tracestate = build_tracestate(sample_rate, random_value, sampled)
decision = if sampled, do: :record_and_sample, else: :drop
{decision, [], tracestate}

@sl0thentr0py
Copy link
Member

just noting the first change that is necessary, but we can defer this to the future because we don't have trace propagation implemented right now:

  • traces_sample_rate needs to default to nil, because when we do have distributed tracing:
    • traces_sample_rate=0 implies the elixir service will not start new traces but honor incoming sentry-trace headers and will still sample a transaction based on incoming headers
    • traces_sample_rate=nil means there will be no transactions ever, and even incoming headers will be ignored

Copy link
Member

@sl0thentr0py sl0thentr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one comment about the client report, rest we can postpone when we implement propagation.

If you want, you can already set the default traces_sample_rate to nil and tweak the logic a bit to account for that

span_ctx -> Span.span_id(span_ctx)
end

transaction =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a useless allocation, can you just make a new method on ClientReport that just takes a single record_discarded_event(reason, data_category)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants