From 3b7d78368aa3addea860f9a93aff8073125a7794 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Gy=C3=B6rgy=20Krajcsovits?= Date: Thu, 17 Apr 2025 18:25:16 +0200 Subject: [PATCH 1/4] fix(om2): histograms and negative observed values MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit OM1.0 required that the Sum of Histograms is not represented when there are negative observations in a histogram. This PR is removing this requirement in OM2.0. Due to: The requirement was never implemented by the Go and Java instrumentation libraries. Enforcing it now would be breaking. The requirement makes it impossible to implement the use case where the user wants to measure the Sum anyway. We already warned users in the documentation about the possibility of Sum decreasing and not being usable for rate() 10 years ago: #43. And native histograms will not take Sum into account when calculating counter resets during rate() , thus this problem won't come up. Note: this PR does not make Sum mandatory, that is a different question. Signed-off-by: György Krajcsovits --- content/docs/specs/om/open_metrics_spec_2_0.md | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/content/docs/specs/om/open_metrics_spec_2_0.md b/content/docs/specs/om/open_metrics_spec_2_0.md index bc8b98382..a48e1a190 100644 --- a/content/docs/specs/om/open_metrics_spec_2_0.md +++ b/content/docs/specs/om/open_metrics_spec_2_0.md @@ -48,6 +48,10 @@ OpenMetrics is primarily a wire format, independent of any particular transport Implementers MUST expose metrics in the OpenMetrics text format in response to a simple HTTP GET request to a documented URL for a given process or device. This endpoint SHOULD be called "/metrics". Implementers MAY also expose OpenMetrics formatted metrics in other ways, such as by regularly pushing metric sets to an operator-configured endpoint over HTTP. +## Changes from version 1.0 + +In the data model, histograms are no longer required to omit the Sum if there are negative measured event values. #2627. + ### Metrics and Time Series This standard expresses all system states as numerical values; counts, current values, enumerations, and boolean states being common examples. Contrary to metrics, singular events occur at a specific time. Metrics tend to aggregate data temporally. While this can lose information, the reduction in overhead is an engineering trade-off commonly chosen in many modern monitoring systems. @@ -220,12 +224,16 @@ Histograms measure distributions of discrete events. Common examples are the lat A Histogram MetricPoint MUST contain at least one bucket, and SHOULD contain Sum, and Created values. Every bucket MUST have a threshold and a value. -Histogram MetricPoints MUST have one bucket with an +Inf threshold. Buckets MUST be cumulative. As an example for a metric representing request latency in seconds its values for buckets with thresholds 1, 2, 3, and +Inf MUST follow value_1 <= value_2 <= value_3 <= value_+Inf. If ten requests took 1 second each, the values of the 1, 2, 3, and +Inf buckets MUST equal 10. +Histogram MetricPoints MUST have one bucket with threshold equal to +Inf. Buckets MUST be cumulative. +As an example: for a metric representing request latency in seconds that has the following bucket thresholds: 1, 2, 3, and +Inf, +it MUST follow that value_1 <= value_2 <= value_3 <= value_+Inf. If ten requests took 1 second each, the values of the 1, 2, 3, and +Inf buckets MUST equal 10. +Or in other words, the count of measured event values that are >1 and <=2 is equal to value_2 - value_1. + +The +Inf bucket counts all requests. Bucket thresholds within a MetricPoint MUST be unique. Negative threshold buckets MAY be used. Bucket thresholds MUST NOT equal NaN. -The +Inf bucket counts all requests. If present, the Sum value MUST equal the Sum of all the measured event values. Bucket thresholds within a MetricPoint MUST be unique. +Semantically, buckets values are counters so MUST NOT be NaN or negative. Bucket values MUST be integers. -Semantically, Sum, and buckets values are counters so MUST NOT be NaN or negative. -Negative threshold buckets MAY be used, but then the Histogram MetricPoint MUST NOT contain a sum value as it would no longer be a counter semantically. Bucket thresholds MUST NOT equal NaN. Count and bucket values MUST be integers. +If present, the Sum value MUST equal the Sum of all the measured event values. The histogram MAY count negative event values, which means that the Sum may decrease. A Histogram MetricPoint SHOULD have a Timestamp value called Created. This can help ingestors discern between new metrics and long-running ones it did not see before. From 45476ecbc05bb5216ae90eadb4ea218e9ba3fd90 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Gy=C3=B6rgy=20Krajcsovits?= Date: Wed, 23 Apr 2025 10:46:24 +0200 Subject: [PATCH 2/4] Reduce the diff MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: György Krajcsovits --- content/docs/specs/om/open_metrics_spec_2_0.md | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/content/docs/specs/om/open_metrics_spec_2_0.md b/content/docs/specs/om/open_metrics_spec_2_0.md index a48e1a190..391e804a3 100644 --- a/content/docs/specs/om/open_metrics_spec_2_0.md +++ b/content/docs/specs/om/open_metrics_spec_2_0.md @@ -50,7 +50,7 @@ Implementers MUST expose metrics in the OpenMetrics text format in response to a ## Changes from version 1.0 -In the data model, histograms are no longer required to omit the Sum if there are negative measured event values. #2627. +In the data model it is no longer recommended that histograms omit the Sum if there are negative measured event values. #2627. ### Metrics and Time Series @@ -224,16 +224,15 @@ Histograms measure distributions of discrete events. Common examples are the lat A Histogram MetricPoint MUST contain at least one bucket, and SHOULD contain Sum, and Created values. Every bucket MUST have a threshold and a value. -Histogram MetricPoints MUST have one bucket with threshold equal to +Inf. Buckets MUST be cumulative. -As an example: for a metric representing request latency in seconds that has the following bucket thresholds: 1, 2, 3, and +Inf, -it MUST follow that value_1 <= value_2 <= value_3 <= value_+Inf. If ten requests took 1 second each, the values of the 1, 2, 3, and +Inf buckets MUST equal 10. -Or in other words, the count of measured event values that are >1 and <=2 is equal to value_2 - value_1. +Histogram MetricPoints MUST have one bucket with an +Inf threshold. Buckets MUST be cumulative. As an example for a metric representing request latency in seconds its values for buckets with thresholds 1, 2, 3, and +Inf MUST follow value_1 <= value_2 <= value_3 <= value_+Inf. If ten requests took 1 second each, the values of the 1, 2, 3, and +Inf buckets MUST equal 10. -The +Inf bucket counts all requests. Bucket thresholds within a MetricPoint MUST be unique. Negative threshold buckets MAY be used. Bucket thresholds MUST NOT equal NaN. +The +Inf bucket counts all requests. If present, the Sum value MUST equal the Sum of all the measured event values. Bucket thresholds within a MetricPoint MUST be unique. -Semantically, buckets values are counters so MUST NOT be NaN or negative. Bucket values MUST be integers. +Semantically, buckets values are counters so MUST NOT be NaN or negative. -If present, the Sum value MUST equal the Sum of all the measured event values. The histogram MAY count negative event values, which means that the Sum may decrease. +The Sum is only a counter semantically as long as there are no negative event values measured by the Histogram MetricPoint. The Sum MUST NOT be NaN. + +Negative threshold buckets MAY be used. Bucket thresholds MUST NOT equal NaN. Count and bucket values MUST be integers. A Histogram MetricPoint SHOULD have a Timestamp value called Created. This can help ingestors discern between new metrics and long-running ones it did not see before. From c5b2799e3d6ad0ca9fd97f25bbda456aec006638 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Gy=C3=B6rgy=20Krajcsovits?= Date: Wed, 23 Apr 2025 10:49:24 +0200 Subject: [PATCH 3/4] fix wording MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: György Krajcsovits --- content/docs/specs/om/open_metrics_spec_2_0.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/specs/om/open_metrics_spec_2_0.md b/content/docs/specs/om/open_metrics_spec_2_0.md index 391e804a3..2ec904e3b 100644 --- a/content/docs/specs/om/open_metrics_spec_2_0.md +++ b/content/docs/specs/om/open_metrics_spec_2_0.md @@ -50,7 +50,7 @@ Implementers MUST expose metrics in the OpenMetrics text format in response to a ## Changes from version 1.0 -In the data model it is no longer recommended that histograms omit the Sum if there are negative measured event values. #2627. +In the data model it is no longer required that histograms omit the Sum if there are negative measured event values. #2627. ### Metrics and Time Series From d06bcc851ebc6891fa37c4734dd19596c1337ada Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Gy=C3=B6rgy=20Krajcsovits?= Date: Thu, 1 May 2025 12:58:13 +0200 Subject: [PATCH 4/4] fix(om2) remove the changelog MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We agreed to just have good PR descriptions. Signed-off-by: György Krajcsovits --- content/docs/specs/om/open_metrics_spec_2_0.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/content/docs/specs/om/open_metrics_spec_2_0.md b/content/docs/specs/om/open_metrics_spec_2_0.md index 2ec904e3b..b793808c2 100644 --- a/content/docs/specs/om/open_metrics_spec_2_0.md +++ b/content/docs/specs/om/open_metrics_spec_2_0.md @@ -48,10 +48,6 @@ OpenMetrics is primarily a wire format, independent of any particular transport Implementers MUST expose metrics in the OpenMetrics text format in response to a simple HTTP GET request to a documented URL for a given process or device. This endpoint SHOULD be called "/metrics". Implementers MAY also expose OpenMetrics formatted metrics in other ways, such as by regularly pushing metric sets to an operator-configured endpoint over HTTP. -## Changes from version 1.0 - -In the data model it is no longer required that histograms omit the Sum if there are negative measured event values. #2627. - ### Metrics and Time Series This standard expresses all system states as numerical values; counts, current values, enumerations, and boolean states being common examples. Contrary to metrics, singular events occur at a specific time. Metrics tend to aggregate data temporally. While this can lose information, the reduction in overhead is an engineering trade-off commonly chosen in many modern monitoring systems.