-
Notifications
You must be signed in to change notification settings - Fork 132
Instrument http add on operator #1328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi folks, trying to get some eyes here but not really sure who to tag, thanks for any pointer in advance |
func (r *HTTPScaledObjectReconciler) updatePromMetricsOnDelete(ctx context.Context, scaledObject *httpv1alpha1.HTTPScaledObject, namespacedName string) { | ||
logger := log.FromContext(ctx, "updatePromMetricsOnDelete", namespacedName) | ||
logger.Info("updatePromMetricsOnDelete") | ||
metrics.RecordDeleteHTTPScaledObjectCount(namespacedName) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this is never called, wouldn't this result in always growing metric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, forgot to call this, will fix
assert.True(t, ok, "operator_http_scaled_object_count_total is available") | ||
|
||
requestCount := getMetricsValue(val) | ||
assert.GreaterOrEqual(t, requestCount, float64(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when the whole e2e suite is executed, multiple e2e tests can run in parallel (iirc it could be like 3?). Perhaps the test could create 100 HSOs check if the metric is equal or over 100. And then we may want to delete that 100, wait for the metric to propagate and check again if it's under some reasonably low number, e.g. 10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, meanwhile we don't execute sequential tests (I'd like to avoid it tbh, but maybe we have to 🤔)
I guess that we can just execute both checks at once in the same tests, something like:
- spawn 20 HTTPScaledObjects
- check prometheus
- check otel
- remove the HTTPScaledObjects
- check prometheus
- check otel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm updating the e2e to do some sequencing for testing metric going up and down. also including testing for http and grpc for otel
config/operator/deployment.yaml
Outdated
ports: | ||
- name: metrics | ||
containerPort: 8080 | ||
containerPort: 2223 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this? Personally, I'd prefer to not change an already existing port
docs/operate.md
Outdated
This endpoint can be enabled by setting the `OTEL_PROM_EXPORTER_ENABLED` environment variable to `true` on the operator deployment (`true` by default) and by setting `OTEL_PROM_EXPORTER_PORT` to an unused port for the endpoint to be made avaialble on (`2223` by default). | ||
|
||
### Configuring the OTEL HTTP exporter | ||
When configured, the ioperator can export metrics to a OTEL HTTP collector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When configured, the ioperator can export metrics to a OTEL HTTP collector. | |
When configured, the operator can export metrics to a OTEL HTTP collector. |
docs/operate.md
Outdated
### Configuring the OTEL HTTP exporter | ||
When configured, the ioperator can export metrics to a OTEL HTTP collector. | ||
|
||
The OTEL exporter can be enabled by setting the `OTEL_EXPORTER_OTLP_METRICS_ENABLED` environment variable to `true` on the operator deployment (`false` by default). When enabled the `OTEL_EXPORTER_OTLP_ENDPOINT` environment variable must also be configured so the exporter knows what collector to send the metrics to (e.g. http://opentelemetry-collector.open-telemetry-system:4318). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OTEL exporter can be enabled by setting the `OTEL_EXPORTER_OTLP_METRICS_ENABLED` environment variable to `true` on the operator deployment (`false` by default). When enabled the `OTEL_EXPORTER_OTLP_ENDPOINT` environment variable must also be configured so the exporter knows what collector to send the metrics to (e.g. http://opentelemetry-collector.open-telemetry-system:4318). | |
The OTEL exporter can be enabled by setting the `OTEL_EXPORTER_OTLP_METRICS_ENABLED` environment variable to `true` on the operator deployment (`false` by default). When enabled, the `OTEL_EXPORTER_OTLP_ENDPOINT` environment variable must also be configured so the exporter knows what collector to send the metrics to (e.g. http://opentelemetry-collector.open-telemetry-system:4318). |
operator/main.go
Outdated
if metricsCfg.OtelPrometheusExporterEnabled { | ||
// start the prometheus compatible metrics server | ||
// serves a prometheus compatible metrics endpoint on the configured port | ||
eg.Go(func() error { | ||
if err := runMetricsServer(ctx, ctrl.Log, metricsCfg); !util.IsIgnoredErr(err) { | ||
setupLog.Error(err, "could not start the Prometheus metrics server") | ||
return err | ||
} | ||
|
||
return nil | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operator is already exposing a metrics server in prom format. Let's use it instead of starting another one, you can register more metrics on that server, for example, that's what we do for KEDA -> https://github.com/kedacore/keda/blob/main/pkg/metricscollector/prommetrics.go#L161
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that make sense, will use it
operator/metrics/otelmetrics.go
Outdated
func NewOtelMetrics(options ...metric.Option) *OtelMetrics { | ||
ctx := context.Background() | ||
|
||
exporter, err := otlpmetrichttp.New(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opentelemetry supports HTTP and gRCP protocols, I prefer if we support both tbh. This is how we do it in KEDA, you can use it as example https://github.com/kedacore/keda/blob/main/pkg/metricscollector/opentelemetry.go#L67-L86
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
including it in upcoming changes, thnaks
operator/metrics/otelmetrics.go
Outdated
provider := metric.NewMeterProvider(options...) | ||
meter := provider.Meter(meterName) | ||
|
||
httpScaledObjectCounter, err := meter.Int64UpDownCounter("operator_http_scaled_object_count", api.WithDescription("a counter of http_scaled_objects processed by the operator")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in opentelemetry context, dots are used instead of underscores
httpScaledObjectCounter, err := meter.Int64UpDownCounter("operator_http_scaled_object_count", api.WithDescription("a counter of http_scaled_objects processed by the operator")) | |
httpScaledObjectCounter, err := meter.Int64UpDownCounter("keda.http.scaled.object.count", api.WithDescription("a counter of HttpScaledObjects processed by the operator")) |
operator/metrics/prommetrics.go
Outdated
) | ||
meter := provider.Meter(meterName) | ||
|
||
httpScaledObjectCounter, err := meter.Int64UpDownCounter("operator_http_scaled_object_count", api.WithDescription("a counter of http_scaled_objects processed by the operator")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
httpScaledObjectCounter, err := meter.Int64UpDownCounter("operator_http_scaled_object_count", api.WithDescription("a counter of http_scaled_objects processed by the operator")) | |
httpScaledObjectCounter, err := meter.Int64UpDownCounter("keda_http_scaled_object_total", api.WithDescription("a counter of http_scaled_objects processed by the operator")) |
assert.True(t, ok, "operator_http_scaled_object_count_total is available") | ||
|
||
requestCount := getMetricsValue(val) | ||
assert.GreaterOrEqual(t, requestCount, float64(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, meanwhile we don't execute sequential tests (I'd like to avoid it tbh, but maybe we have to 🤔)
I guess that we can just execute both checks at once in the same tests, something like:
- spawn 20 HTTPScaledObjects
- check prometheus
- check otel
- remove the HTTPScaledObjects
- check prometheus
- check otel
Thanks a lot for the contribution! |
723bbb9
to
176de5d
Compare
@elieser1101 could you please check those failing e2e tests?
|
Signed-off-by: Elieser Pereira <[email protected]> Signed-off-by: Elieser Pereira <[email protected]>
Signed-off-by: Elieser Pereira <[email protected]>
…o fit meaning Signed-off-by: Elieser Pereira <[email protected]>
Signed-off-by: Elieser Pereira [email protected] Signed-off-by: Elieser Pereira <[email protected]>
Signed-off-by: Elieser Pereira <[email protected]>
Signed-off-by: Elieser Pereira <[email protected]>
Signed-off-by: Elieser Pereira <[email protected]>
Signed-off-by: Elieser Pereira <[email protected]>
Signed-off-by: Elieser Pereira <[email protected]>
176de5d
to
f56c1bd
Compare
I was able to reproduce the error in my local env. Was because of misconfig on go.mod. Also I rebased latest changes from main so this should work. lets see how e2e goes. @zroubalik @wozniakjan @JorTurFer for workflow approval. |
e2e went good on previous run, so fixing lints and some broken links from docs |
Signed-off-by: Elieser Pereira <[email protected]>
Signed-off-by: Elieser Pereira <[email protected]>
bbad4f2
to
bb94b8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds instrumentation for Prometheus-compatible endpoints and OpenTelemetry (OTEL) metrics to the KEDA HTTP Add-on operator. The implementation provides monitoring capabilities for HTTP scaled objects through both metrics collection methods.
- Added metrics collection infrastructure with Prometheus and OTEL exporters
- Implemented HTTP scaled object count tracking in the operator controller
- Added comprehensive end-to-end tests for both metrics collection methods
Reviewed Changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 12 comments.
Show a summary per file
File | Description |
---|---|
operator/metrics/ |
New package containing metrics collection infrastructure with Prometheus and OTEL exporters |
operator/controllers/http/httpscaledobject_controller.go |
Updated controller to track HTTP scaled object metrics with thread-safe map |
operator/controllers/http/config/config.go |
Added metrics configuration structure with environment variable parsing |
operator/main.go |
Integrated metrics collectors initialization into main function |
tests/checks/operator_*_metrics/ |
Comprehensive E2E tests for both Prometheus and OTEL metrics |
docs/ |
Updated documentation with metrics configuration instructions |
config/ |
Added Kubernetes manifests for metrics service and E2E test configurations |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
tests/checks/operator_prometheus_metrics/operator_prometheus_metrics_test.go
Show resolved
Hide resolved
tests/checks/operator_prometheus_metrics/operator_prometheus_metrics_test.go
Show resolved
Hide resolved
namespacedName := scaledObject.Name + scaledObject.Namespace | ||
metricsData, ok := httpScaledObjectPromMetricsMap[namespacedName] | ||
if ok { | ||
metrics.RecordDeleteHTTPScaledObjectCount(scaledObject.Namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the map entry already exists, the code calls RecordDeleteHTTPScaledObjectCount before RecordHTTPScaledObjectCount, which could lead to incorrect metrics if the namespace changed. This logic appears to be attempting to handle namespace changes, but it's unclear and potentially buggy.
metrics.RecordDeleteHTTPScaledObjectCount(scaledObject.Namespace) | |
// Record deletion for the old namespace before updating | |
metrics.RecordDeleteHTTPScaledObjectCount(metricsData.namespace) |
Copilot uses AI. Check for mistakes.
Signed-off-by: Jan Wozniak <[email protected]>
Added instrumentation for promethues compatible endpoints and otel.
Added tests and e2e tests.
I went forward this ways as I was trying to get familiar with codebase. Happy to go in any direction reviewers advise.
Checklist
README.md
docs/
directoryPart of #965