Skip to content

Conversation

@bharos
Copy link
Collaborator

@bharos bharos commented Oct 25, 2025

What changes were proposed in this pull request?

This PR adds observability for Iceberg client operations by bridging Iceberg's metrics reporting to Gravitino's MetricsSystem.

Key Changes:

IcebergClientMetricsSource: New metrics source with iceberg-client namespace (separate from iceberg-rest-server HTTP metrics)
IcebergRestMetricsStore: Implements MetricsStore to parse and record Iceberg commit/scan metrics using Iceberg's public APIs
Configuration: Enable with metricsStore = rest

Why are the changes needed?

Metrics sent to /v1/{prefix}/namespaces/{namespace}/tables/{table}/metrics are silently dropped when using dummy store. This PR enables monitoring of:
Iceberg table operations (commits, scans)
Data file operations (added/removed files, sizes)
Query performance metrics sent through the metrics API

Fix: #(issue)

Does this PR introduce any user-facing change?

Yes, new configuration and metrics:

# Server configuration
gravitino.iceberg-rest.metricsStore = rest
# Client configuration (Spark)
spark.sql.catalog.<catalog-name>.rest-metrics-impl = org.apache.iceberg.rest.RESTMetricsReporter

Exposed metrics (under iceberg-client namespace): commit reports, scan reports, data files added/removed, file sizes, scan/commit durations, and 27+ additional metrics.

How was this patch tested?

  • Unit tests:
./gradlew :iceberg:iceberg-rest-server:test --tests TestIcebergRestMetricsStore
  • Production verification: Deployed to K8s with Spark SQL workload, confirmed 32 metrics tracked correctly
 curl -s http://localhost:9001/metrics | jq '.histograms | with_entries(select(.key | startswith("iceberg-client")))'
{
  "iceberg-client.iceberg.total-duration": {
    "count": 3,
    "max": 0,
    "mean": 0,
    "min": 0,
    "p50": 0,
    "p75": 0,
    "p95": 0,
    "p98": 0,
    "p99": 0,
    "p999": 0,
    "stddev": 0
  },
  "iceberg-client.iceberg.total-planning-duration": {
    "count": 9,
    "max": 0,
    "mean": 0,
    "min": 0,
    "p50": 0,
    "p75": 0,
    "p95": 0,
    "p98": 0,
    "p99": 0,
    "p999": 0,
    "stddev": 0
  }
}
curl -s http://localhost:9001/metrics | jq '.counters | with_entries(select(.key | startswith("iceberg-client")))'
{
  "iceberg-client.iceberg.added-data-files": {
    "count": 1
  },
  "iceberg-client.iceberg.added-files-size-bytes": {
    "count": 960
  },
  "iceberg-client.iceberg.added-records": {
    "count": 1
  },
  "iceberg-client.iceberg.attempts": {
    "count": 3
  },
  "iceberg-client.iceberg.dvs": {
    "count": 0
  },
  "iceberg-client.iceberg.equality-delete-files": {
    "count": 0
  },
  "iceberg-client.iceberg.indexed-delete-files": {
    "count": 0
  },
  "iceberg-client.iceberg.positional-delete-files": {
    "count": 0
  },
  "iceberg-client.iceberg.removed-data-files": {
    "count": 1
  },
  "iceberg-client.iceberg.removed-files-size-bytes": {
    "count": 923
  },
  "iceberg-client.iceberg.removed-records": {
    "count": 1
  },
  "iceberg-client.iceberg.reports.commit": {
    "count": 3
  },
  "iceberg-client.iceberg.reports.scan": {
    "count": 9
  },
  "iceberg-client.iceberg.result-data-files": {
    "count": 5
  },
  "iceberg-client.iceberg.result-delete-files": {
    "count": 0
  },
  "iceberg-client.iceberg.scanned-data-manifests": {
    "count": 5
  },
  "iceberg-client.iceberg.scanned-delete-manifests": {
    "count": 0
  },
  "iceberg-client.iceberg.skipped-data-files": {
    "count": 0
  },
  "iceberg-client.iceberg.skipped-data-manifests": {
    "count": 2
  },
  "iceberg-client.iceberg.skipped-delete-files": {
    "count": 0
  },
  "iceberg-client.iceberg.skipped-delete-manifests": {
    "count": 0
  },
  "iceberg-client.iceberg.total-data-files": {
    "count": 1
  },
  "iceberg-client.iceberg.total-data-manifests": {
    "count": 7
  },
  "iceberg-client.iceberg.total-delete-file-size-in-bytes": {
    "count": 0
  },
  "iceberg-client.iceberg.total-delete-files": {
    "count": 0
  },
  "iceberg-client.iceberg.total-delete-manifests": {
    "count": 0
  },
  "iceberg-client.iceberg.total-equality-deletes": {
    "count": 0
  },
  "iceberg-client.iceberg.total-file-size-in-bytes": {
    "count": 4615
  },
  "iceberg-client.iceberg.total-files-size-bytes": {
    "count": 960
  },
  "iceberg-client.iceberg.total-positional-deletes": {
    "count": 0
  },
  "iceberg-client.iceberg.total-records": {
    "count": 1
  }
}

- Add IcebergClientMetricsSource with 'iceberg-client' namespace
- Add IcebergRestMetricsStore to bridge Iceberg metrics to Gravitino MetricsSystem
- Configure with metricsStore = rest
@bharos bharos requested a review from FANNG1 October 25, 2025 00:02
@bharos bharos self-assigned this Oct 25, 2025
@bharos bharos force-pushed the iceberg-client-metrics branch from 1c178af to 6a6a024 Compare October 25, 2025 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant