Skip to content

Conversation

patapenka-alexey
Copy link

@patapenka-alexey patapenka-alexey commented Oct 8, 2025

This patch adds CPU/memory/virtual memory utilization panels per instance and total.

Closes #TNTP-4365

image image

@patapenka-alexey patapenka-alexey force-pushed the patapenka-alexey/tntp-4365-resources-utilization branch 4 times, most recently from ce8eeda to 7a9bc4f Compare October 8, 2025 12:54
@patapenka-alexey patapenka-alexey changed the title Patapenka alexey/tntp 4365 resources utilization dashboard: cpu/memory/virtual memory panels Oct 8, 2025
@patapenka-alexey patapenka-alexey force-pushed the patapenka-alexey/tntp-4365-resources-utilization branch from 7a9bc4f to d1b1905 Compare October 8, 2025 13:47
@oleg-jukovec oleg-jukovec requested a review from bigbes October 8, 2025 15:14
Copy link
Contributor

@oleg-jukovec oleg-jukovec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, squash commits.

Comment on lines 21 to 26
#RUN DEBIAN_FRONTEND=noninteractive apt install -y git patch
#RUN git clone https://github.com/magefile/mage && \
# cd mage && \
# go run bootstrap.go
#RUN tt install tt master
#RUN tt install tarantool master
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, revert the changes or fix the problem.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's temporarily debug stuff.

if cfg.type == variable.datasource_type.prometheus then
prometheus.target(expr=aggregate_expr(cfg, 'tnt_memory_virt'), legendFormat=title)
else if cfg.type == variable.datasource_type.influxdb then
influxdb.target()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, do you plan to support it later for InfluxDB or not? If not, let's not add an empty panel to InfluxDB dashboard.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InfluxDB panels have been added.

).addTarget(
if cfg.type == variable.datasource_type.prometheus then
prometheus.target(
expr='rate(tnt_cpu_user_time[$__rate_interval]) + rate(tnt_cpu_system_time[$__rate_interval])',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To properly support non-default configurations, targets must consider the pre-configured filters (either static or dynamic) and metrics prefix.

The basic example is as follows. By default, Tarantool dashboard expects the user to have one Prometheus job per application, so all queries must get metrics only for a specific job. Otherwise we will display the data for all Tarantool application over Prometheus.

tnt_metric{job='myjob'}

grafana-dashboard library allows to provide various additional filters on build. The dashboard we publish to grafana.com has the following ones

filters: {
job: ['=~', '$job'],
alias: ['=~', '$alias'],
}

It allows to choose a job for your application, as well as display data only for certain application instances instead of all of them, if one wishes to.

grafana-dashboard library has a couple of helpers to compute such stuff for you, but in case of non-trivial queries (with additional sums) we'll need to reimplement them a little bit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I have used this Prometheus target as an example.)

So, let's rewrite this one. To add a prefix, we need to simply concat it

std.format(|||
    'rate(%(metrics_prefix)tnt_cpu_user_time[$__rate_interval]) + rate(%(metrics_prefix)tnt_cpu_system_time[$__rate_interval])'
|||, {
    metrics_prefix: cfg.metrics_prefix,
})

I use named std.format here since positional one will look more complicated here.

Next, we need to add common filters

std.format(|||
     'rate(%(metrics_prefix)tnt_cpu_user_time{%(filters)s}[$__rate_interval]) + rate(%(metrics_prefix)tnt_cpu_system_time{%(filters)s}[$__rate_interval])'
|||, {
    metrics_prefix: cfg.metrics_prefix,
    filters: common.prometheus_query_filters(cfg.filters),
})

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your tips!

cfg=cfg,
title=title,
description=description,
expr='sum(rate(tnt_cpu_user_time{job=~"$job"}[$__rate_interval])) + sum(rate(tnt_cpu_system_time{job=~"$job"}[$__rate_interval]))',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next, to sum over all instances, we need to exclude alias filter

std.format(|||
     'sum(rate(%(metrics_prefix)tnt_cpu_user_time{%(filters)}[$__rate_interval])) + sum(rate(%(metrics_prefix)tnt_cpu_system_time{%(filters)}[$__rate_interval]))'
|||, {
    metrics_prefix: cfg.metrics_prefix,
    filters: common.prometheus_query_filters(common.remove_field(cfg.filters, 'alias')),
})

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, forgot about that. Updated.

@Satbek
Copy link
Contributor

Satbek commented Oct 13, 2025

Is it possible to sum system and user time for cpu graphs by thread?

@patapenka-alexey patapenka-alexey force-pushed the patapenka-alexey/tntp-4365-resources-utilization branch from fd41f66 to 4bfde95 Compare October 13, 2025 15:44
@patapenka-alexey
Copy link
Author

Is it possible to sum system and user time for cpu graphs by thread?

Yes, all required metrics are provided by metrics. But I think that will be done in another ticket.

@patapenka-alexey patapenka-alexey force-pushed the patapenka-alexey/tntp-4365-resources-utilization branch from 4bfde95 to bc4b63e Compare October 14, 2025 06:25
@patapenka-alexey patapenka-alexey marked this pull request as ready for review October 14, 2025 06:29
This patch adds `CPU/memory/virtual memory` utilization panels
per instance and total.

Closes #TNTP-4365
@patapenka-alexey patapenka-alexey force-pushed the patapenka-alexey/tntp-4365-resources-utilization branch from bc4b63e to d1d8047 Compare October 14, 2025 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants