-
Notifications
You must be signed in to change notification settings - Fork 172
MSSQL Modernization #1450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
MSSQL Modernization #1450
Conversation
mssql-mixin/rows.libsonnet
Outdated
database: | ||
[ | ||
g.panel.row.new('Database'), | ||
this.grafana.panels.databaseWriteStallDurationPanel | ||
+ g.panel.timeSeries.gridPos.withW(12), | ||
this.grafana.panels.databaseReadStallDurationPanel | ||
+ g.panel.timeSeries.gridPos.withW(12), | ||
this.grafana.panels.transactionLogExpansionsPanel | ||
+ g.panel.timeSeries.gridPos.withW(24), | ||
], | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
best if row panels are added not as array but inside
g.panel.row.withPanels()
function. This way it is possible to toggle rows collapsed/uncollapsed.
example:
https://github.com/grafana/jsonnet-libs/blob/master/kafka-observ-lib/rows.libsonnet#L25-L33.
and then:
https://github.com/grafana/jsonnet-libs/blob/master/kafka-observ-lib/dashboards.libsonnet#L33-L52
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple minor comments but overall this looks great to me!
Admittedly a bit rusty on this what with PTO and all so leaving final approval up to @v-zhuravlev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing .json filetype on this and the mssql_overview files
"keepTime": true, | ||
"title": "MSSQL Overview", | ||
"type": "link", | ||
"url": "/d/mssql_mssql_overview" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is generated, but it feels like a weird change to have mssql twice in the URL?
@v-zhuravlev just curious, do you think we should just default to using signals for this PR as well? I think the API is nice to use and gives us more flexibility probably moving forward. |
yes, it would be great using Signals where possible. I also see asTarget() as first transitional step for easier migration, using asTimeSeries() and asPanelMixin(), asTable(), asTableRow() (which propogate units, value mappings from signals) are even better for panel constructions from signals, I encourage you take a look in README or other observ-libs, like kafka, jvm etc. But let's finish off Clickhouse first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, some comments inline.
mssql-mixin/config.libsonnet
Outdated
local this = self, | ||
enableMultiCluster: false, | ||
filteringSelector: 'job=~"integrations/mssql"', | ||
groupLabels: ['job', 'cluster'], // Remove 'db' from here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like db is removed.
mssql-mixin/config.libsonnet
Outdated
|
||
// logs lib related | ||
enableLokiLogs: true, | ||
logLabels: ['job', 'cluster', 'instance'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logLabels: ['job', 'cluster', 'instance'], | |
logLabels: self.groupLabels + self.instanceLabels, |
would it be a better default?
mssql-mixin/jsonnetfile.json
Outdated
"source": { | ||
"git": { | ||
"remote": "https://github.com/grafana/grafonnet.git", | ||
"subdir": "gen/grafonnet-v10.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could it be bumped to 11.0.0 or 11.4.0?
mssql-mixin/main.libsonnet
Outdated
local dbVariable = { | ||
label: 'Database', | ||
name: 'db', | ||
query: 'label_values(mssql_io_stall_seconds_total{%s}, db)' % this.config.filteringSelector, | ||
datasource: { | ||
type: 'prometheus', | ||
uid: '${prometheus_datasource}', | ||
}, | ||
refresh: 2, | ||
sort: 1, | ||
type: 'query', | ||
includeAll: true, | ||
allValue: '.*', | ||
multi: true, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need special value here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented this because only some targets need the db
field as part of the filter in the original design.
Old Source:
'100 * mssql_server_total_memory_bytes{' + matcher + '} / clamp_min(mssql_available_commit_memory_bytes{' + matcher + '},1)', |
vs
'increase(mssql_io_stall_seconds_total{' + matcher + ', db=~"$database", operation="write"}[$__rate_interval:])', |
Dropdown:
jsonnet-libs/mssql-mixin/dashboards/mssql-overview.libsonnet
Lines 892 to 901 in 1cd23af
template.new( | |
'database', | |
promDatasource, | |
'label_values(mssql_log_growths_total{%(mssqlSelector)s, instance=~"$instance"}, db)' % $._config, | |
label='Database', | |
refresh=1, | |
includeAll=true, | |
multi=true, | |
allValues='.+', | |
sort=2 |
As you can see we got around this last time by selectively using the filter on the specific targets we needed.
With signals it seems a tad bit harder to specify these kind of scenarios where we want a filter dropdown but also want all the filters aside from the db
selector. So changing the default value to .*
makes it so that the filter still works but does not break other panels when filtering.
Let me know if you think there's an easier way of accomplishing this but this was my solution thus far to keep this going forward!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you just need db var to accept .*
as all value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right but its moreso that if any database filter is applied then it kind of breaks the other panels if I do it as an instanceLabel...
I suppose is this behavior fine from your perspective? I didn't want to remove the funcitionality of this being a single pane where a user could correlate database data with instance data.
Screen.Recording.2025-07-15.at.12.34.14.PM.mov
if that doesn't make sense let me know but I figured the currently written way was more user friendly, but happy to make the change either way!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok got it. You check this example with Kafka instance and Kafka instance topics, very similar to yours.
- First topic is added(hardcoded) as instance label to topics signals.
https://github.com/grafana/jsonnet-libs/blob/master/kafka-observ-lib/signals/topic.libsonnet#L7
note that for your case it could be instanceLabels + ['db'], there instanceLabels are ommitted for completely different reason. - Second, union of variables generated is used give variables for topics dashboard:
https://github.com/grafana/jsonnet-libs/blob/master/kafka-observ-lib/dashboards.libsonnet#L7-L13
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I've implemented based off what you're saying. Let me know if you think its adequate!
severeErrors: { | ||
name: 'Severe errors', | ||
nameShort: 'Severe Errors', | ||
type: 'raw', | ||
description: 'Rate of severe connection errors per second.', | ||
unit: '/ sec', | ||
sources: { | ||
prometheus: { | ||
expr: 'increase(mssql_kill_connection_errors_total{%(queriesSelector)s}[$__rate_interval:])', | ||
legendCustomTemplate: '{{ instance }}', | ||
}, | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it be simplier to use type=counter and simple expression?
Alos, AFAIK, increase doesn't work well with rate_interval. Best if combined just $__interval to avoid increments duplication and wrong "Total" aggregations in the legend etc.
name: 'Deadlocks', | ||
nameShort: 'Deadlocks', | ||
type: 'raw', | ||
description: 'Rate of deadlocks per second.', | ||
unit: '/ sec', | ||
sources: { | ||
prometheus: { | ||
expr: 'rate(mssql_deadlocks_total{%(queriesSelector)s}[$__rate_interval])', | ||
legendCustomTemplate: '{{ instance }}', | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, why not to use type=counter?
unit: 'seconds', | ||
sources: { | ||
prometheus: { | ||
expr: 'increase(mssql_io_stall_seconds_total{%(queriesSelector)s, db=~"$db", operation="write"}[$__rate_interval:])', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can db
just added as the last element of instanceLabels? Can it be considered the smallest instnance you want to observ (single db within MSSQL host?) would it work?
}, | ||
databaseReadStallDuration: { | ||
name: 'Database read stall duration', | ||
nameShort: 'Read Stall', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nameShort: 'Read Stall', | |
nameShort: 'Read stall', |
Please use Sentence case.
type: 'raw', | ||
description: 'Time spent waiting for database read operations.', | ||
unit: 'seconds', | ||
sources: { | ||
prometheus: { | ||
expr: 'increase(mssql_io_stall_seconds_total{%(queriesSelector)s, db=~"$db", operation="read"}[$__rate_interval:])', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type: 'raw', | |
description: 'Time spent waiting for database read operations.', | |
unit: 'seconds', | |
sources: { | |
prometheus: { | |
expr: 'increase(mssql_io_stall_seconds_total{%(queriesSelector)s, db=~"$db", operation="read"}[$__rate_interval:])', | |
type: 'counter', | |
description: 'Time spent waiting for database read operations.', | |
unit: 'seconds', | |
sources: { | |
prometheus: { | |
expr: 'mssql_io_stall_seconds_total{%(queriesSelector)s, db=~"$db", operation="read"})', |
again, I would defined counters as type=counters and simple expression. See signals README, you can swap rate to increase() as the default function that is applied to the counter if needed.
mssql-mixin/signals/memory.libsonnet
Outdated
}, | ||
bufferCacheHitPercentage: { | ||
name: 'Buffer cache hit percentage', | ||
nameShort: 'Buffer Cache Hit', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and similar, please check it is Sentence case.
"expr": "increase(mssql_kill_connection_errors_total{job=~\"integrations/mssql\",job=~\"$job\",cluster=~\"$cluster\",instance=~\"$instance\"}[$__rate_interval:])", | ||
"expr": "rate(mssql_kill_connection_errors_total{job=~\"integrations/mssql\",job=~\"$job\",cluster=~\"$cluster\",instance=~\"$instance\"}[$__rate_interval])", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some expr got changed function from increase to rate at the end. Just to check that this is what you want. If not, you can change rateFunction to be increase for some signals. See Signals README.
This involves separating out the MSSQL Overview dashboard to separate logs and metrics. This would require some new screenshots.