-
Notifications
You must be signed in to change notification settings - Fork 704
advanced metrics for starter and essential clusters #21894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release-8.5
Are you sure you want to change the base?
Conversation
Hi @zuowei593. Thanks for your PR. I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Summary of ChangesHello @zuowei593, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the documentation for built-in monitoring of TiDB Cloud Starter and Essential clusters. It introduces a new 'Overview' section to present core metrics concisely and expands the 'Cluster Status' section with more advanced and granular performance metrics. The goal is to offer users a more comprehensive and insightful view into their cluster's operational health and resource utilization. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request restructures the metrics documentation for Starter and Essential clusters, splitting them into 'Overview' and 'Cluster Status' sections. My review focuses on improving the clarity, consistency, and completeness of the metric descriptions, in line with the repository's style guide. I've suggested a few wording adjustments and pointed out areas where descriptions could be more comprehensive or less redundant.
tidb-cloud/built-in-monitoring.md
Outdated
| Transaction Duration | avg, P99 | The average and the 99th percentile execution duration of transactions. | | ||
| Lock wait | P95, P99 | The 95th and the 99th percentile durations are the times taken by transactions waiting to acquire pessimistic locks. High values indicate contention for the same rows or keys. | | ||
| Total Connection | All | The number of connections to the {{{ .starter }}} or {{{ .essential }}} cluster. | | ||
| Idle Connection Duration | P99, P99(in-txn), P99(not-in-txn) | The 99th percentile time connections remained idle while inside an open transaction. Long values usually indicate slow app logic or long-running transactions. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current description only explains the P99(in-txn)
label, but the labels also include P99
and P99(not-in-txn)
. To make the description complete, it should explain all the labels.1
| Idle Connection Duration | P99, P99(in-txn), P99(not-in-txn) | The 99th percentile time connections remained idle while inside an open transaction. Long values usually indicate slow app logic or long-running transactions. | | |
| Idle Connection Duration | P99, P99(in-txn), P99(not-in-txn) | The 99th percentile of time that connections remained idle. `P99(in-txn)` shows idle time within an open transaction, while `P99(not-in-txn)` shows idle time outside of a transaction. Long values usually indicate slow app logic or long-running transactions. | |
Style Guide References
Footnotes
-
The documentation should be complete. ↩
tidb-cloud/built-in-monitoring.md
Outdated
| Transaction Per Second | All | The number of transactions executed per second. | | ||
| Average Transaction Duration | All | The average execution duration of transactions. | | ||
| Transaction Duration | avg, P99 | The average and the 99th percentile execution duration of transactions. | | ||
| Lock wait | P95, P99 | The 95th and the 99th percentile durations are the times taken by transactions waiting to acquire pessimistic locks. High values indicate contention for the same rows or keys. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrasing 'durations are the times taken' is a bit redundant. For better readability, consider rephrasing this description.1
| Lock wait | P95, P99 | The 95th and the 99th percentile durations are the times taken by transactions waiting to acquire pessimistic locks. High values indicate contention for the same rows or keys. | | |
| Lock wait | P95, P99 | The 95th and 99th percentile of time that transactions spend waiting to acquire pessimistic locks. High values indicate contention for the same rows or keys. | |
Style Guide References
Footnotes
-
Avoid unnecessary words and repetition. ↩
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
/ok-to-test |
LGTM |
@baiyuqing: adding LGTM is restricted to approvers and reviewers in OWNERS files. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Co-authored-by: Grace Cai <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Co-authored-by: Grace Cai <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
tidb-cloud/built-in-monitoring.md
Outdated
### Overview | ||
| Metric name | Labels | Description | | ||
| :------------| :------| :-------------------------------------------- | | ||
| Request Units | RU per second | The Request Unit (RU) is a unit of measurement used to track the resource consumption of a query or transaction to the {{{ .starter }}} cluster. In addition to queries that you run, Request Units can be consumed by background activities, so when the QPS is 0, the Request Units per second might not be zero. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Request Units | RU per second | The Request Unit (RU) is a unit of measurement used to track the resource consumption of a query or transaction to the {{{ .starter }}} cluster. In addition to queries that you run, Request Units can be consumed by background activities, so when the QPS is 0, the Request Units per second might not be zero. | | |
| Request Units | RU per second | The Request Unit (RU) is a unit of measurement used to track the resource consumption of a query or transaction in a {{{ .starter }}} cluster. Besides user queries, background activities can also consume RUs, so when QPS is 0, RU usage per second might still be nonzero.| |
tidb-cloud/built-in-monitoring.md
Outdated
| Metric name | Labels | Description | | ||
| :------------| :------| :-------------------------------------------- | | ||
| Request Units | RU per second | The Request Unit (RU) is a unit of measurement used to track the resource consumption of a query or transaction to the {{{ .starter }}} cluster. In addition to queries that you run, Request Units can be consumed by background activities, so when the QPS is 0, the Request Units per second might not be zero. | | ||
| Capacity vs Usage (RU/s) | Provisioned capacity (RCU), Consumed RU/s | The provisioned capacity (RCU) and the consumed Request Units (RU) per second to the {{{ .essential }}} clusters. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Capacity vs Usage (RU/s) | Provisioned capacity (RCU), Consumed RU/s | The provisioned capacity (RCU) and the consumed Request Units (RU) per second to the {{{ .essential }}} clusters. | | |
| Capacity vs Usage (RU/s) | Provisioned capacity (RCU), Consumed RU/s | The provisioned Request Capacity Units (RCUs) and the consumed Request Units (RU) per second in a {{{ .essential }}} cluster. | |
Co-authored-by: Grace Cai <[email protected]>
Co-authored-by: Grace Cai <[email protected]>
Co-authored-by: Grace Cai <[email protected]>
Co-authored-by: Grace Cai <[email protected]>
| Transaction Duration | Avg, P99 | The execution duration of transactions. | | ||
| Lock wait | P95, P99 | Time spent by transactions waiting to acquire pessimistic locks. High values indicate contention on the same rows or keys. | | ||
| Total Connection | All | The number of connections to the {{{ .starter }}} or {{{ .essential }}} cluster. | | ||
| Idle Connection Duration | P99, P99(in-txn), P99(not-in-txn) | The time connections remained idle while inside an open transaction. Long durations typically indicate slow application logic or long-running transactions. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Idle Connection Duration | P99, P99(in-txn), P99(not-in-txn) | The time connections remained idle while inside an open transaction. Long durations typically indicate slow application logic or long-running transactions. | | |
| Idle Connection Duration | P99, P99(in-txn), P99(not-in-txn) | The time that connections remain idle while inside an open transaction. Long durations typically indicate slow application logic or long-running transactions. | |
First-time contributors' checklist
What is changed, added or deleted? (Required)
Which TiDB version(s) do your changes apply to? (Required)
Tips for choosing the affected version(s):
By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.
For details, see tips for choosing the affected versions.
What is the related PR or file link(s)?
Do your changes match any of the following descriptions?