Skip to content

Create monitor-new-relic-integration-preview.md #21461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: release-8.5
Choose a base branch
from

Conversation

huoyao1125
Copy link

@huoyao1125 huoyao1125 commented Jul 24, 2025

First-time contributors' checklist

What is changed, added or deleted? (Required)

Which TiDB version(s) do your changes apply to? (Required)

Tips for choosing the affected version(s):

By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.

For details, see tips for choosing the affected versions.

  • master (the latest development version)
  • v9.0 (TiDB 9.0 versions)
  • v8.5 (TiDB 8.5 versions)
  • v8.1 (TiDB 8.1 versions)
  • v7.5 (TiDB 7.5 versions)
  • v7.1 (TiDB 7.1 versions)
  • v6.5 (TiDB 6.5 versions)
  • v6.1 (TiDB 6.1 versions)
  • v5.4 (TiDB 5.4 versions)

What is the related PR or file link(s)?

  • This PR is translated from:
  • Other reference link(s):

Do your changes match any of the following descriptions?

  • Delete files
  • Change aliases
  • Need modification after applied to another branch
  • Might cause conflicts after applied to another branch

@ti-chi-bot ti-chi-bot bot added contribution This PR is from a community contributor. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Jul 24, 2025
Copy link

ti-chi-bot bot commented Jul 24, 2025

Hi @huoyao1125. Thanks for your PR.

I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot added missing-translation-status This PR does not have translation status info. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 24, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @huoyao1125, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces new documentation for a preview feature: the integration of TiDB Cloud with New Relic. The added document serves as a guide for users to set up this monitoring integration, detailing the necessary steps, prerequisites, and a comprehensive list of metrics that can be observed in New Relic dashboards.

Highlights

  • New Documentation: A new documentation page, monitor-new-relic-integration-preview.md, has been added to guide users on integrating TiDB Cloud with New Relic.
  • Integration Guide: The document provides step-by-step instructions for configuring TiDB Cloud to send metric data to New Relic, including prerequisites like a New Relic API key and specific user permissions.
  • Metrics Reference: A comprehensive table is included, detailing the various TiDB Cloud metrics (e.g., database time, queries per second, connection stats, node resource usage, changefeed metrics, resource manager metrics) that are available for monitoring within New Relic.
  • Limitations Noted: The documentation clearly outlines limitations, such as the New Relic integration not being available for TiDB Cloud Serverless clusters or during certain cluster statuses (CREATING, RESTORING, PAUSED, RESUMING).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new documentation page for integrating TiDB Cloud with New Relic, marked as a "Preview" feature. The new page outlines the prerequisites, limitations, and steps for setting up the integration, and includes a comprehensive list of available metrics.

My review focuses on improving clarity, consistency, and technical accuracy, in line with the documentation style guide. I've provided suggestions to:

  • Enhance readability by simplifying sentence structures.
  • Ensure consistent terminology and formatting (e.g., "API key", heading capitalization).
  • Correct significant inconsistencies in the metrics table, including standardizing metric name prefixes and formatting of labels.
  • Flag a potential error in the documented metric statuses for verification.

These changes will help ensure the documentation is clear, accurate, and easy for users to follow.

Referenced style guide elements:

  • Clarity and simplicity (Line 14)
  • Consistent terminology (Line 25)
  • Avoiding unnecessary words (Line 24)

Comment on lines 71 to 80
| tidbcloud.node_storage_available_bytes | gauge | instance: `tidb-0\|tidb-1\|...`<br/>component: `tikv\|tiflash`<br/>cluster_name: `<cluster name>` | The available disk space in bytes for TiKV/TiFlash nodes |
| tidbcloud.disk_read_latency | histogram | instance: `tidb-0\|tidb-1\|...`<br/>component: `tikv\|tiflash`<br/>cluster_name: `<cluster name>`<br/>`device`: `nvme.*\|dm.*` | The read latency in seconds per storage device |
| tidbcloud.disk_write_latency | histogram | instance: `tidb-0\|tidb-1\|...`<br/>component: `tikv\|tiflash`<br/>cluster_name: `<cluster name>`<br/>`device`: `nvme.*\|dm.*` | The write latency in seconds per storage device |
| tidbcloud.kv_request_duration | histogram | instance: `tidb-0\|tidb-1\|...`<br/>component: `tikv`<br/>cluster_name: `<cluster name>`<br/>`type`: `BatchGet\|Commit\|Prewrite\|...` | The duration in seconds of TiKV requests by type |
| tidbcloud.component_uptime | histogram | instance: `tidb-0\|tidb-1\|...`<br/>component: `tidb\|tikv\|tiflash`<br/>cluster_name: `<cluster name>` | The uptime in seconds of TiDB components |
| tidbcloud.changefeed_checkpoint_ts | gauge | changefeed_id | The checkpoint timestamp of a changefeed, representing the largest TSO (Timestamp Oracle) successfully written to the downstream |
| tidbcloud.ticdc_owner_resolved_ts_lag | gauge | changefeed_id: `<changefeed-id>`<br/>cluster_name: `<cluster name>` | The resolved timestamp lag in seconds for changefeed owner |
| tidbcloud.changefeed_status | gauge | changefeed_id: `<changefeed-id>`<br/>cluster_name: `<cluster name>` | Changefeed status:<br/>`-1`: Unknown<br/>`0`: Normal<br/>`1`: Warning<br/>`2`: Failed<br/>`3`: Stopped<br/>`4`: Finished<br/>`6`: Warning<br/>`7`: Other |
| tidbcloud.resource_manager_resource_unit_read_request_unit | gauge | cluster_name: `<cluster name>`<br/>resource_group: `<group-name>` | The read request units consumed by Resource Manager |
| tidbcloud.resource_manager_resource_unit_write_request_unit | gauge | cluster_name: `<cluster name>`<br/>resource_group: `<group-name>` | The write request units consumed by Resource Manager |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

I've noticed a few inconsistencies in this part of the metrics table compared to the rows above. To improve technical accuracy and consistency, I've made the following changes in the suggestion:1

  • Changed the metric prefix from tidbcloud. to tidb_cloud. to match the other metrics.
  • Removed backticks from label values (e.g., instance: 'tidb-0|...') and label keys (e.g., device:) for consistency within the table.

Style Guide References

Suggested change
| tidbcloud.node_storage_available_bytes | gauge | instance: `tidb-0\|tidb-1\|...`<br/>component: `tikv\|tiflash`<br/>cluster_name: `<cluster name>` | The available disk space in bytes for TiKV/TiFlash nodes |
| tidbcloud.disk_read_latency | histogram | instance: `tidb-0\|tidb-1\|...`<br/>component: `tikv\|tiflash`<br/>cluster_name: `<cluster name>`<br/>`device`: `nvme.*\|dm.*` | The read latency in seconds per storage device |
| tidbcloud.disk_write_latency | histogram | instance: `tidb-0\|tidb-1\|...`<br/>component: `tikv\|tiflash`<br/>cluster_name: `<cluster name>`<br/>`device`: `nvme.*\|dm.*` | The write latency in seconds per storage device |
| tidbcloud.kv_request_duration | histogram | instance: `tidb-0\|tidb-1\|...`<br/>component: `tikv`<br/>cluster_name: `<cluster name>`<br/>`type`: `BatchGet\|Commit\|Prewrite\|...` | The duration in seconds of TiKV requests by type |
| tidbcloud.component_uptime | histogram | instance: `tidb-0\|tidb-1\|...`<br/>component: `tidb\|tikv\|tiflash`<br/>cluster_name: `<cluster name>` | The uptime in seconds of TiDB components |
| tidbcloud.changefeed_checkpoint_ts | gauge | changefeed_id | The checkpoint timestamp of a changefeed, representing the largest TSO (Timestamp Oracle) successfully written to the downstream |
| tidbcloud.ticdc_owner_resolved_ts_lag | gauge | changefeed_id: `<changefeed-id>`<br/>cluster_name: `<cluster name>` | The resolved timestamp lag in seconds for changefeed owner |
| tidbcloud.changefeed_status | gauge | changefeed_id: `<changefeed-id>`<br/>cluster_name: `<cluster name>` | Changefeed status:<br/>`-1`: Unknown<br/>`0`: Normal<br/>`1`: Warning<br/>`2`: Failed<br/>`3`: Stopped<br/>`4`: Finished<br/>`6`: Warning<br/>`7`: Other |
| tidbcloud.resource_manager_resource_unit_read_request_unit | gauge | cluster_name: `<cluster name>`<br/>resource_group: `<group-name>` | The read request units consumed by Resource Manager |
| tidbcloud.resource_manager_resource_unit_write_request_unit | gauge | cluster_name: `<cluster name>`<br/>resource_group: `<group-name>` | The write request units consumed by Resource Manager |
| tidb_cloud.node_storage_available_bytes | gauge | instance: tidb-0\|tidb-1\|...<br/>component: tikv\|tiflash<br/>cluster_name: `<cluster name>` | The available disk space in bytes for TiKV/TiFlash nodes |
| tidb_cloud.disk_read_latency | histogram | instance: tidb-0\|tidb-1\|...<br/>component: tikv\|tiflash<br/>cluster_name: `<cluster name>`<br/>device: nvme.*\|dm.* | The read latency in seconds per storage device |
| tidb_cloud.disk_write_latency | histogram | instance: tidb-0\|tidb-1\|...<br/>component: tikv\|tiflash<br/>cluster_name: `<cluster name>`<br/>device: nvme.*\|dm.* | The write latency in seconds per storage device |
| tidb_cloud.kv_request_duration | histogram | instance: tidb-0\|tidb-1\|...<br/>component: tikv<br/>cluster_name: `<cluster name>`<br/>type: BatchGet\|Commit\|Prewrite\|... | The duration in seconds of TiKV requests by type |
| tidb_cloud.component_uptime | histogram | instance: tidb-0\|tidb-1\|...<br/>component: tidb\|tikv\|tiflash<br/>cluster_name: `<cluster name>` | The uptime in seconds of TiDB components |
| tidb_cloud.changefeed_checkpoint_ts | gauge | changefeed_id | The checkpoint timestamp of a changefeed, representing the largest TSO (Timestamp Oracle) successfully written to the downstream |
| tidb_cloud.ticdc_owner_resolved_ts_lag | gauge | changefeed_id: `<changefeed-id>`<br/>cluster_name: `<cluster name>` | The resolved timestamp lag in seconds for changefeed owner |
| tidb_cloud.changefeed_status | gauge | changefeed_id: `<changefeed-id>`<br/>cluster_name: `<cluster name>` | Changefeed status:<br/>`-1`: Unknown<br/>`0`: Normal<br/>`1`: Warning<br/>`2`: Failed<br/>`3`: Stopped<br/>`4`: Finished<br/>`6`: Warning<br/>`7`: Other |
| tidb_cloud.resource_manager_resource_unit_read_request_unit | gauge | cluster_name: `<cluster name>`<br/>resource_group: `<group-name>` | The read request units consumed by Resource Manager |
| tidb_cloud.resource_manager_resource_unit_write_request_unit | gauge | cluster_name: `<cluster name>`<br/>resource_group: `<group-name>` | The write request units consumed by Resource Manager |

Footnotes

  1. The style guide requires maintaining technical accuracy and consistency in terminology. (link)

| tidbcloud.component_uptime | histogram | instance: `tidb-0\|tidb-1\|...`<br/>component: `tidb\|tikv\|tiflash`<br/>cluster_name: `<cluster name>` | The uptime in seconds of TiDB components |
| tidbcloud.changefeed_checkpoint_ts | gauge | changefeed_id | The checkpoint timestamp of a changefeed, representing the largest TSO (Timestamp Oracle) successfully written to the downstream |
| tidbcloud.ticdc_owner_resolved_ts_lag | gauge | changefeed_id: `<changefeed-id>`<br/>cluster_name: `<cluster name>` | The resolved timestamp lag in seconds for changefeed owner |
| tidbcloud.changefeed_status | gauge | changefeed_id: `<changefeed-id>`<br/>cluster_name: `<cluster name>` | Changefeed status:<br/>`-1`: Unknown<br/>`0`: Normal<br/>`1`: Warning<br/>`2`: Failed<br/>`3`: Stopped<br/>`4`: Finished<br/>`6`: Warning<br/>`7`: Other |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In the description for changefeed_status, the status Warning is listed for both value 1 and 6. This could be confusing for users. If this is correct, it might be helpful to add a note clarifying the difference between the two warning states. If it's a typo, please correct it.


If you do not have a New Relic account, sign up [here](https://newrelic.com/signup).

- To set up third-party metrics integration for TiDB Cloud, you must have the **Organization Owner** access to your organization in TiDB Cloud, while other users have read-only access to the integration page and can access configured dashboards via the provided links.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

This sentence is a bit long. For better readability, consider splitting it into two separate sentences.1

Style Guide References

Suggested change
- To set up third-party metrics integration for TiDB Cloud, you must have the **Organization Owner** access to your organization in TiDB Cloud, while other users have read-only access to the integration page and can access configured dashboards via the provided links.
- To set up third-party metrics integration for TiDB Cloud, you must have the **Organization Owner** access to your organization in TiDB Cloud.
Other users have read-only access to the integration page and can access configured dashboards via the provided links.

Footnotes

  1. The style guide emphasizes writing documentation that is clear and simple to read. (link)


- To set up third-party metrics integration for TiDB Cloud, you must have the **Organization Owner** access to your organization in TiDB Cloud, while other users have read-only access to the integration page and can access configured dashboards via the provided links.

## Limitation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Since this section introduces a list of limitations, using the plural form "Limitations" is more conventional and slightly clearer.

Suggested change
## Limitation
## Limitations


## Steps

### Step 1. Integrate with your New Relic API Key

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

For consistency and standard capitalization, it's better to write "API key" instead of "API Key".1

Style Guide References

Suggested change
### Step 1. Integrate with your New Relic API Key
### Step 1. Integrate with your New Relic API key

Footnotes

  1. The style guide requires using consistent terminology. (link)

1. In the [TiDB Cloud console](https://tidbcloud.com), navigate to the target cluster by clicking on it from the cluster list page.
2. After entering the target cluster details page, in the left navigation pane, click **Settings** > **Integrations**.
3. On the **Integrations** page, click **Integration to New Relic (Preview)**.
4. Enter your API key of New Relic and choose the site of New Relic.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

This phrasing is a bit wordy. Let's make it more concise to improve readability.1

Style Guide References

Suggested change
4. Enter your API key of New Relic and choose the site of New Relic.
4. Enter your New Relic API key and choose the New Relic site.

Footnotes

  1. The style guide recommends avoiding unnecessary words. (link)


## Pre-built dashboard

Click the **Dashboard** link in the **New Relic** card of the integrations. You can see the pre-built dashboard of your TiDB clusters.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The phrase "of the integrations" is a little vague. Specifying that the link is on the Integrations page makes the instruction more precise for the user.

Suggested change
Click the **Dashboard** link in the **New Relic** card of the integrations. You can see the pre-built dashboard of your TiDB clusters.
Click the **Dashboard** link in the **New Relic** card on the **Integrations** page. You can see the pre-built dashboard of your TiDB clusters.

| tidb_cloud.db_database_time| gauge | sql_type: Select\|Insert\|...<br/><br/>cluster_name: `<cluster name>`<br/><br/>instance: tidb-0\|tidb-1…<br/><br/>component: `tidb` | The total time consumed by all SQL statements running in TiDB per second, including the CPU time of all processes and the non-idle waiting time. |
| tidb_cloud.db_query_per_second| gauge | type: Select\|Insert\|...<br/><br/>cluster_name: `<cluster name>`<br/><br/>instance: tidb-0\|tidb-1…<br/><br/>component: `tidb` | The number of SQL statements executed per second on all TiDB instances, which is counted according to `SELECT`, `INSERT`, `UPDATE`, and other types of statements. |
| tidb_cloud.db_average_query_duration| gauge | sql_type: Select\|Insert\|...<br/><br/>cluster_name: `<cluster name>`<br/><br/>instance: tidb-0\|tidb-1…<br/><br/>component: `tidb` | The duration between the time that the client's network request is sent to TiDB and the time that the request is returned to the client after TiDB has executed it. |
| tidb_cloud.db_failed_queries| gauge | type: executor:xxxx\|parser:xxxx\|...<br/><br/>cluster_name: `<cluster name>`<br/><br/>instance: tidb-0\|tidb-1…<br/><br/>component: `tidb` | The statistics of error types (such as syntax errors and primary key conflicts) according to the SQL execution errors that occur per second on each TiDB instance. |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The current description is slightly awkward. Rephrasing it will make it easier for users to understand.

Suggested change
| tidb_cloud.db_failed_queries| gauge | type: executor:xxxx\|parser:xxxx\|...<br/><br/>cluster_name: `<cluster name>`<br/><br/>instance: tidb-0\|tidb-1…<br/><br/>component: `tidb` | The statistics of error types (such as syntax errors and primary key conflicts) according to the SQL execution errors that occur per second on each TiDB instance. |
| tidb_cloud.db_failed_queries| gauge | type: executor:xxxx\|parser:xxxx\|...<br/><br/>cluster_name: `<cluster name>`<br/><br/>instance: tidb-0\|tidb-1…<br/><br/>component: `tidb` | The number of SQL execution errors per second on each TiDB instance, categorized by error type (such as syntax errors and primary key conflicts). |

@qiancai
Copy link
Collaborator

qiancai commented Jul 25, 2025

/ok-to-test

@ti-chi-bot ti-chi-bot bot added ok-to-test Indicates a PR is ready to be tested. and removed needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Jul 25, 2025
@qiancai qiancai added translation/no-need No need to translate this PR. needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. cherry-pick-release-cloud/no-need No need to cherry pick this PR to the "release-cloud" branch. and removed missing-translation-status This PR does not have translation status info. ok-to-test Indicates a PR is ready to be tested. labels Jul 25, 2025
@qiancai qiancai self-assigned this Jul 25, 2025
@qiancai qiancai self-requested a review July 25, 2025 07:55
Copy link

ti-chi-bot bot commented Jul 25, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from qiancai. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick-release-cloud/no-need No need to cherry pick this PR to the "release-cloud" branch. contribution This PR is from a community contributor. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. translation/no-need No need to translate this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants