From 3aafed6c1541b3795093313b04fd9583d53a1f21 Mon Sep 17 00:00:00 2001 From: Ruihao Chen Date: Sun, 27 Apr 2025 10:24:33 +0800 Subject: [PATCH 01/15] update doc --- sql-statements/sql-statement-import-into.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql-statements/sql-statement-import-into.md b/sql-statements/sql-statement-import-into.md index 1bb57618741ae..f36de7e445ad7 100644 --- a/sql-statements/sql-statement-import-into.md +++ b/sql-statements/sql-statement-import-into.md @@ -151,7 +151,7 @@ The supported options are described as follows: | `DISK_QUOTA=''` | All file formats | Specifies the disk space threshold that can be used during data sorting. The default value is 80% of the disk space in the TiDB [temporary directory](https://docs.pingcap.com/tidb/stable/tidb-configuration-file#temp-dir-new-in-v630). If the total disk size cannot be obtained, the default value is 50 GiB. When specifying `DISK_QUOTA` explicitly, make sure that the value does not exceed 80% of the disk space in the TiDB temporary directory. | | `DISABLE_TIKV_IMPORT_MODE` | All file formats | Specifies whether to disable switching TiKV to import mode during the import process. By default, switching TiKV to import mode is not disabled. If there are ongoing read-write operations in the cluster, you can enable this option to avoid impact from the import process. | | `THREAD=` | All file formats and query results of `SELECT` | Specifies the concurrency for import. For `IMPORT INTO ... FROM FILE`, the default value of `THREAD` is 50% of the number of CPU cores on the TiDB node, the minimum value is `1`, and the maximum value is the number of CPU cores. For `IMPORT INTO ... FROM SELECT`, the default value of `THREAD` is `2`, the minimum value is `1`, and the maximum value is two times the number of CPU cores on the TiDB node. To import data into a new cluster without any data, it is recommended to increase this concurrency appropriately to improve import performance. If the target cluster is already used in a production environment, it is recommended to adjust this concurrency according to your application requirements. | -| `MAX_WRITE_SPEED=''` | All file formats | Controls the write speed to a TiKV node. By default, there is no speed limit. For example, you can specify this option as `1MiB` to limit the write speed to 1 MiB/s. | +| `MAX_WRITE_SPEED=''` | All file formats | Controls the write speed of each TiDB node to TiKV. By default, there is no speed limit. For example, you can specify this option as `1MiB` to limit the write speed to 1 MiB/s. | | `CHECKSUM_TABLE=''` | All file formats | Configures whether to perform a checksum check on the target table after the import to validate the import integrity. The supported values include `"required"` (default), `"optional"`, and `"off"`. `"required"` means performing a checksum check after the import. If the checksum check fails, TiDB will return an error and the import will exit. `"optional"` means performing a checksum check after the import. If an error occurs, TiDB will return a warning and ignore the error. `"off"` means not performing a checksum check after the import. | | `DETACHED` | All file formats | Controls whether to execute `IMPORT INTO` asynchronously. When this option is enabled, executing `IMPORT INTO` immediately returns the information of the import job (such as the `Job_ID`), and the job is executed asynchronously in the backend. | | `CLOUD_STORAGE_URI` | All file formats | Specifies the target address where encoded KV data for [Global Sort](/tidb-global-sort.md) is stored. When `CLOUD_STORAGE_URI` is not specified, `IMPORT INTO` determines whether to use Global Sort based on the value of the system variable [`tidb_cloud_storage_uri`](/system-variables.md#tidb_cloud_storage_uri-new-in-v740). If this system variable specifies a target storage address, `IMPORT INTO` uses this address for Global Sort. When `CLOUD_STORAGE_URI` is specified with a non-empty value, `IMPORT INTO` uses that value as the target storage address. When `CLOUD_STORAGE_URI` is specified with an empty value, local sorting is enforced. Currently, the target storage address only supports S3. For details about the URI configuration, see [Amazon S3 URI format](/external-storage-uri.md#amazon-s3-uri-format). When this feature is used, all TiDB nodes must have read and write access for the target S3 bucket, including at least these permissions: `s3:ListBucket`, `s3:GetObject`, `s3:DeleteObject`, `s3:PutObject`, `s3: AbortMultipartUpload`. | From 2f862f0b47a3e64e27920cd8aee98bef076c603a Mon Sep 17 00:00:00 2001 From: Ruihao Chen Date: Sun, 27 Apr 2025 11:54:59 +0800 Subject: [PATCH 02/15] update doc --- sql-statements/sql-statement-admin-alter-ddl.md | 2 +- sql-statements/sql-statement-import-into.md | 2 +- system-variables.md | 3 ++- tidb-lightning/tidb-lightning-configuration.md | 2 +- tidb-lightning/tidb-lightning-physical-import-mode-usage.md | 6 +++--- 5 files changed, 8 insertions(+), 7 deletions(-) diff --git a/sql-statements/sql-statement-admin-alter-ddl.md b/sql-statements/sql-statement-admin-alter-ddl.md index 3f8d77eed77b0..2e1c98aba5db3 100644 --- a/sql-statements/sql-statement-admin-alter-ddl.md +++ b/sql-statements/sql-statement-admin-alter-ddl.md @@ -23,7 +23,7 @@ The following are the supported parameters for different DDL jobs and their corr - `ADD INDEX`: - `THREAD`: the concurrency of the DDL job. The initial value is set by `tidb_ddl_reorg_worker_cnt`. - `BATCH_SIZE`: the batch size. The initial value is set by [`tidb_ddl_reorg_batch_size`](/system-variables.md#tidb_ddl_reorg_batch_size). - - `MAX_WRITE_SPEED`: the maximum bandwidth limit for importing index records into each TiKV. The initial value is set by [`tidb_ddl_reorg_max_write_speed`](/system-variables.md#tidb_ddl_reorg_max_write_speed-new-in-v6512-v755-and-v850). + - `MAX_WRITE_SPEED`: the maximum bandwidth limit for importing index records into TiKV on each TiDB node. The initial value is set by [`tidb_ddl_reorg_max_write_speed`](/system-variables.md#tidb_ddl_reorg_max_write_speed-new-in-v6512-v755-and-v850). Currently, the preceding parameters only work for `ADD INDEX` jobs that are submitted and running after [`tidb_enable_dist_task`](/system-variables.md#tidb_enable_dist_task-new-in-v710) is disabled. diff --git a/sql-statements/sql-statement-import-into.md b/sql-statements/sql-statement-import-into.md index f36de7e445ad7..88b75c7405986 100644 --- a/sql-statements/sql-statement-import-into.md +++ b/sql-statements/sql-statement-import-into.md @@ -151,7 +151,7 @@ The supported options are described as follows: | `DISK_QUOTA=''` | All file formats | Specifies the disk space threshold that can be used during data sorting. The default value is 80% of the disk space in the TiDB [temporary directory](https://docs.pingcap.com/tidb/stable/tidb-configuration-file#temp-dir-new-in-v630). If the total disk size cannot be obtained, the default value is 50 GiB. When specifying `DISK_QUOTA` explicitly, make sure that the value does not exceed 80% of the disk space in the TiDB temporary directory. | | `DISABLE_TIKV_IMPORT_MODE` | All file formats | Specifies whether to disable switching TiKV to import mode during the import process. By default, switching TiKV to import mode is not disabled. If there are ongoing read-write operations in the cluster, you can enable this option to avoid impact from the import process. | | `THREAD=` | All file formats and query results of `SELECT` | Specifies the concurrency for import. For `IMPORT INTO ... FROM FILE`, the default value of `THREAD` is 50% of the number of CPU cores on the TiDB node, the minimum value is `1`, and the maximum value is the number of CPU cores. For `IMPORT INTO ... FROM SELECT`, the default value of `THREAD` is `2`, the minimum value is `1`, and the maximum value is two times the number of CPU cores on the TiDB node. To import data into a new cluster without any data, it is recommended to increase this concurrency appropriately to improve import performance. If the target cluster is already used in a production environment, it is recommended to adjust this concurrency according to your application requirements. | -| `MAX_WRITE_SPEED=''` | All file formats | Controls the write speed of each TiDB node to TiKV. By default, there is no speed limit. For example, you can specify this option as `1MiB` to limit the write speed to 1 MiB/s. | +| `MAX_WRITE_SPEED=''` | All file formats | Controls the write speed of each TiDB node to TiKV. By default, there is no speed limit. For example, if you have 10 TiDB nodes, specifying this option as `1MiB` limits the total write speed to 10 MiB/s. | | `CHECKSUM_TABLE=''` | All file formats | Configures whether to perform a checksum check on the target table after the import to validate the import integrity. The supported values include `"required"` (default), `"optional"`, and `"off"`. `"required"` means performing a checksum check after the import. If the checksum check fails, TiDB will return an error and the import will exit. `"optional"` means performing a checksum check after the import. If an error occurs, TiDB will return a warning and ignore the error. `"off"` means not performing a checksum check after the import. | | `DETACHED` | All file formats | Controls whether to execute `IMPORT INTO` asynchronously. When this option is enabled, executing `IMPORT INTO` immediately returns the information of the import job (such as the `Job_ID`), and the job is executed asynchronously in the backend. | | `CLOUD_STORAGE_URI` | All file formats | Specifies the target address where encoded KV data for [Global Sort](/tidb-global-sort.md) is stored. When `CLOUD_STORAGE_URI` is not specified, `IMPORT INTO` determines whether to use Global Sort based on the value of the system variable [`tidb_cloud_storage_uri`](/system-variables.md#tidb_cloud_storage_uri-new-in-v740). If this system variable specifies a target storage address, `IMPORT INTO` uses this address for Global Sort. When `CLOUD_STORAGE_URI` is specified with a non-empty value, `IMPORT INTO` uses that value as the target storage address. When `CLOUD_STORAGE_URI` is specified with an empty value, local sorting is enforced. Currently, the target storage address only supports S3. For details about the URI configuration, see [Amazon S3 URI format](/external-storage-uri.md#amazon-s3-uri-format). When this feature is used, all TiDB nodes must have read and write access for the target S3 bucket, including at least these permissions: `s3:ListBucket`, `s3:GetObject`, `s3:DeleteObject`, `s3:PutObject`, `s3: AbortMultipartUpload`. | diff --git a/system-variables.md b/system-variables.md index c31bde4afb2b9..bbd3819e8bfde 100644 --- a/system-variables.md +++ b/system-variables.md @@ -1741,11 +1741,12 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; - Type: String - Default value: `0` - Range: `[0, 1PiB]` -- This variable limits the write bandwidth for each TiKV node and only takes effect when index creation acceleration is enabled (controlled by the [`tidb_ddl_enable_fast_reorg`](#tidb_ddl_enable_fast_reorg-new-in-v630) variable). When the data size in your cluster is quite large (such as billions of rows), limiting the write bandwidth for index creation can effectively reduce the impact on application workloads. +- This variable limits the write bandwidth of each TiDB node to TiKV and only takes effect when index creation acceleration is enabled (controlled by the [`tidb_ddl_enable_fast_reorg`](#tidb_ddl_enable_fast_reorg-new-in-v630) variable). When the data size in your cluster is quite large (such as billions of rows), limiting the write bandwidth for index creation can effectively reduce the impact on application workloads. - The default value `0` means no write bandwidth limit. - You can specify the value of this variable either with a unit or without a unit. - When you specify the value without a unit, the default unit is bytes per second. For example, `67108864` represents `64MiB` per second. - When you specify the value with a unit, supported units include KiB, MiB, GiB, and TiB. For example, `'1GiB`' represents 1 GiB per second, and `'256MiB'` represents 256 MiB per second. +- When the Distributed eXecution Framework (DXF) is enabled, this limit applies to each TiDB node seperately. For example, if you add index using 4 TiDB nodes, setting this variable to `64MiB` means the maximum write speed to TiKV is `256MiB/s`. ### tidb_ddl_reorg_worker_cnt diff --git a/tidb-lightning/tidb-lightning-configuration.md b/tidb-lightning/tidb-lightning-configuration.md index d9fba98de07e5..0054cce6703da 100644 --- a/tidb-lightning/tidb-lightning-configuration.md +++ b/tidb-lightning/tidb-lightning-configuration.md @@ -280,7 +280,7 @@ The `security` section specifies certificates and keys for TLS connections withi #### `store-write-bwlimit` -- Limits the bandwidth in which TiDB Lightning writes data into each TiKV node in the physical import mode. +- Limits the per-table bandwidth to write data into TiKV for each TiDB Lightning instance in the physical import mode. - Default value: `0`, which means no limit. #### `disk-quota` diff --git a/tidb-lightning/tidb-lightning-physical-import-mode-usage.md b/tidb-lightning/tidb-lightning-physical-import-mode-usage.md index 6caf6e4357817..742014fd80e04 100644 --- a/tidb-lightning/tidb-lightning-physical-import-mode-usage.md +++ b/tidb-lightning/tidb-lightning-physical-import-mode-usage.md @@ -57,8 +57,8 @@ duplicate-resolution = 'none' # The directory of local KV sorting. sorted-kv-dir = "./some-dir" -# Limits the bandwidth in which TiDB Lightning writes data into each TiKV -# node in the physical import mode. 0 by default, which means no limit. +# Limits the per-table bandwidth to write data into TiKV for each +# TiDB Lightning instance in the physical import mode. # store-write-bwlimit = "128MiB" # Specifies whether Physical Import Mode adds indexes via SQL. The default value is `false`, which means that TiDB Lightning will encode both row data and index data into KV pairs and import them into TiKV together. This mechanism is consistent with that of the historical versions. If you set it to `true`, it means that TiDB Lightning adds indexes via SQL after importing the row data. @@ -206,7 +206,7 @@ By default, TiDB Lightning pauses the cluster scheduling for the minimum range p ```toml [tikv-importer] -# Limits the bandwidth in which TiDB Lightning writes data into each TiKV node in the physical import mode. +# Limits the per-table bandwidth to write data into TiKV for each TiDB Lightning instance in the physical import mode. store-write-bwlimit = "128MiB" [tidb] From 87dcde0307c4a662fc3fba54b1617d14ce457b57 Mon Sep 17 00:00:00 2001 From: Ruihao Chen Date: Sun, 27 Apr 2025 13:41:21 +0800 Subject: [PATCH 03/15] update doc --- system-variables.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/system-variables.md b/system-variables.md index bbd3819e8bfde..9d7e3b5feb73e 100644 --- a/system-variables.md +++ b/system-variables.md @@ -1746,7 +1746,7 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; - You can specify the value of this variable either with a unit or without a unit. - When you specify the value without a unit, the default unit is bytes per second. For example, `67108864` represents `64MiB` per second. - When you specify the value with a unit, supported units include KiB, MiB, GiB, and TiB. For example, `'1GiB`' represents 1 GiB per second, and `'256MiB'` represents 256 MiB per second. -- When the Distributed eXecution Framework (DXF) is enabled, this limit applies to each TiDB node seperately. For example, if you add index using 4 TiDB nodes, setting this variable to `64MiB` means the maximum write speed to TiKV is `256MiB/s`. +- When the Distributed eXecution Framework (DXF) is enabled, this write limit applies to each TiDB node respectively. For example, if you add index using 4 TiDB nodes, setting this variable to `64MiB` means the maximum write speed to TiKV is `256MiB/s`. ### tidb_ddl_reorg_worker_cnt From 408cb3241c61e7e1c794bf4521f7b41009bc68fd Mon Sep 17 00:00:00 2001 From: Ruihao Chen Date: Tue, 6 May 2025 11:57:47 +0800 Subject: [PATCH 04/15] update doc --- sql-statements/sql-statement-import-into.md | 9 ++++++++- system-variables.md | 2 +- tidb-lightning/tidb-lightning-configuration.md | 2 +- .../tidb-lightning-physical-import-mode-usage.md | 6 +++--- 4 files changed, 13 insertions(+), 6 deletions(-) diff --git a/sql-statements/sql-statement-import-into.md b/sql-statements/sql-statement-import-into.md index 88b75c7405986..3439bd455c400 100644 --- a/sql-statements/sql-statement-import-into.md +++ b/sql-statements/sql-statement-import-into.md @@ -151,7 +151,7 @@ The supported options are described as follows: | `DISK_QUOTA=''` | All file formats | Specifies the disk space threshold that can be used during data sorting. The default value is 80% of the disk space in the TiDB [temporary directory](https://docs.pingcap.com/tidb/stable/tidb-configuration-file#temp-dir-new-in-v630). If the total disk size cannot be obtained, the default value is 50 GiB. When specifying `DISK_QUOTA` explicitly, make sure that the value does not exceed 80% of the disk space in the TiDB temporary directory. | | `DISABLE_TIKV_IMPORT_MODE` | All file formats | Specifies whether to disable switching TiKV to import mode during the import process. By default, switching TiKV to import mode is not disabled. If there are ongoing read-write operations in the cluster, you can enable this option to avoid impact from the import process. | | `THREAD=` | All file formats and query results of `SELECT` | Specifies the concurrency for import. For `IMPORT INTO ... FROM FILE`, the default value of `THREAD` is 50% of the number of CPU cores on the TiDB node, the minimum value is `1`, and the maximum value is the number of CPU cores. For `IMPORT INTO ... FROM SELECT`, the default value of `THREAD` is `2`, the minimum value is `1`, and the maximum value is two times the number of CPU cores on the TiDB node. To import data into a new cluster without any data, it is recommended to increase this concurrency appropriately to improve import performance. If the target cluster is already used in a production environment, it is recommended to adjust this concurrency according to your application requirements. | -| `MAX_WRITE_SPEED=''` | All file formats | Controls the write speed of each TiDB node to TiKV. By default, there is no speed limit. For example, if you have 10 TiDB nodes, specifying this option as `1MiB` limits the total write speed to 10 MiB/s. | +| `MAX_WRITE_SPEED=''` | All file formats | Controls the write speed of each TiDB node to TiKV. By default, there is no speed limit. For example, if you have 10 TiDB nodes, specifying this option as `1MiB` limits the write speed to 10 MiB/s for each TiKV. | | `CHECKSUM_TABLE=''` | All file formats | Configures whether to perform a checksum check on the target table after the import to validate the import integrity. The supported values include `"required"` (default), `"optional"`, and `"off"`. `"required"` means performing a checksum check after the import. If the checksum check fails, TiDB will return an error and the import will exit. `"optional"` means performing a checksum check after the import. If an error occurs, TiDB will return a warning and ignore the error. `"off"` means not performing a checksum check after the import. | | `DETACHED` | All file formats | Controls whether to execute `IMPORT INTO` asynchronously. When this option is enabled, executing `IMPORT INTO` immediately returns the information of the import job (such as the `Job_ID`), and the job is executed asynchronously in the backend. | | `CLOUD_STORAGE_URI` | All file formats | Specifies the target address where encoded KV data for [Global Sort](/tidb-global-sort.md) is stored. When `CLOUD_STORAGE_URI` is not specified, `IMPORT INTO` determines whether to use Global Sort based on the value of the system variable [`tidb_cloud_storage_uri`](/system-variables.md#tidb_cloud_storage_uri-new-in-v740). If this system variable specifies a target storage address, `IMPORT INTO` uses this address for Global Sort. When `CLOUD_STORAGE_URI` is specified with a non-empty value, `IMPORT INTO` uses that value as the target storage address. When `CLOUD_STORAGE_URI` is specified with an empty value, local sorting is enforced. Currently, the target storage address only supports S3. For details about the URI configuration, see [Amazon S3 URI format](/external-storage-uri.md#amazon-s3-uri-format). When this feature is used, all TiDB nodes must have read and write access for the target S3 bucket, including at least these permissions: `s3:ListBucket`, `s3:GetObject`, `s3:DeleteObject`, `s3:PutObject`, `s3: AbortMultipartUpload`. | @@ -333,6 +333,13 @@ To limit the write speed to a TiKV node to 10 MiB/s, execute the following SQL s IMPORT INTO t FROM 's3://bucket/path/to/file.parquet?access-key=XXX&secret-access-key=XXX' FORMAT 'parquet' WITH MAX_WRITE_SPEED='10MiB'; ``` +Importing data may impact the performance of foreground workloads. To mitigate this impact, it is recommended to configure `MAX_WRITE_SPEED` as follows: + +1. Import a small dataset without speed restrictions. And you can monitor the import speed in Grafana: TiDB -> Import Into -> Total encode/deliver/import-kv speed -> Import KV. +2. Use this import speed to determine the upper limit of `MAX_WRITE_SPEED` with this formula: + - (Import Speed) × (Number of Replicas) / (Number of TiDB Nodes) / min(Number of TiKV Nodes, THREAD) +3. Set `MAX_WRITE_SPEED` to a lower value than the upper limit. For example, reduce the result from Step 2 by 4–8X to reduce the impact on workload performance. + ## `IMPORT INTO ... FROM SELECT` usage `IMPORT INTO ... FROM SELECT` lets you import the query result of a `SELECT` statement to an empty table in TiDB. You can also use it to import historical data queried with [`AS OF TIMESTAMP`](/as-of-timestamp.md). diff --git a/system-variables.md b/system-variables.md index 9d7e3b5feb73e..cf50d086fb1f4 100644 --- a/system-variables.md +++ b/system-variables.md @@ -1746,7 +1746,7 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; - You can specify the value of this variable either with a unit or without a unit. - When you specify the value without a unit, the default unit is bytes per second. For example, `67108864` represents `64MiB` per second. - When you specify the value with a unit, supported units include KiB, MiB, GiB, and TiB. For example, `'1GiB`' represents 1 GiB per second, and `'256MiB'` represents 256 MiB per second. -- When the Distributed eXecution Framework (DXF) is enabled, this write limit applies to each TiDB node respectively. For example, if you add index using 4 TiDB nodes, setting this variable to `64MiB` means the maximum write speed to TiKV is `256MiB/s`. +- When the Distributed eXecution Framework (DXF) is enabled, this write limit applies to each TiDB node seperately. For example, if you add index with 4 TiDB nodes, setting this variable to `64MiB` means the maximum write speed to one TiKV is `256MiB/s`. ### tidb_ddl_reorg_worker_cnt diff --git a/tidb-lightning/tidb-lightning-configuration.md b/tidb-lightning/tidb-lightning-configuration.md index 0054cce6703da..9c91d360a0209 100644 --- a/tidb-lightning/tidb-lightning-configuration.md +++ b/tidb-lightning/tidb-lightning-configuration.md @@ -280,7 +280,7 @@ The `security` section specifies certificates and keys for TLS connections withi #### `store-write-bwlimit` -- Limits the per-table bandwidth to write data into TiKV for each TiDB Lightning instance in the physical import mode. +- Limits the bandwidth to write data into TiKV for each TiDB Lightning instance in the physical import mode. - Default value: `0`, which means no limit. #### `disk-quota` diff --git a/tidb-lightning/tidb-lightning-physical-import-mode-usage.md b/tidb-lightning/tidb-lightning-physical-import-mode-usage.md index 742014fd80e04..958c99ac38b97 100644 --- a/tidb-lightning/tidb-lightning-physical-import-mode-usage.md +++ b/tidb-lightning/tidb-lightning-physical-import-mode-usage.md @@ -57,8 +57,8 @@ duplicate-resolution = 'none' # The directory of local KV sorting. sorted-kv-dir = "./some-dir" -# Limits the per-table bandwidth to write data into TiKV for each -# TiDB Lightning instance in the physical import mode. +# Limits the bandwidth to write data into TiKV for each TiDB Lightning instance +# in the physical import mode. # store-write-bwlimit = "128MiB" # Specifies whether Physical Import Mode adds indexes via SQL. The default value is `false`, which means that TiDB Lightning will encode both row data and index data into KV pairs and import them into TiKV together. This mechanism is consistent with that of the historical versions. If you set it to `true`, it means that TiDB Lightning adds indexes via SQL after importing the row data. @@ -206,7 +206,7 @@ By default, TiDB Lightning pauses the cluster scheduling for the minimum range p ```toml [tikv-importer] -# Limits the per-table bandwidth to write data into TiKV for each TiDB Lightning instance in the physical import mode. +# Limits the bandwidth to write data into TiKV for each TiDB Lightning instance in the physical import mode. store-write-bwlimit = "128MiB" [tidb] From 97c09a934c9fa010353773e18d97391a31b60b87 Mon Sep 17 00:00:00 2001 From: Ruihao Chen Date: Tue, 6 May 2025 13:39:30 +0800 Subject: [PATCH 05/15] update doc for add index --- sql-statements/sql-statement-import-into.md | 2 +- system-variables.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/sql-statements/sql-statement-import-into.md b/sql-statements/sql-statement-import-into.md index 3439bd455c400..ae2f12474fb29 100644 --- a/sql-statements/sql-statement-import-into.md +++ b/sql-statements/sql-statement-import-into.md @@ -335,7 +335,7 @@ IMPORT INTO t FROM 's3://bucket/path/to/file.parquet?access-key=XXX&secret-acces Importing data may impact the performance of foreground workloads. To mitigate this impact, it is recommended to configure `MAX_WRITE_SPEED` as follows: -1. Import a small dataset without speed restrictions. And you can monitor the import speed in Grafana: TiDB -> Import Into -> Total encode/deliver/import-kv speed -> Import KV. +1. Import a small dataset without speed restrictions. And you can monitor the import speed in Grafana: TiDB > Import Into > Total encode/deliver/import-kv speed > Import KV. 2. Use this import speed to determine the upper limit of `MAX_WRITE_SPEED` with this formula: - (Import Speed) × (Number of Replicas) / (Number of TiDB Nodes) / min(Number of TiKV Nodes, THREAD) 3. Set `MAX_WRITE_SPEED` to a lower value than the upper limit. For example, reduce the result from Step 2 by 4–8X to reduce the impact on workload performance. diff --git a/system-variables.md b/system-variables.md index cf50d086fb1f4..cd8210c89a7ab 100644 --- a/system-variables.md +++ b/system-variables.md @@ -1746,7 +1746,7 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; - You can specify the value of this variable either with a unit or without a unit. - When you specify the value without a unit, the default unit is bytes per second. For example, `67108864` represents `64MiB` per second. - When you specify the value with a unit, supported units include KiB, MiB, GiB, and TiB. For example, `'1GiB`' represents 1 GiB per second, and `'256MiB'` represents 256 MiB per second. -- When the Distributed eXecution Framework (DXF) is enabled, this write limit applies to each TiDB node seperately. For example, if you add index with 4 TiDB nodes, setting this variable to `64MiB` means the maximum write speed to one TiKV is `256MiB/s`. +- When the Distributed eXecution Framework (DXF) is enabled, this write limit applies to each TiDB node seperately. For example, if you add index with 4 TiDB nodes, setting this variable to `64MiB` means the maximum write speed to one TiKV is `256MiB/s`. You can refer to the document of [IMPORT INTO](https://docs.pingcap.com/tidb/stable/sql-statement-import-into/#limit-the-write-speed-to-tikv) for how to set this variable. ### tidb_ddl_reorg_worker_cnt From aa5f465d2b8bbf2bf609c3c4100d809e92a55c8e Mon Sep 17 00:00:00 2001 From: Ruihao Chen Date: Wed, 7 May 2025 14:46:46 +0800 Subject: [PATCH 06/15] refine description --- sql-statements/sql-statement-import-into.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/sql-statements/sql-statement-import-into.md b/sql-statements/sql-statement-import-into.md index ae2f12474fb29..e17fe9db4a612 100644 --- a/sql-statements/sql-statement-import-into.md +++ b/sql-statements/sql-statement-import-into.md @@ -333,12 +333,12 @@ To limit the write speed to a TiKV node to 10 MiB/s, execute the following SQL s IMPORT INTO t FROM 's3://bucket/path/to/file.parquet?access-key=XXX&secret-access-key=XXX' FORMAT 'parquet' WITH MAX_WRITE_SPEED='10MiB'; ``` -Importing data may impact the performance of foreground workloads. To mitigate this impact, it is recommended to configure `MAX_WRITE_SPEED` as follows: +Importing data may impact the performance of foreground workloads. To mitigate this, it is recommended to configure `MAX_WRITE_SPEED` as follows: -1. Import a small dataset without speed restrictions. And you can monitor the import speed in Grafana: TiDB > Import Into > Total encode/deliver/import-kv speed > Import KV. -2. Use this import speed to determine the upper limit of `MAX_WRITE_SPEED` with this formula: - - (Import Speed) × (Number of Replicas) / (Number of TiDB Nodes) / min(Number of TiKV Nodes, THREAD) -3. Set `MAX_WRITE_SPEED` to a lower value than the upper limit. For example, reduce the result from Step 2 by 4–8X to reduce the impact on workload performance. +1. Import a small dataset with no speed restrictions. And you can estimate the average import speed via Grafana: TiDB > Import Into > Total encode/deliver/import-kv speed > Import KV. +2. Determine the upper limit of `MAX_WRITE_SPEED` using the speed from step 1 with this formula: + - (Import Speed) x (Number of Replicas) / (Number of TiDB Nodes) / min(Number of TiKV Nodes, THREAD) +3. Set `MAX_WRITE_SPEED` to a lower value, for example, reduce the speed by 4–8X to mitigate the impact on workload performance. ## `IMPORT INTO ... FROM SELECT` usage From ddbc7e3b1e583366bd025fb2e19258c1d48a588b Mon Sep 17 00:00:00 2001 From: Ruihao Chen Date: Thu, 8 May 2025 13:56:07 +0800 Subject: [PATCH 07/15] refine description --- sql-statements/sql-statement-import-into.md | 12 +++++++----- system-variables.md | 2 +- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/sql-statements/sql-statement-import-into.md b/sql-statements/sql-statement-import-into.md index e17fe9db4a612..96d38a25e11cc 100644 --- a/sql-statements/sql-statement-import-into.md +++ b/sql-statements/sql-statement-import-into.md @@ -327,18 +327,20 @@ IMPORT INTO t FROM '/path/to/file.sql' FORMAT 'sql'; #### Limit the write speed to TiKV -To limit the write speed to a TiKV node to 10 MiB/s, execute the following SQL statement: +Importing data may impact the performance of foreground workloads. In such scenario, it is recommended to limit the write speed to TiKV with `MAX_WRITE_SPEED`. + +For example, the following SQL statement limits the write speed to a TiKV node to 10 MiB/s for each TiDB node: ```sql IMPORT INTO t FROM 's3://bucket/path/to/file.parquet?access-key=XXX&secret-access-key=XXX' FORMAT 'parquet' WITH MAX_WRITE_SPEED='10MiB'; ``` -Importing data may impact the performance of foreground workloads. To mitigate this, it is recommended to configure `MAX_WRITE_SPEED` as follows: +To mitigate such impact, you can configure `MAX_WRITE_SPEED` as follows: -1. Import a small dataset with no speed restrictions. And you can estimate the average import speed via Grafana: TiDB > Import Into > Total encode/deliver/import-kv speed > Import KV. -2. Determine the upper limit of `MAX_WRITE_SPEED` using the speed from step 1 with this formula: +1. Import a small dataset with unlimited speed. And you can monitor the average import speed through Grafana: TiDB > Import Into > Total encode/deliver/import-kv speed > Import KV. +2. Determine the upper limit of `MAX_WRITE_SPEED` using this formula: - (Import Speed) x (Number of Replicas) / (Number of TiDB Nodes) / min(Number of TiKV Nodes, THREAD) -3. Set `MAX_WRITE_SPEED` to a lower value, for example, reduce the speed by 4–8X to mitigate the impact on workload performance. +3. Set `MAX_WRITE_SPEED` to a lower value than the calculated to ensure workload performance, for example, 4-8× lower. ## `IMPORT INTO ... FROM SELECT` usage diff --git a/system-variables.md b/system-variables.md index cd8210c89a7ab..8d9d946603e6b 100644 --- a/system-variables.md +++ b/system-variables.md @@ -1746,7 +1746,7 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; - You can specify the value of this variable either with a unit or without a unit. - When you specify the value without a unit, the default unit is bytes per second. For example, `67108864` represents `64MiB` per second. - When you specify the value with a unit, supported units include KiB, MiB, GiB, and TiB. For example, `'1GiB`' represents 1 GiB per second, and `'256MiB'` represents 256 MiB per second. -- When the Distributed eXecution Framework (DXF) is enabled, this write limit applies to each TiDB node seperately. For example, if you add index with 4 TiDB nodes, setting this variable to `64MiB` means the maximum write speed to one TiKV is `256MiB/s`. You can refer to the document of [IMPORT INTO](https://docs.pingcap.com/tidb/stable/sql-statement-import-into/#limit-the-write-speed-to-tikv) for how to set this variable. +- When the Distributed eXecution Framework (DXF) is enabled, this write limit applies to each TiDB node seperately. For example, if you add index with 4 TiDB nodes, setting this variable to `64MiB` means the maximum write speed to one TiKV is `256MiB/s`. For instructions on configuring this variable, you can refer to the [limit-the-write-speed-to-tikv](https://docs.pingcap.com/tidb/stable/sql-statement-import-into/#limit-the-write-speed-to-tikv) section of IMPORT INTO documentation. ### tidb_ddl_reorg_worker_cnt From dbb4e050be324d112f55a7982a8e3780fe788c84 Mon Sep 17 00:00:00 2001 From: Ruihao Chen Date: Thu, 8 May 2025 14:25:56 +0800 Subject: [PATCH 08/15] refine description --- sql-statements/sql-statement-import-into.md | 2 +- system-variables.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/sql-statements/sql-statement-import-into.md b/sql-statements/sql-statement-import-into.md index 96d38a25e11cc..1b9da64002566 100644 --- a/sql-statements/sql-statement-import-into.md +++ b/sql-statements/sql-statement-import-into.md @@ -335,7 +335,7 @@ For example, the following SQL statement limits the write speed to a TiKV node t IMPORT INTO t FROM 's3://bucket/path/to/file.parquet?access-key=XXX&secret-access-key=XXX' FORMAT 'parquet' WITH MAX_WRITE_SPEED='10MiB'; ``` -To mitigate such impact, you can configure `MAX_WRITE_SPEED` as follows: +If you are importing data with DXF and global sort enabled, you can configure `MAX_WRITE_SPEED` as follows to mitigate the impact: 1. Import a small dataset with unlimited speed. And you can monitor the average import speed through Grafana: TiDB > Import Into > Total encode/deliver/import-kv speed > Import KV. 2. Determine the upper limit of `MAX_WRITE_SPEED` using this formula: diff --git a/system-variables.md b/system-variables.md index 8d9d946603e6b..c7e95ef62ad5f 100644 --- a/system-variables.md +++ b/system-variables.md @@ -1746,7 +1746,7 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; - You can specify the value of this variable either with a unit or without a unit. - When you specify the value without a unit, the default unit is bytes per second. For example, `67108864` represents `64MiB` per second. - When you specify the value with a unit, supported units include KiB, MiB, GiB, and TiB. For example, `'1GiB`' represents 1 GiB per second, and `'256MiB'` represents 256 MiB per second. -- When the Distributed eXecution Framework (DXF) is enabled, this write limit applies to each TiDB node seperately. For example, if you add index with 4 TiDB nodes, setting this variable to `64MiB` means the maximum write speed to one TiKV is `256MiB/s`. For instructions on configuring this variable, you can refer to the [limit-the-write-speed-to-tikv](https://docs.pingcap.com/tidb/stable/sql-statement-import-into/#limit-the-write-speed-to-tikv) section of IMPORT INTO documentation. +- When the Distributed eXecution Framework (DXF) is enabled, this write limit applies to each TiDB node seperately. For example, if you add index with 4 TiDB nodes, setting this variable to `64MiB` means the maximum write speed to one TiKV is `256MiB/s`. For instructions on configuring this variable, you can refer to the [limit-the-write-speed-to-tikv](https://docs.pingcap.com/tidb/stable/sql-statement-import-into/#limit-the-write-speed-to-tikv) section of IMPORT INTO documentation. The only difference is that you should monitor the speed through Grafana > TiDB > DDL > Add Index Backfill Import Speed. ### tidb_ddl_reorg_worker_cnt From 4e4fef8f2fb3ea5b99c144166225ea8b004bd9a1 Mon Sep 17 00:00:00 2001 From: Ruihao Chen Date: Fri, 9 May 2025 14:08:13 +0800 Subject: [PATCH 09/15] Apply suggestions from code review Co-authored-by: D3Hunter --- sql-statements/sql-statement-admin-alter-ddl.md | 2 +- sql-statements/sql-statement-import-into.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/sql-statements/sql-statement-admin-alter-ddl.md b/sql-statements/sql-statement-admin-alter-ddl.md index 2e1c98aba5db3..158f861d12e41 100644 --- a/sql-statements/sql-statement-admin-alter-ddl.md +++ b/sql-statements/sql-statement-admin-alter-ddl.md @@ -23,7 +23,7 @@ The following are the supported parameters for different DDL jobs and their corr - `ADD INDEX`: - `THREAD`: the concurrency of the DDL job. The initial value is set by `tidb_ddl_reorg_worker_cnt`. - `BATCH_SIZE`: the batch size. The initial value is set by [`tidb_ddl_reorg_batch_size`](/system-variables.md#tidb_ddl_reorg_batch_size). - - `MAX_WRITE_SPEED`: the maximum bandwidth limit for importing index records into TiKV on each TiDB node. The initial value is set by [`tidb_ddl_reorg_max_write_speed`](/system-variables.md#tidb_ddl_reorg_max_write_speed-new-in-v6512-v755-and-v850). + - `MAX_WRITE_SPEED`: the maximum bandwidth limit for importing index records into each TiKV on each TiDB node. The initial value is set by [`tidb_ddl_reorg_max_write_speed`](/system-variables.md#tidb_ddl_reorg_max_write_speed-new-in-v6512-v755-and-v850). Currently, the preceding parameters only work for `ADD INDEX` jobs that are submitted and running after [`tidb_enable_dist_task`](/system-variables.md#tidb_enable_dist_task-new-in-v710) is disabled. diff --git a/sql-statements/sql-statement-import-into.md b/sql-statements/sql-statement-import-into.md index 1b9da64002566..be0b4273bc333 100644 --- a/sql-statements/sql-statement-import-into.md +++ b/sql-statements/sql-statement-import-into.md @@ -151,7 +151,7 @@ The supported options are described as follows: | `DISK_QUOTA=''` | All file formats | Specifies the disk space threshold that can be used during data sorting. The default value is 80% of the disk space in the TiDB [temporary directory](https://docs.pingcap.com/tidb/stable/tidb-configuration-file#temp-dir-new-in-v630). If the total disk size cannot be obtained, the default value is 50 GiB. When specifying `DISK_QUOTA` explicitly, make sure that the value does not exceed 80% of the disk space in the TiDB temporary directory. | | `DISABLE_TIKV_IMPORT_MODE` | All file formats | Specifies whether to disable switching TiKV to import mode during the import process. By default, switching TiKV to import mode is not disabled. If there are ongoing read-write operations in the cluster, you can enable this option to avoid impact from the import process. | | `THREAD=` | All file formats and query results of `SELECT` | Specifies the concurrency for import. For `IMPORT INTO ... FROM FILE`, the default value of `THREAD` is 50% of the number of CPU cores on the TiDB node, the minimum value is `1`, and the maximum value is the number of CPU cores. For `IMPORT INTO ... FROM SELECT`, the default value of `THREAD` is `2`, the minimum value is `1`, and the maximum value is two times the number of CPU cores on the TiDB node. To import data into a new cluster without any data, it is recommended to increase this concurrency appropriately to improve import performance. If the target cluster is already used in a production environment, it is recommended to adjust this concurrency according to your application requirements. | -| `MAX_WRITE_SPEED=''` | All file formats | Controls the write speed of each TiDB node to TiKV. By default, there is no speed limit. For example, if you have 10 TiDB nodes, specifying this option as `1MiB` limits the write speed to 10 MiB/s for each TiKV. | +| `MAX_WRITE_SPEED=''` | All file formats | Controls the write speed of each TiDB node to each TiKV. By default, there is no speed limit. For example, if you have 10 TiDB nodes, specifying this option as `1MiB` limits the write speed to 10 MiB/s for each TiKV. | | `CHECKSUM_TABLE=''` | All file formats | Configures whether to perform a checksum check on the target table after the import to validate the import integrity. The supported values include `"required"` (default), `"optional"`, and `"off"`. `"required"` means performing a checksum check after the import. If the checksum check fails, TiDB will return an error and the import will exit. `"optional"` means performing a checksum check after the import. If an error occurs, TiDB will return a warning and ignore the error. `"off"` means not performing a checksum check after the import. | | `DETACHED` | All file formats | Controls whether to execute `IMPORT INTO` asynchronously. When this option is enabled, executing `IMPORT INTO` immediately returns the information of the import job (such as the `Job_ID`), and the job is executed asynchronously in the backend. | | `CLOUD_STORAGE_URI` | All file formats | Specifies the target address where encoded KV data for [Global Sort](/tidb-global-sort.md) is stored. When `CLOUD_STORAGE_URI` is not specified, `IMPORT INTO` determines whether to use Global Sort based on the value of the system variable [`tidb_cloud_storage_uri`](/system-variables.md#tidb_cloud_storage_uri-new-in-v740). If this system variable specifies a target storage address, `IMPORT INTO` uses this address for Global Sort. When `CLOUD_STORAGE_URI` is specified with a non-empty value, `IMPORT INTO` uses that value as the target storage address. When `CLOUD_STORAGE_URI` is specified with an empty value, local sorting is enforced. Currently, the target storage address only supports S3. For details about the URI configuration, see [Amazon S3 URI format](/external-storage-uri.md#amazon-s3-uri-format). When this feature is used, all TiDB nodes must have read and write access for the target S3 bucket, including at least these permissions: `s3:ListBucket`, `s3:GetObject`, `s3:DeleteObject`, `s3:PutObject`, `s3: AbortMultipartUpload`. | From 993fe27dc5029565651b5a5add302309b2ebc907 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Fri, 4 Jul 2025 16:29:14 +0800 Subject: [PATCH 10/15] Apply suggestions from code review Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- sql-statements/sql-statement-admin-alter-ddl.md | 2 +- sql-statements/sql-statement-import-into.md | 14 +++++++------- system-variables.md | 2 +- tidb-lightning/tidb-lightning-configuration.md | 2 +- .../tidb-lightning-physical-import-mode-usage.md | 2 +- 5 files changed, 11 insertions(+), 11 deletions(-) diff --git a/sql-statements/sql-statement-admin-alter-ddl.md b/sql-statements/sql-statement-admin-alter-ddl.md index 158f861d12e41..56e88018e115f 100644 --- a/sql-statements/sql-statement-admin-alter-ddl.md +++ b/sql-statements/sql-statement-admin-alter-ddl.md @@ -23,7 +23,7 @@ The following are the supported parameters for different DDL jobs and their corr - `ADD INDEX`: - `THREAD`: the concurrency of the DDL job. The initial value is set by `tidb_ddl_reorg_worker_cnt`. - `BATCH_SIZE`: the batch size. The initial value is set by [`tidb_ddl_reorg_batch_size`](/system-variables.md#tidb_ddl_reorg_batch_size). - - `MAX_WRITE_SPEED`: the maximum bandwidth limit for importing index records into each TiKV on each TiDB node. The initial value is set by [`tidb_ddl_reorg_max_write_speed`](/system-variables.md#tidb_ddl_reorg_max_write_speed-new-in-v6512-v755-and-v850). + - `MAX_WRITE_SPEED`: the maximum bandwidth limit for importing index records from each TiDB node to each TiKV node. The initial value is set by [`tidb_ddl_reorg_max_write_speed`](/system-variables.md#tidb_ddl_reorg_max_write_speed-new-in-v6512-v755-and-v850). Currently, the preceding parameters only work for `ADD INDEX` jobs that are submitted and running after [`tidb_enable_dist_task`](/system-variables.md#tidb_enable_dist_task-new-in-v710) is disabled. diff --git a/sql-statements/sql-statement-import-into.md b/sql-statements/sql-statement-import-into.md index be0b4273bc333..ee80bc46bed62 100644 --- a/sql-statements/sql-statement-import-into.md +++ b/sql-statements/sql-statement-import-into.md @@ -151,7 +151,7 @@ The supported options are described as follows: | `DISK_QUOTA=''` | All file formats | Specifies the disk space threshold that can be used during data sorting. The default value is 80% of the disk space in the TiDB [temporary directory](https://docs.pingcap.com/tidb/stable/tidb-configuration-file#temp-dir-new-in-v630). If the total disk size cannot be obtained, the default value is 50 GiB. When specifying `DISK_QUOTA` explicitly, make sure that the value does not exceed 80% of the disk space in the TiDB temporary directory. | | `DISABLE_TIKV_IMPORT_MODE` | All file formats | Specifies whether to disable switching TiKV to import mode during the import process. By default, switching TiKV to import mode is not disabled. If there are ongoing read-write operations in the cluster, you can enable this option to avoid impact from the import process. | | `THREAD=` | All file formats and query results of `SELECT` | Specifies the concurrency for import. For `IMPORT INTO ... FROM FILE`, the default value of `THREAD` is 50% of the number of CPU cores on the TiDB node, the minimum value is `1`, and the maximum value is the number of CPU cores. For `IMPORT INTO ... FROM SELECT`, the default value of `THREAD` is `2`, the minimum value is `1`, and the maximum value is two times the number of CPU cores on the TiDB node. To import data into a new cluster without any data, it is recommended to increase this concurrency appropriately to improve import performance. If the target cluster is already used in a production environment, it is recommended to adjust this concurrency according to your application requirements. | -| `MAX_WRITE_SPEED=''` | All file formats | Controls the write speed of each TiDB node to each TiKV. By default, there is no speed limit. For example, if you have 10 TiDB nodes, specifying this option as `1MiB` limits the write speed to 10 MiB/s for each TiKV. | +| `MAX_WRITE_SPEED=''` | All file formats | Controls the write speed of each TiDB node to each TiKV node. By default, there is no speed limit. For example, if you have 10 TiDB nodes, specifying this option as `1MiB` limits the write speed to 10 MiB/s for each TiKV. | | `CHECKSUM_TABLE=''` | All file formats | Configures whether to perform a checksum check on the target table after the import to validate the import integrity. The supported values include `"required"` (default), `"optional"`, and `"off"`. `"required"` means performing a checksum check after the import. If the checksum check fails, TiDB will return an error and the import will exit. `"optional"` means performing a checksum check after the import. If an error occurs, TiDB will return a warning and ignore the error. `"off"` means not performing a checksum check after the import. | | `DETACHED` | All file formats | Controls whether to execute `IMPORT INTO` asynchronously. When this option is enabled, executing `IMPORT INTO` immediately returns the information of the import job (such as the `Job_ID`), and the job is executed asynchronously in the backend. | | `CLOUD_STORAGE_URI` | All file formats | Specifies the target address where encoded KV data for [Global Sort](/tidb-global-sort.md) is stored. When `CLOUD_STORAGE_URI` is not specified, `IMPORT INTO` determines whether to use Global Sort based on the value of the system variable [`tidb_cloud_storage_uri`](/system-variables.md#tidb_cloud_storage_uri-new-in-v740). If this system variable specifies a target storage address, `IMPORT INTO` uses this address for Global Sort. When `CLOUD_STORAGE_URI` is specified with a non-empty value, `IMPORT INTO` uses that value as the target storage address. When `CLOUD_STORAGE_URI` is specified with an empty value, local sorting is enforced. Currently, the target storage address only supports S3. For details about the URI configuration, see [Amazon S3 URI format](/external-storage-uri.md#amazon-s3-uri-format). When this feature is used, all TiDB nodes must have read and write access for the target S3 bucket, including at least these permissions: `s3:ListBucket`, `s3:GetObject`, `s3:DeleteObject`, `s3:PutObject`, `s3: AbortMultipartUpload`. | @@ -329,18 +329,18 @@ IMPORT INTO t FROM '/path/to/file.sql' FORMAT 'sql'; Importing data may impact the performance of foreground workloads. In such scenario, it is recommended to limit the write speed to TiKV with `MAX_WRITE_SPEED`. -For example, the following SQL statement limits the write speed to a TiKV node to 10 MiB/s for each TiDB node: +For example, the following SQL statement limits the write speed from each TiDB node to each TiKV node to 10 MiB/s: ```sql IMPORT INTO t FROM 's3://bucket/path/to/file.parquet?access-key=XXX&secret-access-key=XXX' FORMAT 'parquet' WITH MAX_WRITE_SPEED='10MiB'; ``` -If you are importing data with DXF and global sort enabled, you can configure `MAX_WRITE_SPEED` as follows to mitigate the impact: +If you are importing data with DXF and Global Sort enabled, you can configure `MAX_WRITE_SPEED` as follows to mitigate the impact: -1. Import a small dataset with unlimited speed. And you can monitor the average import speed through Grafana: TiDB > Import Into > Total encode/deliver/import-kv speed > Import KV. -2. Determine the upper limit of `MAX_WRITE_SPEED` using this formula: - - (Import Speed) x (Number of Replicas) / (Number of TiDB Nodes) / min(Number of TiKV Nodes, THREAD) -3. Set `MAX_WRITE_SPEED` to a lower value than the calculated to ensure workload performance, for example, 4-8× lower. +1. Import a small dataset with unlimited speed. You can monitor the average import speed through Grafana: **TiDB** > **Import Into** > **Total encode/deliver/import-kv speed** > **Import KV**. +2. Calculate the upper limit of `MAX_WRITE_SPEED` using the following formula: + - `MAX_WRITE_SPEED` = (Import speed) x (Number of Replicas) / (Number of TiDB nodes) / min(Number of TiKV nodes, THREAD) +3. Set `MAX_WRITE_SPEED` to a value lower than the calculated result to ensure workload performance, for example, 4 to 8 times lower. ## `IMPORT INTO ... FROM SELECT` usage diff --git a/system-variables.md b/system-variables.md index c7e95ef62ad5f..e82bf7f4e8c5c 100644 --- a/system-variables.md +++ b/system-variables.md @@ -1746,7 +1746,7 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; - You can specify the value of this variable either with a unit or without a unit. - When you specify the value without a unit, the default unit is bytes per second. For example, `67108864` represents `64MiB` per second. - When you specify the value with a unit, supported units include KiB, MiB, GiB, and TiB. For example, `'1GiB`' represents 1 GiB per second, and `'256MiB'` represents 256 MiB per second. -- When the Distributed eXecution Framework (DXF) is enabled, this write limit applies to each TiDB node seperately. For example, if you add index with 4 TiDB nodes, setting this variable to `64MiB` means the maximum write speed to one TiKV is `256MiB/s`. For instructions on configuring this variable, you can refer to the [limit-the-write-speed-to-tikv](https://docs.pingcap.com/tidb/stable/sql-statement-import-into/#limit-the-write-speed-to-tikv) section of IMPORT INTO documentation. The only difference is that you should monitor the speed through Grafana > TiDB > DDL > Add Index Backfill Import Speed. +- When the Distributed eXecution Framework (DXF) is enabled, this write limit applies to each TiDB node separately. For example, if you add an index with 4 TiDB nodes, setting this variable to `64MiB` means the maximum write speed to one TiKV is `256MiB/s`. For more information on configuring this variable, see [Limit the write speed to TiKV](/sql-statements/sql-statement-import-into.md#limit-the-write-speed-to-tikv). The only difference is that you should monitor the speed through Grafana: **TiDB** > **DDL** > **Add Index Backfill Import Speed**. ### tidb_ddl_reorg_worker_cnt diff --git a/tidb-lightning/tidb-lightning-configuration.md b/tidb-lightning/tidb-lightning-configuration.md index 9c91d360a0209..2548463468efc 100644 --- a/tidb-lightning/tidb-lightning-configuration.md +++ b/tidb-lightning/tidb-lightning-configuration.md @@ -280,7 +280,7 @@ The `security` section specifies certificates and keys for TLS connections withi #### `store-write-bwlimit` -- Limits the bandwidth to write data into TiKV for each TiDB Lightning instance in the physical import mode. +- Limits the total write bandwidth from each TiDB Lightning instance to TiKV in the physical import mode. - Default value: `0`, which means no limit. #### `disk-quota` diff --git a/tidb-lightning/tidb-lightning-physical-import-mode-usage.md b/tidb-lightning/tidb-lightning-physical-import-mode-usage.md index 958c99ac38b97..c9a76fbfce8a5 100644 --- a/tidb-lightning/tidb-lightning-physical-import-mode-usage.md +++ b/tidb-lightning/tidb-lightning-physical-import-mode-usage.md @@ -206,7 +206,7 @@ By default, TiDB Lightning pauses the cluster scheduling for the minimum range p ```toml [tikv-importer] -# Limits the bandwidth to write data into TiKV for each TiDB Lightning instance in the physical import mode. +# Limits the total write bandwidth from each TiDB Lightning instance to TiKV in the physical import mode. store-write-bwlimit = "128MiB" [tidb] From eba13cc6f908f42e4425ec4a8703fbc17ca87d80 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Fri, 4 Jul 2025 16:31:32 +0800 Subject: [PATCH 11/15] Apply suggestions from code review --- tidb-lightning/tidb-lightning-configuration.md | 2 +- tidb-lightning/tidb-lightning-physical-import-mode-usage.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tidb-lightning/tidb-lightning-configuration.md b/tidb-lightning/tidb-lightning-configuration.md index 2548463468efc..3773f6a67a3f0 100644 --- a/tidb-lightning/tidb-lightning-configuration.md +++ b/tidb-lightning/tidb-lightning-configuration.md @@ -280,7 +280,7 @@ The `security` section specifies certificates and keys for TLS connections withi #### `store-write-bwlimit` -- Limits the total write bandwidth from each TiDB Lightning instance to TiKV in the physical import mode. +- Limits the write bandwidth from each TiDB Lightning instance to TiKV in the physical import mode. - Default value: `0`, which means no limit. #### `disk-quota` diff --git a/tidb-lightning/tidb-lightning-physical-import-mode-usage.md b/tidb-lightning/tidb-lightning-physical-import-mode-usage.md index c9a76fbfce8a5..61826a551301c 100644 --- a/tidb-lightning/tidb-lightning-physical-import-mode-usage.md +++ b/tidb-lightning/tidb-lightning-physical-import-mode-usage.md @@ -57,7 +57,7 @@ duplicate-resolution = 'none' # The directory of local KV sorting. sorted-kv-dir = "./some-dir" -# Limits the bandwidth to write data into TiKV for each TiDB Lightning instance +# Limits the write bandwidth from each TiDB Lightning instance to TiKV # in the physical import mode. # store-write-bwlimit = "128MiB" From e06950411521cc8cf67d777ad115b5fe96daee5b Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Fri, 4 Jul 2025 16:31:48 +0800 Subject: [PATCH 12/15] Update tidb-lightning/tidb-lightning-physical-import-mode-usage.md --- tidb-lightning/tidb-lightning-physical-import-mode-usage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb-lightning/tidb-lightning-physical-import-mode-usage.md b/tidb-lightning/tidb-lightning-physical-import-mode-usage.md index 61826a551301c..330d96634a6ee 100644 --- a/tidb-lightning/tidb-lightning-physical-import-mode-usage.md +++ b/tidb-lightning/tidb-lightning-physical-import-mode-usage.md @@ -206,7 +206,7 @@ By default, TiDB Lightning pauses the cluster scheduling for the minimum range p ```toml [tikv-importer] -# Limits the total write bandwidth from each TiDB Lightning instance to TiKV in the physical import mode. +# Limits the write bandwidth from each TiDB Lightning instance to TiKV in the physical import mode. store-write-bwlimit = "128MiB" [tidb] From 6fd45e891ce9874a7c69c551d6a49d39f8583770 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Fri, 4 Jul 2025 16:41:21 +0800 Subject: [PATCH 13/15] Apply suggestions from code review --- sql-statements/sql-statement-import-into.md | 6 +++--- system-variables.md | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/sql-statements/sql-statement-import-into.md b/sql-statements/sql-statement-import-into.md index ee80bc46bed62..85af94d094a70 100644 --- a/sql-statements/sql-statement-import-into.md +++ b/sql-statements/sql-statement-import-into.md @@ -151,7 +151,7 @@ The supported options are described as follows: | `DISK_QUOTA=''` | All file formats | Specifies the disk space threshold that can be used during data sorting. The default value is 80% of the disk space in the TiDB [temporary directory](https://docs.pingcap.com/tidb/stable/tidb-configuration-file#temp-dir-new-in-v630). If the total disk size cannot be obtained, the default value is 50 GiB. When specifying `DISK_QUOTA` explicitly, make sure that the value does not exceed 80% of the disk space in the TiDB temporary directory. | | `DISABLE_TIKV_IMPORT_MODE` | All file formats | Specifies whether to disable switching TiKV to import mode during the import process. By default, switching TiKV to import mode is not disabled. If there are ongoing read-write operations in the cluster, you can enable this option to avoid impact from the import process. | | `THREAD=` | All file formats and query results of `SELECT` | Specifies the concurrency for import. For `IMPORT INTO ... FROM FILE`, the default value of `THREAD` is 50% of the number of CPU cores on the TiDB node, the minimum value is `1`, and the maximum value is the number of CPU cores. For `IMPORT INTO ... FROM SELECT`, the default value of `THREAD` is `2`, the minimum value is `1`, and the maximum value is two times the number of CPU cores on the TiDB node. To import data into a new cluster without any data, it is recommended to increase this concurrency appropriately to improve import performance. If the target cluster is already used in a production environment, it is recommended to adjust this concurrency according to your application requirements. | -| `MAX_WRITE_SPEED=''` | All file formats | Controls the write speed of each TiDB node to each TiKV node. By default, there is no speed limit. For example, if you have 10 TiDB nodes, specifying this option as `1MiB` limits the write speed to 10 MiB/s for each TiKV. | +| `MAX_WRITE_SPEED=''` | All file formats | Controls the write speed of each TiDB node to each TiKV node. By default, there is no speed limit. For example, if you have 10 TiDB nodes, specifying this option as `1MiB` limits the write speed to 10 MiB/s for each TiKV node. | | `CHECKSUM_TABLE=''` | All file formats | Configures whether to perform a checksum check on the target table after the import to validate the import integrity. The supported values include `"required"` (default), `"optional"`, and `"off"`. `"required"` means performing a checksum check after the import. If the checksum check fails, TiDB will return an error and the import will exit. `"optional"` means performing a checksum check after the import. If an error occurs, TiDB will return a warning and ignore the error. `"off"` means not performing a checksum check after the import. | | `DETACHED` | All file formats | Controls whether to execute `IMPORT INTO` asynchronously. When this option is enabled, executing `IMPORT INTO` immediately returns the information of the import job (such as the `Job_ID`), and the job is executed asynchronously in the backend. | | `CLOUD_STORAGE_URI` | All file formats | Specifies the target address where encoded KV data for [Global Sort](/tidb-global-sort.md) is stored. When `CLOUD_STORAGE_URI` is not specified, `IMPORT INTO` determines whether to use Global Sort based on the value of the system variable [`tidb_cloud_storage_uri`](/system-variables.md#tidb_cloud_storage_uri-new-in-v740). If this system variable specifies a target storage address, `IMPORT INTO` uses this address for Global Sort. When `CLOUD_STORAGE_URI` is specified with a non-empty value, `IMPORT INTO` uses that value as the target storage address. When `CLOUD_STORAGE_URI` is specified with an empty value, local sorting is enforced. Currently, the target storage address only supports S3. For details about the URI configuration, see [Amazon S3 URI format](/external-storage-uri.md#amazon-s3-uri-format). When this feature is used, all TiDB nodes must have read and write access for the target S3 bucket, including at least these permissions: `s3:ListBucket`, `s3:GetObject`, `s3:DeleteObject`, `s3:PutObject`, `s3: AbortMultipartUpload`. | @@ -327,7 +327,7 @@ IMPORT INTO t FROM '/path/to/file.sql' FORMAT 'sql'; #### Limit the write speed to TiKV -Importing data may impact the performance of foreground workloads. In such scenario, it is recommended to limit the write speed to TiKV with `MAX_WRITE_SPEED`. +Importing data might affect the performance of application workloads. In such scenario, it is recommended to limit the write speed to TiKV with `MAX_WRITE_SPEED`. For example, the following SQL statement limits the write speed from each TiDB node to each TiKV node to 10 MiB/s: @@ -340,7 +340,7 @@ If you are importing data with DXF and Global Sort enabled, you can configure ` 1. Import a small dataset with unlimited speed. You can monitor the average import speed through Grafana: **TiDB** > **Import Into** > **Total encode/deliver/import-kv speed** > **Import KV**. 2. Calculate the upper limit of `MAX_WRITE_SPEED` using the following formula: - `MAX_WRITE_SPEED` = (Import speed) x (Number of Replicas) / (Number of TiDB nodes) / min(Number of TiKV nodes, THREAD) -3. Set `MAX_WRITE_SPEED` to a value lower than the calculated result to ensure workload performance, for example, 4 to 8 times lower. +3. Set `MAX_WRITE_SPEED` to a value lower than the calculated result (for example, 4 to 8 times lower) to ensure workload performance. ## `IMPORT INTO ... FROM SELECT` usage diff --git a/system-variables.md b/system-variables.md index e82bf7f4e8c5c..056515ee769ae 100644 --- a/system-variables.md +++ b/system-variables.md @@ -1741,7 +1741,7 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; - Type: String - Default value: `0` - Range: `[0, 1PiB]` -- This variable limits the write bandwidth of each TiDB node to TiKV and only takes effect when index creation acceleration is enabled (controlled by the [`tidb_ddl_enable_fast_reorg`](#tidb_ddl_enable_fast_reorg-new-in-v630) variable). When the data size in your cluster is quite large (such as billions of rows), limiting the write bandwidth for index creation can effectively reduce the impact on application workloads. +- This variable limits the write bandwidth of each TiDB node to TiKV node and only takes effect when index creation acceleration is enabled (controlled by the [`tidb_ddl_enable_fast_reorg`](#tidb_ddl_enable_fast_reorg-new-in-v630) variable). When the data size in your cluster is quite large (such as billions of rows), limiting the write bandwidth for index creation can effectively reduce the impact on application workloads. - The default value `0` means no write bandwidth limit. - You can specify the value of this variable either with a unit or without a unit. - When you specify the value without a unit, the default unit is bytes per second. For example, `67108864` represents `64MiB` per second. From f61f2677c8d8e234678c98a7db6ae17afa7a544e Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Fri, 4 Jul 2025 16:43:07 +0800 Subject: [PATCH 14/15] Update sql-statements/sql-statement-import-into.md --- sql-statements/sql-statement-import-into.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql-statements/sql-statement-import-into.md b/sql-statements/sql-statement-import-into.md index 85af94d094a70..a50502e7e8be7 100644 --- a/sql-statements/sql-statement-import-into.md +++ b/sql-statements/sql-statement-import-into.md @@ -337,7 +337,7 @@ IMPORT INTO t FROM 's3://bucket/path/to/file.parquet?access-key=XXX&secret-acces If you are importing data with DXF and Global Sort enabled, you can configure `MAX_WRITE_SPEED` as follows to mitigate the impact: -1. Import a small dataset with unlimited speed. You can monitor the average import speed through Grafana: **TiDB** > **Import Into** > **Total encode/deliver/import-kv speed** > **Import KV**. +1. Import a small dataset with unlimited speed, and monitor the average import speed through Grafana: **TiDB** > **Import Into** > **Total encode/deliver/import-kv speed** > **Import KV**. 2. Calculate the upper limit of `MAX_WRITE_SPEED` using the following formula: - `MAX_WRITE_SPEED` = (Import speed) x (Number of Replicas) / (Number of TiDB nodes) / min(Number of TiKV nodes, THREAD) 3. Set `MAX_WRITE_SPEED` to a value lower than the calculated result (for example, 4 to 8 times lower) to ensure workload performance. From e16b2e1e4fd7743e48931b27447e084b313c22fd Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Fri, 4 Jul 2025 16:51:58 +0800 Subject: [PATCH 15/15] minor wording updates --- sql-statements/sql-statement-import-into.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sql-statements/sql-statement-import-into.md b/sql-statements/sql-statement-import-into.md index a50502e7e8be7..96a184565a070 100644 --- a/sql-statements/sql-statement-import-into.md +++ b/sql-statements/sql-statement-import-into.md @@ -151,7 +151,7 @@ The supported options are described as follows: | `DISK_QUOTA=''` | All file formats | Specifies the disk space threshold that can be used during data sorting. The default value is 80% of the disk space in the TiDB [temporary directory](https://docs.pingcap.com/tidb/stable/tidb-configuration-file#temp-dir-new-in-v630). If the total disk size cannot be obtained, the default value is 50 GiB. When specifying `DISK_QUOTA` explicitly, make sure that the value does not exceed 80% of the disk space in the TiDB temporary directory. | | `DISABLE_TIKV_IMPORT_MODE` | All file formats | Specifies whether to disable switching TiKV to import mode during the import process. By default, switching TiKV to import mode is not disabled. If there are ongoing read-write operations in the cluster, you can enable this option to avoid impact from the import process. | | `THREAD=` | All file formats and query results of `SELECT` | Specifies the concurrency for import. For `IMPORT INTO ... FROM FILE`, the default value of `THREAD` is 50% of the number of CPU cores on the TiDB node, the minimum value is `1`, and the maximum value is the number of CPU cores. For `IMPORT INTO ... FROM SELECT`, the default value of `THREAD` is `2`, the minimum value is `1`, and the maximum value is two times the number of CPU cores on the TiDB node. To import data into a new cluster without any data, it is recommended to increase this concurrency appropriately to improve import performance. If the target cluster is already used in a production environment, it is recommended to adjust this concurrency according to your application requirements. | -| `MAX_WRITE_SPEED=''` | All file formats | Controls the write speed of each TiDB node to each TiKV node. By default, there is no speed limit. For example, if you have 10 TiDB nodes, specifying this option as `1MiB` limits the write speed to 10 MiB/s for each TiKV node. | +| `MAX_WRITE_SPEED=''` | All file formats | Controls the write speed of each TiDB node to each TiKV node. By default, there is no speed limit. For example, if you have 10 TiDB nodes, specifying this option as `1MiB` limits the total write speed to each TiKV node to 10 MiB/s. | | `CHECKSUM_TABLE=''` | All file formats | Configures whether to perform a checksum check on the target table after the import to validate the import integrity. The supported values include `"required"` (default), `"optional"`, and `"off"`. `"required"` means performing a checksum check after the import. If the checksum check fails, TiDB will return an error and the import will exit. `"optional"` means performing a checksum check after the import. If an error occurs, TiDB will return a warning and ignore the error. `"off"` means not performing a checksum check after the import. | | `DETACHED` | All file formats | Controls whether to execute `IMPORT INTO` asynchronously. When this option is enabled, executing `IMPORT INTO` immediately returns the information of the import job (such as the `Job_ID`), and the job is executed asynchronously in the backend. | | `CLOUD_STORAGE_URI` | All file formats | Specifies the target address where encoded KV data for [Global Sort](/tidb-global-sort.md) is stored. When `CLOUD_STORAGE_URI` is not specified, `IMPORT INTO` determines whether to use Global Sort based on the value of the system variable [`tidb_cloud_storage_uri`](/system-variables.md#tidb_cloud_storage_uri-new-in-v740). If this system variable specifies a target storage address, `IMPORT INTO` uses this address for Global Sort. When `CLOUD_STORAGE_URI` is specified with a non-empty value, `IMPORT INTO` uses that value as the target storage address. When `CLOUD_STORAGE_URI` is specified with an empty value, local sorting is enforced. Currently, the target storage address only supports S3. For details about the URI configuration, see [Amazon S3 URI format](/external-storage-uri.md#amazon-s3-uri-format). When this feature is used, all TiDB nodes must have read and write access for the target S3 bucket, including at least these permissions: `s3:ListBucket`, `s3:GetObject`, `s3:DeleteObject`, `s3:PutObject`, `s3: AbortMultipartUpload`. | @@ -327,7 +327,7 @@ IMPORT INTO t FROM '/path/to/file.sql' FORMAT 'sql'; #### Limit the write speed to TiKV -Importing data might affect the performance of application workloads. In such scenario, it is recommended to limit the write speed to TiKV with `MAX_WRITE_SPEED`. +Importing data might affect the performance of application workloads. In such cases, it is recommended to limit the write speed to TiKV with `MAX_WRITE_SPEED`. For example, the following SQL statement limits the write speed from each TiDB node to each TiKV node to 10 MiB/s: