From ecf77c00448606538cb028367b68463c776ff0eb Mon Sep 17 00:00:00 2001 From: Bogdan Stancu Date: Sat, 14 Jun 2025 03:01:29 +0300 Subject: [PATCH 1/7] Add proposal for tenant limits API Signed-off-by: Bogdan Stancu --- docs/proposals/limits-api.md | 73 ++++++++++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+) create mode 100644 docs/proposals/limits-api.md diff --git a/docs/proposals/limits-api.md b/docs/proposals/limits-api.md new file mode 100644 index 00000000000..3471b3b2064 --- /dev/null +++ b/docs/proposals/limits-api.md @@ -0,0 +1,73 @@ +--- +title: "Limits API" +linkTitle: "Limits API" +weight: 1 +slug: limits-api +--- + +- Author: Bogdan Stancu +- Date: June 2025 +- Status: Proposed + +## Overview + +This proposal outlines the design for a new API endpoint that will allow users to modify their current limits in Cortex. Currently, limits can only be changed by administrators modifying the runtime configuration file and waiting for it to be reloaded. + +## Problem + +Currently, when users need limit adjustments, they must: +1. Manually editing the runtime configuration file +2. Coordinating with users to verify the changes +3. Potentially repeating this process multiple times to find the right balance + +This manual process is time-consuming, error-prone, and doesn't scale well with a large number of users. By offering a self-service API, users can adjust their own limits within predefined boundaries, reducing the administrative overhead and improving the user experience. + +## Proposed API Design + +### Endpoints + +#### 1. GET /api/v1/limits/{tenant_id} +Returns the current limits configuration for a specific tenant. + +Response format: +```json +{ + "ingestion_rate": 10000, + "ingestion_burst_size": 20000, + "max_global_series_per_user": 1000000, + "max_global_series_per_metric": 200000, + ... +} +``` + +#### 2. PUT /api/v1/limits/{tenant_id} +Updates limits for a specific tenant. The request body should contain only the limits that need to be updated. + +Request body: +```json +{ + "ingestion_rate": 10000, + "max_series_per_metric": 100000 +} +``` + +#### 3. DELETE /api/v1/limits/{tenant_id} +Removes tenant-specific limits, reverting to default limits. + +### Implementation Details + +1. The API will be integrated into the existing Cortex components to: + - Read the current runtime config from the configured storage backend + - Apply changes to the in-memory configuration + - Persist changes back to the storage backend + - Trigger a reload of the runtime config + +2. Security: + - The API will require admin-level authentication + - Rate limiting will be implemented to prevent abuse + - Changes will be validated before being applied + +3. Error Handling: + - Invalid limit values will return 400 Bad Request + - Storage backend errors will return 500 Internal Server Error + From 2bdb6e56d961ba1e4f0c45fa63fab60e48eb8831 Mon Sep 17 00:00:00 2001 From: Bogdan Stancu Date: Tue, 17 Jun 2025 16:38:09 +0300 Subject: [PATCH 2/7] Change endpoints Signed-off-by: Bogdan Stancu --- docs/proposals/limits-api.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/proposals/limits-api.md b/docs/proposals/limits-api.md index 3471b3b2064..cf88e2fcf4f 100644 --- a/docs/proposals/limits-api.md +++ b/docs/proposals/limits-api.md @@ -26,7 +26,7 @@ This manual process is time-consuming, error-prone, and doesn't scale well with ### Endpoints -#### 1. GET /api/v1/limits/{tenant_id} +#### 1. GET /api/v1/user-limits Returns the current limits configuration for a specific tenant. Response format: @@ -40,7 +40,7 @@ Response format: } ``` -#### 2. PUT /api/v1/limits/{tenant_id} +#### 2. PUT /api/v1/user-limits Updates limits for a specific tenant. The request body should contain only the limits that need to be updated. Request body: @@ -51,7 +51,7 @@ Request body: } ``` -#### 3. DELETE /api/v1/limits/{tenant_id} +#### 3. DELETE /api/v1/user-limits Removes tenant-specific limits, reverting to default limits. ### Implementation Details From d79f421f76d91f37eb6cdb943897b826dff9ce79 Mon Sep 17 00:00:00 2001 From: Bogdan Stancu Date: Tue, 17 Jun 2025 16:39:55 +0300 Subject: [PATCH 3/7] suggestions Signed-off-by: Bogdan Stancu --- docs/proposals/limits-api.md | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/docs/proposals/limits-api.md b/docs/proposals/limits-api.md index cf88e2fcf4f..03583b32647 100644 --- a/docs/proposals/limits-api.md +++ b/docs/proposals/limits-api.md @@ -56,14 +56,11 @@ Removes tenant-specific limits, reverting to default limits. ### Implementation Details -1. The API will be integrated into the existing Cortex components to: +1. The API will be integrated into the cortex-overrides component to: - Read the current runtime config from the configured storage backend - - Apply changes to the in-memory configuration - Persist changes back to the storage backend - - Trigger a reload of the runtime config 2. Security: - - The API will require admin-level authentication - Rate limiting will be implemented to prevent abuse - Changes will be validated before being applied From 4f1216335ed646d3bfe711e9d5ae578968c2d46b Mon Sep 17 00:00:00 2001 From: Bogdan Stancu Date: Tue, 24 Jun 2025 20:15:36 +0300 Subject: [PATCH 4/7] support only block storage Signed-off-by: Bogdan Stancu --- docs/proposals/limits-api.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/proposals/limits-api.md b/docs/proposals/limits-api.md index 03583b32647..25cf9bd95cc 100644 --- a/docs/proposals/limits-api.md +++ b/docs/proposals/limits-api.md @@ -59,6 +59,7 @@ Removes tenant-specific limits, reverting to default limits. 1. The API will be integrated into the cortex-overrides component to: - Read the current runtime config from the configured storage backend - Persist changes back to the storage backend + - The API will only work with configurations stored in block storage backends. 2. Security: - Rate limiting will be implemented to prevent abuse From 1e9f61cdefeb4296f04c94767745053e31cde446 Mon Sep 17 00:00:00 2001 From: Bogdan Stancu Date: Mon, 30 Jun 2025 12:33:57 +0300 Subject: [PATCH 5/7] overrides not limits Signed-off-by: Bogdan Stancu --- docs/proposals/limits-api.md | 41 +++++++++++++++++++++++++++--------- 1 file changed, 31 insertions(+), 10 deletions(-) diff --git a/docs/proposals/limits-api.md b/docs/proposals/limits-api.md index 25cf9bd95cc..bbc16950146 100644 --- a/docs/proposals/limits-api.md +++ b/docs/proposals/limits-api.md @@ -1,8 +1,8 @@ --- -title: "Limits API" -linkTitle: "Limits API" +title: "User Overrides API" +linkTitle: "User Overrides API" weight: 1 -slug: limits-api +slug: overrides-api --- - Author: Bogdan Stancu @@ -11,7 +11,7 @@ slug: limits-api ## Overview -This proposal outlines the design for a new API endpoint that will allow users to modify their current limits in Cortex. Currently, limits can only be changed by administrators modifying the runtime configuration file and waiting for it to be reloaded. +This proposal outlines the design for a new API endpoint that will allow users to modify their current limits in Cortex. Currently, overrides can only be changed by administrators modifying the runtime configuration file and waiting for it to be reloaded. ## Problem @@ -26,8 +26,8 @@ This manual process is time-consuming, error-prone, and doesn't scale well with ### Endpoints -#### 1. GET /api/v1/user-limits -Returns the current limits configuration for a specific tenant. +#### 1. GET /api/v1/user-overrides +Returns the current overrides configuration for a specific tenant. Response format: ```json @@ -40,8 +40,8 @@ Response format: } ``` -#### 2. PUT /api/v1/user-limits -Updates limits for a specific tenant. The request body should contain only the limits that need to be updated. +#### 2. PUT /api/v1/user-overrides +Updates overrides for a specific tenant. The request body should contain only the overrides that need to be updated. Request body: ```json @@ -51,8 +51,8 @@ Request body: } ``` -#### 3. DELETE /api/v1/user-limits -Removes tenant-specific limits, reverting to default limits. +#### 3. DELETE /api/v1/user-overrides +Removes tenant-specific overrides, reverting to default overrides. ### Implementation Details @@ -64,6 +64,27 @@ Removes tenant-specific limits, reverting to default limits. 2. Security: - Rate limiting will be implemented to prevent abuse - Changes will be validated before being applied + - A hard limit configuration will be implemented + Hard limits will not be changable through the API + Example: +```yaml + # file: runtime.yaml + # In this example, we're overriding ingestion limits for a single tenant. + overrides: + "user1": + ingestion_burst_size: 350000 + ingestion_rate: 350000 + max_global_series_per_metric: 300000 + max_global_series_per_user: 300000 + max_series_per_metric: 0 + max_series_per_user: 0 + max_samples_per_query: 100000 + max_series_per_query: 100000 + configurable-overrides: # still not sure about the naming for this section + "user1": + ingestion_rate: 700000 + max_global_series_per_user: 700000 +``` 3. Error Handling: - Invalid limit values will return 400 Bad Request From bd2ca17239591129600829c28ac0e3ae86d591c9 Mon Sep 17 00:00:00 2001 From: Bogdan Stancu Date: Mon, 30 Jun 2025 12:37:41 +0300 Subject: [PATCH 6/7] grammar Signed-off-by: Bogdan Stancu --- docs/proposals/limits-api.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/proposals/limits-api.md b/docs/proposals/limits-api.md index bbc16950146..30d9dc619a0 100644 --- a/docs/proposals/limits-api.md +++ b/docs/proposals/limits-api.md @@ -17,7 +17,7 @@ This proposal outlines the design for a new API endpoint that will allow users t Currently, when users need limit adjustments, they must: 1. Manually editing the runtime configuration file -2. Coordinating with users to verify the changes +2. Coordinate with users to verify the changes 3. Potentially repeating this process multiple times to find the right balance This manual process is time-consuming, error-prone, and doesn't scale well with a large number of users. By offering a self-service API, users can adjust their own limits within predefined boundaries, reducing the administrative overhead and improving the user experience. From e7899853f7aee1c2d2ed9d449602daad9a4bf168 Mon Sep 17 00:00:00 2001 From: Bogdan Stancu Date: Sat, 5 Jul 2025 00:49:10 +0300 Subject: [PATCH 7/7] move hard limits to open questions Signed-off-by: Bogdan Stancu --- .../{limits-api.md => user-overrides-api.md} | 27 +++++-------------- 1 file changed, 6 insertions(+), 21 deletions(-) rename docs/proposals/{limits-api.md => user-overrides-api.md} (75%) diff --git a/docs/proposals/limits-api.md b/docs/proposals/user-overrides-api.md similarity index 75% rename from docs/proposals/limits-api.md rename to docs/proposals/user-overrides-api.md index 30d9dc619a0..3e89d45ddc2 100644 --- a/docs/proposals/limits-api.md +++ b/docs/proposals/user-overrides-api.md @@ -64,29 +64,14 @@ Removes tenant-specific overrides, reverting to default overrides. 2. Security: - Rate limiting will be implemented to prevent abuse - Changes will be validated before being applied - - A hard limit configuration will be implemented - Hard limits will not be changable through the API - Example: -```yaml - # file: runtime.yaml - # In this example, we're overriding ingestion limits for a single tenant. - overrides: - "user1": - ingestion_burst_size: 350000 - ingestion_rate: 350000 - max_global_series_per_metric: 300000 - max_global_series_per_user: 300000 - max_series_per_metric: 0 - max_series_per_user: 0 - max_samples_per_query: 100000 - max_series_per_query: 100000 - configurable-overrides: # still not sure about the naming for this section - "user1": - ingestion_rate: 700000 - max_global_series_per_user: 700000 -``` + 3. Error Handling: - Invalid limit values will return 400 Bad Request - Storage backend errors will return 500 Internal Server Error +### Open Questions: + - How do we implement a hard-limit configuration to avoid users + setting unreasonable limits? + - What set of overrides can be configurable through this API? + Limits like `shard_size` should only be modified by the admin. \ No newline at end of file