Skip to content

Monitor visit marker #67

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

## master
* [ENHANCEMENT] Add bigger tenants and configure default compactor tenant shards
* [ENHANCEMENT] Add alert `CortexCompactorWriteVisitMarkerIsFailing` to monitor compactors

## 1.17.1 / 2024-10-23
* [CHANGE] Use cortex v1.17.1
Expand Down
16 changes: 16 additions & 0 deletions cortex-mixin/alerts/compactor.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,22 @@
||| % $._config,
},
},
{
// Alert if compactor are not able to update the visit-marker.
alert: 'CortexCompactorBlockVisitMarkerIsFailing',
'for': '2h',
expr: |||
sum(increase(cortex_compactor_block_visit_marker_write_failed{job=~".+/%(compactor)s"}[2h]))>0
||| % $._config.job_names,
labels: {
severity: 'critical'
},
annotations: {
message: |||
Cortex compactors are not able to update the visit marker, double check logs to see what is happening
|||
}
}
],
},
],
Expand Down
11 changes: 11 additions & 0 deletions cortex-mixin/docs/playbooks.md
Original file line number Diff line number Diff line change
Expand Up @@ -379,6 +379,17 @@ How to **investigate**:
- Ensure ingesters are successfully shipping blocks to the storage
- Look for any error in the compactor logs

### CortexCompactorWriteVisitMarkerIsFailing

Only applies to compactors when using shuffle sharding.
This alert fires if the compactor is not able to update the visit marker across all tenants.
The marker file is a very small json file that should never have any problems getting updated.

How to **investigate**:
- Verify the logs for the compactors, they should show the exact reason
- If you see the `context canceled` or any other timeouts in the logs,
consider increasing `-compactor.compaction-visit-marker-timeout` and `-compactor.compaction-visit-marker-file-update-interval`.

### CortexCompactorHasNotSuccessfullyRunCompaction

This alert fires if the compactor is not able to successfully compact all discovered compactable blocks (across all tenants).
Expand Down
Loading