Skip to content

Ring state will be inconsistent between memory and consul after a CAS error #3154

Open
@bboreham

Description

@bboreham

The change in memory state is made before updating Consul, and no attempt is made to revert the former if the latter fails:

i.setState(state)
return i.updateConsul(ctx)

I noticed this because I got this log message:

level=warn ts=2020-09-09T19:59:32.324235593Z caller=grpc_logging.go:55 duration=15.010918473s method=/cortex.Ingester/TransferChunks err="Transfer: ChangeState: failed to CAS collectors/ring" msg="gRPC\n"

That's coming from here:

if err := i.lifecycler.ChangeState(ctx, ring.ACTIVE); err != nil {

The defer in that function should then log "TransferChunks failed" and go back to PENDING state, but I don't see that log, which is explained by this line checking the in-memory state:

if i.lifecycler.GetState() == ring.ACTIVE {

(Also odd: metrics show it did go to ACTIVE state)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions