Skip to content

ZOOKEEPER-4882: Fix data loss after rejoin and restart of a node experienced temporary disk error #2268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kezhuw
Copy link
Member

@kezhuw kezhuw commented Jun 15, 2025

The cause is multifold:

  1. Leader will commit a proposal once quorum acked.
  2. Proposal is able to be committed in node's memory even if it has not been written to that node's disk.
  3. In case of disk error, the txn log could lag behind memory database.

This way, node experienced temporary disk error will have hole in its txn log after re-join. Once restarted, data will loss.

This commit complains the lag so to reload disk database to memory. This way, the node will not be able to become leader and sync missing txns from leader.

Refs: ZOOKEEPER-4882, ZOOKEEPER-4925

@kezhuw kezhuw force-pushed the ZOOKEEPER-4882-fix-data-loss-from-node-experienced-temporary-disk-error branch 2 times, most recently from 4b44db5 to 0ac0f3f Compare June 15, 2025 04:04
…rienced temporary disk error

The cause is multifold:
1. Leader will commit a proposal once quorum acked.
2. Proposal is able to be committed in node's memory even if it has not
   been written to that node's disk.
3. In case of disk error, the txn log could lag behind memory database.

This way, node experienced temporary disk error will have hole in its
txn log after re-join. Once restarted, data will loss.

This commit complains the lag so to reload disk database to memory. This
way, the node will not be able to become leader and sync missing txns
from leader.

Refs: ZOOKEEPER-4882, ZOOKEEPER-4925
@kezhuw kezhuw force-pushed the ZOOKEEPER-4882-fix-data-loss-from-node-experienced-temporary-disk-error branch from 0ac0f3f to 2a6a6d3 Compare June 26, 2025 00:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant