-
Notifications
You must be signed in to change notification settings - Fork 432
Backport some bug fixes from the 3.x branch to the 2.x branch #1277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@szetszwo Looking forward to your assistance. |
@ilixiaocui , sure, we could back port bug fixes to branch-2. Would you consider upgrading to the recent release 3.2.0? |
Much appreciated! |
@ilixiaocui , could you select a list of commits you like to back port? I could merge them to branch-2. |
The bugs triggered in the production environment are related to the following two issues. The corresponding commit IDs are based on the RATIS-2140 related RATIS-2208 related Thank you again for your assistance! @szetszwo In addition, the ratis-3.0.0 release notes summarize many bug fixes from the 2.x series. would you consider backporting these fixes to the 2.x branch as well? The corresponding commit IDs are based on the RATIS-2116: 6390a28 |
@ilixiaocui , tried to merging the list but some of commits (the ones commented out below) have serious conflicts. Let me see how to resolve them. git cherry-pick 5c47d3b4cafffa8e2bc21276f302d70efbbed5a9 #RATIS-1858. Follower keeps logging first election timeout. (#894)
git cherry-pick 95b51e512ffa3d0798607b82f8b474649413f2bd #RATIS-1705. Fix metrics leak (#744)
git cherry-pick a6719dc63eb90cc6bdc622a0824101945e746475 #RATIS-1873
git cherry-pick a483bd4bf015b5b368215e0d622ff43ed317b0c7 #RATIS-1884. Fix retry cache warning condition (#915)
git cherry-pick b8ce6d1f6ea37ed3ff9f6e888d2357fe48490567 #RATIS-1883. Next Index should be always larger than Match Index in GrpcLogAppender (#914)
git cherry-pick 05f39221102abc00b2934e279da872d06f6a1811 #RATIS-1887. Gap between segement log (#919)
git cherry-pick be28b3907f4fee8957fb2824770e4925364d0a8f #RATIS-1890. SegmentedRaftLogCache#shouldEvict should only iterate over closed segments once (#921)
git cherry-pick 0e136f39123dc65a07a41c7146ea0e91f0fe1fa7 #RATIS-1893. In SegmentedRaftLogCache, start a daemon thread to checkAndEvictCache. (#924)
git cherry-pick d461a01a53e7e130f0ec4143e75b316012137b62 #RATIS-1895. IllegalStateException: Failed to updateIncreasingly for nextIndex. (#926)
git cherry-pick 8a74dc256c875b46025e24d1d9c9de8e8379a53c #RATIS-1886
git cherry-pick 4c8ef9db16e32d13a1eb07fce12a7563b830a2da #RATIS-1902. The snapshot index is set incorrectly in InstallSnapshotReplyProto. (#933)
git cherry-pick b7ffa1ba1e3e7cecd9ea687f72425c2ffd5b1c34 #RATIS-1909. Fix Decreasing Next Index When GrpcLogAppender Reset Client. (#939)
git cherry-pick 5a8519ee6cc40abb999d07154c4c2d12320c2da1 #RATIS-1920. NPE in AppendLogResponseHandler. (#952)
git cherry-pick 7015ba2f274394697dffec417b43374656077d88 #RATIS-1916. OrderAsync does not call handReply. (#948)
# git cherry-pick 22cbefa2c11c3471d2f763ccb4251806ed3529f5 #RATIS-872. Invalidate replied calls in retry cache. (#942)
git cherry-pick c35f769f513609d808ab1cc91c5323d9ff30f636 #RATIS-1912. Fix infinity election when perform membership change. (#954)
git cherry-pick 95352591005a1bf867f9aac9f9c0b337741181e3 #RATIS-1804. Change the default number of outstanding append entires. (#838)
git cherry-pick 1b05bfcc76e4f3007d389dc52ee0305b9fff8e41 #RATIS-1928. Join the LogAppenders when closing the server. (#959)
# git cherry-pick 6390a28bdf1d2c454d49a11dca117e5bbc482f54 #RATIS-2116. Fix the issue where RaftServerImpl.appendEntries may be blocked indefinitely (#1116)
git cherry-pick 2e7cb458ca6a10b4c38cafca7e8eee8a8e7fcef1 #RATIS-2140. Thread wait when installing snapshot. (#1137)
# git cherry-pick 2c4e354f133a44b971837ea33b5f89d62302cb63 #RATIS-2232. Improve log for debugging on RaftLog / TransactionManager (#1203)
git cherry-pick 337df17c7ea27fbaac9f5f82f8557dc815830d7c #RATIS-2234. Remove lock race between heartbeat and append log channels (#1205)
git cherry-pick cf893f64906df82908fcc43aed2d575e52f7a174 #RATIS-2233. make NOPROGRESS timeout configurable (#1204)
# git cherry-pick 17ca6f41d0a577de2ecb452368c1a38b0c63d8b7 #RATIS-2235. Allow only one thread to perform appendLog (#1206)
# git cherry-pick 5d3476f27650c13e94d6bbe5ccbfbc7ca4712eea #RATIS-2242. change consistency criteria of heartbeat during appendLog (#1215) |
Appreciate it again. |
@ilixiaocui , sorry that I was not able to check conflicts. I should be able to check them sometime next week. In the meantime, please see if you could find out the dependent commits for resolving the confilcts. If you have a tight deadline, please feel free to share it. I would try my best to accommodate it. |
Thanks again for your reply! Could you please help cherry-pick these two sets of commits that are already causing issues? RATIS-2140 related RATIS-2208 related The other issues haven’t been directly encountered in our production environment. There’s no urgency on timing—one or two weeks is completely fine. |
@ilixiaocui , the first and the last two commits have serious conflicts. We need to find out what commits does it depend on. # git cherry-pick 2c4e354f133a44b971837ea33b5f89d62302cb63 #RATIS-2232. Improve log for debugging on RaftLog / TransactionManager (#1203)
git cherry-pick 337df17c7ea27fbaac9f5f82f8557dc815830d7c #RATIS-2234. Remove lock race between heartbeat and append log channels (#1205)
git cherry-pick cf893f64906df82908fcc43aed2d575e52f7a174 #RATIS-2233. make NOPROGRESS timeout configurable (#1204)
# git cherry-pick 17ca6f41d0a577de2ecb452368c1a38b0c63d8b7 #RATIS-2235. Allow only one thread to perform appendLog (#1206)
# git cherry-pick 5d3476f27650c13e94d6bbe5ccbfbc7ca4712eea #RATIS-2242. change consistency criteria of heartbeat during appendLog (#1215) |
What changes were proposed in this pull request?
We are currently operating Ratis 2.4.0 in production at significant scale, where we've observed two recurring issues related to snapshot installation – consistent with existing community reports (reference: RATIS-2140 RATIS-2208)
Would it be possible, at your convenience, to consider backporting the associated fixes to the 2.x maintenance branch? Such an effort would greatly assist our team in planning a stable production upgrade path while continuing to leverage this foundational version.
We sincerely appreciate your guidance on this matter and remain grateful for the community's ongoing stewardship of Ratis.
What is the link to the Apache JIRA
RATIS-2140
RATIS-2208
How was this patch tested?