-
Notifications
You must be signed in to change notification settings - Fork 111
Online Indexer: replace the synchronized runner with a heartbeat #3530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
89d0263
to
f318f65
Compare
a680857
to
83830c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get a chance to look at all of the code, we can discuss some of the things offline.
@@ -180,7 +178,6 @@ <M extends Message> void singleRebuild( | |||
if (!safeBuild) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does safeBuild
here mean in this context? Do we need to continue to prevent concurrency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The things that had left to non-safeBuild
, as it seems, are clearing and marking the index as write-only and setting constrains that, IIUIC, should not be happening after the previous clearing.
After so many changes in the code, maybe it is time to re-design (and possibly simplify) the singleRebuild
. Should that be a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing and cleaning up safeBuild
should not be part of this PR.
...st/java/com/apple/foundationdb/record/provider/foundationdb/OnlineIndexerBuildIndexTest.java
Outdated
Show resolved
Hide resolved
...ain/java/com/apple/foundationdb/record/provider/foundationdb/OnlineIndexOperationConfig.java
Show resolved
Hide resolved
...ain/java/com/apple/foundationdb/record/provider/foundationdb/OnlineIndexOperationConfig.java
Show resolved
Hide resolved
...ain/java/com/apple/foundationdb/record/provider/foundationdb/OnlineIndexOperationConfig.java
Outdated
Show resolved
Hide resolved
...ore/src/main/java/com/apple/foundationdb/record/provider/foundationdb/IndexingHeartbeat.java
Show resolved
Hide resolved
...st/java/com/apple/foundationdb/record/provider/foundationdb/OnlineIndexingHeartbeatTest.java
Show resolved
Hide resolved
...yer-core/src/main/java/com/apple/foundationdb/record/provider/foundationdb/IndexingBase.java
Outdated
Show resolved
Hide resolved
...st/java/com/apple/foundationdb/record/provider/foundationdb/OnlineIndexingHeartbeatTest.java
Show resolved
Hide resolved
@@ -1237,7 +1267,7 @@ public static class Builder { | |||
private DesiredAction ifReadable = DesiredAction.CONTINUE; | |||
private boolean doAllowUniquePendingState = false; | |||
private Set<TakeoverTypes> allowedTakeoverSet = null; | |||
private long checkIndexingStampFrequency = 60_000; | |||
private long checkIndexingStampFrequency = 10_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why check the type stamp more often?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To align the default stamp & heartbeat checking with the default lease time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't you want the lease time to be longer than the heartbeat?
If you only update the heartbeat every 10 seconds, and you consider it too-old after 10 seconds, it doesn't take much clock skew to have indexers stepping over each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had changed the default to check heartbeats and typestamp every 5 seconds. I would consider changing the default least time for longer than 10 seconds, but this should probably be done on a separate PR.
4cf86f4
to
97d02b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were also two comments left on my previous review, that I want to followup on. I resolved everything else in that review, or answered in a (I hope) definitive way, so it should be easy to go back through.
...r-core/src/main/java/com/apple/foundationdb/record/provider/foundationdb/FDBRecordStore.java
Show resolved
Hide resolved
...yer-core/src/main/java/com/apple/foundationdb/record/provider/foundationdb/IndexingBase.java
Outdated
Show resolved
Hide resolved
...yer-core/src/main/java/com/apple/foundationdb/record/provider/foundationdb/IndexingBase.java
Outdated
Show resolved
Hide resolved
...yer-core/src/main/java/com/apple/foundationdb/record/provider/foundationdb/IndexingBase.java
Outdated
Show resolved
Hide resolved
...yer-core/src/main/java/com/apple/foundationdb/record/provider/foundationdb/IndexingBase.java
Outdated
Show resolved
Hide resolved
List<Index> indexes = new ArrayList<>(); | ||
indexes.add(new Index("indexA", field("num_value_2"), EmptyKeyExpression.EMPTY, IndexTypes.VALUE, IndexOptions.UNIQUE_OPTIONS)); | ||
indexes.add(new Index("indexB", field("num_value_3_indexed"), IndexTypes.VALUE)); | ||
FDBRecordStoreTestBase.RecordMetaDataHook hook = allIndexesHook(indexes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a matter of the store creation. When you create an FDBRecordStore
it is passed in a SubspaceProvider
, which could present something that is not a tuple. Looking briefly at the amount of testing infrastructure that you'd have to rework, I'm not sure how likely it is that something else wouldn't break.
I think it would be valuable to have, but I don't think it's worth the effort, and I think we might be better off focusing on changing FDBRecordStore
to take a KeySpacePath
only.
I don't think there's any way to do anything after that that is not a Tuple
startSemaphore.acquire(); | ||
Thread.sleep(100); | ||
} catch (InterruptedException e) { | ||
throw new RuntimeException(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should always re-interrupt the current thread when catching InterruptedException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
√
try (FDBRecordContext context = openContext()) { | ||
final Map<UUID, IndexBuildProto.IndexBuildHeartbeat> heartbeats = IndexingHeartbeat.getIndexingHeartbeats(recordStore, indexes.get(0), 0).join(); | ||
heartbeatsQueries.add(heartbeats); | ||
context.commit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to commit here, you're just doing a read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting warnings. I don't think that creating a context without commit passes a PRB now.
import java.util.concurrent.CompletableFuture; | ||
import java.util.concurrent.atomic.AtomicInteger; | ||
|
||
public class IndexingHeartbeat { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed offline, and decided to take a balanced approach of changing the method to info
, so that this is, on disk, future-proofed for being upgraded to be more generic if we want to use it elsewhere. There would still need to be some substantial changes to the code, but it would need to take a subspace instead of a store
and an index
.
|
||
// clear, partial | ||
int countDeleted = | ||
IndexingHeartbeat.clearIndexingHeartbeats(recordStore, index, 0, 7).join(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is clearing 7 entries, regardless of their last heartbeat, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Max age to delete is 0.
Each indexing session, for each index, will create a key-value heartbeat of the format: [prefix, xid] -> [indexing-type, genesis time, heartbeat time] Indexing session that are expected to be exclusive will throw an exception if another, active, session exists. Motivation: 1. Represent the heartbeat in every index during multi target indexing (currently - only the master index has a sync lock) 2. Keep a heartbeat during mutual indexing, which can allow better automatic decision making 3. Decide about exclusiveness according to the indexing method (currently - user input) Resolve FoundationDB#3529
97d02b6
to
149b070
Compare
86ad032
to
2a9f7e5
Compare
Each indexing session, for each index, will create a key-value heartbeat of the format:
[prefix, xid] -> [indexing-type, genesis time, heartbeat time]
Indexing session that are expected to be exclusive will throw an exception if another, active, session exists.
Motivation:
1. Represent the heartbeat in every index during multi target indexing (currently - only the master index has a sync lock)
2. Keep a heartbeat during mutual indexing, which can allow better automatic decision making
3. Decide about exclusiveness according to the indexing method (currently - user input)
With this change, the equivalent of a sync lock will be determined by the indexing type and cannot be set by the users. The index configuration function
setUseSynchronizedSession
will have no effect on the indexing process.During graduate code upgrade on multiple servers, there may be a situation where one server is indexing with a synchronized session lock, while another server builds the same index with an exclusive heartbeat "lock". If that happens:
a) There will be no more than two concurrent active sessions
b) The indexing sessions will conflict each other until one of the indexers will give up. While this may be not optimal, the generated index will be valid.
Resolve #3529