-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Remote entity reservation v6 #18525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Remote entity reservation v6 #18525
Conversation
crates/bevy_ecs/src/entity/mod.rs
Outdated
source: Arc<RemoteEntitiesInner>, | ||
} | ||
|
||
impl Future for RemoteReservedEntities { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this right, the manual Future
impl and the ConcurrentQueue<Waker>
are here to wake up readers when new entities are pushed the the queue. Is that right?
I think that's essentially what a Channel does over a Queue, and you can simplify the code if you switch to using a Channel. async_channel
seems to already be in our dependency tree and no_std
compatible.
If you're able to lock down remote_batch_size
at creation time, then I think you can handle both requests
and reserved
with a single async_channel::bounded(remote_batch_size)
by having fn fulfill()
keep calling try_send()
until it fails because the channel is full. You could even use Receiver<Entity>
as the public type and skip making a RemoteEntities
wrapper.
(try_send()
looks like it just does some Relaxed
and Acquire
loads on failure, so it should be almost free. is_full()
and len()
seem to do SeqCst
loads, which I believe do a full fence and are expensive, so you wouldn't want to use those.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this right, the manual
Future
impl and theConcurrentQueue<Waker>
are here to wake up readers when new entities are pushed the the queue. Is that right?
Yup. That's right.
I think that's essentially what a Channel does over a Queue, and you can simplify the code if you switch to using a Channel.
async_channel
seems to already be in our dependency tree andno_std
compatible.
Yeah, I wanted to use a channel, but I could't find one that was no_std. async_channel
looks like it will work. I'll try that out in a bit.
If you're able to lock down
remote_batch_size
at creation time, then I think you can handle bothrequests
andreserved
with a singleasync_channel::bounded(remote_batch_size)
by havingfn fulfill()
keep callingtry_send()
until it fails because the channel is full.
That is my intention but I'm not sure "If you're able to lock down remote_batch_size
at creation time". But if that's a possibility, we're on the same page.
You could even use
Receiver<Entity>
as the public type and skip making aRemoteEntities
wrapper.
Technically. But I see the backend for this changing potentially. So I'd like to keep the wrapper for now.
(
try_send()
looks like it just does someRelaxed
andAcquire
loads on failure, so it should be almost free.is_full()
andlen()
seem to doSeqCst
loads, which I believe do a full fence and are expensive, so you wouldn't want to use those.)
Yeah. This is a good end game. I didn't do that because reserve_entities
needs a definite amount ahead of time. As you point out, we could do alloc
for now, but that would prevent it from being flushed. Plus, IDK if we'll always have &mut
access to Entities
. I mean, probably, but we may need to try_fulfill
more often than we flush
. Maybe. I'll try some stuff out. We can always make a &Entities
version later.
crates/bevy_ecs/src/entity/mod.rs
Outdated
|
||
impl RemoteEntitiesInner { | ||
fn fulfill(&self, entities: &Entities, batch_size: NonZero<u32>) { | ||
self.is_waiting.store(0, Ordering::Relaxed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reserves exactly batch_size
entities per flush
, regardless of how many requests there were. If we make enough async requests, it could take a while for them all to get fulfilled!
Rather than counting waiters, it might make sense to have a request_count
, increment it before popping, and do reserve_entities(request_count.swap(0, Relaxed))
. If you pre-fill the queue with batch_size
then that will refill the original buffer every time.
(My proposal above has exactly the same problem, so maybe we can't use async_channel::bounded
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reserves exactly
batch_size
entities perflush
, regardless of how many requests there were. If we make enough async requests, it could take a while for them all to get fulfilled!
Yeah. It's not ideal. I don't anticipate this being a real issue though (with a good batch_size
), but it's bugging me too.
Rather than counting waiters, it might make sense to have a
request_count
, increment it before popping, and doreserve_entities(request_count.swap(0, Relaxed))
. If you pre-fill the queue withbatch_size
then that will refill the original buffer every time.
That's an excellent idea! Brilliant actually. It would keep the number reserved stable, and no reserve would be waiting for more than a single flush. I'll need to split reserve_entities(request_count.swap(0, Relaxed))
up to help with branch prediction, but I really like this idea.
(My proposal above has exactly the same problem, so maybe we can't use
async_channel::bounded
.)
Pros and cons. Might be possible to do both. A fast bounded channel for standard operations. But if it runs out, we increment request_count
and fulfill on a boundless. That might be slightly over engineered, but it would work. I'll hold off for now, but I'll keep this in mind for a follow up.
crates/bevy_ecs/src/entity/mod.rs
Outdated
fn fulfill(&self, entities: &Entities, batch_size: NonZero<u32>) { | ||
self.is_waiting.store(0, Ordering::Relaxed); | ||
if self.reserved.is_empty() { | ||
for reserved in entities.reserve_entities(batch_size.get()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you have &mut Entities
when calling try_fulfill()
, you can use alloc()
instead of reserve_entities()
to avoid all the atomic operations. (You'd have to move the try_fulfill()
call to the end of flush()
, of course.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alloc
prevents them from being flushed. Maybe we just let the asset system make them valid, but I'd kinda like to hand the asset server a blank entity rather than an invalid one. I have an idea here though. I'll get back to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @chescock. Brilliant as always!
crates/bevy_ecs/src/entity/mod.rs
Outdated
source: Arc<RemoteEntitiesInner>, | ||
} | ||
|
||
impl Future for RemoteReservedEntities { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this right, the manual
Future
impl and theConcurrentQueue<Waker>
are here to wake up readers when new entities are pushed the the queue. Is that right?
Yup. That's right.
I think that's essentially what a Channel does over a Queue, and you can simplify the code if you switch to using a Channel.
async_channel
seems to already be in our dependency tree andno_std
compatible.
Yeah, I wanted to use a channel, but I could't find one that was no_std. async_channel
looks like it will work. I'll try that out in a bit.
If you're able to lock down
remote_batch_size
at creation time, then I think you can handle bothrequests
andreserved
with a singleasync_channel::bounded(remote_batch_size)
by havingfn fulfill()
keep callingtry_send()
until it fails because the channel is full.
That is my intention but I'm not sure "If you're able to lock down remote_batch_size
at creation time". But if that's a possibility, we're on the same page.
You could even use
Receiver<Entity>
as the public type and skip making aRemoteEntities
wrapper.
Technically. But I see the backend for this changing potentially. So I'd like to keep the wrapper for now.
(
try_send()
looks like it just does someRelaxed
andAcquire
loads on failure, so it should be almost free.is_full()
andlen()
seem to doSeqCst
loads, which I believe do a full fence and are expensive, so you wouldn't want to use those.)
Yeah. This is a good end game. I didn't do that because reserve_entities
needs a definite amount ahead of time. As you point out, we could do alloc
for now, but that would prevent it from being flushed. Plus, IDK if we'll always have &mut
access to Entities
. I mean, probably, but we may need to try_fulfill
more often than we flush
. Maybe. I'll try some stuff out. We can always make a &Entities
version later.
crates/bevy_ecs/src/entity/mod.rs
Outdated
|
||
impl RemoteEntitiesInner { | ||
fn fulfill(&self, entities: &Entities, batch_size: NonZero<u32>) { | ||
self.is_waiting.store(0, Ordering::Relaxed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reserves exactly
batch_size
entities perflush
, regardless of how many requests there were. If we make enough async requests, it could take a while for them all to get fulfilled!
Yeah. It's not ideal. I don't anticipate this being a real issue though (with a good batch_size
), but it's bugging me too.
Rather than counting waiters, it might make sense to have a
request_count
, increment it before popping, and doreserve_entities(request_count.swap(0, Relaxed))
. If you pre-fill the queue withbatch_size
then that will refill the original buffer every time.
That's an excellent idea! Brilliant actually. It would keep the number reserved stable, and no reserve would be waiting for more than a single flush. I'll need to split reserve_entities(request_count.swap(0, Relaxed))
up to help with branch prediction, but I really like this idea.
(My proposal above has exactly the same problem, so maybe we can't use
async_channel::bounded
.)
Pros and cons. Might be possible to do both. A fast bounded channel for standard operations. But if it runs out, we increment request_count
and fulfill on a boundless. That might be slightly over engineered, but it would work. I'll hold off for now, but I'll keep this in mind for a follow up.
crates/bevy_ecs/src/entity/mod.rs
Outdated
fn fulfill(&self, entities: &Entities, batch_size: NonZero<u32>) { | ||
self.is_waiting.store(0, Ordering::Relaxed); | ||
if self.reserved.is_empty() { | ||
for reserved in entities.reserve_entities(batch_size.get()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alloc
prevents them from being flushed. Maybe we just let the asset system make them valid, but I'd kinda like to hand the asset server a blank entity rather than an invalid one. I have an idea here though. I'll get back to you.
/// This is the number of entities we keep ready for remote reservations via [`RemoteEntities::reserve_entity`]. | ||
/// A value too high can cause excess memory to be used, but a value too low can cause additional waiting. | ||
pub entities_hot_for_remote: u32, | ||
remote: Arc<RemoteEntitiesInner>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Could we just make this store a RemoteEntities
? That way we don't have to worry about constructing one in get_remote
and can just blindly clone. Also means if we change the details of RemoteEntities
we don't have to worry about how we construct it each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do this, and if v6 doesn't end up evolving much, I'm on board with this. However, many of the ways we can improve this is by caching some atomic results in a non-atomic that needs &mut
. More specifically, we'd need RemoteEntities
to have additional per-instance state in addition to the shared state in Arc<RemoteEntitiesInner>
. But Entities
isn't a remote instance, so it doesn't make sense to include that per-instance state in Entities
.
I used this a lot in v4. If v6 is merged, I'll start slowly moving concepts from v4 into it to try to improve performance. If none of those changes are merged, then I can follow up by simplifying this per your suggestion. It's preemptive future proofing if that makes sense.
crates/bevy_ecs/src/entity/mod.rs
Outdated
keep_hot: u32, | ||
) { | ||
let to_fulfill = entities.remote.recent_requests.swap(0, Ordering::Relaxed); | ||
let current_hot = entities.remote.keep_hot.load(Ordering::Relaxed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why entities.remote.keep_hot
needs to be an Atomic. Could we just move it into Entities
and then handle it with just regular u32
s?
Similarly is keep_hot
the correct name for this? Maybe we should call it like in_channel
or something? (they may have already been claimed by recent_request
but I think it's close enough?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why
entities.remote.keep_hot
needs to be an Atomic. Could we just move it intoEntities
and then handle it with just regularu32
s?
For now, it doesn't. However, the extra 2 atomic ops are nothing compared to the like 10 atomic ops per entity to push onto the queue. (This is something I think a custom queue implementation can do much better. I did this for v4).
Given that it's not a performance concern, I like to keep the state about remote entities on the type itself rather than somewhere on Entities
. We can change this later, but this could still come in handy if we want to do fulfillment with only &Entities
in the future (maybe).
Similarly is
keep_hot
the correct name for this? Maybe we should call it likein_channel
or something? (they may have already been claimed byrecent_request
but I think it's close enough?)
That's a more precise name, so I changed the internals to use that. But the field we expose to users I've kept as "hot" just because the channel detail might change later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, it doesn't. However, the extra 2 atomic ops are nothing compared to the like 10 atomic ops per entity to push onto the queue. (This is something I think a custom queue implementation can do much better. I did this for v4).
Given that it's not a performance concern, I like to keep the state about remote entities on the type itself rather than somewhere on
Entities
. We can change this later, but this could still come in handy if we want to do fulfillment with only&Entities
in the future (maybe).
Another way to approach this is to have separate structs for each side, like Sender
and Receiver
on channels. That keeps the remote reservation logic together, but still keeps the provider and client data separate. That would also remove the need to close()
the channel explicitly, since it automatically closes when the last Sender
is dropped.
I think the issue with using an atomic here is that the ordinary load followed by a store looks like a race condition, while if you had &mut
then it would be obvious that nothing could change it in between. Performance shouldn't be an issue; relaxed load and store calls like this are no more expensive than non-atomic ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I really like how small this wound up being!
I left a few more nits, but nothing blocking (assuming async_channel
gets updated, which seems promising).
self.meta.clear(); | ||
self.pending.clear(); | ||
*self.free_cursor.get_mut() = 0; | ||
self.remote.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will disconnect any outstanding RemoteEntities
values so that they start failing to reserve. Is that really what we want to do in this case? It might be less disruptive to simply drain the queue.
Although it probably doesn't matter, since nobody will call clear()
on a real application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer this behavior. What if a clear happens while an asset is loading? Now we have to make sure every s included entity is valid instead of just making sure one is. Right now, the minute it clears, all current remote reservers fail, which I think a more transparent way of error handling.
(But like you said, it doesn't really mater).
crates/bevy_ecs/src/entity/mod.rs
Outdated
keep_hot: u32, | ||
) { | ||
let to_fulfill = entities.remote.recent_requests.swap(0, Ordering::Relaxed); | ||
let current_hot = entities.remote.keep_hot.load(Ordering::Relaxed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, it doesn't. However, the extra 2 atomic ops are nothing compared to the like 10 atomic ops per entity to push onto the queue. (This is something I think a custom queue implementation can do much better. I did this for v4).
Given that it's not a performance concern, I like to keep the state about remote entities on the type itself rather than somewhere on
Entities
. We can change this later, but this could still come in handy if we want to do fulfillment with only&Entities
in the future (maybe).
Another way to approach this is to have separate structs for each side, like Sender
and Receiver
on channels. That keeps the remote reservation logic together, but still keeps the provider and client data separate. That would also remove the need to close()
the channel explicitly, since it automatically closes when the last Sender
is dropped.
I think the issue with using an atomic here is that the ordinary load followed by a store looks like a race condition, while if you had &mut
then it would be obvious that nothing could change it in between. Performance shouldn't be an issue; relaxed load and store calls like this are no more expensive than non-atomic ones.
crates/bevy_ecs/src/entity/mod.rs
Outdated
let to_fulfill = entities.remote.recent_requests.swap(0, Ordering::Relaxed); | ||
let current_in_channel = entities.remote.in_channel.load(Ordering::Relaxed); | ||
let should_reserve = (to_fulfill + in_channel).saturating_sub(current_in_channel); // should_reserve = to_fulfill + (in_channel - current_in_channel) | ||
let new_in_channel = (current_in_channel + should_reserve).saturating_sub(to_fulfill); // new_in_channel = current_in_channel + (should_reserve - to_fulfill). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the second subtraction can never saturate, and could be an ordinary -
. new_in_channel
will be more than in_channel
if the first subtraction saturates, and equal to in_channel
if it doesn't, but it can never be less than in_channel
, so it can never be less than zero.
(It's possible that this would be more clear if should_reserve
were calculated using checked_sub
and an if
, especially since we never need to allocate entities if it overflows.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the second subtraction can never saturate, and could be an ordinary
-
.new_in_channel
will be more thanin_channel
if the first subtraction saturates, and equal toin_channel
if it doesn't, but it can never be less thanin_channel
, so it can never be less than zero.
I agree. I thought this was the case when I made it but I was too lazy to do the proof lol. But I also like the redundancy.
(It's possible that this would be more clear if
should_reserve
were calculated usingchecked_sub
and anif
, especially since we never need to allocate entities if it overflows.)
I agreed with this when I read it, but when I tried it, it felt more confusing.
I'm not sure why this keeps happening, where GitHub doesn't let me reply.
I can see the appeal here, but I opted not to do this for a few reasons. 1) We might move away from the channel in the near future. 2) We don't necessarily want that closing behavior. and 3) We may want to be able to push to the channel apart from entities. (Ex: I reserved more entities than I need, so I'll push them instead of flushing and freeing.)
Agreed. Hopefully this is more clear now that fulfillment is in |
let in_channel = self.entities_hot_for_remote; | ||
let mut init_allocated = init_allocated; | ||
let to_fulfill = self.remote.recent_requests.swap(0, Ordering::Relaxed); | ||
let current_in_channel = self.remote.in_channel.load(Ordering::Relaxed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option here is to store the (signed) difference between in_channel
and recent_requests
. Then you only have one value to update! You can use the existing IdCursor
type for signed numbers. This would reduce to something like:
if self.remote.net_in_channel.load(Ordering::Relaxed) < self.entities_hot_for_remote {
// ...
let old_net_in_channel = self.remote.net_in_channel
.fetch_max(self.entities_hot_for_remote, Ordering::Relaxed);
let to_fulfill = self.entities_hot_for_remote - old_net_in_channel;
if to_fulfill > 0 {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we could, but I I'm not sure what the motivation would be. IIUC, it's the same number of atomic ops and the same amount of data stored. Actually its 8 bytes instead of 4, but still. I'm not opposed to it though, if there's a reason to do this that I'm just missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I was just trying to simplify the saturating_sub
arithmetic by having one counter to modify instead of two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point I'm fine with the code itself. Whether we take this route is another discussion. IOW, if we decide this is the PR we want to get merged, I think my review is valid, but that doesn't mean this is a "vote" for this PR :P My ideal (and I think yours as well) is fully parallelized entity reservation. But this is a good second.
Heads up that the async channel issue is fixed now. See here. IDK when the next release will be, but I can fix CI here when it's out. (Unless we do v9, which I tend to prefer.) |
Objective
Like the other 5 versions, the goal is to allow entity reservation from any thread asynchronously. This is primarily important for assets as entities. fixes #18003.
Solution
This differs from #18440 (v5) by reserving one by one instead of in bulk. This will probably have worse performance, but it's definitely passible, especially given that async reservation is a relatively slow path. In fact, with a high batch size, this may even be faster than v5. See v5 benchmarks for the impact on the rest of the ecs, as the changes there are almost identical.
Testing
The test from v5 has been ported over.
Questions for review
If we are ok with not exposing the batch size for remote, we could make this use a bounded queue, which would improve performance of remote reservation.