Skip to content

Conversation

ElliottjPierce
Copy link
Contributor

Objective

Like the other 5 versions, the goal is to allow entity reservation from any thread asynchronously. This is primarily important for assets as entities. fixes #18003.

Solution

This differs from #18440 (v5) by reserving one by one instead of in bulk. This will probably have worse performance, but it's definitely passible, especially given that async reservation is a relatively slow path. In fact, with a high batch size, this may even be faster than v5. See v5 benchmarks for the impact on the rest of the ecs, as the changes there are almost identical.

Testing

The test from v5 has been ported over.

Questions for review

If we are ok with not exposing the batch size for remote, we could make this use a bounded queue, which would improve performance of remote reservation.

@alice-i-cecile alice-i-cecile added C-Feature A new feature, making something new possible A-ECS Entities, components, systems, and events X-Controversial There is active debate or serious implications around merging this PR labels Mar 25, 2025
@alice-i-cecile alice-i-cecile added this to the 0.16 milestone Mar 25, 2025
@alice-i-cecile alice-i-cecile added the S-Needs-Review Needs reviewer attention (from anyone!) to move forward label Mar 25, 2025
@alice-i-cecile alice-i-cecile modified the milestones: 0.16, 0.17 Mar 25, 2025
@andriyDev andriyDev self-requested a review March 25, 2025 18:18
source: Arc<RemoteEntitiesInner>,
}

impl Future for RemoteReservedEntities {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this right, the manual Future impl and the ConcurrentQueue<Waker> are here to wake up readers when new entities are pushed the the queue. Is that right?

I think that's essentially what a Channel does over a Queue, and you can simplify the code if you switch to using a Channel. async_channel seems to already be in our dependency tree and no_std compatible.

If you're able to lock down remote_batch_size at creation time, then I think you can handle both requests and reserved with a single async_channel::bounded(remote_batch_size) by having fn fulfill() keep calling try_send() until it fails because the channel is full. You could even use Receiver<Entity> as the public type and skip making a RemoteEntities wrapper.

(try_send() looks like it just does some Relaxed and Acquire loads on failure, so it should be almost free. is_full() and len() seem to do SeqCst loads, which I believe do a full fence and are expensive, so you wouldn't want to use those.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this right, the manual Future impl and the ConcurrentQueue<Waker> are here to wake up readers when new entities are pushed the the queue. Is that right?

Yup. That's right.

I think that's essentially what a Channel does over a Queue, and you can simplify the code if you switch to using a Channel. async_channel seems to already be in our dependency tree and no_std compatible.

Yeah, I wanted to use a channel, but I could't find one that was no_std. async_channel looks like it will work. I'll try that out in a bit.

If you're able to lock down remote_batch_size at creation time, then I think you can handle both requests and reserved with a single async_channel::bounded(remote_batch_size) by having fn fulfill() keep calling try_send() until it fails because the channel is full.

That is my intention but I'm not sure "If you're able to lock down remote_batch_size at creation time". But if that's a possibility, we're on the same page.

You could even use Receiver<Entity> as the public type and skip making a RemoteEntities wrapper.

Technically. But I see the backend for this changing potentially. So I'd like to keep the wrapper for now.

(try_send() looks like it just does some Relaxed and Acquire loads on failure, so it should be almost free. is_full() and len() seem to do SeqCst loads, which I believe do a full fence and are expensive, so you wouldn't want to use those.)

Yeah. This is a good end game. I didn't do that because reserve_entities needs a definite amount ahead of time. As you point out, we could do alloc for now, but that would prevent it from being flushed. Plus, IDK if we'll always have &mut access to Entities. I mean, probably, but we may need to try_fulfill more often than we flush. Maybe. I'll try some stuff out. We can always make a &Entities version later.


impl RemoteEntitiesInner {
fn fulfill(&self, entities: &Entities, batch_size: NonZero<u32>) {
self.is_waiting.store(0, Ordering::Relaxed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reserves exactly batch_size entities per flush, regardless of how many requests there were. If we make enough async requests, it could take a while for them all to get fulfilled!

Rather than counting waiters, it might make sense to have a request_count, increment it before popping, and do reserve_entities(request_count.swap(0, Relaxed)). If you pre-fill the queue with batch_size then that will refill the original buffer every time.

(My proposal above has exactly the same problem, so maybe we can't use async_channel::bounded.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reserves exactly batch_size entities per flush, regardless of how many requests there were. If we make enough async requests, it could take a while for them all to get fulfilled!

Yeah. It's not ideal. I don't anticipate this being a real issue though (with a good batch_size), but it's bugging me too.

Rather than counting waiters, it might make sense to have a request_count, increment it before popping, and do reserve_entities(request_count.swap(0, Relaxed)). If you pre-fill the queue with batch_size then that will refill the original buffer every time.

That's an excellent idea! Brilliant actually. It would keep the number reserved stable, and no reserve would be waiting for more than a single flush. I'll need to split reserve_entities(request_count.swap(0, Relaxed)) up to help with branch prediction, but I really like this idea.

(My proposal above has exactly the same problem, so maybe we can't use async_channel::bounded.)

Pros and cons. Might be possible to do both. A fast bounded channel for standard operations. But if it runs out, we increment request_count and fulfill on a boundless. That might be slightly over engineered, but it would work. I'll hold off for now, but I'll keep this in mind for a follow up.

fn fulfill(&self, entities: &Entities, batch_size: NonZero<u32>) {
self.is_waiting.store(0, Ordering::Relaxed);
if self.reserved.is_empty() {
for reserved in entities.reserve_entities(batch_size.get()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you have &mut Entities when calling try_fulfill(), you can use alloc() instead of reserve_entities() to avoid all the atomic operations. (You'd have to move the try_fulfill() call to the end of flush(), of course.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alloc prevents them from being flushed. Maybe we just let the asset system make them valid, but I'd kinda like to hand the asset server a blank entity rather than an invalid one. I have an idea here though. I'll get back to you.

Copy link
Contributor Author

@ElliottjPierce ElliottjPierce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @chescock. Brilliant as always!

source: Arc<RemoteEntitiesInner>,
}

impl Future for RemoteReservedEntities {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this right, the manual Future impl and the ConcurrentQueue<Waker> are here to wake up readers when new entities are pushed the the queue. Is that right?

Yup. That's right.

I think that's essentially what a Channel does over a Queue, and you can simplify the code if you switch to using a Channel. async_channel seems to already be in our dependency tree and no_std compatible.

Yeah, I wanted to use a channel, but I could't find one that was no_std. async_channel looks like it will work. I'll try that out in a bit.

If you're able to lock down remote_batch_size at creation time, then I think you can handle both requests and reserved with a single async_channel::bounded(remote_batch_size) by having fn fulfill() keep calling try_send() until it fails because the channel is full.

That is my intention but I'm not sure "If you're able to lock down remote_batch_size at creation time". But if that's a possibility, we're on the same page.

You could even use Receiver<Entity> as the public type and skip making a RemoteEntities wrapper.

Technically. But I see the backend for this changing potentially. So I'd like to keep the wrapper for now.

(try_send() looks like it just does some Relaxed and Acquire loads on failure, so it should be almost free. is_full() and len() seem to do SeqCst loads, which I believe do a full fence and are expensive, so you wouldn't want to use those.)

Yeah. This is a good end game. I didn't do that because reserve_entities needs a definite amount ahead of time. As you point out, we could do alloc for now, but that would prevent it from being flushed. Plus, IDK if we'll always have &mut access to Entities. I mean, probably, but we may need to try_fulfill more often than we flush. Maybe. I'll try some stuff out. We can always make a &Entities version later.


impl RemoteEntitiesInner {
fn fulfill(&self, entities: &Entities, batch_size: NonZero<u32>) {
self.is_waiting.store(0, Ordering::Relaxed);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reserves exactly batch_size entities per flush, regardless of how many requests there were. If we make enough async requests, it could take a while for them all to get fulfilled!

Yeah. It's not ideal. I don't anticipate this being a real issue though (with a good batch_size), but it's bugging me too.

Rather than counting waiters, it might make sense to have a request_count, increment it before popping, and do reserve_entities(request_count.swap(0, Relaxed)). If you pre-fill the queue with batch_size then that will refill the original buffer every time.

That's an excellent idea! Brilliant actually. It would keep the number reserved stable, and no reserve would be waiting for more than a single flush. I'll need to split reserve_entities(request_count.swap(0, Relaxed)) up to help with branch prediction, but I really like this idea.

(My proposal above has exactly the same problem, so maybe we can't use async_channel::bounded.)

Pros and cons. Might be possible to do both. A fast bounded channel for standard operations. But if it runs out, we increment request_count and fulfill on a boundless. That might be slightly over engineered, but it would work. I'll hold off for now, but I'll keep this in mind for a follow up.

fn fulfill(&self, entities: &Entities, batch_size: NonZero<u32>) {
self.is_waiting.store(0, Ordering::Relaxed);
if self.reserved.is_empty() {
for reserved in entities.reserve_entities(batch_size.get()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alloc prevents them from being flushed. Maybe we just let the asset system make them valid, but I'd kinda like to hand the asset server a blank entity rather than an invalid one. I have an idea here though. I'll get back to you.

/// This is the number of entities we keep ready for remote reservations via [`RemoteEntities::reserve_entity`].
/// A value too high can cause excess memory to be used, but a value too low can cause additional waiting.
pub entities_hot_for_remote: u32,
remote: Arc<RemoteEntitiesInner>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could we just make this store a RemoteEntities? That way we don't have to worry about constructing one in get_remote and can just blindly clone. Also means if we change the details of RemoteEntities we don't have to worry about how we construct it each time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do this, and if v6 doesn't end up evolving much, I'm on board with this. However, many of the ways we can improve this is by caching some atomic results in a non-atomic that needs &mut. More specifically, we'd need RemoteEntities to have additional per-instance state in addition to the shared state in Arc<RemoteEntitiesInner>. But Entities isn't a remote instance, so it doesn't make sense to include that per-instance state in Entities.

I used this a lot in v4. If v6 is merged, I'll start slowly moving concepts from v4 into it to try to improve performance. If none of those changes are merged, then I can follow up by simplifying this per your suggestion. It's preemptive future proofing if that makes sense.

keep_hot: u32,
) {
let to_fulfill = entities.remote.recent_requests.swap(0, Ordering::Relaxed);
let current_hot = entities.remote.keep_hot.load(Ordering::Relaxed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why entities.remote.keep_hot needs to be an Atomic. Could we just move it into Entities and then handle it with just regular u32s?

Similarly is keep_hot the correct name for this? Maybe we should call it like in_channel or something? (they may have already been claimed by recent_request but I think it's close enough?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why entities.remote.keep_hot needs to be an Atomic. Could we just move it into Entities and then handle it with just regular u32s?

For now, it doesn't. However, the extra 2 atomic ops are nothing compared to the like 10 atomic ops per entity to push onto the queue. (This is something I think a custom queue implementation can do much better. I did this for v4).

Given that it's not a performance concern, I like to keep the state about remote entities on the type itself rather than somewhere on Entities. We can change this later, but this could still come in handy if we want to do fulfillment with only &Entities in the future (maybe).

Similarly is keep_hot the correct name for this? Maybe we should call it like in_channel or something? (they may have already been claimed by recent_request but I think it's close enough?)

That's a more precise name, so I changed the internals to use that. But the field we expose to users I've kept as "hot" just because the channel detail might change later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, it doesn't. However, the extra 2 atomic ops are nothing compared to the like 10 atomic ops per entity to push onto the queue. (This is something I think a custom queue implementation can do much better. I did this for v4).

Given that it's not a performance concern, I like to keep the state about remote entities on the type itself rather than somewhere on Entities. We can change this later, but this could still come in handy if we want to do fulfillment with only &Entities in the future (maybe).

Another way to approach this is to have separate structs for each side, like Sender and Receiver on channels. That keeps the remote reservation logic together, but still keeps the provider and client data separate. That would also remove the need to close() the channel explicitly, since it automatically closes when the last Sender is dropped.

I think the issue with using an atomic here is that the ordinary load followed by a store looks like a race condition, while if you had &mut then it would be obvious that nothing could change it in between. Performance shouldn't be an issue; relaxed load and store calls like this are no more expensive than non-atomic ones.

Copy link
Contributor

@chescock chescock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I really like how small this wound up being!

I left a few more nits, but nothing blocking (assuming async_channel gets updated, which seems promising).

self.meta.clear();
self.pending.clear();
*self.free_cursor.get_mut() = 0;
self.remote.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will disconnect any outstanding RemoteEntities values so that they start failing to reserve. Is that really what we want to do in this case? It might be less disruptive to simply drain the queue.

Although it probably doesn't matter, since nobody will call clear() on a real application.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer this behavior. What if a clear happens while an asset is loading? Now we have to make sure every s included entity is valid instead of just making sure one is. Right now, the minute it clears, all current remote reservers fail, which I think a more transparent way of error handling.

(But like you said, it doesn't really mater).

keep_hot: u32,
) {
let to_fulfill = entities.remote.recent_requests.swap(0, Ordering::Relaxed);
let current_hot = entities.remote.keep_hot.load(Ordering::Relaxed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, it doesn't. However, the extra 2 atomic ops are nothing compared to the like 10 atomic ops per entity to push onto the queue. (This is something I think a custom queue implementation can do much better. I did this for v4).

Given that it's not a performance concern, I like to keep the state about remote entities on the type itself rather than somewhere on Entities. We can change this later, but this could still come in handy if we want to do fulfillment with only &Entities in the future (maybe).

Another way to approach this is to have separate structs for each side, like Sender and Receiver on channels. That keeps the remote reservation logic together, but still keeps the provider and client data separate. That would also remove the need to close() the channel explicitly, since it automatically closes when the last Sender is dropped.

I think the issue with using an atomic here is that the ordinary load followed by a store looks like a race condition, while if you had &mut then it would be obvious that nothing could change it in between. Performance shouldn't be an issue; relaxed load and store calls like this are no more expensive than non-atomic ones.

let to_fulfill = entities.remote.recent_requests.swap(0, Ordering::Relaxed);
let current_in_channel = entities.remote.in_channel.load(Ordering::Relaxed);
let should_reserve = (to_fulfill + in_channel).saturating_sub(current_in_channel); // should_reserve = to_fulfill + (in_channel - current_in_channel)
let new_in_channel = (current_in_channel + should_reserve).saturating_sub(to_fulfill); // new_in_channel = current_in_channel + (should_reserve - to_fulfill).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the second subtraction can never saturate, and could be an ordinary -. new_in_channel will be more than in_channel if the first subtraction saturates, and equal to in_channel if it doesn't, but it can never be less than in_channel, so it can never be less than zero.

(It's possible that this would be more clear if should_reserve were calculated using checked_sub and an if, especially since we never need to allocate entities if it overflows.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the second subtraction can never saturate, and could be an ordinary -. new_in_channel will be more than in_channel if the first subtraction saturates, and equal to in_channel if it doesn't, but it can never be less than in_channel, so it can never be less than zero.

I agree. I thought this was the case when I made it but I was too lazy to do the proof lol. But I also like the redundancy.

(It's possible that this would be more clear if should_reserve were calculated using checked_sub and an if, especially since we never need to allocate entities if it overflows.)

I agreed with this when I read it, but when I tried it, it felt more confusing.

@ElliottjPierce
Copy link
Contributor Author

Screenshot 2025-03-27 at 1 07 12 PM

I'm not sure why this keeps happening, where GitHub doesn't let me reply.

Another way to approach this is to have separate structs for each side, like Sender and Receiver on channels. That keeps the remote reservation logic together, but still keeps the provider and client data separate. That would also remove the need to close() the channel explicitly, since it automatically closes when the last Sender is dropped.

I can see the appeal here, but I opted not to do this for a few reasons. 1) We might move away from the channel in the near future. 2) We don't necessarily want that closing behavior. and 3) We may want to be able to push to the channel apart from entities. (Ex: I reserved more entities than I need, so I'll push them instead of flushing and freeing.)

I think the issue with using an atomic here is that the ordinary load followed by a store looks like a race condition, while if you had &mut then it would be obvious that nothing could change it in between. Performance shouldn't be an issue; relaxed load and store calls like this are no more expensive than non-atomic ones.

Agreed. Hopefully this is more clear now that fulfillment is in &mut Entities.

let in_channel = self.entities_hot_for_remote;
let mut init_allocated = init_allocated;
let to_fulfill = self.remote.recent_requests.swap(0, Ordering::Relaxed);
let current_in_channel = self.remote.in_channel.load(Ordering::Relaxed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option here is to store the (signed) difference between in_channel and recent_requests. Then you only have one value to update! You can use the existing IdCursor type for signed numbers. This would reduce to something like:

if self.remote.net_in_channel.load(Ordering::Relaxed) < self.entities_hot_for_remote {
// ...
let old_net_in_channel = self.remote.net_in_channel
    .fetch_max(self.entities_hot_for_remote, Ordering::Relaxed);
let to_fulfill = self.entities_hot_for_remote - old_net_in_channel;
if to_fulfill > 0 {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we could, but I I'm not sure what the motivation would be. IIUC, it's the same number of atomic ops and the same amount of data stored. Actually its 8 bytes instead of 4, but still. I'm not opposed to it though, if there's a reason to do this that I'm just missing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I was just trying to simplify the saturating_sub arithmetic by having one counter to modify instead of two.

Copy link
Contributor

@andriyDev andriyDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point I'm fine with the code itself. Whether we take this route is another discussion. IOW, if we decide this is the PR we want to get merged, I think my review is valid, but that doesn't mean this is a "vote" for this PR :P My ideal (and I think yours as well) is fully parallelized entity reservation. But this is a good second.

@ElliottjPierce
Copy link
Contributor Author

Heads up that the async channel issue is fixed now. See here. IDK when the next release will be, but I can fix CI here when it's out. (Unless we do v9, which I tend to prefer.)

@atlv24 atlv24 modified the milestones: 0.17, 0.18 Jul 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ECS Entities, components, systems, and events C-Feature A new feature, making something new possible S-Needs-Review Needs reviewer attention (from anyone!) to move forward X-Controversial There is active debate or serious implications around merging this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reserve entities from async

5 participants