-
Notifications
You must be signed in to change notification settings - Fork 293
Gossipsub: Partial Message Extension #685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
60fbe08
to
767eac2
Compare
As mentioned in our call and prior discussions, this is what we've described in the past as bitmap-based IHAVE/IWANT. We have already discussed this a few times as the way forward, but haven't arrived to spec it. Hence, I have also started preparing a draft spec on the same a few weeks ago. You can find my draft here: It is a bit unfinished, hence I didn't push it here on the repo yet :-) In general I think it is quite aligned with what you propose here. Let's see how we can merge our views into one generic spec. As I say in my draft above, a structured message ID is used to identify a segment of a larger message. In another view, we can see these as messages in a larger batch of messages. One can easily see that the two are almost exactly the same. |
Partial message validation is a crucial aspect to this, in fact we can do fast forwarding in FullDAS because we can validate every single message segment individually. I would dedicate a section to this in the spec (trying to write it up). |
@MarcoPolo proposed improvements to the partial message spec here In my version I had both partialIHAVE and partialIWANT. I think I see your point in focusing on the IWANT part, as it might allow us to delegate all the logic to the upper layer, similar to how you can do partial requests in other protocols where you need pieces only (e.g. HTTP Accept-Ranges for resume, and similar use cases). We still need the other node to send the IHAVE (the normal, full IHAVE, and we still would need the logic in our node not to fetch the whole message, so either a hook before IWANT or some call to say "for this I already have something, please send a PartialIWANT with this extra metadata". If instead we also define the Partial IHAVE, we can do more. In the FullDAS implementation I'm doing these partial IHAVEs to some extent already, because I'm segmenting the column to cell-level messages. We can also give a bitmap encoding to these, as it was proposed there. Regarding keeping the metadata opaque, I think it is an interesting idea, but it seems to me it requires a more complex API. At the end of the day we do need a protocol layer that is aware of the underlying segment structure. We can try to have this layering of the structure aware and structure unaware part, but I feel this will lead to inefficiencies at some point. |
That sounds great to me!
Agreed. The other use case is faster verification of complete messages. Imagine the fusaka das use case where we are only missing a single cell, we don't need to verify the full column again, just the missing cell. On fast forwarding, there's a couple different strategies we could do here, but one that stands out to me follows this simple rule:
The underlying idea is that, as the mempool works today, if you have a cell locally odds are high your peer has it as well (again from your very helpful post: https://ethresear.ch/t/is-data-available-in-the-el-mempool/22329). As we change the mempool, we'll change this strategy as well.
This is hard because we don't have a way to link the GroupID to the full message ID present in the IHAVE. One approach is to go all in with partialIHAVEs and use that tell a peer that we have all of the parts of this group, and never send an IHAVE if a peer supports partial message extensions.
I think keeping the metadata opaque and completely application defined is the correct approach. It lets the application define the optimal encoding of the missing. For the das use case bitmaps are very good, but I could also imagine a use case where ranges are desired, or maybe even something like a rateless IBLT.
What inefficiencies do you have in mind here? |
Could this end up increasing the bandwidth because the full message is already in flight by the time you send all the partial control messages? Similar situation to IDONTWANT slightly increasing the bandwidth instead of significantly reducing it. |
If implemented poorly, yes. But consider this implementation: Assume your peer supports the partial message extension. Assume we are talking about the data column sidecar case.
This should be strictly better because:
We could even be lazier, and rely on the peer requesting data rather than lazy pushing data (save 1/2 RTT). We can do this because the peer can request the cell without knowing the hash of the cell. Contrast this with a "normal" gossipsub message where you can only request a message once you know the hash of the message. |
Signed-off-by: Csaba Kiraly <[email protected]>
Signed-off-by: Csaba Kiraly <[email protected]>
62b3868
to
7e9576b
Compare
|
||
### PartialIWANT | ||
|
||
Partial IWants signal to a receiver that the sending peer only wants a part of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial IWants signal to a receiver that the sending peer only wants a part of | |
`PartialIWANT`s signal to a receiver that the sending peer only wants a part of |
pubsub/gossipsub/partial-messages.md
Outdated
The main motivation for this extension is optimizing Ethereum's Data | ||
Availability Sampling (DAS) protocol. In Ethereum's upcoming fork, Fusaka, | ||
custodied data is laid out in a matrix, where the rows represent user data | ||
(called blobs), and the columns represent a slice across all blobs (each blob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(called blobs), and the columns represent a slice across all blobs (each blob | |
(called blobs), and the columns represent a slice across all blobs included in the block (each blob |
pubsub/gossipsub/partial-messages.md
Outdated
message PartialMessage { | ||
optional bytes topicID = 1; | ||
optional bytes data = 2; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the partial message benefit some contextual metadata to indicate which parts are actually being delivered? Or do we expect the data to encode this (e.g. with null elements inside a list, if that's the shape)? Or does the PartialMessage
refer to a single part, and we expect the peer to stream N instances, one per hit?
Also: this schema only allows for one inflight partial request per peer per topic. I guess that's not a problem and we (Ethereum) are fine with that limitation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the partial message benefit some contextual metadata
I've gone back and forth. In this version I settled on having this be encoded as part of the data. I'll try implementing it and see if I feel differently after some implementation experience.
pubsub/gossipsub/partial-messages.md
Outdated
application defined. Here are a list of operations an application is expected to | ||
provide to Gossipsub to enable partial message delivery. | ||
|
||
1. Split a full message into a group of partial messages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not have a full message, but still benefit from splitting what we have into pieces that can be trasmitted as parts right? (e.g. the common case for PeerDAS columns)
Also, is the term "partial message" being used ambiguously in this spec? Here it refers to parts, but below it might refer to a collection of parts in response to a PARTIALIWANT?
usage of "message parts" and "partial message".
common to send two partialX messages for the same group, this lets us avoid copying the topicID/groupID multiple times
All the info is on the spec.