-
Notifications
You must be signed in to change notification settings - Fork 1
refactor: rewrite record account layout and API #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
now we make use of tag based (de)serialization which allows to evolve the schema of account with various new field types while preserving backwards compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this is a cool effort, I see a case for tag based de/serialization in case of registry, but the more I read implementation the more it reminds me of protobuf
. That resemblance also highlights some missing parts from the implementation.
The important part that is missing in the implementation can be seen in this quote from protobuf encoding spec
When a message is encoded, each key-value pair is turned into a record consisting of the field number, a wire type and a payload. The wire type tells the parser how big the payload after it is. This allows old parsers to skip over new fields they don’t understand. This type of scheme is sometimes called Tag-Length-Value, or TLV.
Assume following example. For sake of simplicity I will use proto DSL.
message Origin {
string country = 1;
string city = 2;
}
message Person {
int32 age = 1;
Origin origin = 2;
string name = 3
}
Now we update our registry contract and Origin
now is:
message Origin {
string country = 1;
string city = 2;
string hood = 3;
}
The issue here that an older client once encountering unknown for him tag Origin::hood
can't advance on a proper number of bytes. since it doesn't know the size/len
of the hood
. And if it's not done then Persion::name
will contain hood
value since their tags and types are the same.
Above is assuming that we skip unknown tags, which we should otherwise there's no backward compatibility anyway.
If we really want to go with tag based serialization, taking in account above our protocol would end up pretty much identical to protobuf
, hence I would just suggest to use it since its well documented & battle tested solution.
We don't even need to introduce .proto
files since we could just use post syntax to describe our types
#[derive(Clone, PartialEq, prost::Message)]
pub struct ValidatorInfo {
#[prost(message,, tag = "1")]
pub identity: PublicKey,
#[prost(message, tag = "2")]
pub addr: Url,
#[prost(u16, tag = "3")]
pub block_time_ms: u16,
// bla bla .....
}
slice[..2].copy_from_slice(&(self.0.len() as u16).to_le_bytes()); | ||
slice[2..2 + self.0.len()].copy_from_slice(self.0.as_ref()); | ||
} | ||
fn deserialize(slice: &[u8]) -> Option<Self> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to use Result
. Option
doesn't carry enough information at what point deserialization broke apart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, there's basically one point where deserialization might go wrong, length mismatch and that's it, creating an error just for that is redundant. I've added extra logging for debug purposes anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be various errors associated with data format. Say wrong encodings, or SockedAddrV4 that was sent as string and not as its memory representation, DateTime and various others
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's change the signature once we have those types, for now it doesn't affect anything imho.
fn size(&self) -> usize { | ||
std::mem::size_of::<Self>() | ||
} | ||
fn serialize(&self, slice: &mut [u8]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest for serialize
to return Result
as well, since data format may have restriction on a particular data type. Say, as was discussed there should be restriction on addr
field size to 150.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with tag based approach and solana's built-in incentive to keep accounts small, we can allow for any length of fields, if you are willing to pay for huge accounts, so be it, we won't stop you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally there could be various issue with encoding during the serialization, for example, a filesystem Path may refuse to serialize itself if it contains invalid UTF-8 data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we are working with raw bytes anyway, you can send us invalid url for example, and we happily will store it for you, because when clients will try to get that url from bytes, something like from_utf8
will fail and they won't use your endpoint, so it's your loss.
All in all you have a point, but currently let's not introduce extra complexity for future hypothetical issues, which we might never encounter.
src/state/record.rs
Outdated
|
||
macro_rules! extract { | ||
($field: ident) => { | ||
builder.$field.ok_or(ProgramError::InvalidAccountData)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't allow for compatibility with an older client. Will return error if field is missing, while should choose a default value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is, that new fields are always serialized after old ones, i.e. if we add a new field it will be laid out at the end of byte string, so older clients can stop deserializing once they hit an unknown field, as everything afterwards is new stuff, hence the break statement in builder loop https://github.com/magicblock-labs/magic-domain-program/pull/6/files#diff-4ee356d91730bc0561e18e949120c3eb25918ed5b7bf1bc89e2f7ed4c0f13503R119
Basically, by the time you encounter an unserializable field you already have all the fields you know of and can proceed with your older logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could implement it this way, but notice that we call RecordBuilder::populate(buffer, true)
everywhere, hence we throw an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, by the time you encounter an unserializable field you already have all the fields you know of and can proceed with your older logic.
Here I would refer to this part of my review
The issue here that an older client once encountering unknown for him tag Origin::hood can't advance on a proper number of bytes. since it doesn't know the size/len of the hood. And if it's not done then Persion::name will contain hood value since their tags and types are the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that's a good point, but I kinda doubt that will be using composite types here
tags::ADDR_TAG => { | ||
builder.addr = deserializer.read_field(); | ||
} | ||
_ if error => return Err(ProgramError::InvalidAccountData), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since deserialize functions use error = true
, on an older client once it will see new unknown(for it) tag it will return error, while it should just ignore it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Older clients can also not use error = true, it's up to them whether they want to error or not, I'll add clarifying doc comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the issue here is that they would have to copy-paste an implementation to change the flag, since this is hardcoded for now on contract side. Also this would be quite painful to version and maintain.
builder.addr = deserializer.read_field(); | ||
} | ||
_ if error => return Err(ProgramError::InvalidAccountData), | ||
_ => break, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_ => break, | |
_ => continue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In tag deserialization such fields usually ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to ignore them we have to, as you pointed out use tag-length encoding, which adds extra complexity and space requirement. Instead we layout the fields in strict chronological order of their introduction, thus once you encounter unknown tag, you are done, there's nothing else beyond that you know of, and most likely (considering you are using our SDK) you already have all the fields that your code is aware of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this doesn't allow for complex types at all. Assume:
struct ValidatorInfo {
// ... bla bla some feels
pub region: Region
pub block_time_ms: u32
}
struct Regoin {
pub continent: Continent
}
enum Continent {
...
}
but the we decide do match validators based on country
struct Regoin {
pub continent: Continent
pub country: String
}
The above will break everything
I'd like to avoid reliance on proto integration at all costs. First, we only have one type which we serialize, unless we have 10 more, the usage of schema based serde is unjustified. Second we are talking about smart contract, and we should keep them as nimble as possible, avoiding any extra code bloat and runtime cost increase. |
That's why I proposed that we could just use prost, we wouldn't use protoc at all since we don't need compilation to anything else other then rust
But the idea of the PR implies that our types and
Not sure what you mean here. The proposed approach uses the same tags as protobuf does.
both |
Anyway I think here we could try to play with |
Interesting related post. IMHO we should stick with serializers commonly used in Solana programs, either borsh or a more efficient one that we use already in the delegation program. We should try to use one or two of those options across all of our programs so developers onboarding don't have to learn a different serializer each time. Afaik prost is just a library to generate rust code for protobuf types and if we don't want to use protobuf we shouldn't use prost either. I have never seen protobuf used inside a Solana program even though the benchmark linked above includes a flatbuffer case but also for that I have never seen a real world use in a program. |
Also to consider types growing or types changing we can nest them in enums for instance have a V1 with specific fields, a V2 with added fields a V3 and so on and then the discriminator of that enum will automatically figure out how to deserialize a specific account. This is how it is commonly done in Solana I've seen this for instance at metaplex. The nice thing about this is that deserialization will automatically figure out which version to use and in the rest of the code you just match on the enum to get the correct fields out of whichever version of the account you are using. Essentially the enum discriminator functions as a tag in that case. |
it's not backwards compatible, if you introduce v3 for example, while you client code still stuck with v2, then there's no way they can deserialize new version of your enum. |
The idea is that you update the client as well whenever a new version of account is added or a new version of the program is released in general. That is very common and how it is normally done and usually not a problem. It is a much simpler and robust solution than the tagging approach with serde in my opinion. |
But that implies that you'll be forcing the client to upgrade, and if they don't (because they were not following the newsletter for example) then their code will suddenly stop working, which is not ideal. |
now we make use of tag based (de)serialization which allows us to evolve the schema of account with various new field types, while preserving backwards compatibility