-
-
Notifications
You must be signed in to change notification settings - Fork 115
Description
The canonicalization spec has an issue. It requires encoding pointers in pre-order, but doesn't require the same for data fields in an effort to avoid requiring a schema and reflection.
Because segment arenas are append-only allocators in the implementation, "cruft" data may be present in the slice from temporary operations but otherwise unattached to the root struct (i.e. orphaned). Obviously, this library does not implement orphans or adoption. Due to this line of code, the segment is opaquely copied along with the "cruft" data, which should not be part of a canonical security message. Furthermore, the data elements may not be in pre-order layout.
For a canonical security message to be truly useful, all fields should be in a defined order, including non-pointer data fields. Two individuals should be able to produce the exact same semantic message independently, using different libraries, languages, and implementations, and have their canonical form (and ensuing cryptographic hash) match. Verifying a raw-payload (when the spec has already chosen to rearrange the pointers) is pointless when the same isn't applied to the data fields. Furthermore, it's more powerful to cryptographically verify a network object versus a raw message, which is what Cap'n'Proto is all about. A solid example is serializing and signing RPC persistent capabilities encoded as a struct.
Proposed changes:
- The canonical encoding spec should be amended to require (perhaps as an alternative) that data fields are encoded in pre-order according to the schema; obviously this now adds the requirement of a schema and reflection since unlike pointers they don't have a pre-existing pointer table. @kentonv
- The
func Canonicalize(s Struct) ([]byte, error)
method should be improved tofunc Canonicalize(s Struct) (Struct, error)
. It's a more useful API if users don't have to re-read a message from[]byte
; e.g. sending a struct via RPC interface in canonical form, alongside a cryptographic signature that verifies it. It could also add an optionaltypeId ...uint64
parameter to enable reflection (the alternative encoding mode suggested previously). capnp.Struct
should add an exportedCanonical() bool
accessor, indicating that the struct uses a single segment in the expected form. This makes it useful to embed in larger multi-segment messages (e.g. an RPC call).- It would thus be useful to also add a
capnp.Struct.Canonicalize()
method. - The indicated code should be changed to use reflection instead of
copy
. Copying the fields over one-by-one in pre-order also leaves any orphaned data in the source segment behind. The extra overhead from not usingcopy
is insignificant in the context of performing cryptogrpahic operations anyways.
As a side-note, I've noticed an undesirable bug(?) in the RPC implementation; messages with "cruft" data in their segment are sent on the network despite the data's orphaned status. This then requires great care on the library user's part to never 'build' messages directly in the target segment if they ever have temporary operations, and are then thus likely to mostly implement their own 'canonical form' by having their code perform operations in a very specific, idiomatic style to avoid this. I personally ran into this writing unit test tables, where it look a hundred lines of code to build a message, and then I used capnp.Struct.CopyFrom()
for subsequent message test cases where I altered a single field; the RPC data sent included (potentially large payloads) of the previous data copied and orphaned from the original message. The implication of an improvement is perhaps capnp.Struct.CopyFrom
should essentially do it in canonical fashion as proposed above as an option. We could add a canonical ...bool
field to the method to implement this without breaking backwards compatibility (which this library in its 'alpha' state does reserve the right to do).
I'm essentially suggesting that canonical form should be a merkle tree; i.e. blockchain. I've implemented a blockchain using Cap'n'Proto as the mmap
ledger, and had to implement my own Canonicalize
using reflection. ;)