Skip to content

Canonical encoding spec issue #581

@araxus

Description

@araxus

The canonicalization spec has an issue. It requires encoding pointers in pre-order, but doesn't require the same for data fields in an effort to avoid requiring a schema and reflection.

Because segment arenas are append-only allocators in the implementation, "cruft" data may be present in the slice from temporary operations but otherwise unattached to the root struct (i.e. orphaned). Obviously, this library does not implement orphans or adoption. Due to this line of code, the segment is opaquely copied along with the "cruft" data, which should not be part of a canonical security message. Furthermore, the data elements may not be in pre-order layout.

For a canonical security message to be truly useful, all fields should be in a defined order, including non-pointer data fields. Two individuals should be able to produce the exact same semantic message independently, using different libraries, languages, and implementations, and have their canonical form (and ensuing cryptographic hash) match. Verifying a raw-payload (when the spec has already chosen to rearrange the pointers) is pointless when the same isn't applied to the data fields. Furthermore, it's more powerful to cryptographically verify a network object versus a raw message, which is what Cap'n'Proto is all about. A solid example is serializing and signing RPC persistent capabilities encoded as a struct.

Proposed changes:

  1. The canonical encoding spec should be amended to require (perhaps as an alternative) that data fields are encoded in pre-order according to the schema; obviously this now adds the requirement of a schema and reflection since unlike pointers they don't have a pre-existing pointer table. @kentonv
  2. The func Canonicalize(s Struct) ([]byte, error) method should be improved to func Canonicalize(s Struct) (Struct, error). It's a more useful API if users don't have to re-read a message from []byte; e.g. sending a struct via RPC interface in canonical form, alongside a cryptographic signature that verifies it. It could also add an optional typeId ...uint64 parameter to enable reflection (the alternative encoding mode suggested previously).
  3. capnp.Struct should add an exported Canonical() bool accessor, indicating that the struct uses a single segment in the expected form. This makes it useful to embed in larger multi-segment messages (e.g. an RPC call).
  4. It would thus be useful to also add a capnp.Struct.Canonicalize() method.
  5. The indicated code should be changed to use reflection instead of copy. Copying the fields over one-by-one in pre-order also leaves any orphaned data in the source segment behind. The extra overhead from not using copy is insignificant in the context of performing cryptogrpahic operations anyways.

As a side-note, I've noticed an undesirable bug(?) in the RPC implementation; messages with "cruft" data in their segment are sent on the network despite the data's orphaned status. This then requires great care on the library user's part to never 'build' messages directly in the target segment if they ever have temporary operations, and are then thus likely to mostly implement their own 'canonical form' by having their code perform operations in a very specific, idiomatic style to avoid this. I personally ran into this writing unit test tables, where it look a hundred lines of code to build a message, and then I used capnp.Struct.CopyFrom() for subsequent message test cases where I altered a single field; the RPC data sent included (potentially large payloads) of the previous data copied and orphaned from the original message. The implication of an improvement is perhaps capnp.Struct.CopyFrom should essentially do it in canonical fashion as proposed above as an option. We could add a canonical ...bool field to the method to implement this without breaking backwards compatibility (which this library in its 'alpha' state does reserve the right to do).

I'm essentially suggesting that canonical form should be a merkle tree; i.e. blockchain. I've implemented a blockchain using Cap'n'Proto as the mmap ledger, and had to implement my own Canonicalize using reflection. ;)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions