From 7218c5029b9cb873445e64f9391ffcdd2df5ad4e Mon Sep 17 00:00:00 2001 From: Abraham Egnor Date: Thu, 7 Aug 2025 15:03:04 +0100 Subject: [PATCH 1/3] RUST-2148 Migration guide for bson crate 3.0 --- migration-3.0.md | 59 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) create mode 100644 migration-3.0.md diff --git a/migration-3.0.md b/migration-3.0.md new file mode 100644 index 00000000..4a93a2d6 --- /dev/null +++ b/migration-3.0.md @@ -0,0 +1,59 @@ +# Migrating from 2.x to 3.0 +3.0 updates several APIs in backwards-incompatible ways; in most cases these changes should require only minor updates in application code. + +## Unified error hierarchy +In 2.x, many crate submodules had their own `Error` types, with inconsistent conversions between those types. In 3.0, the crate defines a single `bson::error::Error` type, with fields for values common across errors like message or associated key, and a `kind` enum that provides granular root cause information. + +## `&CStr` +The [bson spec](https://bsonspec.org/spec.html) describes a "cstring" type as UTF-8, with the exception that it cannot contain byte `0` (which is otherwise a valid UTF8 byte value). This type is used in bson for the keys of documents and for the pattern and options of regular expressions; all other string values include a length header and allow full UTF8 values, including `0` bytes. + +In 2.x, attempting to use `rawdoc!` or `RawDocumentBuf::append` with a key or regular expression containing a `0` byte will panic. + +3.0 introduces the `&CStr`/`CString` types to represent the "cstring" type as described in the spec; these types are analogous to `&str` and `String` but validate on construction that no `0` byte is contained. The `cstr!` macro will construct a literal `&``static CStr` from a `&``static str` with compile-time validation, and `TryFrom` impls are provided for run-time validation. + +In 2.x: +```rust +let mut computed_key = "foo".to_owned(); +computed_key.push_str("bar"); +let mut doc_buf = rawdoc! { + "hello": "world", + computed_key: 42, + "regex": Regex { + pattern: "needle".to_owned(), + options: "".to_owned(), + }, +}; +doc_buf.append("a key", "a value"); +``` + +In 3.x: +```rust +let mut computed_key = "foo".to_owned(); +computed_key.push_str("bar"); +// Non-static values need to be checked at runtime +let computed_key = CString::try_from(computed_key)?; +let doc_buf = rawdoc! { + // String literal keys are implicitly checked at compile-time. + "hello": "world", + computed_key: 42, + "regex": Regex { + // `&CStr` implements many common traits like `ToOwned` + pattern: cstr!("needle").to_owned(), + options: cstr!("").to_owned(), + } +}; +``` + +## Conversions + +In 2.x, conversions between raw types (`RawDocumentBuf`, `RawBson`, `RawArray`) and their associated reference types and equivalent core types (`Document`, `Bson`, `Array`) were via a mix of standard traits, ad-hoc functions, and in some cases not present at all. In 3.0, all appropriate conversions are available, and all are via standard library traits. + +## `append` and `append_ref` + +In 2.x, `RawDocumentBuf` provided both `append` and `append_ref` for appending owned or borrowed values respectively. In 3.x, `append` can accept both and `append_ref` has been removed. + +## Clarifying encoding vs serialization + +In 2.x, the API documentation and naming frequently conflated _encoding_ (directly converting Rust BSON values into BSON bytes) with _serialization_ (converting arbitrary Rust structs, including BSON values, to arbitrary formats, including BSON bytes, via the `serde` crate), and likewise for decoding vs deserialization. This was a persistent footgun for crate users, who could easily end up using `serde` functionality when encoding or decoding would have been simpler and more efficient. + +In 3.x, use of `serde` is now an optional feature, disabled by default; additionally, the functions for serialization and deserialization now have `serialize_to_` or `deserialize_from_` prefixes to make the distinction obvious at point of use. \ No newline at end of file From 53b20c7534daac5fb4f63962a2c932b7f799b36a Mon Sep 17 00:00:00 2001 From: Abraham Egnor Date: Fri, 8 Aug 2025 10:52:16 +0100 Subject: [PATCH 2/3] moar --- migration-3.0.md | 40 ++++++++++++++++++++++++++++++---------- 1 file changed, 30 insertions(+), 10 deletions(-) diff --git a/migration-3.0.md b/migration-3.0.md index 4a93a2d6..8772ed91 100644 --- a/migration-3.0.md +++ b/migration-3.0.md @@ -5,11 +5,11 @@ In 2.x, many crate submodules had their own `Error` types, with inconsistent conversions between those types. In 3.0, the crate defines a single `bson::error::Error` type, with fields for values common across errors like message or associated key, and a `kind` enum that provides granular root cause information. ## `&CStr` -The [bson spec](https://bsonspec.org/spec.html) describes a "cstring" type as UTF-8, with the exception that it cannot contain byte `0` (which is otherwise a valid UTF8 byte value). This type is used in bson for the keys of documents and for the pattern and options of regular expressions; all other string values include a length header and allow full UTF8 values, including `0` bytes. +The [bson spec](https://bsonspec.org/spec.html) describes a "cstring" type as UTF-8, with the exception that it cannot contain byte `0` (which is otherwise a valid UTF8 byte value). This type is used in BSON for the keys of documents and for the pattern and options of regular expressions. -In 2.x, attempting to use `rawdoc!` or `RawDocumentBuf::append` with a key or regular expression containing a `0` byte will panic. +In 2.x, attempting to use `rawdoc!` or `RawDocumentBuf::append` with a key or regular expression containing a `0` byte would panic. -3.0 introduces the `&CStr`/`CString` types to represent the "cstring" type as described in the spec; these types are analogous to `&str` and `String` but validate on construction that no `0` byte is contained. The `cstr!` macro will construct a literal `&``static CStr` from a `&``static str` with compile-time validation, and `TryFrom` impls are provided for run-time validation. +3.0 introduces the `&CStr`/`CString` types to represent the "cstring" type as described in the spec; these types are analogous to `&str` and `String` but validate on construction that no `0` byte is contained. The `cstr!` macro will construct a literal ``&`static CStr`` from a ``&`static str`` with compile-time validation, and `TryFrom` impls are provided for run-time validation. In 2.x: ```rust @@ -26,7 +26,7 @@ let mut doc_buf = rawdoc! { doc_buf.append("a key", "a value"); ``` -In 3.x: +In 3.0: ```rust let mut computed_key = "foo".to_owned(); computed_key.push_str("bar"); @@ -45,15 +45,35 @@ let doc_buf = rawdoc! { ``` ## Conversions - In 2.x, conversions between raw types (`RawDocumentBuf`, `RawBson`, `RawArray`) and their associated reference types and equivalent core types (`Document`, `Bson`, `Array`) were via a mix of standard traits, ad-hoc functions, and in some cases not present at all. In 3.0, all appropriate conversions are available, and all are via standard library traits. -## `append` and `append_ref` +## Clarifying encoding vs serialization +In 2.x, the API documentation, structure, and naming frequently conflated _encoding_ (directly converting Rust BSON values into BSON bytes) with _serialization_ (converting arbitrary Rust structs, including BSON values, to arbitrary formats, including BSON bytes, via the `serde` crate), and likewise for decoding vs deserialization. This was a persistent footgun for crate users, who could easily end up using `serde` functionality when encoding or decoding would have been simpler and more efficient. + +In 3.0, use of `serde` is now an optional feature, disabled by default; additionally, the functions for serialization and deserialization now have `serialize_to_` or `deserialize_from_` prefixes to make the distinction obvious at point of use. -In 2.x, `RawDocumentBuf` provided both `append` and `append_ref` for appending owned or borrowed values respectively. In 3.x, `append` can accept both and `append_ref` has been removed. +## Documenting supported `serde` formats +The `serde` data model allows a high degree of flexibility in how data types represent themselves, and how data formats will parse and reconstruct that representation. This flexibility comes with the downside that not all values will produce the same values when serialized and deserialized with a given format. Because of that, for 3.0 we have clarified our compatibility policy: -## Clarifying encoding vs serialization +The implementations of `Serialize` and `Deserialize` for BSON value types are tested with the `serde` \[de\]serializers provided by this crate and by the `serde_json` crate. Compatibility with formats provided by other crates is not guaranteed and the data produced by serializing BSON values to other formats may change when this crate is updated. + +## Lossy UTF8 text decoding +BSON text is required to be UTF8 encoded. However, in various real-world circumstances, text strings may be truncated or contain invalid character sequences; in those circumstances, it's sometimes appropriate to use _lossy text decoding_, where invalid sequences are replaced with the Unicode replacement character. + +In 2.x, this was only available for deserialization, not decoding, and had multiple overlapping APIs: +* the `Utf8LossyDeserialization` wrapper type that would cause the BSON binary deserializer to use lossy string decoding for the wrapped type, +* `from_slice_utf8_lossy` / `from_reader_utf8_lossy`, functions to deserialize arbitrary types from BSON with lossy string decoding +* `Document::from_reader_utf8_lossy`, deserializing a `Document` from a byte stream with lossy string decoding + +In 3.0, this API has updated to be simpler and to cover both decoding and deserialization: +* the `Utf8Lossy` wraper type provides the same functionality as `Utf8LossyDeserialization` from 2.x +* a `RawDocument` can be decoded into a `Utf8Lossy` via `TryFrom` +* the `RawElement::value_utf8_lossy` allows low-level element-by-element lossy text decoding -In 2.x, the API documentation and naming frequently conflated _encoding_ (directly converting Rust BSON values into BSON bytes) with _serialization_ (converting arbitrary Rust structs, including BSON values, to arbitrary formats, including BSON bytes, via the `serde` crate), and likewise for decoding vs deserialization. This was a persistent footgun for crate users, who could easily end up using `serde` functionality when encoding or decoding would have been simpler and more efficient. +## Serde helpers +The BSON crate provides a number of helper functions to allow \[de\]serializing common types like `ObjectId` or `DateTime` in other useful formats. For 3.0, these have been updated to work with the `serde_as` annotation provided by the `serde_with` crate; this allows substantially more flexibility and composition of the annotated field types. -In 3.x, use of `serde` is now an optional feature, disabled by default; additionally, the functions for serialization and deserialization now have `serialize_to_` or `deserialize_from_` prefixes to make the distinction obvious at point of use. \ No newline at end of file +## Smaller changes +Finally, we made a few small changes for API consistency: +* `append` and `append_ref` have been merged; `append` now accepts both owned and borrowed values, +* `RawElement::len` has been renamed to `RawElement::size` to better reflect its purpose. From 819b98a89c5d67ed40d8eec3621d3fba144bbe41 Mon Sep 17 00:00:00 2001 From: Abraham Egnor Date: Fri, 8 Aug 2025 11:00:53 +0100 Subject: [PATCH 3/3] self-review updates --- migration-3.0.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/migration-3.0.md b/migration-3.0.md index 8772ed91..e39062c7 100644 --- a/migration-3.0.md +++ b/migration-3.0.md @@ -33,7 +33,7 @@ computed_key.push_str("bar"); // Non-static values need to be checked at runtime let computed_key = CString::try_from(computed_key)?; let doc_buf = rawdoc! { - // String literal keys are implicitly checked at compile-time. + // String literal keys in `rawdoc!` are implicitly checked at compile-time. "hello": "world", computed_key: 42, "regex": Regex { @@ -42,6 +42,8 @@ let doc_buf = rawdoc! { options: cstr!("").to_owned(), } }; +// `append` keys must now be a `&CStr` or `CString`. +doc_buf.append(cstr!("a key"), "a value"); ``` ## Conversions @@ -53,9 +55,9 @@ In 2.x, the API documentation, structure, and naming frequently conflated _encod In 3.0, use of `serde` is now an optional feature, disabled by default; additionally, the functions for serialization and deserialization now have `serialize_to_` or `deserialize_from_` prefixes to make the distinction obvious at point of use. ## Documenting supported `serde` formats -The `serde` data model allows a high degree of flexibility in how data types represent themselves, and how data formats will parse and reconstruct that representation. This flexibility comes with the downside that not all values will produce the same values when serialized and deserialized with a given format. Because of that, for 3.0 we have clarified our compatibility policy: +The `serde` data model allows a high degree of flexibility in how data types represent themselves, and how data formats will parse and reconstruct that representation. This flexibility comes with the downside that not all data types will produce the same values when serialized and deserialized with a given format. Because of that, for 3.0 we have clarified our compatibility policy: -The implementations of `Serialize` and `Deserialize` for BSON value types are tested with the `serde` \[de\]serializers provided by this crate and by the `serde_json` crate. Compatibility with formats provided by other crates is not guaranteed and the data produced by serializing BSON values to other formats may change when this crate is updated. +> The implementations of `Serialize` and `Deserialize` for BSON value types are tested with the `serde` \[de\]serializers provided by this crate and by the `serde_json` crate. Compatibility with formats provided by other crates is not guaranteed and the data produced by serializing BSON values to other formats may change when this crate is updated. ## Lossy UTF8 text decoding BSON text is required to be UTF8 encoded. However, in various real-world circumstances, text strings may be truncated or contain invalid character sequences; in those circumstances, it's sometimes appropriate to use _lossy text decoding_, where invalid sequences are replaced with the Unicode replacement character.