Reduce Allocations in Envelope Serialization #280

JanHyka · 2025-08-16T15:59:40Z

Issue #, if available:
#273 Reduce Allocations in Envelope Serialization

Description of changes:

new version of IEnvelopeSerializer was introduced: EnvelopeSerializerUtf8JsonWriter
serializer method was rewritten using optimized approach based on Utf8JsonWriter
new version of IMesssageSerializer was introduced: MessageSerializerUtf8JsonWriter
necessary interface changes were added via IMessageSerializerUtf8JsonWriter interface
MessageBusBuilder was extended by EnableExperimentalFeatures() method - when used, new Envelope/Message serializers are used
SerializationOptions was extended by bool CleanRentedBuffers (by default true). At cost of security risk keeping uncleared buffers in memory makes the result a bit faster (suggestion here is to drop the option and clean always)
Unit tests were added to cover the changes/newly added code
AWS.Messaging.Benchmarks.Serialization project was added for convenience (can be kept or dropped based on preferences)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Benchmark results:


BenchmarkDotNet v0.13.10, Windows 11 (10.0.26100.4652)
Unknown processor
.NET SDK 10.0.100-preview.2.25164.34
  [Host]     : .NET 8.0.17 (8.0.1725.26602), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.17 (8.0.1725.26602), X64 RyuJIT AVX2

Method	ItemCount	Mean	Error	StdDev	Median	Ratio	RatioSD	Gen0	Gen1	Gen2	Allocated	Alloc Ratio
StandardSerializer	1	1,047.7 ns	9.08 ns	8.49 ns	1,047.1 ns	1.00	0.00	0.1240	-	-	2368 B	1.00
StandardSerializerWithJsonContext	1	941.7 ns	2.85 ns	2.67 ns	942.0 ns	0.90	0.01	0.1087	-	-	2056 B	0.87
JsonWriterSerializer	1	458.4 ns	2.27 ns	2.12 ns	458.6 ns	0.44	0.00	0.0548	-	-	1040 B	0.44
JsonWriterSerializerWithJsonContext	1	357.0 ns	1.71 ns	1.60 ns	357.4 ns	0.34	0.00	0.0386	-	-	728 B	0.31
JsonWriterSerializerWithJsonContextUnsafe	1	276.4 ns	0.64 ns	0.57 ns	276.3 ns	0.26	0.00	0.0386	-	-	728 B	0.31

StandardSerializer	10	3,162.9 ns	19.17 ns	17.93 ns	3,168.6 ns	1.00	0.00	0.2861	-	-	5432 B	1.00
StandardSerializerWithJsonContext	10	2,617.0 ns	6.00 ns	5.61 ns	2,616.1 ns	0.83	0.01	0.2708	-	-	5120 B	0.94
JsonWriterSerializer	10	1,112.1 ns	5.10 ns	4.26 ns	1,112.6 ns	0.35	0.00	0.1011	-	-	1920 B	0.35
JsonWriterSerializerWithJsonContext	10	696.2 ns	2.06 ns	1.92 ns	695.8 ns	0.22	0.00	0.0849	-	-	1608 B	0.30
JsonWriterSerializerWithJsonContextUnsafe	10	598.4 ns	4.37 ns	4.09 ns	599.8 ns	0.19	0.00	0.0849	-	-	1608 B	0.30

StandardSerializer	100	27,274.1 ns	491.14 ns	820.58 ns	27,431.5 ns	1.00	0.00	1.9531	0.1526	-	37032 B	1.00
StandardSerializerWithJsonContext	100	19,778.3 ns	389.58 ns	702.49 ns	19,639.0 ns	0.73	0.04	1.9226	0.1526	-	36720 B	0.99
JsonWriterSerializer	100	7,880.9 ns	157.43 ns	303.32 ns	7,744.5 ns	0.29	0.01	0.5798	-	-	11104 B	0.30
JsonWriterSerializerWithJsonContext	100	4,023.4 ns	80.33 ns	222.61 ns	3,955.4 ns	0.15	0.01	0.5722	-	-	10792 B	0.29
JsonWriterSerializerWithJsonContextUnsafe	100	3,617.6 ns	35.00 ns	32.74 ns	3,600.0 ns	0.13	0.01	0.5722	-	-	10792 B	0.29

StandardSerializer	1000	274,182.0 ns	752.42 ns	667.00 ns	274,142.0 ns	1.00	0.00	96.6797	96.6797	96.6797	361993 B	1.00
StandardSerializerWithJsonContext	1000	238,880.0 ns	441.12 ns	391.04 ns	238,938.5 ns	0.87	0.00	96.6797	96.6797	96.6797	361681 B	1.00
JsonWriterSerializer	1000	101,512.4 ns	260.47 ns	243.65 ns	101,461.8 ns	0.37	0.00	33.3252	33.3252	33.3252	106538 B	0.29
JsonWriterSerializerWithJsonContext	1000	69,257.1 ns	267.03 ns	222.98 ns	69,241.4 ns	0.25	0.00	33.3252	33.3252	33.3252	106226 B	0.29
JsonWriterSerializerWithJsonContextUnsafe	1000	69,293.8 ns	320.33 ns	299.64 ns	69,175.4 ns	0.25	0.00	33.3252	33.3252	33.3252	106226 B	0.29

Copilot

Pull Request Overview

This PR introduces optimized envelope and message serializers using Utf8JsonWriter to reduce memory allocations and improve performance for high-throughput scenarios.

Added experimental UTF-8 JSON writer-based serializers that significantly reduce memory allocations
Introduced EnableExperimentalFeatures() configuration method to opt into the optimized serializers
Added benchmarking infrastructure and configuration option for controlling buffer cleanup behavior

Reviewed Changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
EnvelopeSerializerUtf8JsonWriter.cs	New high-performance envelope serializer using Utf8JsonWriter and pooled memory
MessageSerializerUtf8JsonWriter.cs	New optimized message serializer with direct buffer writing capabilities
RentArrayBufferWriter.cs	Pooled memory buffer writer implementation for allocation reduction
MessageBusBuilder.cs	Added EnableExperimentalFeatures() method to configure optimized serializers
SerializerOptions.cs	Added CleanRentedBuffers option for security vs performance trade-off
IMessageSerializerUtf8JsonWriter.cs	Interface for buffer-based serialization methods
Test files	Comprehensive unit tests covering new serializer functionality
Benchmark files	Performance testing infrastructure showing significant improvements

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/AWS.Messaging/Serialization/MessageSerializerUtf8JsonWriter.cs

Copilot · 2025-08-20T07:16:13Z

src/AWS.Messaging/Serialization/Helpers/RentArrayBufferWriter.cs

+        {
+            _rentedBuffer.AsSpan(0, _written).Clear();
+
+        }


The conditional clearing of buffers based on _cleanRentedBuffers creates a potential security vulnerability. Sensitive data could remain in pooled memory when this flag is false. Consider always clearing buffers containing sensitive data regardless of the performance setting, or provide clear documentation about the security implications.

Suggested change

}

_rentedBuffer.AsSpan(0, _written).Clear();

Depends on owners (@normj) decision and is already mentioned in the PR itself.

Copilot · 2025-08-20T07:16:13Z

src/AWS.Messaging/Configuration/SerializerOptions.cs

+    /// When set to true, it will clean the rented buffers after each use.
+    /// </summary>
+    /// <remarks>
+    /// Setting this to false can improve performance in high-throughput scenarios at cost of potential security issues


The documentation should be more explicit about the security risks. Consider adding details about what type of sensitive data could be exposed and under what conditions, to help users make informed decisions about this trade-off.

Suggested change

/// Setting this to false can improve performance in high-throughput scenarios at cost of potential security issues

/// Setting this to false can improve performance in high-throughput scenarios, but introduces security risks.

/// If buffers are not cleaned after use, sensitive data such as user credentials, personal information, authentication tokens,

/// or cryptographic keys may remain in memory. This residual data could potentially be accessed by other code or processes

/// that reuse the same buffers, leading to unintended data exposure. Consider the sensitivity of the data being processed

/// and the threat model of your application before disabling buffer cleaning.

test/AWS.Messaging.UnitTests/SerializationTests/MessageSerializerTestsUtf8JsonWriterTests.cs

~ altered the related tests to use RentArrayBufferWriter instead of ArrayBufferWriter ~ rerun benchmark ~ moved serializerWriterOptions outside of the SerializeAsync method

GarrettBeatty · 2025-08-20T14:10:13Z

thanks for the PR! I can try and take a deeper look today at it!

src/AWS.Messaging/Configuration/MessageBusBuilder.cs

GarrettBeatty · 2025-08-20T16:28:46Z

src/AWS.Messaging/Serialization/EnvelopeSerializerUtf8JsonWriter.cs

+            writer.WriteStartObject();
+
+            writer.WriteString(s_idProp, envelope.Id);
+            writer.WriteString(s_sourceProp, envelope.Source?.OriginalString);


i think in the regular enevelopserializer we do ToString() . is there any difference between that vs original string? was seeing this when searching online

ToString(): This method returns a canonicalized and unescaped representation of the URI. This means it might perform actions like converting the scheme and host to lowercase, removing default port numbers, and unescaping percent-encoded characters (like %20 becoming a space). The goal is to provide a "human-readable" or standardized form of the URI. OriginalString: This property returns the exact string that was used to initialize the Uri object, without any modifications, canonicalization, or unescaping. If the original string contained leading/trailing spaces, non-standard casing, or percent-encoded characters, OriginalString will preserve them precisely as they were provided.

can we verify?

From the path I was able to track we define Source in CreateEnvelopeAsync Source = MessageSource which comes from MessageSourceHandler.ComputeMessageSource()...
inside the GetFullSourceUri() is called - it trims both source and suffix parts and normalizes slash boundaries in relative uri. In the end the Uri is formed as relative with definition string $"{source}{suffix}". That should result with already sanitized OriginalString, unless there is some code path I have missed?

Also, another point of view can be when checking to Uri implementation:

public bool IsAbsoluteUri { get { return _syntax != null }}

so, when Uri is not absolute, which it is not in our case,

public override string ToString() { if (_syntax == null) { return _string; } ...

and

// // Gets the exact string passed by a user. // The original string will switched from _string to _originalUnicodeString if // iri is turned on and we have non-ascii chars // public string OriginalString => _originalUnicodeString ?? _string;

will be returning exactly same thing, we can easily keep ToString() if you prefer as in our use case, both will be and behave exactly same.

GarrettBeatty · 2025-08-20T16:30:33Z

src/AWS.Messaging/Serialization/EnvelopeSerializerUtf8JsonWriter.cs

+                writer.WritePropertyName(s_dataProp);
+                if (IsJsonContentType(response.ContentType))
+                {
+                    writer.WriteRawValue(response.Data, skipInputValidation: true);


why is it safe to skip validation?

my reasoning is following:

we have full control over the envelope json structure and property names (they are pre-encoded anyways)

Utf8JsonWriter handles escaping for all string writing calls internally

metadata get emitted via JsonElement.WriteTo which guarantees valid JSON

for data either we have Utf8Serializer that writes directly and safely or fallback to non-utf ones. Out of those, the ones returning non-JSON go via string writing route and for other we rely on message serializer returning valid json for its advertised content type

GarrettBeatty · 2025-08-20T16:38:29Z

test/AWS.Messaging.UnitTests/SerializationTests/MessageSerializerTestsUtf8JsonWriterTests.cs

+
+namespace AWS.Messaging.UnitTests.SerializationTests;
+
+public class MessageSerializerUtf8JsonWriterTests


question: for these test cases do we expect them to be equivalent to the message serialize test cases? (i.e the same input will give same output?) i was thinking the answer is "yes" because the the new message serializer would only have changes related to how it creates the results (with all of its optimizations and everything) but the final json should be the same.

if thats the case, wondering why we didnt do similar to envelope serializer, where we run the same test cases against both scenarios and they produce identical results?

I was considering IEnvelopeSerializer being contract that we do not want to touch (definitely not in scope of these changes, they are big enough as is), so the contract (interface) is intact - and same test cases applied to any implementation of the interface need to return same result (as long as they use same kind of message serializer). I already have deserialization part in progress, but that one is much tougher nut to crack due to polymorphic nature of the envelope (bus x sns x sqs) - we'll still require the exactly same IEnvelopeSerializer schema, and test suite to work flawlessly and exactly same on that level, no matter what we do in IEnvelopeSerializer implementations.

Otoh, MessageSerializer can't do with the same interface so this contract isn't that strict, we have a new method with different input and output types so it needs extra tests. You could consider 'old' methods here [Obsolete] but since I don't know your development strategy I try to do with as low impact as possible. So I have gone with deep copy for backwards compatibility while not creating any tight coupling - we could encapsulate or inherit too if you prefer. I went with lesser evil pick pretty much. How to tackle the duality of envelope/message serializers is best to be defined by you guys to fit in overall strategy for transitions, breaking changes etc.

While we keep both Envelope/Message serializers we could do cross tests of any EnvelopeSerializer x any JSON MessageSerializer return with bitwise exact result if you want.
After all our predefined original envelope + original message serializer or new ones pairing is only defined in DI so if some consumer feels adventurous the combinations could change.

JanHyka added 3 commits August 16, 2025 12:36

work in progress

898f5af

~ rest of implementation

a60e4a6

+ change file

87dd2d9

GarrettBeatty requested review from GarrettBeatty, Copilot and normj August 19, 2025 17:10

This comment was marked as outdated.

Sign in to view

JanHyka added 2 commits August 20, 2025 06:52

Fix issues copilot pointed out.

4ce79ed

remove unwanted whitespace

2d8e2ff

JanHyka requested a review from Copilot August 20, 2025 07:14

Copilot AI reviewed Aug 20, 2025

View reviewed changes

! fix written content length calculation at some small extra cpu cost

d964e25

~ altered the related tests to use RentArrayBufferWriter instead of ArrayBufferWriter ~ rerun benchmark ~ moved serializerWriterOptions outside of the SerializeAsync method

GarrettBeatty reviewed Aug 20, 2025

View reviewed changes

-    /// Setting this to false can improve performance in high-throughput scenarios at cost of potential security issues
+    /// Setting this to false can improve performance in high-throughput scenarios, but introduces security risks.
+    /// If buffers are not cleaned after use, sensitive data such as user credentials, personal information, authentication tokens,
+    /// or cryptographic keys may remain in memory. This residual data could potentially be accessed by other code or processes
+    /// that reuse the same buffers, leading to unintended data exposure. Consider the sensitivity of the data being processed
+    /// and the threat model of your application before disabling buffer cleaning.


		namespace AWS.Messaging.UnitTests.SerializationTests;

		public class MessageSerializerUtf8JsonWriterTests

Reduce Allocations in Envelope Serialization #280

Are you sure you want to change the base?

Reduce Allocations in Envelope Serialization #280

Uh oh!

Conversation

JanHyka commented Aug 16, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

GarrettBeatty commented Aug 20, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants