-
Notifications
You must be signed in to change notification settings - Fork 26
Reduce Allocations in Envelope Serialization #280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces optimized envelope and message serializers using Utf8JsonWriter to reduce memory allocations and improve performance for high-throughput scenarios.
- Added experimental UTF-8 JSON writer-based serializers that significantly reduce memory allocations
- Introduced
EnableExperimentalFeatures()
configuration method to opt into the optimized serializers - Added benchmarking infrastructure and configuration option for controlling buffer cleanup behavior
Reviewed Changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.
Show a summary per file
File | Description |
---|---|
EnvelopeSerializerUtf8JsonWriter.cs | New high-performance envelope serializer using Utf8JsonWriter and pooled memory |
MessageSerializerUtf8JsonWriter.cs | New optimized message serializer with direct buffer writing capabilities |
RentArrayBufferWriter.cs | Pooled memory buffer writer implementation for allocation reduction |
MessageBusBuilder.cs | Added EnableExperimentalFeatures() method to configure optimized serializers |
SerializerOptions.cs | Added CleanRentedBuffers option for security vs performance trade-off |
IMessageSerializerUtf8JsonWriter.cs | Interface for buffer-based serialization methods |
Test files | Comprehensive unit tests covering new serializer functionality |
Benchmark files | Performance testing infrastructure showing significant improvements |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
{ | ||
_rentedBuffer.AsSpan(0, _written).Clear(); | ||
|
||
} |
Copilot
AI
Aug 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conditional clearing of buffers based on _cleanRentedBuffers
creates a potential security vulnerability. Sensitive data could remain in pooled memory when this flag is false. Consider always clearing buffers containing sensitive data regardless of the performance setting, or provide clear documentation about the security implications.
} | |
_rentedBuffer.AsSpan(0, _written).Clear(); |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on owners (@normj) decision and is already mentioned in the PR itself.
/// When set to true, it will clean the rented buffers after each use. | ||
/// </summary> | ||
/// <remarks> | ||
/// Setting this to false can improve performance in high-throughput scenarios at cost of potential security issues |
Copilot
AI
Aug 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation should be more explicit about the security risks. Consider adding details about what type of sensitive data could be exposed and under what conditions, to help users make informed decisions about this trade-off.
/// Setting this to false can improve performance in high-throughput scenarios at cost of potential security issues | |
/// Setting this to false can improve performance in high-throughput scenarios, but introduces security risks. | |
/// If buffers are not cleaned after use, sensitive data such as user credentials, personal information, authentication tokens, | |
/// or cryptographic keys may remain in memory. This residual data could potentially be accessed by other code or processes | |
/// that reuse the same buffers, leading to unintended data exposure. Consider the sensitivity of the data being processed | |
/// and the threat model of your application before disabling buffer cleaning. |
Copilot uses AI. Check for mistakes.
test/AWS.Messaging.UnitTests/SerializationTests/MessageSerializerTestsUtf8JsonWriterTests.cs
Outdated
Show resolved
Hide resolved
test/AWS.Messaging.UnitTests/SerializationTests/MessageSerializerTestsUtf8JsonWriterTests.cs
Outdated
Show resolved
Hide resolved
~ altered the related tests to use RentArrayBufferWriter instead of ArrayBufferWriter ~ rerun benchmark ~ moved serializerWriterOptions outside of the SerializeAsync method
thanks for the PR! I can try and take a deeper look today at it! |
writer.WriteStartObject(); | ||
|
||
writer.WriteString(s_idProp, envelope.Id); | ||
writer.WriteString(s_sourceProp, envelope.Source?.OriginalString); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think in the regular enevelopserializer we do ToString() . is there any difference between that vs original string? was seeing this when searching online
ToString():
This method returns a canonicalized and unescaped representation of the URI. This means it might perform actions like converting the scheme and host to lowercase, removing default port numbers, and unescaping percent-encoded characters (like %20 becoming a space). The goal is to provide a "human-readable" or standardized form of the URI.
OriginalString:
This property returns the exact string that was used to initialize the Uri object, without any modifications, canonicalization, or unescaping. If the original string contained leading/trailing spaces, non-standard casing, or percent-encoded characters, OriginalString will preserve them precisely as they were provided.
can we verify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the path I was able to track we define Source in CreateEnvelopeAsync Source = MessageSource
which comes from MessageSourceHandler.ComputeMessageSource()
...
inside the GetFullSourceUri()
is called - it trims both source and suffix parts and normalizes slash boundaries in relative uri. In the end the Uri is formed as relative with definition string $"{source}{suffix}"
. That should result with already sanitized OriginalString
, unless there is some code path I have missed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, another point of view can be when checking to Uri implementation:
public bool IsAbsoluteUri { get { return _syntax != null }}
so, when Uri is not absolute, which it is not in our case,
public override string ToString()
{
if (_syntax == null)
{
return _string;
}
...
and
//
// Gets the exact string passed by a user.
// The original string will switched from _string to _originalUnicodeString if
// iri is turned on and we have non-ascii chars
//
public string OriginalString => _originalUnicodeString ?? _string;
will be returning exactly same thing, we can easily keep ToString() if you prefer as in our use case, both will be and behave exactly same.
writer.WritePropertyName(s_dataProp); | ||
if (IsJsonContentType(response.ContentType)) | ||
{ | ||
writer.WriteRawValue(response.Data, skipInputValidation: true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it safe to skip validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my reasoning is following:
- we have full control over the envelope json structure and property names (they are pre-encoded anyways)
- Utf8JsonWriter handles escaping for all string writing calls internally
- metadata get emitted via JsonElement.WriteTo which guarantees valid JSON
- for data either we have Utf8Serializer that writes directly and safely or fallback to non-utf ones. Out of those, the ones returning non-JSON go via string writing route and for other we rely on message serializer returning valid json for its advertised content type
|
||
namespace AWS.Messaging.UnitTests.SerializationTests; | ||
|
||
public class MessageSerializerUtf8JsonWriterTests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: for these test cases do we expect them to be equivalent to the message serialize test cases? (i.e the same input will give same output?) i was thinking the answer is "yes" because the the new message serializer would only have changes related to how it creates the results (with all of its optimizations and everything) but the final json should be the same.
if thats the case, wondering why we didnt do similar to envelope serializer, where we run the same test cases against both scenarios and they produce identical results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was considering IEnvelopeSerializer being contract that we do not want to touch (definitely not in scope of these changes, they are big enough as is), so the contract (interface) is intact - and same test cases applied to any implementation of the interface need to return same result (as long as they use same kind of message serializer). I already have deserialization part in progress, but that one is much tougher nut to crack due to polymorphic nature of the envelope (bus x sns x sqs) - we'll still require the exactly same IEnvelopeSerializer schema, and test suite to work flawlessly and exactly same on that level, no matter what we do in IEnvelopeSerializer implementations.
Otoh, MessageSerializer can't do with the same interface so this contract isn't that strict, we have a new method with different input and output types so it needs extra tests. You could consider 'old' methods here [Obsolete] but since I don't know your development strategy I try to do with as low impact as possible. So I have gone with deep copy for backwards compatibility while not creating any tight coupling - we could encapsulate or inherit too if you prefer. I went with lesser evil pick pretty much. How to tackle the duality of envelope/message serializers is best to be defined by you guys to fit in overall strategy for transitions, breaking changes etc.
While we keep both Envelope/Message serializers we could do cross tests of any EnvelopeSerializer x any JSON MessageSerializer return with bitwise exact result if you want.
After all our predefined original envelope + original message serializer or new ones pairing is only defined in DI so if some consumer feels adventurous the combinations could change.
Issue #, if available:
#273 Reduce Allocations in Envelope Serialization
Description of changes:
IEnvelopeSerializer
was introduced:EnvelopeSerializerUtf8JsonWriter
IMesssageSerializer
was introduced:MessageSerializerUtf8JsonWriter
IMessageSerializerUtf8JsonWriter
interfaceMessageBusBuilder
was extended byEnableExperimentalFeatures()
method - when used, new Envelope/Message serializers are usedSerializationOptions
was extended bybool CleanRentedBuffers
(by default true). At cost of security risk keeping uncleared buffers in memory makes the result a bit faster (suggestion here is to drop the option and clean always)AWS.Messaging.Benchmarks.Serialization
project was added for convenience (can be kept or dropped based on preferences)By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Benchmark results: