Skip to content

Conversation

JanHyka
Copy link
Contributor

@JanHyka JanHyka commented Aug 16, 2025

Issue #, if available:
#273 Reduce Allocations in Envelope Serialization

Description of changes:

  • new version of IEnvelopeSerializer was introduced: EnvelopeSerializerUtf8JsonWriter
  • serializer method was rewritten using optimized approach based on Utf8JsonWriter
  • new version of IMesssageSerializer was introduced: MessageSerializerUtf8JsonWriter
  • necessary interface changes were added via IMessageSerializerUtf8JsonWriter interface
  • MessageBusBuilder was extended by EnableExperimentalFeatures() method - when used, new Envelope/Message serializers are used
  • SerializationOptions was extended by bool CleanRentedBuffers (by default true). At cost of security risk keeping uncleared buffers in memory makes the result a bit faster (suggestion here is to drop the option and clean always)
  • Unit tests were added to cover the changes/newly added code
  • AWS.Messaging.Benchmarks.Serialization project was added for convenience (can be kept or dropped based on preferences)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Benchmark results:


BenchmarkDotNet v0.13.10, Windows 11 (10.0.26100.4652)
Unknown processor
.NET SDK 10.0.100-preview.2.25164.34
  [Host]     : .NET 8.0.17 (8.0.1725.26602), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.17 (8.0.1725.26602), X64 RyuJIT AVX2


Method ItemCount Mean Error StdDev Median Ratio RatioSD Gen0 Gen1 Gen2 Allocated Alloc Ratio
StandardSerializer 1 1,047.7 ns 9.08 ns 8.49 ns 1,047.1 ns 1.00 0.00 0.1240 - - 2368 B 1.00
StandardSerializerWithJsonContext 1 941.7 ns 2.85 ns 2.67 ns 942.0 ns 0.90 0.01 0.1087 - - 2056 B 0.87
JsonWriterSerializer 1 458.4 ns 2.27 ns 2.12 ns 458.6 ns 0.44 0.00 0.0548 - - 1040 B 0.44
JsonWriterSerializerWithJsonContext 1 357.0 ns 1.71 ns 1.60 ns 357.4 ns 0.34 0.00 0.0386 - - 728 B 0.31
JsonWriterSerializerWithJsonContextUnsafe 1 276.4 ns 0.64 ns 0.57 ns 276.3 ns 0.26 0.00 0.0386 - - 728 B 0.31
StandardSerializer 10 3,162.9 ns 19.17 ns 17.93 ns 3,168.6 ns 1.00 0.00 0.2861 - - 5432 B 1.00
StandardSerializerWithJsonContext 10 2,617.0 ns 6.00 ns 5.61 ns 2,616.1 ns 0.83 0.01 0.2708 - - 5120 B 0.94
JsonWriterSerializer 10 1,112.1 ns 5.10 ns 4.26 ns 1,112.6 ns 0.35 0.00 0.1011 - - 1920 B 0.35
JsonWriterSerializerWithJsonContext 10 696.2 ns 2.06 ns 1.92 ns 695.8 ns 0.22 0.00 0.0849 - - 1608 B 0.30
JsonWriterSerializerWithJsonContextUnsafe 10 598.4 ns 4.37 ns 4.09 ns 599.8 ns 0.19 0.00 0.0849 - - 1608 B 0.30
StandardSerializer 100 27,274.1 ns 491.14 ns 820.58 ns 27,431.5 ns 1.00 0.00 1.9531 0.1526 - 37032 B 1.00
StandardSerializerWithJsonContext 100 19,778.3 ns 389.58 ns 702.49 ns 19,639.0 ns 0.73 0.04 1.9226 0.1526 - 36720 B 0.99
JsonWriterSerializer 100 7,880.9 ns 157.43 ns 303.32 ns 7,744.5 ns 0.29 0.01 0.5798 - - 11104 B 0.30
JsonWriterSerializerWithJsonContext 100 4,023.4 ns 80.33 ns 222.61 ns 3,955.4 ns 0.15 0.01 0.5722 - - 10792 B 0.29
JsonWriterSerializerWithJsonContextUnsafe 100 3,617.6 ns 35.00 ns 32.74 ns 3,600.0 ns 0.13 0.01 0.5722 - - 10792 B 0.29
StandardSerializer 1000 274,182.0 ns 752.42 ns 667.00 ns 274,142.0 ns 1.00 0.00 96.6797 96.6797 96.6797 361993 B 1.00
StandardSerializerWithJsonContext 1000 238,880.0 ns 441.12 ns 391.04 ns 238,938.5 ns 0.87 0.00 96.6797 96.6797 96.6797 361681 B 1.00
JsonWriterSerializer 1000 101,512.4 ns 260.47 ns 243.65 ns 101,461.8 ns 0.37 0.00 33.3252 33.3252 33.3252 106538 B 0.29
JsonWriterSerializerWithJsonContext 1000 69,257.1 ns 267.03 ns 222.98 ns 69,241.4 ns 0.25 0.00 33.3252 33.3252 33.3252 106226 B 0.29
JsonWriterSerializerWithJsonContextUnsafe 1000 69,293.8 ns 320.33 ns 299.64 ns 69,175.4 ns 0.25 0.00 33.3252 33.3252 33.3252 106226 B 0.29

Copilot

This comment was marked as outdated.

@JanHyka JanHyka requested a review from Copilot August 20, 2025 07:14
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces optimized envelope and message serializers using Utf8JsonWriter to reduce memory allocations and improve performance for high-throughput scenarios.

  • Added experimental UTF-8 JSON writer-based serializers that significantly reduce memory allocations
  • Introduced EnableExperimentalFeatures() configuration method to opt into the optimized serializers
  • Added benchmarking infrastructure and configuration option for controlling buffer cleanup behavior

Reviewed Changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
EnvelopeSerializerUtf8JsonWriter.cs New high-performance envelope serializer using Utf8JsonWriter and pooled memory
MessageSerializerUtf8JsonWriter.cs New optimized message serializer with direct buffer writing capabilities
RentArrayBufferWriter.cs Pooled memory buffer writer implementation for allocation reduction
MessageBusBuilder.cs Added EnableExperimentalFeatures() method to configure optimized serializers
SerializerOptions.cs Added CleanRentedBuffers option for security vs performance trade-off
IMessageSerializerUtf8JsonWriter.cs Interface for buffer-based serialization methods
Test files Comprehensive unit tests covering new serializer functionality
Benchmark files Performance testing infrastructure showing significant improvements

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

{
_rentedBuffer.AsSpan(0, _written).Clear();

}
Copy link

Copilot AI Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditional clearing of buffers based on _cleanRentedBuffers creates a potential security vulnerability. Sensitive data could remain in pooled memory when this flag is false. Consider always clearing buffers containing sensitive data regardless of the performance setting, or provide clear documentation about the security implications.

Suggested change
}
_rentedBuffer.AsSpan(0, _written).Clear();

Copilot uses AI. Check for mistakes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on owners (@normj) decision and is already mentioned in the PR itself.

/// When set to true, it will clean the rented buffers after each use.
/// </summary>
/// <remarks>
/// Setting this to false can improve performance in high-throughput scenarios at cost of potential security issues
Copy link

Copilot AI Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation should be more explicit about the security risks. Consider adding details about what type of sensitive data could be exposed and under what conditions, to help users make informed decisions about this trade-off.

Suggested change
/// Setting this to false can improve performance in high-throughput scenarios at cost of potential security issues
/// Setting this to false can improve performance in high-throughput scenarios, but introduces security risks.
/// If buffers are not cleaned after use, sensitive data such as user credentials, personal information, authentication tokens,
/// or cryptographic keys may remain in memory. This residual data could potentially be accessed by other code or processes
/// that reuse the same buffers, leading to unintended data exposure. Consider the sensitivity of the data being processed
/// and the threat model of your application before disabling buffer cleaning.

Copilot uses AI. Check for mistakes.

~ altered the related tests to use RentArrayBufferWriter instead of ArrayBufferWriter
~ rerun benchmark
~ moved serializerWriterOptions outside of the SerializeAsync method
@GarrettBeatty
Copy link
Contributor

thanks for the PR! I can try and take a deeper look today at it!

writer.WriteStartObject();

writer.WriteString(s_idProp, envelope.Id);
writer.WriteString(s_sourceProp, envelope.Source?.OriginalString);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think in the regular enevelopserializer we do ToString() . is there any difference between that vs original string? was seeing this when searching online

ToString():
This method returns a canonicalized and unescaped representation of the URI. This means it might perform actions like converting the scheme and host to lowercase, removing default port numbers, and unescaping percent-encoded characters (like %20 becoming a space). The goal is to provide a "human-readable" or standardized form of the URI.
OriginalString:
This property returns the exact string that was used to initialize the Uri object, without any modifications, canonicalization, or unescaping. If the original string contained leading/trailing spaces, non-standard casing, or percent-encoded characters, OriginalString will preserve them precisely as they were provided.

can we verify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the path I was able to track we define Source in CreateEnvelopeAsync Source = MessageSource which comes from MessageSourceHandler.ComputeMessageSource()...
inside the GetFullSourceUri() is called - it trims both source and suffix parts and normalizes slash boundaries in relative uri. In the end the Uri is formed as relative with definition string $"{source}{suffix}". That should result with already sanitized OriginalString, unless there is some code path I have missed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, another point of view can be when checking to Uri implementation:

public bool IsAbsoluteUri { get { return _syntax != null }}

so, when Uri is not absolute, which it is not in our case,

public override string ToString()
{
    if (_syntax == null)
    {
        return _string;
    }
    ...

and

//
//  Gets the exact string passed by a user.
//  The original string will switched from _string to _originalUnicodeString if
//  iri is turned on and we have non-ascii chars
//
public string OriginalString => _originalUnicodeString ?? _string;

will be returning exactly same thing, we can easily keep ToString() if you prefer as in our use case, both will be and behave exactly same.

writer.WritePropertyName(s_dataProp);
if (IsJsonContentType(response.ContentType))
{
writer.WriteRawValue(response.Data, skipInputValidation: true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it safe to skip validation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my reasoning is following:

  • we have full control over the envelope json structure and property names (they are pre-encoded anyways)
  • Utf8JsonWriter handles escaping for all string writing calls internally
  • metadata get emitted via JsonElement.WriteTo which guarantees valid JSON
  • for data either we have Utf8Serializer that writes directly and safely or fallback to non-utf ones. Out of those, the ones returning non-JSON go via string writing route and for other we rely on message serializer returning valid json for its advertised content type


namespace AWS.Messaging.UnitTests.SerializationTests;

public class MessageSerializerUtf8JsonWriterTests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: for these test cases do we expect them to be equivalent to the message serialize test cases? (i.e the same input will give same output?) i was thinking the answer is "yes" because the the new message serializer would only have changes related to how it creates the results (with all of its optimizations and everything) but the final json should be the same.

if thats the case, wondering why we didnt do similar to envelope serializer, where we run the same test cases against both scenarios and they produce identical results?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering IEnvelopeSerializer being contract that we do not want to touch (definitely not in scope of these changes, they are big enough as is), so the contract (interface) is intact - and same test cases applied to any implementation of the interface need to return same result (as long as they use same kind of message serializer). I already have deserialization part in progress, but that one is much tougher nut to crack due to polymorphic nature of the envelope (bus x sns x sqs) - we'll still require the exactly same IEnvelopeSerializer schema, and test suite to work flawlessly and exactly same on that level, no matter what we do in IEnvelopeSerializer implementations.

Otoh, MessageSerializer can't do with the same interface so this contract isn't that strict, we have a new method with different input and output types so it needs extra tests. You could consider 'old' methods here [Obsolete] but since I don't know your development strategy I try to do with as low impact as possible. So I have gone with deep copy for backwards compatibility while not creating any tight coupling - we could encapsulate or inherit too if you prefer. I went with lesser evil pick pretty much. How to tackle the duality of envelope/message serializers is best to be defined by you guys to fit in overall strategy for transitions, breaking changes etc.

While we keep both Envelope/Message serializers we could do cross tests of any EnvelopeSerializer x any JSON MessageSerializer return with bitwise exact result if you want.
After all our predefined original envelope + original message serializer or new ones pairing is only defined in DI so if some consumer feels adventurous the combinations could change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants