Skip to content

Conversation

josecorella
Copy link
Contributor

@josecorella josecorella commented Sep 15, 2025

For best reading and commenting experience, I suggest splitting your window in two; the review page and the rendered page.
Here are the rendered files:

Goals for 9-15-2025 Spec Review:

  • Agreement on optional metrics agents and how they will impact existing APIs
  • Agreement that Metrics Agent Interface and Implementation will be implemented in Dafny but only as wrappers and provide extern implementations to make moving off of Dafny easier.
  • Agreement on interface supported operations.

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Check any applicable:

  • Were any files moved? Moving files changes their URL, which breaks all hyperlinks to the files.

@josecorella josecorella requested a review from a team as a code owner September 15, 2025 16:05
metrics. Customers can then ask for updates to the implementations
CT provides or customers can go an implement their own interfaces that are fine-tuned
to their use cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a middle ground? For example, logging information locally without uploading it anywhere.

will feel like; getting no choice on the matter and opting to not upgrade.
Going from never emitting metrics to always emitting them says to customers
that their application no matter its use case will always benefit from metrics.
Without letting customers make that choice, CT looses hard earned customer trust.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Without letting customers make that choice, CT looses hard earned customer trust.
Without letting customers make that choice, CT loses hard earned customer trust.

### Issue 1: What will be the default behavior?

As a client-side encryption library CT should be as cautious as possible.
Customers of CT libraries should be on the driver seat and determine for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Customers of CT libraries should be on the driver seat and determine for
Customers of CT libraries should be in the driver's seat and determine for

english is silly

become unsupported.

Additionally, requiring customers to start emitting metrics
almost certainly guarantees a breaking change across supported libraries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so? If we require that all customer facing APIs now require a metric agent, then you can't pick up this change in a minor version because you are required to make a code change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, not that I am advocating for this, but we could just provide a metric agent.
I am against it, but there is probably a way to do this in a non-breaking way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sure, I apparently forgot this was in the context of specific API changes, I retract my objection

at customer facing API call sites.

Currently, the ESDK client APIs models are defined [here](https://github.com/aws/aws-encryption-sdk/blob/mainline/AwsEncryptionSDK/dafny/AwsEncryptionSdk/Model/esdk.smithy#L60-L126).
This change would see that the client APIs be changed as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be the only change. I like the detail, but we should explicitly add the CMM/Keyring inputs to this list.

non-breaking way as the Metrics Agent will be an optional parameter
at customer facing API call sites.

Currently, the ESDK client APIs models are defined [here](https://github.com/aws/aws-encryption-sdk/blob/mainline/AwsEncryptionSDK/dafny/AwsEncryptionSdk/Model/esdk.smithy#L60-L126).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that the DB-ESDK item encryptor would be similar.
What about the DDB client interface?
Also, are there plans for S3EC?

### label

A label is a string that is used
as a an attribute name to aggregate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
as a an attribute name to aggregate
as an attribute name to aggregate

to either a local application or to an observability service like AWS CloudWatch.

As client side encryption libraries emitting metrics must be done carefully as
to avoid accidentally [leaking](https://github.com/aws/aws-encryption-sdk-python/pull/105/files) any information related to the plaintext that could lead to a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we want to link to this :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am planning on removing it before merging but wanted it here for folks to get context.

## Summary

Existing users of Crypto Tools (CT) libraries do no have any insights as to
how the librar(y/ies) behave(s) in their application.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
how the librar(y/ies) behave(s) in their application.
how the libraries behave in their application.

simplify, we know what you mean

@required
label: String,
@required
duration: Long, // Duration in milliseconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@texastony
Copy link
Contributor

@josecorella I have a meta/macro question.
Is there a proposal doc that accompanies these change docs?

I appreciate that the Background doc highlights issues and alternatives, but I feel like we a missing a "User Stories" document, that can be used to measure success criteria and what are the table stakes of this work.

It is also possible I just missed such a proposal doc; but without it, it is difficult to work backwards.

Additionally, requiring customers to start emitting metrics
almost certainly guarantees a breaking change across supported libraries.

### Issue 2: Should Data Plane APIs fail if metrics fail to publish?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about latency?

This is somewhat relevant to the discussion around availability, particularly in technical terms.

What I'm getting at is, are we planning to make blocking calls to CT or wherever, or have a separate thread/pool that does so periodically? Do we get this for free from the CT metrics agent (or whatever) or do we have to write this ourselves?

In any case we should specify what the bar is here for performance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 . We should delegate the whole metric stuff to some known frame work. Maintaining this would be difficult and in any case we shouldn't be reinventing the wheel.

operation AddDate {
input: AddDateInput,
output: AddOutput,
errors: [MetricsPutError]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You say that the interface should not error, but you have errors here.

Comment on lines +234 to +235
// Common output structure
structure AddOutput {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get why you might optimize this. But is this really the best choice? why not have a output per operation?

## Issues and Alternatives

Crypto Tools (CT) publishes software libraries. The latest
versions of these libraries have no logging or metrics publishing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit/Issue: We have not defined metrics.
While the definition might seem obvious to some, metrics can either be a description of the library's performance in the customer's application OR they could be telementry on the usage of Crypto Tools products.

Having read the docs, it is clear that is NOT telementry, and maybe it is only Tony Knapp who gets the two confused, but we could clarify this.


This change will allow Crypto Tools to introduce a Metrics Agent in a
non-breaking way as the Metrics Agent will be an optional parameter
at customer facing API call sites.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we support providing a metrics agent to the client constructor? Such that the same instance would then be used for each API call.

I don't necessarily think we should but it might be worth discussing here.

.MaterialProvidersConfig(MaterialProvidersConfig.builder().build())
.build();

final IKeyring rawAesKeyring = matProv.CreateRawAesKeyring(keyringInput);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we implementing our own metric agent? I'm pretty sure there exists some framework which we can leverage. Customers can optionally provide the agent, or we do no-op (like logging frameworks such as slf4j etc).

algorithmSuiteId: aws.cryptography.materialProviders#ESDKAlgorithmSuiteId,

frameLength: FrameLength,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this probably isn't possible in Smithy/Smithy-Dafny but we might want to consider having a more open-ended "request override" option for things like this. (This is the same way the SDK does it.)

@texastony
Copy link
Contributor

Potential Issue/Alternative/User Story:

Users of the MPL/ESDK/DB-ESDK are also users of the AWS SDKs. The AWS SDKs have established logging and metric interfaces.
AWS Crypto Tools products are AWS Products, just like the AWS SDKs.

There likely is an implicit customer expectation that Crypto Tools products behave and appear to be consistent with the AWS SDKs.

Therefore, I suggest we carefully evaluate if we can utilize the SDKs metric and logging tooling, and offer a customer experience that closely mimics the SDKs experience.

The current collection of docs does not state this as a goal, but it does leave it open as an opportunity.

i.e: the proposed metric interface could wrap an SDK metric class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants