Skip to content

Conversation

sleepyStick
Copy link

Please complete the following before merging:

  • Is the relevant DRIVERS ticket in the PR title?
  • Update changelog.
  • Test changes in at least one language driver.
  • Test these changes against all server versions and topologies (including standalone, replica set, and sharded
    clusters).

callbacks.

A previous design had no limits for retrying commits or entire transactions. The callback is always able indicate that
A previous design had no limits for retrying commits or entire transactions. The callback is always able to indicate that
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I read it wrong, or maybe its a typo? Not sure.

[abortTransaction](../transactions/transactions.md#aborttransaction) on the session.
2. If the callback's error includes a "TransientTransactionError" label and the elapsed time of `withTransaction` is
less than 120 seconds, jump back to step two.
less than 120 seconds, sleep for `jitter * min(BACKOFF_INITIAL * (1.25**retry), BACKOFF_MAX)` where:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know python uses ** as the exponential operator. I don't believe that is a standard across other programming languages. Is it clear that its exponent? Is there a different preferred symbol for exponent? (I know math commonly uses ^ but it typically also means bitwise XOR in code so I felt like that could be confusing.)

### Retry Backoff is Enforced

Drivers should test that retries within `withTransaction` do not occur immediately. Ideally, set BACKOFF_INITIAL 500ms
and configure a failpoint that forces one retry. Ensure that the operation took more than 500ms so succeed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test would not work as described because jitter is non-deterministic.

An alternative test would be to use a failpoint to fail the transaction X times and then assert the overall time is larger than some threshold.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I realized that when implementing it. I set the fail point to fail 3 times and that seems to be consistently working (and without jitter failing 3 times would still cause the success to happen within 500ms)


A previous design had no limits for retrying commits or entire transactions. The callback is always able indicate that
`withTransaction` should return to its caller (without future retry attempts) by aborting the transaction directly;
A previous design had no limits for retrying commits or entire transactions. The callback is always able to indicate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we add a section in the Design Rationale to cover why we think it's beneficial to introduce backoff. And another section explaining why we choose the parameters 1ms initial, 1.25 growth, and 500ms max.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants