Skip to content

Pathological behavior triggered by "slow" DNS #2905

@hacst

Description

@hacst

Describe the bug

In environments where YARP has to (re-)establish a lot of connections to the same domain and DNS resolution times are non-zero, DNS resolution times can become excessive leading to pathological behavior.

This is triggered by dotnet serializing all DNS requests for a domain under the assumption there is a local resolver cache that will be fast after the first query completes. This is generally not true in Linux/Kubernetes.

To Reproduce

  1. Get a DNS resolver with some latency (~5ms or so are sufficient)
  2. Add a backend forces reconnects
  3. Put sufficient load on the instance (e.g. >200rps @ 5ms)
  4. Observe DNS metrics. The time for resolution of the domain will keep climbing.

Further technical details

Found with YARP 2.3.0, dotnet 8 on AKS (Kubernetes, Linux). Probably not a problem on systems with built-in resolver caches, but should be true for any system where DNS has some latency. Especially in Kubernetes many queries are "slow" due to the way they setup the default search behavior for local domains. Ich checked an the serialization behavior seems to still be in place in dotnet 10 preview.

In our case the pathological behavior was triggered by a relatively short term backend overload triggering lots of reconnects. Due to the high rps this was sufficient to drive DNS resolution times over the connection timeout making it pretty much impossible for the instance to recover.

Imo it might make sense to provide some form of mitigation,workaround or guidance in YARP for this. I am also considering reporting this to dotnet runtime as I think it was written under the assumption of being in a windows environment with very fast local caching.

Metadata

Metadata

Assignees

No one assigned

    Labels

    External: RuntimeThis work will mostly be done in the dotnet/runtime repoType: TrackingTracking work to be done in other repositories.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions