GitHub client is fragile with recent GitHub API flakiness

The bug is in the following lines of code

https://github.com/hub4j/github-api/blob/1cb9e66f7a762ad35f22a19ec854e2f8c4c6d45e/src/main/java/org/kohsuke/github/GitHubClient.java#L47-L52

It will only retry twice with 200ms in between retries.

# Why this is a bug

GitHub branch source Jenkins plugin will drop builds occasionally from received webhooks.  The GHBS plugin relies on GitHub plugin which relies on github-api plugin which provides this library as a client.  Here's an exception from multibranch pipeline events.

```
[Mon Oct 16 14:31:28 GMT 2023] Received Push event to branch master in repository REDACTED UPDATED event from REDACTED ⇒ https://jenkins-webhooks.REDACTED.com/github-webhook/ with timestamp Mon Oct 16 14:31:22 GMT 2023
14:31:26 Connecting to https://api.github.com using GitHub app
ERROR: Ran out of retries for URL: https://api.github.com/repos/REDACTED
org.kohsuke.github.GHIOException: Ran out of retries for URL: https://api.github.com/repos/REDACTED
	at org.kohsuke.github.GitHubClient.sendRequest(GitHubClient.java:456)
	at org.kohsuke.github.GitHubClient.sendRequest(GitHubClient.java:403)
	at org.kohsuke.github.Requester.fetch(Requester.java:85)
	at org.kohsuke.github.GHRepository.read(GHRepository.java:145)
	at org.kohsuke.github.GitHub.getRepository(GitHub.java:684)
	at org.jenkinsci.plugins.github_branch_source.GitHubSCMSource.retrieve(GitHubSCMSource.java:1005)
	at jenkins.scm.api.SCMSource._retrieve(SCMSource.java:372)
	at jenkins.scm.api.SCMSource.fetch(SCMSource.java:326)
	at jenkins.branch.MultiBranchProject$SCMEventListenerImpl.processHeadUpdate(MultiBranchProject.java:1614)
	at jenkins.branch.MultiBranchProject$SCMEventListenerImpl.onSCMHeadEvent(MultiBranchProject.java:1218)
	at jenkins.scm.api.SCMHeadEvent$DispatcherImpl.fire(SCMHeadEvent.java:246)
	at jenkins.scm.api.SCMHeadEvent$DispatcherImpl.fire(SCMHeadEvent.java:229)
	at jenkins.scm.api.SCMEvent$Dispatcher.run(SCMEvent.java:545)
	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
```

# How many retries should it be?

In practice, rolling out this patch from another plugin not using this library but does interact with GitHub I've seen GitHub API retried 28 times over the course of 1 minute.  The retry limit for this was 30 and sleeping randomly between 1000 and 3000 ms.  I've since increased the retry cap to 60 in my production system for this plugin.

https://github.com/jenkinsci/scm-filter-jervis-plugin/commit/c21d0c1d936d1ab71d93aba7684e00fa68d6e67e#diff-46da075f8e1e17ccf594a86bc96e8c8eaf295617b81d5ae5206634e37021bb49R119-R122

# Ideal solution

You can keep the same retries but I would like this to be tunable by the user via system properties on the fly.  That means do not use static properties except as a default value.

```java
Integer minInterval = Integer.getInteger(GitHubClient.class.getName() + ".minRetryInterval", retryTimeoutMillis);
Integer maxInterval = Integer.getInteger(GitHubClient.class.getName() + ".maxRetryInterval", retryTimeoutMillis) + 1;
Integer retryLimit = Integer.getInteger(GitHubClient.class.getName() + ".retryLimit", CONNECTION_ERROR_RETRIES);
```

And for sleeping between retries I would like it to be random [instead of a fixed value](https://github.com/hub4j/github-api/blob/1cb9e66f7a762ad35f22a19ec854e2f8c4c6d45e/src/main/java/org/kohsuke/github/GitHubClient.java#L633-L643).

```java
// import java.util.concurrent.ThreadLocalRandom
    private static void logRetryConnectionError(IOException e, URL url, int retries) throws IOException {
        Integer minInterval = Integer.getInteger(GitHubClient.class.getName() + ".minRetryInterval", retryTimeoutMillis);
        Integer maxInterval = Integer.getInteger(GitHubClient.class.getName() + ".maxRetryInterval", retryTimeoutMillis) + 1;
        Integer sleepyTime = ThreadLocalRandom.current().nextLong(minInterval, maxInterval);
        // There are a range of connection errors where we want to wait a moment and just automatically retry
        LOGGER.log(INFO,
                e.getMessage() + " while connecting to " + url + ". Sleeping " + sleepyTime
                        + " milliseconds before retrying... ; will try " + retryLimit + " more time(s)");
        try {
            Thread.sleep(sleepyTime);
        } catch (InterruptedException ie) {
            throw (IOException) new InterruptedIOException().initCause(e);
        }
    }
```

This should have a sane default but allow clients to tune them on the fly.  Random delay between retries is a cloud best practice for interacting with distributed systems.  See also https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/

	/** The Constant CONNECTION_ERROR_RETRIES. */
	static final int CONNECTION_ERROR_RETRIES = 2;
	/**
	* If timeout issues let's retry after milliseconds.
	*/
	static final int retryTimeoutMillis = 100;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GitHub client is fragile with recent GitHub API flakiness #1728

Why this is a bug

How many retries should it be?

Ideal solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GitHub client is fragile with recent GitHub API flakiness #1728

Description

Why this is a bug

How many retries should it be?

Ideal solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions