From two separate bugs, we've observed that a lot of the times, we see RPCs wrapped in retries fail with context deadline exceeded, even if it's retried 5 times.
It seems unlikely for RPCs that normally work to fail 5 times in a row. It is possible that there is a bug in a library somewhere that is causing this behavior.
The symptoms are:
c := context.Context()
err := retry.Retry(.....func() error {
nc := clock.WithDeadline(c, 15 seconds)
return doRPC(nc)
})
retry.Retry will retry 5 times. If it fails the first time, the second time will always fail.
Comment 1 by tandrii@chromium.org
, Jul 3