New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 23 link

Starred by 17 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:



Sign in to add a comment

Implement better rate-limiting/retry logic

Project Member Reported by mark@chromium.org, Mar 26 2015

Issue description

The current crashpad_handler makes one upload attempt per hour. Any would-be upload attempts in excess of this are never attempted. Any failed upload is never retried.

This logic exactly matches the Mac Breakpad client as configured for use by Chrome, but it’s not elegant or flexible.

We should implement something better, possible with multiple stacked rules, like

limit 2 upload attempts within 2 hours
limit 6 upload attempts within 12 hours
initial retry attempt for a report is a 1 hour delay after initial upload attempt
subsequent retry attempts for a report back off by doubling the delay to a maximum delay of 8 hours
reports that failed to upload are retired after 48 hours

or something like that.
 
Project Member

Comment 1 by mark@chromium.org, Oct 30 2015

Labels: -priority-medium Priority-2
Project Member

Comment 2 by mark@chromium.org, Oct 30 2015

Labels: -type-defect Type-Bug
Project Member

Comment 3 by mark@chromium.org, Oct 30 2015

Status: Available
Project Member

Comment 4 by mark@chromium.org, Oct 30 2015

Labels: Component-Networking
Components: Networking
Labels: -component-networking
Changing Component-* labels to full components.
Having the ability to set a higher limit (even w/o stacked rules) would be a great start for my use case. Maybe allow setting a callback which decides whether a report should be uploaded or not?
Project Member

Comment 7 by scottmg@chromium.org, Jan 5 2016

I just realized that Breakpad Chrome on Windows doesn't throttle uploads in the same way (it's a per day throttle, set to 20/day by GoogleUpdate). This is a potential cause of great consternation, because the crash dashboard has many fewer reports after the switch to Crashpad. (Of course, it could be something completely different too.)

In the near term, I'd like to emulate that for comparison purposes.
The following revision refers to this bug:
  https://chromium.googlesource.com/crashpad/crashpad.git/+/330adfb02984692135fc5989dbaff49adb3b757e

commit 330adfb02984692135fc5989dbaff49adb3b757e
Author: Scott Graham <scottmg@chromium.org>
Date: Wed Jan 06 17:59:54 2016

Allow disabling upload rate-limiting in crashpad_handler

This is a temporary measure to try to account for lower than expected
upload volume from Chrome in the wild. So this doesn't fix bug 23, but
is related. The ability to delimit the upload rate is useful when
testing locally too.

R=mark@chromium.org
BUG=crashpad:23

Review URL: https://codereview.chromium.org/1563683002 .

[modify] http://crrev.com/330adfb02984692135fc5989dbaff49adb3b757e/handler/crash_report_upload_thread.cc
[modify] http://crrev.com/330adfb02984692135fc5989dbaff49adb3b757e/handler/crash_report_upload_thread.h
[modify] http://crrev.com/330adfb02984692135fc5989dbaff49adb3b757e/handler/crashpad_handler.ad
[modify] http://crrev.com/330adfb02984692135fc5989dbaff49adb3b757e/handler/handler_main.cc

Could we do something smarter than using a fixed limit ? Could we look at the crash rate instead to stop uploading if we realize that Chrome is just continuously sending crash reports ? (e.g. it's stuck in a crash loop at startup).
By the way, any kind of limit does introduce a discrepancy with UMA crash numbers, which has been a long standing problem for people on Chrome Stability trying to understand things.

Would it be possible to log how many crash reports were not reported up due to throttling to UMA so that it's easier to reason when comparing the numbers? (In fact, we should also log how many *were* uploaded.)
Project Member

Comment 11 by mark@chromium.org, Aug 3 2016

Seb (#9), how would you detect the character of crash reports?

Alexei (#10), I’m into providing an interface to get stats out of Crashpad that Chrome could pick up and log via UMA. I think we can have a separate bug for that.
Project Member

Comment 12 by bugdroid1@chromium.org, Sep 12 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d922b3cacfcee6958d71de501a8ced7c25801965

commit d922b3cacfcee6958d71de501a8ced7c25801965
Author: scottmg <scottmg@chromium.org>
Date: Mon Sep 12 21:48:56 2016

Re-enable rate limiting for Windows Crashpad

Chrome can get itself into crash loops that cause massive numbers of
uploads. We'd like to have UMA to see what we're dropping, but for
the time being, mitigate by re-instituting the Mac-style 1/hr upload
limit.

BUG=crashpad:23

Review-Url: https://codereview.chromium.org/2332103002
Cr-Commit-Position: refs/heads/master@{#418063}

[modify] https://crrev.com/d922b3cacfcee6958d71de501a8ced7c25801965/components/crash/content/app/crashpad_win.cc

Project Member

Comment 13 by bugdroid1@chromium.org, Sep 16 2016

Labels: merge-merged-2840
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9b50a69f641ced25720eb16e05b7da73879108eb

commit 9b50a69f641ced25720eb16e05b7da73879108eb
Author: Scott Graham <scottmg@chromium.org>
Date: Fri Sep 16 21:37:18 2016

Re-enable rate limiting for Windows Crashpad

Chrome can get itself into crash loops that cause massive numbers of
uploads. We'd like to have UMA to see what we're dropping, but for
the time being, mitigate by re-instituting the Mac-style 1/hr upload
limit.

BUG=crashpad:23

Review-Url: https://codereview.chromium.org/2332103002
Cr-Commit-Position: refs/heads/master@{#418063}
(cherry picked from commit d922b3cacfcee6958d71de501a8ced7c25801965)

Review URL: https://codereview.chromium.org/2343283002 .

Cr-Commit-Position: refs/branch-heads/2840@{#398}
Cr-Branched-From: 1ae106dbab4bddd85132d5b75c670794311f4c57-refs/heads/master@{#414607}

[modify] https://crrev.com/9b50a69f641ced25720eb16e05b7da73879108eb/components/crash/content/app/crashpad_win.cc

Project Member

Comment 14 by bugdroid1@chromium.org, Sep 23 2016

Labels: merge-merged-2785
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c1490fd145eda148dd788e554426aa7cb7fde5b2

commit c1490fd145eda148dd788e554426aa7cb7fde5b2
Author: Scott Graham <scottmg@chromium.org>
Date: Fri Sep 23 16:23:55 2016

Re-enable rate limiting for Windows Crashpad

Chrome can get itself into crash loops that cause massive numbers of
uploads. We'd like to have UMA to see what we're dropping, but for
the time being, mitigate by re-instituting the Mac-style 1/hr upload
limit.

BUG=crashpad:23, 647464

Review-Url: https://codereview.chromium.org/2332103002
Cr-Commit-Position: refs/heads/master@{#418063}
(cherry picked from commit d922b3cacfcee6958d71de501a8ced7c25801965)

Review URL: https://codereview.chromium.org/2365833003 .

Cr-Commit-Position: refs/branch-heads/2785@{#918}
Cr-Branched-From: 68623971be0cfc492a2cb0427d7f478e7b214c24-refs/heads/master@{#403382}

[modify] https://crrev.com/c1490fd145eda148dd788e554426aa7cb7fde5b2/components/crash/content/app/crashpad_win.cc

Project Member

Comment 15 by bugdroid1@chromium.org, Sep 23 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/815445561158768341c730ac4609a5be81b97b27

commit 815445561158768341c730ac4609a5be81b97b27
Author: scottmg <scottmg@chromium.org>
Date: Fri Sep 23 18:56:59 2016

Revert of Re-enable rate limiting for Windows Crashpad (patchset #1 id:1 of https://codereview.chromium.org/2365833003/ )

Reason for revert:
After further discussion, decided against taking this in 53

Original issue's description:
> Re-enable rate limiting for Windows Crashpad
>
> Chrome can get itself into crash loops that cause massive numbers of
> uploads. We'd like to have UMA to see what we're dropping, but for
> the time being, mitigate by re-instituting the Mac-style 1/hr upload
> limit.
>
> BUG=crashpad:23, 647464
>
> Review-Url: https://codereview.chromium.org/2332103002
> Cr-Commit-Position: refs/heads/master@{#418063}
> (cherry picked from commit d922b3cacfcee6958d71de501a8ced7c25801965)
>
> Committed: https://chromium.googlesource.com/chromium/src/+/c1490fd145eda148dd788e554426aa7cb7fde5b2

TBR=
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=crashpad:23, 647464

Review-Url: https://codereview.chromium.org/2367723003
Cr-Commit-Position: refs/branch-heads/2785@{#920}
Cr-Branched-From: 68623971be0cfc492a2cb0427d7f478e7b214c24-refs/heads/master@{#403382}

[modify] https://crrev.com/815445561158768341c730ac4609a5be81b97b27/components/crash/content/app/crashpad_win.cc

Project Member

Comment 16 by scottmg@chromium.org, Sep 30 2016

xref to https://bugs.chromium.org/p/chromium/issues/detail?id=651862#c6 where rate limiting resulted in not uploading the important crash.
Project Member

Comment 17 by mark@chromium.org, Sep 30 2016

Noting that we’re discussing revamping throttling/rate-limiting internally.
Project Member

Comment 18 by bugdroid1@chromium.org, Oct 27 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9b50a69f641ced25720eb16e05b7da73879108eb

commit 9b50a69f641ced25720eb16e05b7da73879108eb
Author: Scott Graham <scottmg@chromium.org>
Date: Fri Sep 16 21:37:18 2016

Re-enable rate limiting for Windows Crashpad

Chrome can get itself into crash loops that cause massive numbers of
uploads. We'd like to have UMA to see what we're dropping, but for
the time being, mitigate by re-instituting the Mac-style 1/hr upload
limit.

BUG=crashpad:23

Review-Url: https://codereview.chromium.org/2332103002
Cr-Commit-Position: refs/heads/master@{#418063}
(cherry picked from commit d922b3cacfcee6958d71de501a8ced7c25801965)

Review URL: https://codereview.chromium.org/2343283002 .

Cr-Commit-Position: refs/branch-heads/2840@{#398}
Cr-Branched-From: 1ae106dbab4bddd85132d5b75c670794311f4c57-refs/heads/master@{#414607}

[modify] https://crrev.com/9b50a69f641ced25720eb16e05b7da73879108eb/components/crash/content/app/crashpad_win.cc

As a downstream consumer of Chromium I would like rate limiting logic such as:

1. Upload 5 crash reports per day max.
2. Upload reports as quickly as possible and retry uploads that fail (don't upload or retry while offline).
3. Discard additional reports from the same day.

Can we expose an API whereby the Chromium consumer can implement their own rate limiting logic? For example, have CrashReportUploadThread::ProcessPendingReport call the consumer to determine whether to upload or skip pending reports.
Project Member

Comment 21 by mark@chromium.org, Jun 26 2017

Cc: mosescu@chromium.org
Are there any plans to, at a minimum, not move reports to completed that simply failed to upload? What would such a change look like?

In particular, if a crash report was not marked as skipped on failure to upload, is there concern with loops or something as we repeatedly try to upload that report in the future?
Project Member

Comment 23 by mark@chromium.org, Nov 16

Yes. That’s the retry aspect. It’s distinct from improving rate limiting, but closely related enough that I’ll fix both at the same time.

Sign in to add a comment