New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 637935 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Closed: Sep 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Feature



Sign in to add a comment

Investigate how to better fix chrome crash overload problem

Project Member Reported by aut...@google.com, Aug 15 2016

Issue description

See discussion on https://b.corp.google.com/issues/29401582

Maybe clear Chrome crashes after each successful test run  ? 
 
Do we want to upload core dumps at all? Are the logs usually sufficient for debugging?
I think for a first pass, we should do two things:
 1) Put a cap on the amount of crash data we're willing to
    upload.
 2) Make removing all crashes part of the upload code,
    _whether or not upload succeeds_.

We do need to upload crashes when they're not too large;
the crash dumps are important in debugging certain classes
of bugs in the lab.

Project Member

Comment 3 by sheriffbot@chromium.org, Aug 16 2016

Labels: Hotlist-Google
For YET ANOTHER example of what we want to prevent, see
 bug 638257 .

Labels: -Pri-2 Pri-1
Reflecting on the "Put a cap on [ ... ] crash data", I think a
strict cap isn't the best strategy; we'd hate to lose our ability
to debug some critical problem merely because the dump was a few
kilobytes over a multi-gigabyte cap.

Two ideas might mitigate the problems with a strict cap:

1) Transmit probabilistically:

Let Q be the transmission quota per job
Let R be a random number out of U(0,1)
Let S the the size for a single job's worth of data

Then, copy data when R < Q / S.

This would rate limit the size copied to be no more
than Q per job, on average.

2) Copy based on local available storage.  That is, copy
unconditionally when local free disk space is above some
threshold.  This would generally prevent or mitigate problems
like  bug 638257 , while guaranteeing that (typically) the first
failure incident could be fully copied.

Status: Started (was: Untriaged)
Project Member

Comment 7 by bugdroid1@chromium.org, Aug 25 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/1641867d310df1ec429e72ac68630e2d6d0b8995

commit 1641867d310df1ec429e72ac68630e2d6d0b8995
Author: Allen Li <ayatane@chromium.org>
Date: Wed Aug 17 18:54:09 2016

[autotest] Remove crash files after copying

BUG= chromium:638641 , chromium:637935 
TEST=Run control file that forces core dumps

Change-Id: Ibe0ad43a568eca7fc0f13dc92d1183c0185804b3
Reviewed-on: https://chromium-review.googlesource.com/372458
Commit-Ready: Allen Li <ayatane@chromium.org>
Tested-by: Allen Li <ayatane@chromium.org>
Reviewed-by: Richard Barnette <jrbarnette@google.com>

[modify] https://crrev.com/1641867d310df1ec429e72ac68630e2d6d0b8995/server/crashcollect.py
[modify] https://crrev.com/1641867d310df1ec429e72ac68630e2d6d0b8995/server/site_crashcollect.py

Project Member

Comment 8 by bugdroid1@chromium.org, Sep 7 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/5889a8a4cb7a9adf6d96934fe4d1e30411ce21c4

commit 5889a8a4cb7a9adf6d96934fe4d1e30411ce21c4
Author: Allen Li <ayatane@chromium.org>
Date: Wed Aug 17 18:54:09 2016

[autotest] Remove crash files after copying

Retry of https://chromium-review.googlesource.com/#/c/372458/ to fix
unintentionally removing files, for example, /var/log

Cleanup:

- Replace usage of stdout_tee=devnull with None.
- Move tmpdir removal into finally

BUG= chromium:638641 , chromium:637935 
TEST=Run control file that forces core dumps

Change-Id: If39b94510891fb507212d078609effc30efcb696
Reviewed-on: https://chromium-review.googlesource.com/376961
Commit-Ready: Allen Li <ayatane@chromium.org>
Tested-by: Allen Li <ayatane@chromium.org>
Reviewed-by: Allen Li <ayatane@chromium.org>

[modify] https://crrev.com/5889a8a4cb7a9adf6d96934fe4d1e30411ce21c4/server/crashcollect.py
[modify] https://crrev.com/5889a8a4cb7a9adf6d96934fe4d1e30411ce21c4/server/site_crashcollect.py

Project Member

Comment 9 by bugdroid1@chromium.org, Sep 13 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/45da8479324dee178613ba11963aaf5b20f1fa59

commit 45da8479324dee178613ba11963aaf5b20f1fa59
Author: Allen Li <ayatane@chromium.org>
Date: Fri Aug 26 00:39:07 2016

[autotest] Upload large crash files stochastically

Uploading large crash files can overwhelm the network capacity of the
lab.  However, these files may be useful.

This commit changes the crash collection to upload large files
stochastically, such that on average we only uploading files with a
capped maximum size.

The maximum size was determined thus:

Chrome core dump is roughly 350-400 MiB.  Assuming 6 DUTs, 64 MiB * 6 =
384 MiB, so on average one core dump will be uploaded per 6 DUTs.

BUG= chromium:637935 
TEST=Run a job with a control file that forces crashes

Change-Id: I37a38f80ecadd8744631724ba6183e5e24a5c65d
Reviewed-on: https://chromium-review.googlesource.com/376163
Commit-Ready: Allen Li <ayatane@chromium.org>
Tested-by: Allen Li <ayatane@chromium.org>
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>
Reviewed-by: Aviv Keshet <akeshet@chromium.org>

[modify] https://crrev.com/45da8479324dee178613ba11963aaf5b20f1fa59/server/crashcollect.py

Project Member

Comment 10 by bugdroid1@chromium.org, Sep 13 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/45da8479324dee178613ba11963aaf5b20f1fa59

commit 45da8479324dee178613ba11963aaf5b20f1fa59
Author: Allen Li <ayatane@chromium.org>
Date: Fri Aug 26 00:39:07 2016

[autotest] Upload large crash files stochastically

Uploading large crash files can overwhelm the network capacity of the
lab.  However, these files may be useful.

This commit changes the crash collection to upload large files
stochastically, such that on average we only uploading files with a
capped maximum size.

The maximum size was determined thus:

Chrome core dump is roughly 350-400 MiB.  Assuming 6 DUTs, 64 MiB * 6 =
384 MiB, so on average one core dump will be uploaded per 6 DUTs.

BUG= chromium:637935 
TEST=Run a job with a control file that forces crashes

Change-Id: I37a38f80ecadd8744631724ba6183e5e24a5c65d
Reviewed-on: https://chromium-review.googlesource.com/376163
Commit-Ready: Allen Li <ayatane@chromium.org>
Tested-by: Allen Li <ayatane@chromium.org>
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>
Reviewed-by: Aviv Keshet <akeshet@chromium.org>

[modify] https://crrev.com/45da8479324dee178613ba11963aaf5b20f1fa59/server/crashcollect.py

Status: Fixed (was: Started)
Labels: VerifyIn-55

Comment 13 by dchan@chromium.org, Oct 10 2016

Labels: -VerifyIn-55

Comment 14 by dchan@google.com, Nov 19 2016

Labels: VerifyIn-56

Comment 15 by dchan@google.com, Jan 21 2017

Labels: VerifyIn-57

Comment 16 by dchan@google.com, Mar 4 2017

Labels: VerifyIn-58

Comment 17 by dchan@google.com, Apr 17 2017

Labels: VerifyIn-59

Comment 18 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61

Comment 20 by dchan@chromium.org, Oct 14 2017

Status: Archived (was: Fixed)

Sign in to add a comment