Investigate how to better fix chrome crash overload problem |
||||||||||||||
Issue descriptionSee discussion on https://b.corp.google.com/issues/29401582 Maybe clear Chrome crashes after each successful test run ?
,
Aug 16 2016
I think for a first pass, we should do two things:
1) Put a cap on the amount of crash data we're willing to
upload.
2) Make removing all crashes part of the upload code,
_whether or not upload succeeds_.
We do need to upload crashes when they're not too large;
the crash dumps are important in debugging certain classes
of bugs in the lab.
,
Aug 16 2016
,
Aug 16 2016
For YET ANOTHER example of what we want to prevent, see bug 638257 .
,
Aug 17 2016
Reflecting on the "Put a cap on [ ... ] crash data", I think a strict cap isn't the best strategy; we'd hate to lose our ability to debug some critical problem merely because the dump was a few kilobytes over a multi-gigabyte cap. Two ideas might mitigate the problems with a strict cap: 1) Transmit probabilistically: Let Q be the transmission quota per job Let R be a random number out of U(0,1) Let S the the size for a single job's worth of data Then, copy data when R < Q / S. This would rate limit the size copied to be no more than Q per job, on average. 2) Copy based on local available storage. That is, copy unconditionally when local free disk space is above some threshold. This would generally prevent or mitigate problems like bug 638257 , while guaranteeing that (typically) the first failure incident could be fully copied.
,
Aug 23 2016
,
Aug 25 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/1641867d310df1ec429e72ac68630e2d6d0b8995 commit 1641867d310df1ec429e72ac68630e2d6d0b8995 Author: Allen Li <ayatane@chromium.org> Date: Wed Aug 17 18:54:09 2016 [autotest] Remove crash files after copying BUG= chromium:638641 , chromium:637935 TEST=Run control file that forces core dumps Change-Id: Ibe0ad43a568eca7fc0f13dc92d1183c0185804b3 Reviewed-on: https://chromium-review.googlesource.com/372458 Commit-Ready: Allen Li <ayatane@chromium.org> Tested-by: Allen Li <ayatane@chromium.org> Reviewed-by: Richard Barnette <jrbarnette@google.com> [modify] https://crrev.com/1641867d310df1ec429e72ac68630e2d6d0b8995/server/crashcollect.py [modify] https://crrev.com/1641867d310df1ec429e72ac68630e2d6d0b8995/server/site_crashcollect.py
,
Sep 7 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/5889a8a4cb7a9adf6d96934fe4d1e30411ce21c4 commit 5889a8a4cb7a9adf6d96934fe4d1e30411ce21c4 Author: Allen Li <ayatane@chromium.org> Date: Wed Aug 17 18:54:09 2016 [autotest] Remove crash files after copying Retry of https://chromium-review.googlesource.com/#/c/372458/ to fix unintentionally removing files, for example, /var/log Cleanup: - Replace usage of stdout_tee=devnull with None. - Move tmpdir removal into finally BUG= chromium:638641 , chromium:637935 TEST=Run control file that forces core dumps Change-Id: If39b94510891fb507212d078609effc30efcb696 Reviewed-on: https://chromium-review.googlesource.com/376961 Commit-Ready: Allen Li <ayatane@chromium.org> Tested-by: Allen Li <ayatane@chromium.org> Reviewed-by: Allen Li <ayatane@chromium.org> [modify] https://crrev.com/5889a8a4cb7a9adf6d96934fe4d1e30411ce21c4/server/crashcollect.py [modify] https://crrev.com/5889a8a4cb7a9adf6d96934fe4d1e30411ce21c4/server/site_crashcollect.py
,
Sep 13 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/45da8479324dee178613ba11963aaf5b20f1fa59 commit 45da8479324dee178613ba11963aaf5b20f1fa59 Author: Allen Li <ayatane@chromium.org> Date: Fri Aug 26 00:39:07 2016 [autotest] Upload large crash files stochastically Uploading large crash files can overwhelm the network capacity of the lab. However, these files may be useful. This commit changes the crash collection to upload large files stochastically, such that on average we only uploading files with a capped maximum size. The maximum size was determined thus: Chrome core dump is roughly 350-400 MiB. Assuming 6 DUTs, 64 MiB * 6 = 384 MiB, so on average one core dump will be uploaded per 6 DUTs. BUG= chromium:637935 TEST=Run a job with a control file that forces crashes Change-Id: I37a38f80ecadd8744631724ba6183e5e24a5c65d Reviewed-on: https://chromium-review.googlesource.com/376163 Commit-Ready: Allen Li <ayatane@chromium.org> Tested-by: Allen Li <ayatane@chromium.org> Reviewed-by: Ilja H. Friedel <ihf@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org> [modify] https://crrev.com/45da8479324dee178613ba11963aaf5b20f1fa59/server/crashcollect.py
,
Sep 13 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/45da8479324dee178613ba11963aaf5b20f1fa59 commit 45da8479324dee178613ba11963aaf5b20f1fa59 Author: Allen Li <ayatane@chromium.org> Date: Fri Aug 26 00:39:07 2016 [autotest] Upload large crash files stochastically Uploading large crash files can overwhelm the network capacity of the lab. However, these files may be useful. This commit changes the crash collection to upload large files stochastically, such that on average we only uploading files with a capped maximum size. The maximum size was determined thus: Chrome core dump is roughly 350-400 MiB. Assuming 6 DUTs, 64 MiB * 6 = 384 MiB, so on average one core dump will be uploaded per 6 DUTs. BUG= chromium:637935 TEST=Run a job with a control file that forces crashes Change-Id: I37a38f80ecadd8744631724ba6183e5e24a5c65d Reviewed-on: https://chromium-review.googlesource.com/376163 Commit-Ready: Allen Li <ayatane@chromium.org> Tested-by: Allen Li <ayatane@chromium.org> Reviewed-by: Ilja H. Friedel <ihf@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org> [modify] https://crrev.com/45da8479324dee178613ba11963aaf5b20f1fa59/server/crashcollect.py
,
Sep 13 2016
,
Oct 7 2016
,
Oct 10 2016
,
Nov 19 2016
,
Jan 21 2017
,
Mar 4 2017
,
Apr 17 2017
,
May 30 2017
,
Aug 1 2017
,
Oct 14 2017
|
||||||||||||||
►
Sign in to add a comment |
||||||||||||||
Comment 1 by ayatane@chromium.org
, Aug 16 2016