Always upload moblab logs |
||||||||||||||
Issue descriptionIn the CQ / BVT: We upload the logs from moblab's autotest setup (i.e. logs from /usr/local/autotest/logs and /usr/local/autotest/results/) when the moblab_RunSuite test fails. We do not upload them if the job is aborted. This means that if something goes wrong so that the run_suite call inside moblab takes too long, our lab aborts the moblab_RunSuite test and we end up with no logs to figure out why moblab didn't run its (internal) tests. Example aborted test: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=120599578 Example logs (that don't contain the moblab logs): https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/120599578-chromeos-test/chromeos2-row2-rack8-host1/ sbasi@: What would be involved in making this happen?
,
May 31 2017
I had a thought about changing the gs bucket to a moblab-lab - if that bucket was writable the gs_offloader would upload the internal test results to that bucket and notify the partner console withing a min or so of the moblab test completing. However if the DUT was very quickly re-imaged it might fail to upload some DUTS. My main concern was gs_offloader keeps dying and respawning on the moblab because the configured bucket is read only. Any concerns about that approach ?
,
May 31 2017
What is failing to upload the logs, gs_offloader on the moblab, or gs_offloader on the shard?
,
May 31 2017
Re comment #1: I think you need to update the logs gatherer script for the MobLab usecase to fetch the logs. I feel like I looked at it a long time ago and it required a lot of refactoring and hence got dropped. Sorry I don't remember exactly where my investigations took me.
,
May 31 2017
Per #3 gs_offloader on the moblab fails because the configured bucket is gs://chromeos-image-archive/ and I believe that is read only to the moblab. I see this in /var/log/messages 2017-05-31T22:14:57.715832+00:00 WARNING kernel: [ 3718.734683] init: moblab-gsoffloader-init main process (9310) terminated with status 1 2017-05-31T22:14:57.715840+00:00 WARNING kernel: [ 3718.734703] init: moblab-gsoffloader-init main process ended, respawning It would be possible to set the test up to be much more like the way partners use the moblab and upload the logs.
,
Jun 1 2017
I really like idea in #2, with a twist - Provide a path within the target results directory on GS to moblab as its image archive. This way, moblab's gs_offloader will try to offload the results into a subdirectory in the usual results folder directly. - At the end of a successful / failed test, add an extra step to give gs_offloader on the moblab one more one-off spin to try to offload everything. In the case of aborted tests, we still don't guarantee that all from-moblab logs are offloaded (since the autoserv process is terminated brutally. My gut feeling is also that fixing this may be non-trivial). But, if the test is aborted due to a timeout, we should get plenty logs.
,
Jun 2 2017
,
Jun 2 2017
,
Jun 2 2017
,
Jun 5 2017
,
Jun 5 2017
,
Jun 5 2017
,
Jun 6 2017
We actually already have a partial solution for this in place. We symlink the autotest service logs into /var/log/autotest, so that they get collected as part of sysinfo collection. Technically, the result logs that we want here are part of logs collected from the DUT. So one proposal would be to - also symlink /usr/local/autotest/results into /var/log/autotest/results and let sysinfo collection do its magic. - Stop the special-cased log collection from moblab_RunSuite. Example of logs collected this way: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/120599578-chromeos-test/chromeos2-row2-rack8-host1/sysinfo/var/log/autotest/ sbasi@: wdyt? This means that we'll get better moblab logs for _all_ tests. The downside is that /usr/local/autotest/results can be large, and sysinfo log collection restrictions (if any) will apply.
,
Jun 8 2017
Sounds good to me.
,
Jun 9 2017
Posted: https://chromium-review.googlesource.com/c/528471/ If this works, I'll also remove the current result collection logic in moblab_RunSuite because it'll be redundant.
,
Jun 13 2017
,
Jun 15 2017
#15 didn't go far enough, as expected. I have an alternative approach: https://chromium-review.googlesource.com/c/536257/ Trying it out now.
,
Jun 28 2017
Issue 642157 has been merged into this issue.
,
Jun 30 2017
Update, there was a bug in sysinfo collection (affecting sysinfo collection for all DUTs). I've verified locally that my approach in #17 works along with the fix. I'm running a trybot with both changes now.
,
Jul 1 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/2e5c609b2c7a14cff4f51c749ae9d5835b79ff52 commit 2e5c609b2c7a14cff4f51c749ae9d5835b79ff52 Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Sat Jul 01 05:24:49 2017 [autotest] Make sysinfo logging more robust We ignore errors in sysinfo log creation / collection. But we bailed on the first error encountered so that the rest of the requested loggables were never run. Since we're anyway ignoring errors, ignore them for each loggable as well so that we get more information on failures. BUG= chromium:728290 TEST=(1) new unittest. (2) Run a client test and notice some error messages about (current failing) log collection, and verify that other loggables are still executed. Change-Id: Ie09df2d0510f8a225a5d6cdde0f4dc59d55e3c9c Reviewed-on: https://chromium-review.googlesource.com/558202 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Xixuan Wu <xixuan@chromium.org> [add] https://crrev.com/2e5c609b2c7a14cff4f51c749ae9d5835b79ff52/client/bin/base_sysinfo_unittest.py [modify] https://crrev.com/2e5c609b2c7a14cff4f51c749ae9d5835b79ff52/client/bin/base_sysinfo.py
,
Jul 17 2017
ChromeOS Infra P1 Bugscrub. P1 Bugs in this component should be important enough to get weekly status updates. Is this already fixed? -> Fixed Is this no longer relevant? -> Archived or WontFix Is this not a P1, based on go/chromeos-infra-bug-slo rubric? -> lower priority. Is this a Feature Request rather than a bug? Type -> Feature Is this missing important information or scope needed to decide how to proceed? -> Ask question on bug, possibly reassign. Does this bug have the wrong owner? -> reassign. Bugs that remain in this state next week will be downgraded to P2.
,
Jul 20 2017
,
Jul 22 2017
Actually works with this HACK: https://chromium-review.googlesource.com/c/582507/ Will upload a principled CL to do the same thing.
,
Jul 22 2017
Re:24 The real CL is https://chromium-review.googlesource.com/c/536257/ but it needs an extra hack.
,
Jul 27 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/aab59c0517e31caabc4941eea32774a1b48410c7 commit aab59c0517e31caabc4941eea32774a1b48410c7 Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Thu Jul 27 05:41:53 2017 Make sysinfo collection more aggressive about resolving source paths Two changes really: - Resolve symlinks to determine the source path. - Don't apply path exclusions to the leading prefix of the source path, only within the source root. + unittests for old and new behaviour. BUG= chromium:728290 TEST=(new) unittests. Change-Id: I43a773d695d2062eceefe7a6b65bc99ae3d920c1 Reviewed-on: https://chromium-review.googlesource.com/584087 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Xixuan Wu <xixuan@chromium.org> [modify] https://crrev.com/aab59c0517e31caabc4941eea32774a1b48410c7/client/bin/site_sysinfo_unittest.py [modify] https://crrev.com/aab59c0517e31caabc4941eea32774a1b48410c7/client/bin/site_sysinfo.py
,
Jul 28 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/b31263bae188d9bb598b81f818a8bff93d8f25da commit b31263bae188d9bb598b81f818a8bff93d8f25da Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Fri Jul 28 04:39:03 2017 [autotest] Accept a sysinfo.logdir object for sysinfo.add_logdir Instead of duplicating the constructor of sysinfo.logdir, accept a pre-constructed logdir. BUG= chromium:728290 TEST=Run provision on local AFE without SSP. Change-Id: Id7841dc03a30e0d82b52a88b93c7533779b26acd Reviewed-on: https://chromium-review.googlesource.com/585378 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org> [modify] https://crrev.com/b31263bae188d9bb598b81f818a8bff93d8f25da/server/site_tests/provision_AutoUpdate/control [modify] https://crrev.com/b31263bae188d9bb598b81f818a8bff93d8f25da/server/control_segments/provision [modify] https://crrev.com/b31263bae188d9bb598b81f818a8bff93d8f25da/server/site_tests/autoupdate_EndToEndTest/control [modify] https://crrev.com/b31263bae188d9bb598b81f818a8bff93d8f25da/client/bin/site_sysinfo.py [modify] https://crrev.com/b31263bae188d9bb598b81f818a8bff93d8f25da/client/bin/sysinfo.py [modify] https://crrev.com/b31263bae188d9bb598b81f818a8bff93d8f25da/server/site_tests/provision_AutoUpdate/control.double
,
Jul 28 2017
#27 is getting reverted because of issue 750254 I need to re-land in two steps: - First add the sysinfo support. - push-to-prod so that the stupid paygen tests can use the new import - then land the use in the autoupdate control files.
,
Aug 4 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/0b3ac67a34b83952e395c7baffc46ba3b034f697 commit 0b3ac67a34b83952e395c7baffc46ba3b034f697 Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Fri Aug 04 02:23:14 2017 Allow overriding exclude paths for sysinfo collection Before this CL, we could specify additional patterns for exclusions during sysinfo rsync. This CL adds the ability to override the list entirely. This is needed for moblab sysinfo collection to be able to gather logs that were originally blacklisted internally. BUG= chromium:728290 TEST=(1) new unittests. (2) With CL:536257, moblab_RunSuite collects results including autoserv.DEBUG files. Change-Id: I7b569bb73bbf461c4af8d54de3ae2da866565df8 Reviewed-on: https://chromium-review.googlesource.com/585379 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Xixuan Wu <xixuan@chromium.org> [modify] https://crrev.com/0b3ac67a34b83952e395c7baffc46ba3b034f697/client/bin/site_sysinfo_unittest.py [modify] https://crrev.com/0b3ac67a34b83952e395c7baffc46ba3b034f697/client/bin/site_sysinfo.py
,
Aug 6 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/c314a4e179c143f5145181cbb35046740053cf81 commit c314a4e179c143f5145181cbb35046740053cf81 Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Sun Aug 06 07:30:47 2017 [autotest] Fix backwards compatibility of sysinfo excludes. CL:585379 didn't correctly handle backward compatibility requirement of the deprecated additiona_exclude property of sysinfo.logdir FixIt. BUG= chromium:728290 TEST=(fixed) unittests. Change-Id: I1e89ebbe41516587195d3a220b0cb6ef9c18a46b Reviewed-on: https://chromium-review.googlesource.com/602076 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Xixuan Wu <xixuan@chromium.org> [modify] https://crrev.com/c314a4e179c143f5145181cbb35046740053cf81/client/bin/site_sysinfo_unittest.py [modify] https://crrev.com/c314a4e179c143f5145181cbb35046740053cf81/client/bin/site_sysinfo.py
,
Aug 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/78736c7b05d00b30d335128984633eeedd57cb26 commit 78736c7b05d00b30d335128984633eeedd57cb26 Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Tue Aug 08 02:29:16 2017 moblab_RunSuite: Use sysinfo collection to collect moblab logs We always want to collect logs and results generated by the autotest instance running on moblab. This is already supported via sysinfo collection, so use it. BUG= chromium:728290 TEST=trybot Change-Id: I747fb5eda55d835bb1e74cdf675d5048a958a893 Reviewed-on: https://chromium-review.googlesource.com/536257 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org> [modify] https://crrev.com/78736c7b05d00b30d335128984633eeedd57cb26/server/site_tests/moblab_RunSuite/control.dummyServer [modify] https://crrev.com/78736c7b05d00b30d335128984633eeedd57cb26/server/site_tests/moblab_RunSuite/moblab_RunSuite.py [modify] https://crrev.com/78736c7b05d00b30d335128984633eeedd57cb26/server/site_tests/moblab_RunSuite/control.smoke
,
Aug 8 2017
Work here is done. Need to wait for a prod push so that moblab woes get over (unrelated to this CL) and I can verify that logs are being collected correctly.
,
Aug 9 2017
This is actually "done": https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/133966542-chromeos-test/chromeos2-row1-rack8-host1/ There is one loose end: - gs_offloader is tarring up moblab_RunSuite.tgz on the top-level, probably because it decides the logs are too large. I'll file a separate bug to figure out if this is a good idea in general. - I don't know how I found those logs -- a successful CQ run (as in this case) is supposed to delete all logs. It didn't, but meh.
,
Aug 9 2017
Not necessarily in the lab but we need that tar/zip for partners, it reduces the file sizes for CTS suite from 30GB to about 3GB and by reducing the number of small files on the external disk we do not have very long boot up times ( the fsck check seems to take very long time with lots of files ) No problem with changing for the lab but please do not switch off for partners.
,
Aug 9 2017
Ack. I'm not sure I'll chase it in the lab either, unless enough people complain :)
,
Aug 9 2017
Not so fast sherlock. Looking inside the offloaded logs there, the test decided to collect just the log_diff, which didn't include the results sub-folder. Needs one more whack Why does this take so long :(
,
Sep 11 2017
Fixed unless someone else complains. |
||||||||||||||
►
Sign in to add a comment |
||||||||||||||
Comment 1 by haddowk@chromium.org
, May 31 2017