New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 728290 link

Starred by 5 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 747056



Sign in to add a comment

Always upload moblab logs

Project Member Reported by pprabhu@chromium.org, May 31 2017

Issue description

In the CQ / BVT: We upload the logs from moblab's autotest setup (i.e. logs from /usr/local/autotest/logs and /usr/local/autotest/results/) when the moblab_RunSuite test fails.

We do not upload them if the job is aborted. This means that if something goes wrong so that the run_suite call inside moblab takes too long, our lab aborts the moblab_RunSuite test and we end up with no logs to figure out why moblab didn't run its (internal) tests.

Example aborted test: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=120599578
Example logs (that don't contain the moblab logs): https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/120599578-chromeos-test/chromeos2-row2-rack8-host1/


sbasi@: What would be involved in making this happen?
 
Cc: haddowk@chromium.org
I had a thought about changing the gs bucket to a moblab-lab - if that bucket was writable the gs_offloader would upload the internal test results to that bucket and notify the partner console withing a min or so of the moblab test completing.  However if the DUT was very quickly re-imaged it might fail to upload some DUTS.

My main concern was gs_offloader keeps dying and respawning on the moblab because the configured bucket is read only.

Any concerns about that approach ?
Cc: pho...@chromium.org
What is failing to upload the logs, gs_offloader on the moblab, or gs_offloader on the shard?

Comment 4 by sbasi@chromium.org, May 31 2017

Re comment #1: I think you need to update the logs gatherer script for the MobLab usecase to fetch the logs. I feel like I looked at it a long time ago and it required a lot of refactoring and hence got dropped. Sorry I don't remember exactly where my investigations took me.
Per #3 gs_offloader on the moblab fails because the configured bucket is gs://chromeos-image-archive/ and I believe that is read only to the moblab.

I see this in /var/log/messages
2017-05-31T22:14:57.715832+00:00 WARNING kernel: [ 3718.734683] init: moblab-gsoffloader-init main process (9310) terminated with status 1
2017-05-31T22:14:57.715840+00:00 WARNING kernel: [ 3718.734703] init: moblab-gsoffloader-init main process ended, respawning


It would be possible to set the test up to be much more like the way partners use the moblab and upload the logs.
I really like idea in #2, with a twist

- Provide a path within the target results directory on GS to moblab as its image archive. This way, moblab's gs_offloader will try to offload the results into a subdirectory in the usual results folder directly.
- At the end of a successful / failed test, add an extra step to give gs_offloader on the moblab one more one-off spin to try to offload everything.

In the case of aborted tests, we still don't guarantee that all from-moblab logs are offloaded (since the autoserv process is terminated brutally. My gut feeling is also that fixing this may be non-trivial). But, if the test is aborted due to a timeout, we should get plenty logs.

Comment 7 by aut...@google.com, Jun 2 2017

Labels: -current-issue
Owner: haddowk@chromium.org
Status: Available (was: Untriaged)
Labels: ImpactsCQ
Labels: Chase-Pending
Labels: -ImpactsCQ

Comment 12 by aut...@google.com, Jun 5 2017

Labels: -Chase-Pending
We actually already have a partial solution for this in place.
We symlink the autotest service logs into /var/log/autotest, so that they get collected as part of sysinfo collection.
Technically, the result logs that we want here are part of logs collected from the DUT. So one proposal would be to

- also symlink /usr/local/autotest/results into /var/log/autotest/results and let sysinfo collection do its magic.
- Stop the special-cased log collection from moblab_RunSuite.

Example of logs collected this way: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/120599578-chromeos-test/chromeos2-row2-rack8-host1/sysinfo/var/log/autotest/

sbasi@: wdyt?

This means that we'll get better moblab logs for _all_ tests.
The downside is that /usr/local/autotest/results can be large, and sysinfo log collection restrictions (if any) will apply.
Sounds good to me.
Owner: pprabhu@chromium.org
Status: Started (was: Available)
Posted: https://chromium-review.googlesource.com/c/528471/

If this works, I'll also remove the current result collection logic in moblab_RunSuite because it'll be redundant.
Cc: dshi@chromium.org msartori@chromium.org
 Issue 509766  has been merged into this issue.
#15 didn't go far enough, as expected.

I have an alternative approach: https://chromium-review.googlesource.com/c/536257/

Trying it out now.
Issue 642157 has been merged into this issue.
Update, there was a bug in sysinfo collection (affecting sysinfo collection for all DUTs). I've verified locally that my approach in #17 works along with the fix. I'm running a trybot with both changes now.
Project Member

Comment 20 by bugdroid1@chromium.org, Jul 1 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/2e5c609b2c7a14cff4f51c749ae9d5835b79ff52

commit 2e5c609b2c7a14cff4f51c749ae9d5835b79ff52
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Sat Jul 01 05:24:49 2017

[autotest] Make sysinfo logging more robust

We ignore errors in sysinfo log creation / collection. But we bailed on
the first error encountered so that the rest of the requested loggables
were never run. Since we're anyway ignoring errors, ignore them for each
loggable as well so that we get more information on failures.

BUG= chromium:728290 
TEST=(1) new unittest.
     (2) Run a client test and notice some error messages about (current
         failing) log collection, and verify that other loggables are
         still executed.

Change-Id: Ie09df2d0510f8a225a5d6cdde0f4dc59d55e3c9c
Reviewed-on: https://chromium-review.googlesource.com/558202
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Xixuan Wu <xixuan@chromium.org>

[add] https://crrev.com/2e5c609b2c7a14cff4f51c749ae9d5835b79ff52/client/bin/base_sysinfo_unittest.py
[modify] https://crrev.com/2e5c609b2c7a14cff4f51c749ae9d5835b79ff52/client/bin/base_sysinfo.py

Labels: akeshet-pending-downgrade
ChromeOS Infra P1 Bugscrub.

P1 Bugs in this component should be important enough to get weekly status updates.

Is this already fixed?  -> Fixed
Is this no longer relevant? -> Archived or WontFix
Is this not a P1, based on go/chromeos-infra-bug-slo rubric? -> lower priority.
Is this a Feature Request rather than a bug? Type -> Feature
Is this missing important information or scope needed to decide how to proceed? -> Ask question on bug, possibly reassign.
Does this bug have the wrong owner? -> reassign.

Bugs that remain in this state next week will be downgraded to P2.

Comment 22 Deleted

Blocking: 747056
Actually works with this HACK: https://chromium-review.googlesource.com/c/582507/

Will upload a principled CL to do the same thing.
Re:24 The real CL is https://chromium-review.googlesource.com/c/536257/
but it needs an extra hack.
Project Member

Comment 26 by bugdroid1@chromium.org, Jul 27 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/aab59c0517e31caabc4941eea32774a1b48410c7

commit aab59c0517e31caabc4941eea32774a1b48410c7
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Thu Jul 27 05:41:53 2017

Make sysinfo collection more aggressive about resolving source paths

Two changes really:
- Resolve symlinks to determine the source path.
- Don't apply path exclusions to the leading prefix of the source path,
  only within the source root.

+ unittests for old and new behaviour.

BUG= chromium:728290 
TEST=(new) unittests.

Change-Id: I43a773d695d2062eceefe7a6b65bc99ae3d920c1
Reviewed-on: https://chromium-review.googlesource.com/584087
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Xixuan Wu <xixuan@chromium.org>

[modify] https://crrev.com/aab59c0517e31caabc4941eea32774a1b48410c7/client/bin/site_sysinfo_unittest.py
[modify] https://crrev.com/aab59c0517e31caabc4941eea32774a1b48410c7/client/bin/site_sysinfo.py

#27 is getting reverted because of  issue 750254 

I need to re-land in two steps:

- First add the sysinfo support.
- push-to-prod so that the stupid paygen tests can use the new import
- then land the use in the autoupdate control files.
Project Member

Comment 29 by bugdroid1@chromium.org, Aug 4 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/0b3ac67a34b83952e395c7baffc46ba3b034f697

commit 0b3ac67a34b83952e395c7baffc46ba3b034f697
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Fri Aug 04 02:23:14 2017

Allow overriding exclude paths for sysinfo collection

Before this CL, we could specify additional patterns for exclusions
during sysinfo rsync. This CL adds the ability to override the list
entirely.

This is needed for moblab sysinfo collection to be able to gather logs
that were originally blacklisted internally.

BUG= chromium:728290 
TEST=(1) new unittests.
     (2) With CL:536257, moblab_RunSuite collects results including
         autoserv.DEBUG files.

Change-Id: I7b569bb73bbf461c4af8d54de3ae2da866565df8
Reviewed-on: https://chromium-review.googlesource.com/585379
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Xixuan Wu <xixuan@chromium.org>

[modify] https://crrev.com/0b3ac67a34b83952e395c7baffc46ba3b034f697/client/bin/site_sysinfo_unittest.py
[modify] https://crrev.com/0b3ac67a34b83952e395c7baffc46ba3b034f697/client/bin/site_sysinfo.py

Project Member

Comment 30 by bugdroid1@chromium.org, Aug 6 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/c314a4e179c143f5145181cbb35046740053cf81

commit c314a4e179c143f5145181cbb35046740053cf81
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Sun Aug 06 07:30:47 2017

[autotest] Fix backwards compatibility of sysinfo excludes.

CL:585379 didn't correctly handle backward compatibility requirement of
the deprecated additiona_exclude property of sysinfo.logdir

FixIt.

BUG= chromium:728290 
TEST=(fixed) unittests.

Change-Id: I1e89ebbe41516587195d3a220b0cb6ef9c18a46b
Reviewed-on: https://chromium-review.googlesource.com/602076
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Xixuan Wu <xixuan@chromium.org>

[modify] https://crrev.com/c314a4e179c143f5145181cbb35046740053cf81/client/bin/site_sysinfo_unittest.py
[modify] https://crrev.com/c314a4e179c143f5145181cbb35046740053cf81/client/bin/site_sysinfo.py

Project Member

Comment 31 by bugdroid1@chromium.org, Aug 8 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/78736c7b05d00b30d335128984633eeedd57cb26

commit 78736c7b05d00b30d335128984633eeedd57cb26
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Tue Aug 08 02:29:16 2017

moblab_RunSuite: Use sysinfo collection to collect moblab logs

We always want to collect logs and results generated by the autotest
instance running on moblab. This is already supported via sysinfo
collection, so use it.

BUG= chromium:728290 
TEST=trybot

Change-Id: I747fb5eda55d835bb1e74cdf675d5048a958a893
Reviewed-on: https://chromium-review.googlesource.com/536257
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org>

[modify] https://crrev.com/78736c7b05d00b30d335128984633eeedd57cb26/server/site_tests/moblab_RunSuite/control.dummyServer
[modify] https://crrev.com/78736c7b05d00b30d335128984633eeedd57cb26/server/site_tests/moblab_RunSuite/moblab_RunSuite.py
[modify] https://crrev.com/78736c7b05d00b30d335128984633eeedd57cb26/server/site_tests/moblab_RunSuite/control.smoke

Work here is done. Need to wait for a prod push so that moblab woes get over (unrelated to this CL) and I can verify that logs are being collected correctly.
Status: Verified (was: Started)
This is actually "done": https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/133966542-chromeos-test/chromeos2-row1-rack8-host1/

There is one loose end:
- gs_offloader is tarring up moblab_RunSuite.tgz on the top-level, probably because it decides the logs are too large.
I'll file a separate bug to figure out if this is a good idea in general.

- I don't know how I found those logs -- a successful CQ run (as in this case) is supposed to delete all logs. It didn't, but meh.
Not necessarily in the lab but we need that tar/zip for partners, it reduces the file sizes for CTS suite from 30GB to about 3GB and by reducing the number of small files on the external disk we do not have very long boot up times ( the fsck check seems to take very long time with lots of files )

No problem with changing for the lab but please do not switch off for partners.
Ack. I'm not sure I'll chase it in the lab either, unless enough people complain :)
Status: Started (was: Verified)
Not so fast sherlock.

Looking inside the offloaded logs there, the test decided to collect just the log_diff, which didn't include the results sub-folder.

Needs one more whack
Why does this take so long :(
Status: Fixed (was: Started)
Fixed unless someone else complains.

Sign in to add a comment