New issue
Advanced search Search tips

Issue 878526 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

CrOS VM logs are captured only some of the time (also cleanup old logs)

Project Member Reported by bpastene@chromium.org, Aug 28

Issue description

See https://chromium-review.googlesource.com/c/chromium/src/+/1193944/6 where a CL kept failing the vm sanity test. Chrome was crashing, and only one of the test runs had the vm logs:
https://isolateserver.appspot.com/browse?namespace=default-gzip&hash=3eb20d0a20213f0d487b9c7ffcd34fd68c7219fd

From there you can see one of the chrome logs w/ the source of the browser crash:
https://isolateserver.appspot.com/browse?namespace=default-gzip&digest=36873fea20303b22607e48344098bec00e582b74&as=chrome_20180828-114804 

Unfortunately, all the other task runs didn't have any logs, eg:
https://isolateserver.appspot.com/browse?namespace=default-gzip&hash=adc857b53ab87de928795a68c13accef21a78d95

I should figure out why we seem to be capturing logs only some of the time. Might be a race condition when the test reaches its timeout.

Also cleaning up some of the old logs wouldn't be a bad idea prior to starting the test.
 
Are we re-using the VM between test runs? A fresh VM should have minimal logs. 

I can have the test runner delete logs before each run?
> Are we re-using the VM between test runs? A fresh VM should have minimal logs.

Yeah. Until a new VM image get published, we reuse the same one. (Though we start/stop it between tests, so it's just files on disk that get persisted.)

> I can have the test runner delete logs before each run?

That would definitely be helpful. Not sure what the cleanest way to do that is tho. If there's no "flush_logs" utility, then maybe just a "rm *" in the log dirs?
Yup, I'll just rm the logs. We do something similar between test runs for other state files:
https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/core/cros_interface.py?l=585-586

That sounds like what we want. Could you add that to cros_run_vm_test? I'm also seeing some VMs that have GBs worth of chrome crashes in /var/spool/crash/, causing the VM to run out of disk space. I bet some change that makes chrome crash gets tested and retried in the CQ, causing those crash dumps to pile up in the VMs.
Labels: -Pri-1 Pri-2
I think this problem should be solved with the copy-on-write CL which we will use on the chrome bots. Devs are probably ok with logs accumulating.
Cc: -achuith@chromium.org bpastene@chromium.org
Owner: achuith@chromium.org
Yep, log cleanup should be a nonissue (for chrome at least) after  bug 887753  is done.

Sign in to add a comment