New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 885368 link

Starred by 1 user

Issue metadata

Status: Started
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

betty-arcnext-chrome-pfq VMTest timing out

Project Member Reported by steve...@chromium.org, Sep 18

Issue description

betty-arcnext-chrome-pfq VMTest timed out the last three runs:
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?id=2951782
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8935069237701676848
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8935044147339897552

Logs include:
TimeoutError: Timeout occurred- waited 5400.0 seconds. Reached VMTestStage test run timeout.

test_that log output appears to go for around 70 minutes, then receives an interrupt, which seems consistent with a timeout. Presumably the other 20 minutes were spent provisioning?

09/18 02:41:56.029 WARNI| test_runner_utils:0648| Received SIGINT or SIGTERM. Cleaning up and exiting.
09/18 02:41:56.029 WARNI| test_runner_utils:0652| Sending SIGINT to autoserv process. Waiting up to 5 seconds for cleanup.

Also, both tests indicate that VMTest is taking 12/13% longer than average, and the previous test run completed in ~52 minutes.

Could be a load issue, or changes to the tests

One possible culprit autotest change:
https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/1226161

 
Cc: bpastene@chromium.org ihf@chromium.org achuith@chromium.org
Note: This is the first betty-arcnext-chrome-pfq that started timing out:

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?id=2951782

Also, betty-arc64-release appears to be timing out in VMTest as well:

https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=betty-arc64-release&buildBranch=master

https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/1226161 just exempts a script that is running anyway from a security test, I think it is unlikely to be the culprit.
Cc: kroot@chromium.org
Cc: kinaba@chromium.org
Owner: ihf@chromium.org
Status: Started (was: Untriaged)
It looks like there is a crash somewhere confusing tradefed, even though it looks like everything [151/151] is passing.


09/18 03:21:34.387 INFO |             utils:0287| [stdout] 09-18 10:21:34 I/ConsoleReporter: [151/151 armeabi-v7a CtsAccountManagerTestCases 127.0.0.1:9227] android.accounts.cts.AccountManagerTest#testNewChooseAccountIntentDepracated pass
09/18 03:21:34.387 INFO |             utils:0287| [stdout] 09-18 10:21:34 I/ConsoleReporter: [127.0.0.1:9227] Test run failed to complete. Expected 151 tests, received 20. onError: commandError=false message=INSTRUMENTATION_ABORTED: System has crashed.
I guess #5 is b/115944638?
This type of failure was not expected to be surfaced as failure, but b/115944638? turned it to FAIL (and I already landed a remedy.)

Yes, but apart from that, the crash is concerning.
I at least found 3 different crashes hidden behind and started analyzing them on b/115944638. Perhaps more to come.

Teardown-time crash is observed elsewhere b/116009991#comment9
and for that one the fix has landed b/115949068 (which is one of the bugs split out from b/115944638).

#5 may be the same one.
(I'm always having hard time to find the actual failing test from VMTest, so I haven't looked into the failure log yet, though...)
Yeah
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?id=2951782
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8935069237701676848
those were due to b/115949068.


Summary for #comment5:

https://chromium-review.googlesource.com/c/1229814/   (R71-11078.0.0)
should be able to mask the crash as warning, rather than a failure.

Fix for the root cause (the system crash) is fixed on ToT ARC-P image.
We though need a green Android PFQ for the change to be propagate to the Chrome OS tree:
https://bugs.chromium.org/p/chromium/issues/detail?id=884828


tradefed looks recovered from the crash and run [151/151] cases but one of them looks to have crashed. (I/ConsoleReporter are somewhat sorted and confusing.)
Not sure about the original timeout issue, but if that crash+retry was the problem, green Android-pi PFQ should fix the trouble, I hope.
Don't worry, if it doesn't fix the issue we can just swap this out for another stable test. But for consistency it might be worth waiting for the fixes and keeping it as it is.

Sign in to add a comment