calling login_LoginSuccess in autoupdate_EndToEndTest instead of just running it in bvt-inline hides failures from wmatrix and goldeney |
|||||||
Issue description
Login is tested everywhere, so one would think the failure to login would show up in one of these suites:
login_LoginSuccess/control:ATTRIBUTES = "suite:bvt-inline, suite:push_to_prod, suite:smoke"
But the first caller of LoginSuccess is not one of the listed suites, but
autoupdate_EndToEndTest.py: client_at.run_test('login_LoginSuccess', arc_mode='enabled')
But autoupdate_EndToEndTest is not run in any standard suite (but dynamic suite paygen_au_stable). One needs to search for these results explicitly which then shows Intel was dead, ARM limping along:
https://wmatrix.googleplex.com/unfiltered?hide_missing=True&tests=autoupdate_EndToEndTest.%2A&releases=53&days_back=30
Now that we have introduced strong survivor bias the only way login_LoginSuccess can fail is by flaking! Indeed if looking at the results of login_LoginSuccess for the .86 build they all pass on ARM devices but miss any (red, blue, yellow, green) result boxes on Intel devices. This is because on Intel the test was never scheduled inside of a suite like bvt-inline (as it is known to be failing); and the call from autoupdate_EndToEndTest was not recorded in the tko result DB (I think at least it should have been recorded as part of paygen_au_stable?):
https://wmatrix.googleplex.com/unfiltered?hide_missing=True&tests=login_LoginSuccess&releases=53&days_back=30
Now seemingly the easiest would have been to remove login_LoginSuccess and give let later stages of testing plenty of opportunity to discover the failure.
,
Oct 6 2016
This type of failure is not completely hidden. It is shown as missing when we create a GoldenEye release for a build (which is when test/TPMs check it out in the morning) Compare the Installer Test column when you go to here and click on .86 stable https://cros-goldeneye.corp.google.com/chromeos/console/listBuild?milestone=53#/details with the AU Tests column when you create a GoldenEye release for the same build: https://cros-goldeneye.corp.google.com/chromeos/console/qaRelease?releaseName=M53-STABLE-4
,
Oct 6 2016
I would imagine a failing test and missing test both show up as red in GoldenEye under the AU Tests column, so if it was ignored for missing it probably would have been ignored for failed.
,
Oct 6 2016
It's not clear to me what change is actually being requested here. Can someone describe in more exact detail what's wrong? In general, tests are allowed to call other tests. In particular, it's _not_ a bug that the AU tests exercise login through LoginSuccess. If the problem is that wmatrix or GE are reporting unwanted results, it would help to have examples to motivate what needs to change.
,
Oct 12 2016
Back to ihf@, can you please clarify based on the question above?
,
Oct 13 2016
I thought about it and nobody has brought compelling reasons why login should be tested in the autoupdate test. It does not belong there. (Notice nobody suggests cramming all tests from all suites in there either.) So my proposal is to remove login form autoupdate test and rerun the build on the failing R53 build as discussed here https://chromium-review.googlesource.com/#/c/397867/
,
Oct 14 2016
I don't get what you're trying to achieve with this change Ilja? The Paygen phase failing doesn't stop bvt-inline from running. bvt-inline didn't get kicked off on SOME boards because the sanity (provision) phase failed. https://uberchromegw.corp.google.com/i/chromeos_release/builders/buddy-release%20release-R53-8530.B/builds/74 https://uberchromegw.corp.google.com/i/chromeos_release/builders/gandof-release%20release-R53-8530.B/builds/73 https://uberchromegw.corp.google.com/i/chromeos_release/builders/mccloud-release%20release-R53-8530.B/builds/73 In these cases, removing login_LoginSuccess from autoupdate_EndToEndTest would have no effect because it still wouldn't have been run in bvt-inline. On other Intel boards (e.g link), the sanity phase did pass and bvt-inline was kicked off BUT it didn't run all of the bvt-inline tests including login_LoginSuccess. https://uberchromeos-server38.corp.google.com/new_tko/#tab_id=spreadsheet_view&row=job_name%252Ctest_name&column=status&show_incomplete=true&show_only_latest=false&show_invalid=true&condition=job_name%2520LIKE%2520%27link-release/R53-8530.86.0/bvt-inline/%2525%27%2520AND%2520test_name%2520%253C%253E%2520%27SERVER_JOB%27%2520AND%2520test_name%2520NOT%2520LIKE%2520%27CLIENT%255C_JOB%2525%27
,
Oct 14 2016
1) Good point. provision phase failed due to another "Can Chrome login?" check. This needs to be removed as well. https://bugs.chromium.org/p/chromium/issues/detail?id=597499#c31 https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/client/common_lib/cros/autoupdater.py?q=%22Chrome+failed+to+reach+login+screen%22+package:%5Echromeos_public$&l=667 2) You do understand that this whole postmortem is about tko/wmatrix/goldeneye not showing missing tests? Hence giving me a link to tko misses the point. If you look at the link to the link board you will see that all tests that it shows are "green". Yet if you find the parent suite you will see that some tests failed (provision error due to lab problems!), while others were aborted due to 1800s timeout http://cautotest/afe/#tab_id=view_job&object_id=76996110 Yes it did not run login_LoginSuccess because it was a victim of the timeout. I installed the R53-8530.86.0 image just fine. The screen is black. The reason for tests being very slow and hence the suite timeout is that Chrome continuously crashes a few times a second slowing everything to a crawl writing dumps etc. Interestingly login_LoginSuccess is very slow, but it passes, just like several other tests that I would have expected to catch the problem (even though Chrome crashed a few hundred times during the test): localhost autotest # bin/autotest tests/login_LoginSuccess/control DEBUG:root:Failed to import elasticsearch. Mock classes will be used and calls to Elasticsearch server will be no-op. Test run is not affected by the missing elasticsearch module. 23:07:05 INFO | Writing results to /usr/local/autotest/results/default 23:07:05 ERROR| Symlink init scripts failed with [Errno 30] Read-only file system 23:09:17 INFO | START ---- ---- timestamp=1476425357 localtime=Oct 13 23:09:17 23:09:18 INFO | File /var/lib/cleanup_logs_paused was already present 23:09:18 INFO | START login_LoginSuccess login_LoginSuccess timestamp=1476425358 localtime=Oct 13 23:09:18 23:09:24 INFO | ChromeOS BOARD = link_1.8GHz_4GB 23:09:25 INFO | Chose browser: PossibleCrOSBrowser(browser_type=system) 23:09:25 INFO | (Re)starting the ui (logs the user out) 23:09:28 INFO | Restarting Chrome with flags and login 23:09:30 INFO | Invoking Oobe.loginForTesting 23:09:41 INFO | Browser is up! 23:09:41 INFO | OS: chromeos [...] (hundreds if not thousands of Chrome crashes happening here) 23:09:42 INFO | (Re)starting the ui (logs the user out) 23:11:10 INFO | GOOD login_LoginSuccess login_LoginSuccess timestamp=1476425470 localtime=Oct 13 23:11:10 completed successfully 23:11:10 INFO | END GOOD login_LoginSuccess login_LoginSuccess timestamp=1476425470 localtime=Oct 13 23:11:10 23:11:10 INFO | Leaving existing /var/lib/cleanup_logs_paused file 23:11:11 INFO | END GOOD ---- ---- timestamp=1476425470 localtime=Oct 13 23:11:10 3) Here is another interesting observation. All login_LoginSuccess control files have the same NAME. This can't possibly be good. Do we run login_LoginSuccess sometimes 50 times and sometimes once? ihf@ql:~/cros_tot/src/third_party/autotest/files/client/site_tests/login_LoginSuccess$ grep NAME control* control:NAME = "login_LoginSuccess" control.stress:NAME = "login_LoginSuccess" control.stress2:NAME = "login_LoginSuccess" control.stress3:NAME = "login_LoginSuccess" 4) Now graphics_WebGLAquarium, which is part of bvt-cq which did not run, does detect problems: 01:00:22 INFO | autoserv| STATUS: FAIL graphics_WebGLAquarium graphics_WebGLAquarium timestamp=1476424649 localtime=Oct 14 00:57:29 Could not get idle CPU. It also reports a gazillion Chrome crashes and GPU hangs. [ 4269.655314] chrome[6148]: segfault at 101b2d145000 ip 00007f3bf6de29a4 sp 00007ffc9b846450 error 4 in libEGL.so.1.0.0[7f3bf6dd6000+1d000] 5) If I remove the idle CPU etc. checks from WebGLAquarium it still correctly detects the problem: TestFail: Could not get FPS count. 6) A philosophical question. If you still think that calling login_LoginSuccess (plus starting ARC!) is a good idea in update/provision tests, then why not add the whole bvt suite there? Do you agree that a test should focus on testing minimal functionality to be able to reason about its outcome? 7) Side note: I also want bvt-cq to run whenever possible. There is no reason for early aborts and bvt-cq is what Goldeneye reports. Also notice WebGLAquarium in bvt-cq would have detected the problem.
,
Oct 14 2016
,
Oct 14 2016
Blerg, yes for no.2 I actually had the AFE page open in one tab but ended up pasting the TKO link (too many cautotest tabs open while I was replying ;)
,
Oct 14 2016
I get what you are trying to do now but as mentioned it wouldn't have had the desired effect since provision/sanity failing blocked bvt-inline anyway. Richard can hopefully comment about the chrome check during provision.
,
Oct 14 2016
> 1) Good point. provision phase failed due to another > "Can Chrome login?" check. This needs to be removed as well. No, this isn't a check for "can Chrome login". This is a check for "did chrome start". Moreover, the particular check isn't even an error check; it's diagnosis. The real error is that the image wasn't marked good, and if the DUT is rebooted, it could roll back to the previous build. That's a critical failure, and it can only be reliably detected during provisioning: We _must_ make the check during provisioning, and we _must_ fail the build when that check fails. IIUC, the key problem here is simply this: The Chrome bug in question left the system unstable, but able to limp along enough to pass some tests sometimes. The result was that the symptom of the bug was not "tests failed" but rather "tests didn't run". And the interactions among dynammic suites, TKO, and GoldenEye are such that our reporting system DOESN'T REPORT THE PROBLEM. So, the question we ought to be discussing is "how do we plumb the fact that tests didn't run from dynamic suites (which knows about it) to GoldenEye (which doesn't get told)".
,
Oct 15 2016
I disagree that "did Chrome start" is a meaningful question, as we are interested in the correct completion of a small but meaningful task. (Starting is not meaningful as I can point the CPU at any random bits.) Worse, the call is
client_at.run_test('login_LoginSuccess', arc_mode='enabled')
Clearly the intention is to test ARC in some way, which has absolutely nothing to do with autoupdate. Even if arc_mode was not enabled the rollback to the previous image would happen only due to problems *before* the login screen - no reason to login. What bugs me with requiring Chrome is that about half of all tests (graphics and kernel) don't care about Chrome and even blow it away (stop ui) to do their testing. We could make it the default on test images to "stop ui" and only "start ui" for tests that instantiate chrome.Chrome(). Then a crashing Chrome will stop causing fear for everyone else.
Anyhow, I do agree with the "the problem is not reporting" part. There are multiple levels to "not reporting". I will make sure this gets tracked.
,
Jul 17 2017
ChromeOS Infra P1 Bugscrub. P1 Bugs in this component should be important enough to get weekly status updates. Is this already fixed? -> Fixed Is this no longer relevant? -> Archived or WontFix Is this not a P1, based on go/chromeos-infra-bug-slo rubric? -> lower priority. Is this a Feature Request rather than a bug? Type -> Feature Is this missing important information or scope needed to decide how to proceed? -> Ask question on bug, possibly reassign. Does this bug have the wrong owner? -> reassign. Bugs that remain in this state next week will be downgraded to P2.
,
Jul 24 2017
ChromeOS Infra P1 Bugscrub. Issue untouched in a week after previous message. Downgrading to P2.
,
Sep 19 2017
What ever happened here?
,
Sep 28 2017
The problem still exists. The current call sequence circumvents reporting of problems to goldeneye.
,
Sep 28 2017
> The current call sequence circumvents reporting of problems to goldeneye.
What does "circumvents reporting" mean here? And what's the mechanism?
I've reviewed the bug history, and I've yet to see a clear explanation
of what problem is meant to be solved here. The subject implies that
login_LoginSuccess isn't run from bvt-inline, but that's never been
true.
Regarding, specifically, the invocation of login_LoginSuccess from AU
tests, the history is this:
* There was an AU bug where the presence of a user profile broke
AU. I forget the exact symptom, but the problem could only
occur if a user had already logged in at least once.
* The bug slipped through AU testing because there was no
coverage of "update from a system with an existing profile".
Indeed, since AU testing always starts from a fresh system,
the only test case was "update from a system at OOBE."
This bug left us with a dilemma: AU tests are expensive, so we
can't test every system state that might affect update. However,
the bug was important, and we'd missed it merely for lack of coverage.
We decided that testing with a user profile present was more important
that testing with no profile, so we changed the test to make sure
there would be a profile present during update.
I'm still happy with the decision to change the test, and I don't
think we should change the coverage without a compelling reason.
,
Jun 8 2018
Hi, this bug has not been updated recently. Please acknowledge the bug and provide status within two weeks (6/22/2018), or the bug will be archived. Thank you. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by jrbarnette@chromium.org
, Oct 6 2016> But the first caller of LoginSuccess is not one of the listed suites, but > autoupdate_EndToEndTest.py: client_at.run_test('login_LoginSuccess', arc_mode='enabled') I'm not sure of the basis for this assertion. As a general rule, there's no single test guaranteed to be first. "First" doesn't really even have meaning in a world where different DUTs have a different "first test", and there's no fixed set of tests run on any give set of DUTs. In practical terms, the 'bvt-inline' test suite usually runs to completion prior to the Paygen tests, so on a typical BVT run, the login_LoginSuccess test will run on some DUT prior to the Paygen test running on any DUT. However, there's no guarantee that the tests will run on the same DUT.