Make tast.py not report not-run tests as failed |
||
Issue descriptionWhen a Tast test that was expected to run doesn't get executed (typically due to the DUT dying in the middle of the run), the tast.py Autotest code currently writes a FAIL entry with a "Test was not run" error for it in the status.log file. If this happens due to an error that was encountered while the tast process was running, tast.py would have previously raised an exception to make the main "tast" Autotest test fail. In other words, we should already be notified about the actual error that caused the test to not be run. With that background out of the way, I'm wondering if there's a better way to handle individual tests that weren't run. Saying that they failed isn't correct; they didn't even get started. My main goal is making sure that Tast test results are reported accurately in Stainless and in other tools that might be based on its BigQuery tables. Stainless has an internal "NOT_RUN" pseudo-status that it uses if the TKO job can't be found or the job failed during provisioning, but I think that the only statuses used in Autotest are "TEST_NA", "ABORT", "ERROR", "FAIL", "WARN", "GOOD", and "ALERT". Stainless's queries appear to use the following mapping: GOOD -> pass WARN -> warn FAIL, ERROR, ABORT -> fail NOT_RUN -> notrun everything else -> other In issue 846895, ihf@ argues that tests should only pass or fail. There's also some discussion in https://crrev.com/c/1018588 and https://crrev.com/c/1046105. I'm now thinking it would be best for status.log to omit tests that didn't get started (for whatever reason). To make sure that we won't miss errors in Tast itself, I'd update tast.py to also make the main Autotest test fail if any Tast test results are missing even if the tast process exited successfully. Does that sound reasonable?
,
Jul 19
If the Tast tests are part of the CQ or a PFQ or on canaries, then the guardian responsible for those builders should be on the hook for investigating run-level failures in the main Tast autotest. I'm not knowledgeable about the current Autotest graphics guardian role. Do the tests run as part of a dedicated suite? How are failures reported? With ~250 runs of tast.must_pass daily, I see a couple of instances of tests not getting started due to DUT issues per day. It's enough to be a bit annoying -- I'm trying to find flaky tests, but I also see failures in completely-innocent tests that didn't get run. I don't think that having a deterministic ordering of tests really helps much here, unfortunately. A DUT can keel over at any time, so I really need to look at the full.txt log file written by the tast process for more clues about where things went south.
,
Jul 19
The deterministic ordering helps with reproducibility. With the current implementation I see a failing graphics test. I click on it. I see from the failure message and logs that it didn't run, because an earlier test caused problems. I file a bug and assign it to that team. I nag them until they fix it. With the new implementation graphics tests don't show up in the dashboard. I sleep soundly as there are no failures. Even if I suspect that something is wrong because I have not seen a graphics test result in weeks, where do I start? Current example issue 865563 .
,
Jul 19
Issue 865563 seems like an Autotest scheduling issue, right? If I'm interpreting it correctly, this wouldn't help there, since tast.py would never get run in the first place. My feeling here is that we need some other process in place to detect cases where a test unexpectedly and consistently isn't getting run. I can think of many reasons why this could happen: a) Someone changed the suite's test attribute expression and the test is no longer matched b) Autotest isn't scheduling tast.py for some reason c) Someone changed the test's attributes such that it's no longer matched by the expression d) There's a bug in Tast itself that results in the test not being matched e) There's a bug in Tast or in an earlier test that consistently makes the test not get run I'm unsure of the relative probability of these, but the only one that would be caught by tast.py reporting not-run tests as failing is e). Making tast.py always report overall failure for missing tests (like I'm proposing) would result in e) being brought to people's attention as long as someone is responsible for the success of the overall Autotest server test. Do you have thoughts about where graphics tests would run?
,
Jul 20
Yes, it is an autotest scheduling issue, but it took 10 days to notice that nothing was running. Also the bvt was red, and still nobody noticed. Tests that are absent are as good as test that are passing. And this might add one more reason. I think there are good reasons for your proposal. I am not completely against it. And maybe we can come up with a workflow that is intuitive.
,
Jul 20
I haven't completely convinced myself that this is a good idea yet, but would you be more comfortable with the proposed change if there were a new attribute that could be set in a test to tell tast.py to stick with the current behavior of reporting a failure when the test isn't run?
,
Jul 20
No, I think you should implement your proposed change and keep tast as simple as possible.
,
Jul 20
Thanks, I appreciate that. :-)
,
Jul 20
Hmm. I'm actually afraid removing skipped tests from status.log might make them far less visible. I guess showing them as NOT_RUN in reporting is good from user perspective. But yes, as Dan pointed out in #c4, even if we generate NOT_RUN entries it is not enough to catch all tests not run for various reasons, so our direction is to catch them in other means, right? I'm not sure what is a good way to do that, but then I think it's okay to just remove entries...
,
Jul 20
Displaying them as NOT_RUN is fine with me too, but I wasn't sure how I could do that solely by changing the way that they're logged in status.log. I suspect that Autotest won't be happy if I start using a new status code for this. I think that this is the expression that Stainless currently uses for NOT_RUN:
IF(tj.tko_job_id IS NULL OR tj.first_bad_test.test = "provision",
"NOT_RUN",
IFNULL(tj.first_bad_test.status, "GOOD")) AS status,
,
Jul 25
It's possible to update Stainless to treat some failures as NOT_RUN (e.g. if a reason message is "Test was not run"), but it would be confusing if a test is reported as "not run" in Stainless but as "fail" in other tools like GoldenEye. One idea: how about recording the names of not-run tests in TKO keyvals? Then tools aware of the keyvals can show not-run entries.
,
Jul 25
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/e64ebab7c783f2aea26b7f5d2f61aca76c839a8c commit e64ebab7c783f2aea26b7f5d2f61aca76c839a8c Author: Daniel Erat <derat@chromium.org> Date: Wed Jul 25 07:14:59 2018 autotest: Make tast.py not write statuses for missing tests. Make the "tast" server test avoid writing tests with missing results to status.log. Missing results typically indicate earlier DUT flakiness, and it can be confusing to report that a test failed when it didn't even get a chance to run. Also make the server test itself report failure whenever any test results are missing or incomplete. This highlights the underlying issue that needs to be investigated. Finally, improve unit test coverage of messages attached to TestFail exceptions (since these are displayed in builders). BUG= chromium:865564 TEST=updated/added unit tests Change-Id: Id14dd2de89cb84c6f8db3154ae78066f6dee8b6b Reviewed-on: https://chromium-review.googlesource.com/1144648 Commit-Ready: Dan Erat <derat@chromium.org> Tested-by: Dan Erat <derat@chromium.org> Reviewed-by: Ilja H. Friedel <ihf@chromium.org> [modify] https://crrev.com/e64ebab7c783f2aea26b7f5d2f61aca76c839a8c/server/site_tests/tast/tast_unittest.py [modify] https://crrev.com/e64ebab7c783f2aea26b7f5d2f61aca76c839a8c/server/site_tests/tast/tast.py
,
Jul 26
#11: Sure! I'm still not super-familiar with the TKO format, though. Are you suggesting passing an arbitrarily-named key/value pair via the 'fields' arg at http://cs/chromeos_public/src/third_party/autotest/files/client/common_lib/base_job.py?l=463&rcl=7182da3082d29b39f9ee4c1c6b7d29b7c472d72c ? For example, should I pass something like {'not_run': '1'}, and then Stainless's table builder could be updated to use that to add a NOT_RUN entry for the test? If that's correct, do you have opinions about whether I should report not-run tests as passing or failing? And should the key/value pair be logged in the START event or in the END FAIL/GOOD event? (Or in the inner FAIL event, if these tests should be logged as having failed?)
,
Jul 30
TKO job keyvals are something written as "keyval" file, e.g. https://storage.cloud.google.com/chromeos-autotest-results/220790363-chromeos-test/chromeos2-row3-rack10-host11/keyval (note there are also "keyval" files in test subdirs but they are different) keyval file is read by TKO parser, saved to tko_job_keyvals in TKO DB, and archived to BigQuery tables. https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/tko/db.py?l=547
,
Aug 1
I sent a change to implement #c14 though I'm not sure if you like the idea... https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/1158121
,
Aug 1
Sounds great to me -- thanks!
,
Aug 7
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/c22d07d9cf20bc48e918a96d6861a0fed41d2b05 commit c22d07d9cf20bc48e918a96d6861a0fed41d2b05 Author: Shuhei Takahashi <nya@chromium.org> Date: Tue Aug 07 05:11:35 2018 tast: Record missing tests in job keyvals. Those keyvals are stored in tko_job_keyvals table in TKO DB and archived to BigQuery tables. This information can be used by dashboards like Stainless to show scheduled-but-not-run Tast tests. BUG= chromium:865564 TEST=test_that DUT tast.informational # w/ mod to set timeout to 1s TEST=tast_unittest.py Change-Id: I9e8b4c438d68afcc9cd30d0ac1ed711d3f1a6f14 Reviewed-on: https://chromium-review.googlesource.com/1158121 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Shuhei Takahashi <nya@chromium.org> Reviewed-by: Shuhei Takahashi <nya@chromium.org> [modify] https://crrev.com/c22d07d9cf20bc48e918a96d6861a0fed41d2b05/server/site_tests/tast/tast_unittest.py [modify] https://crrev.com/c22d07d9cf20bc48e918a96d6861a0fed41d2b05/server/site_tests/tast/tast.py
,
Aug 15
|
||
►
Sign in to add a comment |
||
Comment 1 by ihf@chromium.org
, Jul 19