New issue
Advanced search Search tips

Issue 865564 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 15
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug


Participants' hotlists:
Hotlist-Tast


Sign in to add a comment

Make tast.py not report not-run tests as failed

Project Member Reported by derat@chromium.org, Jul 19

Issue description

When a Tast test that was expected to run doesn't get executed (typically due to the DUT dying in the middle of the run), the tast.py Autotest code currently writes a FAIL entry with a "Test was not run" error for it in the status.log file.

If this happens due to an error that was encountered while the tast process was running, tast.py would have previously raised an exception to make the main "tast" Autotest test fail. In other words, we should already be notified about the actual error that caused the test to not be run.

With that background out of the way, I'm wondering if there's a better way to handle individual tests that weren't run. Saying that they failed isn't correct; they didn't even get started. My main goal is making sure that Tast test results are reported accurately in Stainless and in other tools that might be based on its BigQuery tables.

Stainless has an internal "NOT_RUN" pseudo-status that it uses if the TKO job can't be found or the job failed during provisioning, but I think that the only statuses used in Autotest are "TEST_NA", "ABORT", "ERROR", "FAIL", "WARN", "GOOD", and "ALERT". Stainless's queries appear to use the following mapping:

GOOD -> pass
WARN -> warn
FAIL, ERROR, ABORT -> fail
NOT_RUN -> notrun
everything else -> other

In issue 846895, ihf@ argues that tests should only pass or fail. There's also some discussion in https://crrev.com/c/1018588 and https://crrev.com/c/1046105.

I'm now thinking it would be best for status.log to omit tests that didn't get started (for whatever reason). To make sure that we won't miss errors in Tast itself, I'd update tast.py to also make the main Autotest test fail if any Tast test results are missing even if the tast process exited successfully.

Does that sound reasonable?
 
I understand the argument and there are good pros and cons.

I still think it is reasonable for these tests to fail, even though they just failed to run due to some precondition. This can be indicated in the failure reason string. Failing is the only way to force humans to notice the problem and correct the failing behavior quickly, to get back into a regularly passing state. (The graphics guard will not monitor the parent job, only individual graphics tests. If tast doesn't run them, we are most likely not going to investigate why it doesn't do it. So you will need a separate human rotation for the parent.) Hunting for unreported/missing tests is pure pain.

Notice that ordering tests should have made this situation deterministic (unlike autotest with its dynamic scheduling which randomizes the test outcomes).

Then again, how often does this happen right now, e.g. how big is the problem?
If the Tast tests are part of the CQ or a PFQ or on canaries, then the guardian responsible for those builders should be on the hook for investigating run-level failures in the main Tast autotest.

I'm not knowledgeable about the current Autotest graphics guardian role. Do the tests run as part of a dedicated suite? How are failures reported?

With ~250 runs of tast.must_pass daily, I see a couple of instances of tests not getting started due to DUT issues per day. It's enough to be a bit annoying -- I'm trying to find flaky tests, but I also see failures in completely-innocent tests that didn't get run.

I don't think that having a deterministic ordering of tests really helps much here, unfortunately. A DUT can keel over at any time, so I really need to look at the full.txt log file written by the tast process for more clues about where things went south.
The deterministic ordering helps with reproducibility.

With the current implementation I see a failing graphics test. I click on it. I see from the failure message and logs that it didn't run, because an earlier test caused problems. I file a bug and assign it to that team. I nag them until they fix it.

With the new implementation graphics tests don't show up in the dashboard. I sleep soundly as there are no failures. Even if I suspect that something is wrong because I have not seen a graphics test result in weeks, where do I start? Current example  issue 865563 .
 Issue 865563  seems like an Autotest scheduling issue, right? If I'm interpreting it correctly, this wouldn't help there, since tast.py would never get run in the first place.

My feeling here is that we need some other process in place to detect cases where a test unexpectedly and consistently isn't getting run. I can think of many reasons why this could happen:

a) Someone changed the suite's test attribute expression and the test is no longer matched
b) Autotest isn't scheduling tast.py for some reason
c) Someone changed the test's attributes such that it's no longer matched by the expression
d) There's a bug in Tast itself that results in the test not being matched
e) There's a bug in Tast or in an earlier test that consistently makes the test not get run

I'm unsure of the relative probability of these, but the only one that would be caught by tast.py reporting not-run tests as failing is e).

Making tast.py always report overall failure for missing tests (like I'm proposing) would result in e) being brought to people's attention as long as someone is responsible for the success of the overall Autotest server test. Do you have thoughts about where graphics tests would run?
Yes, it is an autotest scheduling issue, but it took 10 days to notice that nothing was running. Also the bvt was red, and still nobody noticed. Tests that are absent are as good as test that are passing. And this might add one more reason.

I think there are good reasons for your proposal. I am not completely against it. And maybe we can come up with a workflow that is intuitive.
I haven't completely convinced myself that this is a good idea yet, but would you be more comfortable with the proposed change if there were a new attribute that could be set in a test to tell tast.py to stick with the current behavior of reporting a failure when the test isn't run?
No, I think you should implement your proposed change and keep tast as simple as possible.
Thanks, I appreciate that. :-)
Hmm. I'm actually afraid removing skipped tests from status.log might make them far less visible. I guess showing them as NOT_RUN in reporting is good from user perspective.

But yes, as Dan pointed out in #c4, even if we generate NOT_RUN entries it is not enough to catch all tests not run for various reasons, so our direction is to catch them in other means, right? I'm not sure what is a good way to do that, but then I think it's okay to just remove entries...

Displaying them as NOT_RUN is fine with me too, but I wasn't sure how I could do that solely by changing the way that they're logged in status.log. I suspect that Autotest won't be happy if I start using a new status code for this. I think that this is the expression that Stainless currently uses for NOT_RUN:

   IF(tj.tko_job_id IS NULL OR tj.first_bad_test.test = "provision",
     "NOT_RUN",
     IFNULL(tj.first_bad_test.status, "GOOD")) AS status,
It's possible to update Stainless to treat some failures as NOT_RUN (e.g. if a reason message is "Test was not run"), but it would be confusing if a test is reported as "not run" in Stainless but as "fail" in other tools like GoldenEye.

One idea: how about recording the names of not-run tests in TKO keyvals? Then tools aware of the keyvals can show not-run entries.

Project Member

Comment 12 by bugdroid1@chromium.org, Jul 25

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/e64ebab7c783f2aea26b7f5d2f61aca76c839a8c

commit e64ebab7c783f2aea26b7f5d2f61aca76c839a8c
Author: Daniel Erat <derat@chromium.org>
Date: Wed Jul 25 07:14:59 2018

autotest: Make tast.py not write statuses for missing tests.

Make the "tast" server test avoid writing tests with missing
results to status.log. Missing results typically indicate
earlier DUT flakiness, and it can be confusing to report
that a test failed when it didn't even get a chance to run.

Also make the server test itself report failure whenever any
test results are missing or incomplete. This highlights the
underlying issue that needs to be investigated.

Finally, improve unit test coverage of messages attached to
TestFail exceptions (since these are displayed in builders).

BUG= chromium:865564 
TEST=updated/added unit tests

Change-Id: Id14dd2de89cb84c6f8db3154ae78066f6dee8b6b
Reviewed-on: https://chromium-review.googlesource.com/1144648
Commit-Ready: Dan Erat <derat@chromium.org>
Tested-by: Dan Erat <derat@chromium.org>
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>

[modify] https://crrev.com/e64ebab7c783f2aea26b7f5d2f61aca76c839a8c/server/site_tests/tast/tast_unittest.py
[modify] https://crrev.com/e64ebab7c783f2aea26b7f5d2f61aca76c839a8c/server/site_tests/tast/tast.py

#11: Sure!

I'm still not super-familiar with the TKO format, though. Are you suggesting passing an arbitrarily-named key/value pair via the 'fields' arg at http://cs/chromeos_public/src/third_party/autotest/files/client/common_lib/base_job.py?l=463&rcl=7182da3082d29b39f9ee4c1c6b7d29b7c472d72c ?

For example, should I pass something like {'not_run': '1'}, and then Stainless's table builder could be updated to use that to add a NOT_RUN entry for the test?

If that's correct, do you have opinions about whether I should report not-run tests as passing or failing?

And should the key/value pair be logged in the START event or in the END FAIL/GOOD event? (Or in the inner FAIL event, if these tests should be logged as having failed?)
TKO job keyvals are something written as "keyval" file, e.g.
https://storage.cloud.google.com/chromeos-autotest-results/220790363-chromeos-test/chromeos2-row3-rack10-host11/keyval
(note there are also "keyval" files in test subdirs but they are different)

keyval file is read by TKO parser, saved to tko_job_keyvals in TKO DB, and archived to BigQuery tables.
https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/tko/db.py?l=547

I sent a change to implement #c14 though I'm not sure if you like the idea...
https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/1158121

Sounds great to me -- thanks!
Project Member

Comment 17 by bugdroid1@chromium.org, Aug 7

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/c22d07d9cf20bc48e918a96d6861a0fed41d2b05

commit c22d07d9cf20bc48e918a96d6861a0fed41d2b05
Author: Shuhei Takahashi <nya@chromium.org>
Date: Tue Aug 07 05:11:35 2018

tast: Record missing tests in job keyvals.

Those keyvals are stored in tko_job_keyvals table in TKO DB and
archived to BigQuery tables.

This information can be used by dashboards like Stainless to show
scheduled-but-not-run Tast tests.

BUG= chromium:865564 
TEST=test_that DUT tast.informational  # w/ mod to set timeout to 1s
TEST=tast_unittest.py

Change-Id: I9e8b4c438d68afcc9cd30d0ac1ed711d3f1a6f14
Reviewed-on: https://chromium-review.googlesource.com/1158121
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: Shuhei Takahashi <nya@chromium.org>
Reviewed-by: Shuhei Takahashi <nya@chromium.org>

[modify] https://crrev.com/c22d07d9cf20bc48e918a96d6861a0fed41d2b05/server/site_tests/tast/tast_unittest.py
[modify] https://crrev.com/c22d07d9cf20bc48e918a96d6861a0fed41d2b05/server/site_tests/tast/tast.py

Status: Fixed (was: Started)

Sign in to add a comment