New issue
Advanced search Search tips

Issue 846770 link

Starred by 4 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

TEST_NA exception is being flagged as test failure

Reported by jrbarnette@chromium.org, May 25 2018

Issue description

A recent CQ run failed with some related failures.

Master run:
    https://luci-milo.appspot.com/buildbot/chromeos/master-paladin/18688

Failed slaves:
    https://luci-milo.appspot.com/buildbot/chromeos/eve-paladin/3237
    https://luci-milo.appspot.com/buildbot/chromeos/kip-paladin/5213

The failures, as reported:
    video_ChromeHWDecodeUsed.vp8: retry_count: 1, TEST_NA: Missing Capability: hw_dec_vp8_1080_30
    video_ChromeHWDecodeUsed.vp9: retry_count: 1, TEST_NA: Missing Capability: hw_dec_vp9_1080_30
    video_ChromeRTCHWDecodeUsed.vp8: retry_count: 1, TEST_NA: Missing Capability: hw_dec_vp8_1080_30
    video_ChromeRTCHWEncodeUsed: retry_count: 2, TEST_NA: Missing Capability: hw_enc_vp8_1080_30

TEST_NA seems to have been an appropriate response for the test in this case.
However, the system treated the N/A as a failure, rather than "ignore it".
That seems wrong.

 
... It's possible that this bug misdiagnoses the reason for the
rejection.

The kip-paladin failure also included this:
    p2p_ConsumeFiles: FAIL: Wrong number of connections reported: 3

That may be the reason for rejections.  However, there's also this:
    video_ChromeRTCHWEncodeUsed: retry_count: 2, TEST_NA: Missing Capability: hw_enc_vp8_1080_30

It looks like we retried the test after TEST_NA.  We shouldn't have done that.

Also, eve-paladin didn't have the p2p_ConsumeFiles failures, but it still
showed red.

Comment 2 by ihf@chromium.org, May 25 2018

There are errors which are expected from tests (warn/fail) and then there are exceptions which are mostly used internally. (I am not a fan of how autotest uses exceptions for handling the status of a test. I would prefer explicit return codes.) My observation is that TEST_NA is used by the scheduling, but not expected by tests, even though many tests outside of the CQ use it. I am sure it is possible to make TEST_NA something like a pass by handling it everywhere, but we never raise a PASS so it will always be something slightly different.

Personally I think what people really want it a PASS with annotation/different colors on a dashboard.

Comment 3 by xixuan@chromium.org, May 25 2018

I may misunderstand TEST_NA's real meaning in autotest.

My previous understanding: TEST_NA means a test is detected not suitable for given board & pool & labels in scheduling process, which won't trigger a test failure.

In this case:

05/23 23:43:40.212 WARNI|              test:0637| The test failed with the following exception
Traceback (most recent call last):
  File "/usr/local/autotest/common_lib/test.py", line 631, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/usr/local/autotest/common_lib/test.py", line 831, in _call_test_function
    return func(*args, **dargs)
  File "/usr/local/autotest/common_lib/test.py", line 495, in execute
    dargs)
  File "/usr/local/autotest/common_lib/test.py", line 362, in _call_run_once_with_retry
    postprocess_profiled_run, args, dargs)
  File "/usr/local/autotest/common_lib/test.py", line 400, in _call_run_once
    self.run_once(*args, **dargs)
  File "/usr/local/autotest/cros/video/helper_logger.py", line 82, in call
    return func(*args, **kwargs)
  File "/usr/local/autotest/tests/video_ChromeRTCHWEncodeUsed/video_ChromeRTCHWEncodeUsed.py", line 93, in run_once
    device_capability.DeviceCapability().ensure_capability(capability)
  File "/usr/local/autotest/cros/video/device_capability.py", line 130, in ensure_capability
    raise error.TestNAError("Missing Capability: %s" % cap)
TestNAError: Missing Capability: hw_enc_vp8_1080_30

Seems the TEST_NA is raised from test internally. Is it real NA or a test failure about device lacking some essential labels? If it's NA let's make it not raise.

Comment 4 by hiroh@chromium.org, May 25 2018

My understanding was testNA can be raised everywhere, that is, scheduler and internal test.
I detect test capability inside a test and raise TestNAError if the test is not runnable.
What should a test raise in the case? We're discussing this in mail thread.
Cc: avkodipelli@chromium.org vsu...@chromium.org
Owner: xixuan@chromium.org
Status: Assigned (was: Untriaged)
Any future work here falls under skylab suites
Cc: newcomer@chromium.org
 Issue 894559  has been merged into this issue.
I am willing to change all tests (especially in bvt-*) to spew logging.warning() and return instead of raising TestNAError. After that we should remove this error, quite possibly together with TestError (instead of TestFail).

Objections?
+1 to fixing security_AltSyscall which is current failure vexing chrome-pfq 

NA added here: crrev.com/c/1258524

As for changing all tests ... not sure I agree given that things like stainless now support using TEST_NA to distinguish not-applicable vs pass vs fail  
Cc: jorgelo@chromium.org nya@chromium.org
The fundamental problem here is that nobody seems to agree what TEST_NA means. Does it pass, does it fail? There is no agreement. All the test writer wants is a human parse-able message.

It is probably mostly harmless to have TestNA outside of the CQ (except a bad example).

But we can't afford tests running multiple times in the CQ because of TEST_NA. So either the TEST_NA's go or affected tests leave the CQ/PFQ (suite:bvt-*).

Convert to a logging warning
https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/1277969
We have so many ways tests can fail, why can't TEST_NA just be that, N/A? It signals something useful: this test doesn't apply to this platform, but it doesn't hide this fact by just passing.
Corp infra only knows pass, fail and timeout
https://docs.bazel.build/versions/master/test-encyclopedia.html#role-of-the-test-runner
Notice it can pass with a message!

Autotest has many ways to fail with a message. But it currently has no clean way to pass with a message. So people are tempted to pass messages while raising a TestError(NA/Warning) and hope to get away with it.

Problem is, the CQ builders keep retrying this condition, making it a bad test if you want. Until autotest supports passing messages in a PASS we need to fix the test or make them get off the CQ.
If that's what infra supports, then I guess we have to PASS with a message for now.
Project Member

Comment 14 by bugdroid1@chromium.org, Oct 12

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/896c93a463869f47ad4ae23e4dae3f0eaa4b0bf1

commit 896c93a463869f47ad4ae23e4dae3f0eaa4b0bf1
Author: Ilja H. Friedel <ihf@chromium.org>
Date: Fri Oct 12 04:33:12 2018

security_AltSyscall: reduce TestNAError to warning.

When launched from builders it causes retries for no good reason.

BUG=chromium:846770,  chromium:894559 
TEST=pylint

Change-Id: I6d299bd8eafbede2b9e5c0ebbc1916df80ba1d87
Reviewed-on: https://chromium-review.googlesource.com/c/1277969
Tested-by: Ilja H. Friedel <ihf@chromium.org>
Trybot-Ready: Ilja H. Friedel <ihf@chromium.org>
Reviewed-by: Jorge Lucangeli Obes <jorgelo@chromium.org>
Reviewed-by: Todd Broch <tbroch@chromium.org>

[modify] https://crrev.com/896c93a463869f47ad4ae23e4dae3f0eaa4b0bf1/client/site_tests/security_AltSyscall/security_AltSyscall.py

Sign in to add a comment