Kernel Daily Regression Suite: DarkResumeDisplay test is broken on Peppy, Oak, Daisy & x86-alex |
|||||||||||||
Issue descriptionThe DarkResumeDisplay test, in the Kernel Daily Regression autotest testsuite is broken on peppy; it has been for at least a week, maybe much longer. You can see this if you go look at the test on the wmatrix website: https://wmatrix.googleplex.com/failures/kernel_daily?platforms=peppy&tests=power_DarkResumeDisplay The error (from the log file) is: 06/22 11:58:37.325 DEBUG| ssh_host:0180| Running (ssh) 'test -r /sys/kernel/debug/dri/0/i915_crtc_errors' 06/22 11:58:37.597 INFO |power_DarkResumeDi:0042| node_exists=1 06/22 11:58:37.599 WARNI| test:0606| Autotest caught exception when running test: Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/test.py", line 600, in _exec _call_test_function(self.execute, *p_args, **p_dargs) File "/usr/local/autotest/client/common_lib/test.py", line 804, in _call_test_function return func(*args, **dargs) File "/usr/local/autotest/client/common_lib/test.py", line 461, in execute dargs) File "/usr/local/autotest/client/common_lib/test.py", line 347, in _call_run_once_with_retry postprocess_profiled_run, args, dargs) File "/usr/local/autotest/client/common_lib/test.py", line 376, in _call_run_once self.run_once(*args, **dargs) File "/usr/local/autotest/server/site_tests/power_DarkResumeDisplay/power_DarkResumeDisplay.py", line 84, in run_once self.verify_host_supports_test(host) File "/usr/local/autotest/server/site_tests/power_DarkResumeDisplay/power_DarkResumeDisplay.py", line 44, in verify_host_supports_test raise error.TestError('%s file not found.' % ERROR_FILE) TestError: /sys/kernel/debug/dri/0/i915_crtc_errors file not found.
,
Jul 2 2016
Derek can you take a look?
,
Jul 25 2016
Ping? Any progress on this?
,
Jul 26 2016
It appears that this test is broken on many boards: It is consistently failing on daisy, x86-alex and oak as well.
,
Jul 26 2016
,
Jul 26 2016
How do we typically limit which boards a given test will run on? Is it via a DEPENDENCIES entry? Glancing around, I don't see one that would obviously correspond to dark resume support. Should I add one to server/hosts/cros_label.py that e.g. looks for powerd prefs enabling dark resume?
,
Jul 26 2016
Hmm, but I guess that this test also requires i915, so maybe it's best if it just depends on board labels. Where should it be running right now? Dark resume is disabled on samus, right?
,
Jul 26 2016
Derek is out this week. Sean, could you PTAL?
,
Jul 26 2016
Oh, heh. A label already exists but it's using a different name (and just a hardcoded list right now):
class LucidSleepLabel(base_label.BaseLabel):
"""Label that determines if device has support for lucid sleep."""
# TODO(kevcheng): See if we can determine if this label is applicable a
# better way (crbug.com/592146).
_NAME = 'lucidsleep'
LUCID_SLEEP_BOARDS = ['samus', 'lulu']
def exists(self, host):
board = host.get_board().replace(ds_constants.BOARD_PREFIX, '')
return board in self.LUCID_SLEEP_BOARDS
,
Jul 26 2016
I assume the list is out of date, because I see the following boards configured to use dark resume: elm link lulu (disabled) oak (disabled) parrot samus (disabled) The only ones it lists actually have the disable_dark_resume pref also set. :-/
,
Jul 27 2016
The test will only work on select 3.14 and 3.18 kernels at the moment, so perhaps that's why the list seems more restrictive?
,
Jul 27 2016
Would there be any way of updating the 'kernel_daily_regression' test suite so that tests which shouldn't be run on a particular board are skipped or automatically marked as passed when that board attempts to run the suite? I'm not running this test as an individual test on the board, I just asked the entire suite to run on the board, and this particular test (and some others) fail the test, which makes the entire suite overall marked as FAILED. It would be nice if, when all the *appropriate* tests for a board passes, the suite is marked as PASSED when run on that board (i.e. as I said at the start, if a particular test isn't supposed to work on a particular board, the suite doesn't try to run it on that board).
,
Jul 27 2016
If the target board doesn't support a test's dependencies, does the test get skipped?
,
Jul 27 2016
That does not appear to be the case. For example, when I run the kernal_daily_regression suite on an x86-alex machine, which is a 3.8 kernel I get a FAIL on power_DarkResumeDisplay, which according to comment #11 shouldn't even try to run on that board, since it's the wrong kernel version.
,
Jul 27 2016
The only dependency listed in power_DarkResumeDisplay's control file is on the "servo" label[1]. I'm wondering if it'd be skipped if it listed the correct dependencies. :-) 1. which is incorrect, since it apparently also depends on a particular kernel version... but maybe not on dark resume being enabled in powerd, since it looks like dark_resume_utils modifies powerd's pref files.
,
Jul 27 2016
I suppose we could also mark the test as passed if the error_count file in question doesn't exist on the device (ie: board and/or kernel not supported).
,
Sep 8 2016
Ping? What's the status of this now?
,
Sep 8 2016
I looked at the consistent failures of this test suite (kernel_daily_regression) on go/wmatrix for R55, for the 14 boards we test regularly. Most of the boards are passing most of the tests (or sometimes passing a test & sometimes failing it). There are 3 tests that appear to be major problems: power_DarkResumeDisplay (10 boards consistently fail test; 3 boards don't run it) power_DarkResumeShutdownServer (6 boards consistently fail; 3 boards don't run it) power_SuspendStress.bare (7 boards consistently fail; 1 board doesn't run it) 1 board consistently fails power_SuspendShutdown. Which tests fail on what boards (consistently): power_DarkResumeDisplay: falco, x86-alex, oak, squawks, terra, peppy, veyron_jaq, sentry, chell, nyan_big power_DarkResumeShutdownServer: falco, x86-alex, sentry, chell, nyan_big, link power_SuspendStress.bare: falco, daisy, x86-alex, oak, squawks, veyron_jaq, lumpy power_SuspendShutdown: falco
,
Sep 8 2016
Derek and Sameer (and Sean, in the case of power_DarkResumeDisplay): You're listed as the owners of power_DarkResumeDisplay and power_DarkResumeDisplay. I believe that we have dark resume disabled on most/all boards due to kernel issues. Are these tests expected to pass anyway, or can we disable them until the underlying issues are fixed? It may make sense to use a different bug to track the power_SuspendStress.bare and power_SuspendShutdown failures, since those are probably due to unrelated flakiness problems in those tests. (Should we get rid of power_SuspendShutdown? powerd already has pretty good unit test coverage for that functionality.)
,
Sep 8 2016
Copying Derek's comment from issue 631504 : "If power_DarkResumeShutdownServer is having problems, it likely that mosys is just broken. It should just call into mosys to check the platform and return on daisy, oak, and alex."
,
Sep 9 2016
(copying comment from similar issue) Can we please put the onus of these tests on you instead of on the users of the suite? that is, if these tests are flaky or need some setup to work on only a few boards, then disable the tests until you decide what to do with them instead of having the users of the suite analyze these failures every day. I think we can all agree that flaky or constantly failing tests are BAD.
,
Sep 10 2016
That comment was from power_DarkResumeShutdownServer, not power_DarkResumeDisplay. This seems to be because the test raises an error when it should just silently exit.
,
Sep 10 2016
fix to address power_DarkResumeDisplay & Derek's observation (#c22) here: https://chromium-review.googlesource.com/#/c/383980
,
Sep 12 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/5e38889798004f055a3e9a9f901d1049edf07cb3 commit 5e38889798004f055a3e9a9f901d1049edf07cb3 Author: Todd Broch <tbroch@chromium.org> Date: Sat Sep 10 13:11:09 2016 power_DarkResumeDisplay: raise TestNAError if unsupported. Only some platforms support this test. For those that dont raise TestNAError instead of TestError accordingly. BUG= 625281 TEST=test_that <ip> power_DarkResumeDisplay on unsupported platform and it passes Change-Id: Ib8dc564281fe652dc2570e6f629272a2fc883d0d Reviewed-on: https://chromium-review.googlesource.com/383980 Commit-Ready: Todd Broch <tbroch@chromium.org> Tested-by: Todd Broch <tbroch@chromium.org> Reviewed-by: Derek Basehore <dbasehore@chromium.org> [modify] https://crrev.com/5e38889798004f055a3e9a9f901d1049edf07cb3/server/site_tests/power_DarkResumeDisplay/power_DarkResumeDisplay.py
,
Sep 14 2016
we are still hitting an issue with this test. from this builder: https://uberchromegw.corp.google.com/i/chromeos/builders/amd64-gcc-toolchain/builds/1 you can see that kernel_daily_regression still fails with: [Test-Logs]: power_DarkResumeDisplay: ERROR: /sys/kernel/debug/dri/0/i915_crtc_errors file not found. please fix this last one. That is the only remaining failure that we have for the suite.
,
Sep 14 2016
https://chromium-review.googlesource.com/#/c/385701 should take care of remaining issue.
,
Sep 15 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/ae6129d620a528a771764c7e5973f0b327b2bec6 commit ae6129d620a528a771764c7e5973f0b327b2bec6 Author: Todd Broch <tbroch@chromium.org> Date: Wed Sep 14 23:15:56 2016 power_DarkResumeDisplay: check kernel version. This test will only work on 3.14 & 3.18 kernels at the moment. Check that and if its not the case raise TestNAError accordingly. Also cleaned up a few pylint warnings. BUG= chromium:625281 TEST=power_DarkResumeDisplay 1. run on 4.4 device and raises TestNAError but fails 2. run on 3.18 device and it succeeds. Change-Id: Ib74c909e0513ccd98b0cc23e7e90cc28f725af2d Reviewed-on: https://chromium-review.googlesource.com/385701 Commit-Ready: Todd Broch <tbroch@chromium.org> Tested-by: Todd Broch <tbroch@chromium.org> Reviewed-by: Derek Basehore <dbasehore@chromium.org> [modify] https://crrev.com/ae6129d620a528a771764c7e5973f0b327b2bec6/server/site_tests/power_DarkResumeDisplay/power_DarkResumeDisplay.py
,
Sep 16 2016
Please re-open if #c27 didn't squash the last issue.
,
Sep 19 2016
Reopening per llozano@ at https://bugs.chromium.org/p/chromium/issues/detail?id=631504#c20: ---- There is still a failure in ARM from this builder: https://uberchromegw.corp.google.com/i/chromeos/builders/arm-gcc-toolchain/builds/6 The error is [Test-Logs]: power_DarkResumeShutdownServer: ERROR: Unhandled RemotePowerException: Failed to change outlet status for host: chromeos2-row3-rack5-host21 to state: ON. which links to: http://cautotest/tko/retrieve_logs.cgi?job=/results/77296329-chromeos-test/ Please help. This seems to be the last one. Testing with this suite is looking much better now.
,
Sep 19 2016
The Unhandled RemotePowerException is an old lab RPM infra issue 517233 . +fdeng,jrbarnette
,
Sep 19 2016
Thanks. I pasted #29 to issue 517233 ; let's use it to track that failure.
,
Sep 19 2016
adding 517233 to blocked on. the failure seems pretty consistent (does not seem like a flake) Is there a short term solution? Issue 517233 has been open for a while.
,
Oct 7 2016
,
Nov 19 2016
,
Nov 21 2016
Verified in R57-9008.0.0 that tests skipped with "Test support on 3.14 | 3.18 kernels only" https://wmatrix.googleplex.com/testrun/kernel_daily?test_ids=384337989 |
|||||||||||||
►
Sign in to add a comment |
|||||||||||||
Comment 1 by cmt...@chromium.org
, Jul 1 2016