New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 631504 link

Starred by 3 users

Issue metadata

Status: Verified
Owner:
Closed: Sep 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

[Kernel HW Tests] power_SuspendShutdown and power_SuspendStress.bare are broken

Project Member Reported by cmt...@chromium.org, Jul 26 2016

Issue description

These HW tests are consistently failing or having issues on daisy, oak & x86-alex boards in ChromeOS 54 (see wmatrix results for more info).
 
Cc: derat@chromium.org seanpaul@chromium.org snanda@chromium.org
Ping?  Could somebody look at this please?
If power_DarkResumeShutdownServer is having problems, it likely that mosys is just broken. It should just call into mosys to check the platform and return on daisy, oak, and alex.
Cc: tbroch@chromium.org
can someone please own this issue?
This is a P1 and has been open for a while. We are trying to run kernel tests as part of compiler validation and these few failures make it imposible for us to get green builders.
If some tests are board specific, maybe you could split the suite into board specific and non-board specific so that we can run the non-board specific tests?
I looked at the consistent failures of this test suite (kernel_daily_regression) on go/wmatrix for R55, for the 14 boards we test regularly.  Most of the boards are passing most of the tests (or sometimes passing a test & sometimes failing it).   There are 3 tests that appear to be major problems:

power_DarkResumeDisplay  (10 boards consistently fail test; 3 boards don't run it)
power_DarkResumeShutdownServer (6 boards consistently fail; 3 boards don't run it)
power_SuspendStress.bare (7 boards consistently fail; 1 board doesn't run it)

1 board consistently fails power_SuspendShutdown.

Which tests fail on what boards (consistently):

power_DarkResumeDisplay: falco, x86-alex, oak, squawks, terra, peppy, veyron_jaq, sentry, chell, nyan_big
power_DarkResumeShutdownServer: falco, x86-alex, sentry, chell, nyan_big, link
power_SuspendStress.bare: falco, daisy, x86-alex, oak, squawks, veyron_jaq, lumpy
power_SuspendShutdown: falco

Comment 6 by derat@chromium.org, Sep 8 2016

Summary: [Kernel HW Tests] power_SuspendShutdown and power_SuspendStress.bare are broken (was: [Kernel HW Tests] power_SuspendShutdown, power_SuspendStress.bare & power_DarkResumeShutdown Server are broken)
Let's keep discussion of the dark resume tests in  issue 625281  so it doesn't get spread across two bugs.

Julius, you're listed as the owner (well, author) of power_SuspendStress.bare. Any thoughts about the failures? Looking at https://wmatrix.googleplex.com/unfiltered?hide_missing=True&releases=tot&tests=power_SuspendStress.bare, "Autotest client terminated unexpectedly: DUT is no longer pingable, it may have rebooted or hung." appears to be a common reason.

I'll volunteer to delete power_SuspendShutdown unless someone speaks up in its defense. powerd already has unit tests ensuring that the Suspender class requests shutdown after repeated failures. I  suppose that the autotest ensures that we report failures from the powerd_suspend script back to powerd, but that doesn't feel too hard to get right and I'm not sure that it justifies the noise from this autotest.
Cc: hennessywill@chromium.org
Owner: hennessywill@chromium.org
Status: Assigned (was: Untriaged)
crbug.com/626467 has more details on recent power_SuspendStress efforts.

@#c4, definitely don't want power_SuspendStress/control.bareDaily in its current state to block compiler validation.  The test itself isn't board specific but expectations of it succeeding on older platforms are lower since it wasn't run across many iterations like FSI (10k).  

Possible solutions:
1. Migrate control.bareDaily in its current form to another daily suite.  I'm not aware of one to add it to though.  Should we create another suite?
2. Add some whitelist mechanism for boards which we know to have lower quality bar for s2r stress.
3. Remove kernel_daily_regression from compiler qual and find a suite that parallels those tests excluding control.bareDaily
4. Fix all s2r stress bugs on all platforms.  Not a short-term reality.

@#c6, agree the utility of power_SuspendShutdown is low given unit tests.  Wasn't able to find any issues where it identified real issue as well.  I vote for removing it as well.

@Will have time to take a look at resolving power_SuspendStress side of this?
Can we please put the onus of these tests on you instead of on the users of the suite?
that is, if these tests are flaky or need some setup to work on only a few boards, then disable the tests until you decide what to do with them instead of having the users of the suite analyze these failures every day.
I think we can all agree that flaky or constantly failing tests are BAD.
#9: Who's the "you" in your comment? I'm in complete agreement that flaky tests are worse-than-useless and should be disabled or deleted.
Unless there's disagreement I'm migrating power_SuspendStress.bareDaily to power_daily to decouple ongoing work in crbug.com/626467 from compiler validation using kernel_daily_regression.

Labels: M-55
Owner: tbroch@chromium.org
Status: Started (was: Assigned)
about #10. By "you" I mean the owner of the suites or tests in question. 
I am not aware of all the different roles of different people copied in this bug.
Project Member

Comment 15 by bugdroid1@chromium.org, Sep 9 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/218f3d7f02dbab06ff09d8fbdfd4235d5f9beff3

commit 218f3d7f02dbab06ff09d8fbdfd4235d5f9beff3
Author: Daniel Erat <derat@chromium.org>
Date: Fri Sep 09 16:55:58 2016

autotest-server-tests: Unlist power_SuspendShutdown.

Unlist the power_SuspendShutdown test. This functionality is
already covered by the power_manager package's unit tests.

BUG= chromium:631504 
TEST=none
CQ-DEPEND=Ie30f0679387550b1fd446a25734d8c5c436aba66

Change-Id: I42d38cce87903c261a7f7ea8291f6ed71fb8a261
Reviewed-on: https://chromium-review.googlesource.com/383712
Commit-Ready: Dan Erat <derat@chromium.org>
Tested-by: Dan Erat <derat@chromium.org>
Reviewed-by: Todd Broch <tbroch@chromium.org>

[modify] https://crrev.com/218f3d7f02dbab06ff09d8fbdfd4235d5f9beff3/chromeos-base/autotest-server-tests/autotest-server-tests-9999.ebuild

Project Member

Comment 16 by bugdroid1@chromium.org, Sep 9 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/cc676563631416e3a190d29e7caacbee9c2ef866

commit cc676563631416e3a190d29e7caacbee9c2ef866
Author: Daniel Erat <derat@chromium.org>
Date: Fri Sep 09 16:56:51 2016

autotest: Remove power_SuspendShutdown.

Remove the power_SuspendShutdown test. This functionality is
already covered by the power_manager package's unit tests.

BUG= chromium:631504 
TEST=none
CQ-DEPEND=I42d38cce87903c261a7f7ea8291f6ed71fb8a261

Change-Id: Ie30f0679387550b1fd446a25734d8c5c436aba66
Reviewed-on: https://chromium-review.googlesource.com/383751
Commit-Ready: Dan Erat (out monday) <derat@chromium.org>
Tested-by: Dan Erat (out monday) <derat@chromium.org>
Reviewed-by: Todd Broch <tbroch@chromium.org>

[delete] https://crrev.com/785cf458ee0ce7516687add68f102825d8891df5/server/site_tests/power_SuspendShutdown/power_SuspendShutdown.py
[delete] https://crrev.com/785cf458ee0ce7516687add68f102825d8891df5/server/site_tests/power_SuspendShutdown/control

Project Member

Comment 17 by bugdroid1@chromium.org, Sep 10 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/e7adc633f998487f6396bc9b83da03f4e17df8a0

commit e7adc633f998487f6396bc9b83da03f4e17df8a0
Author: Todd Broch <tbroch@chromium.org>
Date: Fri Sep 09 18:53:17 2016

power_SuspendStress.bareDaily move to power_daily suite.

Migrate from kernel_daily_regression to power_daily.

BUG= chromium:631504 
TEST=none

Change-Id: I34e26a6ecdeef75b1b24ae6541d89c56c6fe8f5a
Reviewed-on: https://chromium-review.googlesource.com/383752
Commit-Ready: Todd Broch <tbroch@chromium.org>
Tested-by: Todd Broch <tbroch@chromium.org>
Reviewed-by: Sameer Nanda <snanda@chromium.org>

[modify] https://crrev.com/e7adc633f998487f6396bc9b83da03f4e17df8a0/client/site_tests/power_SuspendStress/control.bareDaily

Status: Fixed (was: Started)
Marking fixed based on:

#c15 & #c16  :: power_SuspendShutdown failure addressed by removing test ... thanks Dan.
#c17         :: power_SuspendStress.bareDaily failure addressed by migrating test to different suite.
 issue 625281  :: power_Dark* being addressed there.

thanks!
Status: Assigned (was: Fixed)
sorry, re-opening this issue. 

There is still a failure in ARM

from this builder:

https://uberchromegw.corp.google.com/i/chromeos/builders/arm-gcc-toolchain/builds/6

The error is

[Test-Logs]: power_DarkResumeShutdownServer: ERROR: Unhandled RemotePowerException: Failed to change outlet status for host: chromeos2-row3-rack5-host21 to state: ON.

which links to: http://cautotest/tko/retrieve_logs.cgi?job=/results/77296329-chromeos-test/

Please help. This seems to be the last one. Testing with this suite is looking much better now.


Comment 21 by derat@chromium.org, Sep 19 2016

Status: Fixed (was: Assigned)
 Issue 625281  was tracking power_DarkResumeShutdownServer. I've reopened it.

Comment 22 by son...@google.com, Nov 15 2016

Status: Verified (was: Fixed)
Verified.
power_SuspendShutdown removed.
https://wmatrix.googleplex.com/unfiltered?hide_missing=True&tests=power_SuspendShutdown

power_SuspendStress.bareDaily test migrated to power_daily suite.
https://wmatrix.googleplex.com/unfiltered?releases=tot&suites=power_daily&tests=power_SuspendStress.bare


Sign in to add a comment