power_AudioDetect causes DUTs to sleep
Reported by
jrbarnette@chromium.org,
Apr 25 2016
|
||||||||||||
Issue description
I've observed multiple cases of DUT failure after power_AudioDetect.
The symptom is that the test passes, and the DUT is online and functioning
at the end of the test. However, within 2-3 minutes, the DUT drops offline.
The most prominent instance right now is three peach_pit. The relevant
logs are below:
chromeos4-row8-rack12-host19
2016-04-24 06:32:56 NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack12-host19/424004-repair/
2016-04-24 06:28:26 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack12-host19/423969-reset/
2016-04-24 06:26:15 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/61094653-chromeos-test/
2016-04-24 06:25:53 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack12-host19/423953-reset/
chromeos4-row7-rack12-host13
2016-04-18 15:00:25 NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row7-rack12-host13/389281-repair/
2016-04-18 14:39:36 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row7-rack12-host13/389132-reset/
2016-04-18 14:37:24 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/60475720-chromeos-test/
2016-04-18 14:36:58 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row7-rack12-host13/389123-reset/
chromeos4-row8-rack12-host21
2016-04-19 14:30:37 NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack12-host21/395345-repair/
2016-04-19 14:24:14 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack12-host21/395311-reset/
2016-04-19 14:21:57 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/60586615-chromeos-test/
2016-04-19 14:21:30 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack12-host21/395289-reset/
Assigning to today's sheriff, who by chance, knows something about audio,
if not power management.
,
Apr 25 2016
One more thing: The three peach_pit DUTs are still down, and can be recovered so that logs can be extracted. If that's needed please contact me _today_ to get that done. There's no guarantee that the DUTs won't be selected for manual repair at any moment, after which, logs will be gone.
,
Apr 25 2016
That's strange. Was there any information after the DUT came back online? Did it suspend?
,
Apr 25 2016
Any DUTs that have actually come back online got re-imaged; most likely there's no data. There are at least three peach_pit DUTs still down that can be recovered.
,
Apr 26 2016
I've found a parrot2 DUT with this same symptom:
chromeos2-row3-rack1-host4
2016-04-22 17:45:57 NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row3-rack1-host4/54396392-repair/
2016-04-22 17:36:16 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row3-rack1-host4/54396030-reset/
2016-04-22 17:30:16 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/60924901-chromeos-test/
2016-04-22 17:27:25 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row3-rack1-host4/54395757-reset/
,
Apr 26 2016
power_AudioDetector looks like it uses PowerPrefChanger to set short timeouts in powerd. I'm not very familiar with the latter class, but it appears to use __del__ to unset the powerd prefs. I don't think that this is safe. My guess would be that the shortened timeouts are causing powerd to suspend the system after the test passes. If you can provide the full /var/log/power_manager/powerd* logs (i.e. not the incremental test logs) from one of the devices exhibiting the problem, I can verify that this is what's happening. It's also weird that power_AudioDetector restarts powerd in its cleanup() method. That's not going to accomplish much if the prefs don't get reset first. PowerPrefChanger should probably have a method that can be called explicitly to clear the prefs, and that should be called from cleanup() instead. (I don't know offhand who owns this class.)
,
Apr 26 2016
sque wrote this test, but he hasn't been on the team in years. The audio team should probably own it. chinyue, can you take it over?
,
Apr 26 2016
I've requested logs for all the DUTs with failures reported:
http://b2/28399296
,
Apr 27 2016
Logs from the 5 DUTs are now available.
,
Apr 27 2016
Looking at the messages file for chromeos2-row3-rack1-host4, you find (no surprise, really) that the DUT did in fact go to sleep right around the time of its failure, and woke up when it was checked for logs: 2016-04-22T17:32:21.784480-07:00 NOTICE ag[23426]: autotest server[stack::close|close|close] -> ssh_run(rm -rf "/tmp/autoserv-Q7Iw0E") 2016-04-22T17:32:21.826551-07:00 INFO sshd[21795]: Received disconnect from 172.25.71.54: 11: disconnected by user 2016-04-22T17:32:22.833848-07:00 INFO sshd[21784]: Received disconnect from 172.25.71.54: 11: disconnected by user 2016-04-22T17:32:23.840565-07:00 INFO sshd[23420]: Received disconnect from 172.25.71.54: 11: disconnected by user 2016-04-22T17:32:43.205969-07:00 NOTICE powerd_suspend[23461]: Going to suspend-to-RAM state: args=--suspend_duration=-1 --nosuspend_to_idle --wakeup_count=1 2016-04-22T17:32:43.270976-07:00 NOTICE powerd_suspend[23513]: Explicit sync 2016-04-22T17:32:44.667905-07:00 NOTICE powerd_suspend[23516]: Finalizing suspend 2016-04-26T15:20:56.574605-07:00 INFO kernel: [ 2304.522013] PM: Syncing filesystems ... done. 2016-04-26T15:20:56.574621-07:00 DEBUG kernel: [ 2304.541276] PM: Preparing system for mem sleep 2016-04-26T15:20:56.574694-07:00 NOTICE kernel: [ 2304.541452] Freezing user space processes ... (elapsed 0.01 seconds) done. 2016-04-26T15:20:56.574696-07:00 NOTICE kernel: [ 2304.552700] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done. 2016-04-26T15:20:56.574697-07:00 DEBUG kernel: [ 2304.563838] PM: Entering mem sleep 2016-04-26T15:20:56.574699-07:00 DEBUG kernel: [ 2304.574846] ehci-pci 0000:00:1a.0: setting latency timer to 64 2016-04-26T15:20:56.574701-07:00 DEBUG kernel: [ 2304.585852] ehci-pci 0000:00:1d.0: setting latency timer to 64
,
Apr 27 2016
,
Apr 27 2016
I have a small patch that might fix the powerd pref problem, but when I tried to run the autotest, it fails with: 04:26:58 INFO | autoserv| FAIL power_AudioDetector power_AudioDetector timestamp=1461749218localtime=Apr 27 04:26:58 Unhandled AttributeError: BrowserFinderOptions instance has no attribute 'extensions_to_load' Do I need to set any *extensions* for the test to run?
,
Apr 28 2016
Got it work and test looks good. CL: https://chromium-review.googlesource.com/#/c/341074/
,
May 5 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/dfd44b3c58b12b08fbd0095c788f8072b0f186fb commit dfd44b3c58b12b08fbd0095c788f8072b0f186fb Author: Chinyue Chen <chinyue@chromium.org> Date: Thu Apr 28 03:16:03 2016 power_AudioDetector: Correctly restore powerd prefs. Correctly restore powerd prefs to prevent DUT suspend after test is run. BUG= chromium:606382 TEST=test_that --board=${BOARD} ${DUT_IP} power_AudioDetector Change-Id: I645b0a9f7f603595d35ebd3bd69a45efbe59d5de Reviewed-on: https://chromium-review.googlesource.com/341074 Commit-Ready: Chinyue Chen <chinyue@chromium.org> Tested-by: Chinyue Chen <chinyue@chromium.org> Reviewed-by: Dan Erat <derat@chromium.org> [modify] https://crrev.com/dfd44b3c58b12b08fbd0095c788f8072b0f186fb/client/site_tests/power_AudioDetector/power_AudioDetector.py
,
May 6 2016
,
May 6 2016
We run this test on branches, too.
,
May 6 2016
Your change meets the bar and is auto-approved for M51 (branch: 2704)
,
May 7 2016
Need a merge to M51, please.
,
May 9 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/b237aa19bc51022c69c377502c686ab603d3d07b commit b237aa19bc51022c69c377502c686ab603d3d07b Author: Chinyue Chen <chinyue@chromium.org> Date: Thu Apr 28 03:16:03 2016 power_AudioDetector: Correctly restore powerd prefs. Correctly restore powerd prefs to prevent DUT suspend after test is run. BUG= chromium:606382 TEST=test_that --board=${BOARD} ${DUT_IP} power_AudioDetector Change-Id: I645b0a9f7f603595d35ebd3bd69a45efbe59d5de Reviewed-on: https://chromium-review.googlesource.com/341074 Commit-Ready: Chinyue Chen <chinyue@chromium.org> Tested-by: Chinyue Chen <chinyue@chromium.org> Reviewed-by: Dan Erat <derat@chromium.org> (cherry picked from commit dfd44b3c58b12b08fbd0095c788f8072b0f186fb) Reviewed-on: https://chromium-review.googlesource.com/343328 Commit-Queue: Chinyue Chen <chinyue@chromium.org> Trybot-Ready: Chinyue Chen <chinyue@chromium.org> Reviewed-by: Chinyue Chen <chinyue@chromium.org> [modify] https://crrev.com/b237aa19bc51022c69c377502c686ab603d3d07b/client/site_tests/power_AudioDetector/power_AudioDetector.py
,
May 9 2016
,
May 10 2016
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible! If all merges have been completed, please remove any remaining Merge-Approved labels from this issue. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
May 10 2016
,
May 23 2016
Bulk verified
,
May 23 2016
bulk verified |
||||||||||||
►
Sign in to add a comment |
||||||||||||
Comment 1 by jrbarnette@chromium.org
, Apr 25 2016I've also observed this symptom on lumpy (at least). chromeos2-row3-rack9-host20 2016-04-19 04:40:09 NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row3-rack9-host20/481637-repair/ 2016-04-19 04:06:40 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row3-rack9-host20/481515-reset/ 2016-04-19 04:04:47 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/60548827-chromeos-test/ 2016-04-19 04:04:24 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row3-rack9-host20/481501-reset/