[Stumpy] Device does not resume after suspended |
|||||||||||||||||
Issue descriptionChrome Version: 63.0.3239.39 OS: 10032.30.0 What steps will reproduce the problem? (1) Device recovered with above version (ASUS 1080 HD monitor connected with DP-DP & DVI-VGA cable) (2) Sign in to Device (3) Now suspend the device using 'powerd_dbus_suspend' or 'suspend_stress_test' (4) Now try to resume the device (by any keyboard press, mouse, power button) What is the expected result? Device should resume and external monitor(s) should display What happens instead? Device does not resume. Please use labels and text to provide additional information. Logs attached. For graphics-related bugs, please copy/paste the contents of the about:gpu page at the end of this report.
,
Nov 7 2017
,
Nov 7 2017
Are other Sandybridge devices affected?
,
Nov 7 2017
I see these three suspend attempts in powerd's logs. The logs all end immediately after we tried to suspend: powerd.20171107-100936: ... [1107/101009:INFO:daemon.cc(1212)] Got RequestSuspend message from :1.42 [1107/101009:INFO:suspender.cc(399)] Starting request 38469633 [1107/101009:INFO:daemon.cc(596)] Reading wakeup count from /sys/power/wakeup_count [1107/101009:INFO:daemon.cc(600)] Read wakeup count 0 [1107/101009:INFO:display_power_setter.cc(81)] Asking DisplayService to turn all displays off [1107/101010:INFO:suspend_delay_controller.cc(137)] Announcing suspend request 38469633 with 2 pending delay(s) and 0 outstanding delay(s) from previous request [1107/101010:INFO:audio_client.cc(125)] Updated audio devices: headphones unplugged, HDMI inactive [1107/101010:INFO:daemon.cc(1412)] Chrome is using normal display mode [1107/101010:INFO:suspend_delay_controller.cc(86)] Got notification that delay 38469633 (shill) is ready for suspend request 38469633 from :1.10 [1107/101010:INFO:suspend_delay_controller.cc(86)] Got notification that delay 38469634 (chrome) is ready for suspend request 38469633 from :1.11 [1107/101010:INFO:suspend_delay_controller.cc(223)] Notifying observers that suspend is ready [1107/101010:INFO:suspender.cc(470)] Starting suspend [1107/101010:INFO:main.cc(237)] Running "/usr/bin/powerd_setuid_helper --action=suspend --suspend_wakeup_count_valid --suspend_wakeup_count=0" EOF powerd.20171107-103037: ... [1107/103037:INFO:daemon.cc(1212)] Got RequestSuspend message from :1.45 [1107/103037:INFO:suspender.cc(399)] Starting request 223870977 [1107/103037:INFO:daemon.cc(596)] Reading wakeup count from /sys/power/wakeup_count [1107/103037:INFO:daemon.cc(600)] Read wakeup count 0 [1107/103037:INFO:display_power_setter.cc(81)] Asking DisplayService to turn all displays off [1107/103037:INFO:suspend_delay_controller.cc(137)] Announcing suspend request 223870977 with 2 pending delay(s) and 0 outstanding delay(s) from previous request [1107/103037:INFO:suspend_delay_controller.cc(86)] Got notification that delay 223870978 (shill) is ready for suspend request 223870977 from :1.10 [1107/103037:INFO:suspend_delay_controller.cc(86)] Got notification that delay 223870977 (chrome) is ready for suspend request 223870977 from :1.11 [1107/103037:INFO:suspend_delay_controller.cc(223)] Notifying observers that suspend is ready [1107/103037:INFO:suspender.cc(470)] Starting suspend [1107/103037:INFO:main.cc(237)] Running "/usr/bin/powerd_setuid_helper --action=suspend --suspend_wakeup_count_valid --suspend_wakeup_count=0" EOF powerd.20171107-180355: ... [1107/180632:INFO:daemon.cc(1212)] Got RequestSuspend message from :1.45 [1107/180632:INFO:suspender.cc(399)] Starting request 42205185 [1107/180632:INFO:daemon.cc(596)] Reading wakeup count from /sys/power/wakeup_count [1107/180632:INFO:daemon.cc(600)] Read wakeup count 0 [1107/180632:INFO:display_power_setter.cc(81)] Asking DisplayService to turn all displays off [1107/180633:INFO:suspend_delay_controller.cc(137)] Announcing suspend request 42205185 with 2 pending delay(s) and 0 outstanding delay(s) from previous request [1107/180633:INFO:audio_client.cc(125)] Updated audio devices: headphones unplugged, HDMI inactive [1107/180633:INFO:daemon.cc(1412)] Chrome is using normal display mode [1107/180633:INFO:suspend_delay_controller.cc(86)] Got notification that delay 42205185 (shill) is ready for suspend request 42205185 from :1.10 [1107/180633:INFO:suspend_delay_controller.cc(86)] Got notification that delay 42205186 (chrome) is ready for suspend request 42205185 from :1.12 [1107/180633:INFO:suspend_delay_controller.cc(223)] Notifying observers that suspend is ready [1107/180633:INFO:suspender.cc(470)] Starting suspend [1107/180633:INFO:main.cc(237)] Running "/usr/bin/powerd_setuid_helper --action=suspend --suspend_wakeup_count_valid --suspend_wakeup_count=0" EOF I don't know what's going on with the system clock. When I look at the messages file, I can see that the 180355 attempt actually came before the other "earlier" ones. It looks like the system dies right after powerd_suspend logs "Finalizing suspend", and before the kernel has logged anything: ... 2017-11-07T18:05:33.299738+00:00 INFO session_manager[683]: [INFO:policy_store.cc(69)] Persisted policy to disk. 2017-11-07T18:05:33.340363+00:00 INFO session_manager[683]: [INFO:child_exit_handler.cc(77)] Handling 3467 exit. 2017-11-07T18:05:33.372653+00:00 INFO session_manager[683]: [INFO:policy_store.cc(69)] Persisted policy to disk. 2017-11-07T18:05:33.422312+00:00 INFO session_manager[683]: [INFO:child_exit_handler.cc(77)] Handling 3551 exit. 2017-11-07T18:05:43.627531+00:00 WARNING crash_reporter[3706]: Received crash notification for chrome[732] user 1000 (called directly) 2017-11-07T18:05:43.683209+00:00 ERR session_manager[3732]: [ERROR:process.cc(329)] Exec of /usr/bin/vm_launcher failed:: No such file or directory 2017-11-07T18:06:24.206734+00:00 ERR cras_server[1138]: Unable to find the best channel map 2017-11-07T18:06:33.038918+00:00 ERR cras_server[1138]: pcm_avail returned frames larger than buf_size: HDA Intel PCH: HDMI 2:0,8: 16416 > 16384 for 1 times 2017-11-07T18:06:33.101777+00:00 ERR cras_server[1138]: Unable to find the best channel map 2017-11-07T18:06:33.175546+00:00 NOTICE powerd_suspend[4015]: Going to suspend-to-RAM state: args=--suspend_duration=-1 --nosuspend_to_idle --wakeup_count=0 2017-11-07T18:06:33.246612+00:00 NOTICE powerd_suspend[4069]: Finalizing suspend 2017-11-07T10:09:36.254439-08:00 INFO kernel: [ 0.000000] Initializing cgroup subsys cpu 2017-11-07T10:09:36.254481-08:00 NOTICE kernel: [ 0.000000] Linux version 3.8.11 (chrome-bot@cros-beefy173-c2) (gcc version 4.9.x 20150123 (prerelease) (4.9.2_cos_gg_4.9.2-r166-0c5a656a1322e 137fa4a251f2ccc6c4022918c0a_4.9.2-r166) ) #1 SMP Mon Nov 6 22:08:11 PST 2017 2017-11-07T10:09:36.254491-08:00 INFO kernel: [ 0.000000] Command line: cros_secure console= loglevel=7 init=/sbin/init cros_secure oops=panic panic=-1 root=/dev/dm-0 rootwait ro dm_verity. error_behavior=3 dm_verity.max_bios=-1 dm_verity.dev_wait=1 dm="1 vroot none ro 1,0 2539520 verity payload=PARTUUID=51aeba08-f134-0144-bbfa-69972a95d601/PARTNROFF=1 hashtree=PARTUUID=51aeba08-f 134-0144-bbfa-69972a95d601/PARTNROFF=1 hashstart=2539520 alg=sha1 root_hexdigest=62753a6c7c05e6fb9dcc8f73e1350d95aa36f855 salt=bfa45950d45b8d4d92e94e9896f9333cbd69fa6362a4c27254f3c0ffa01d4cd8" noinitrd vt.global_cursor_default=0 kern_guid=51aeba08-f134-0144-bbfa-69972a95d601 add_efi_memmap boot=local noresume noswap i915.modeset=1 tpm_tis.force=1 tpm_tis.interrupts=0 nmi_watchdog=pan ic,lapic iTCO_vendor_support.vendorsupport=3 ... Here's the relevant part of powerd_suspend: log_msg "Finalizing suspend" # Make sure that hwclock is cached for when we wake up, this way we don't # wait for io and get a more accurate time for wakeup # -V prints a little version information and exits, loading .so but not # making actual RTC accesses. Do *NOT* change this to a normal execution... # it will trigger an RTC interrupt that may count as a wakeup reason, abort # your suspend and cause you hours of pain and confusion! if [ -f "${TIMESTAMP_FILE}" ]; then /sbin/hwclock -V > /dev/null else # We don't want to overwrite the alarm for whatever test might be # running. setup_dark_resume fi # Echo freeze to /sys/power/state on platforms that support suspend to idle. action='mem' if [ "${FLAGS_suspend_to_idle}" -eq 0 ]; then action='freeze' fi # Suspend to RAM. This is piped through cat since we want to determine # the causes of failures -- dash's "echo" command appears to not print # anything in response to write errors. error=$( (printf '%s' "${action}" | cat > /sys/power/state) 2>&1) ...
,
Nov 7 2017
,
Nov 7 2017
I don't know anything about this beyond what I put in #4. Someone on the kernel side should take a look.
,
Nov 7 2017
,
Nov 7 2017
Re: to comment #3 - Not able to reproduce on butterfly with same version.
,
Nov 7 2017
,
Nov 9 2017
Easily reproducible. Occurs 100% of the time on my stumpy. Device appears stuck in suspend or at least early resume as power button led is still flashing Only recovery I've found is hard reset (no USB keyboard, mouse or RTC wake). Nothing interesting in logs except that unclean shutdown occurred on previous suspend. /sys/power/pm_test doesn't expose a failure. Despite having 29 DUTS in the lab it appears the power_Resume test has only run once in 60 days https://wmatrix.googleplex.com/unfiltered?suites=power_build&tests=power_Resume&days_back=60&platforms=stumpy and passed on R64 10041.0.0 So no easy way to identify when regression was introduced unfortunately.
,
Nov 9 2017
At least for my stumpy I can recreate this failure on R64-10041.0.0 R64-10035.0.0 R63-10032.30.0 R62-9901.66.0 So its not isolated to beta, The singular lab pass, https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/149599519-chromeos-test/chromeos4-row2-rack8-host19 Looks legit, seconds_system_resume{perf}=1.73400402069 But I can't repro it pass locally. Host is, chromeos4-row2-rack8-host19 Verifying chromeos-server12.cbf.corp.google.com False None stumpy board:stumpy, pool:suites, bluetooth, hw_video_acc_h264, hw_video_acc_enc_h264, storage:ssd, power:AC_only, audio_loopback_dongle, os:cros, variant:stumpy, sku:stumpy_intel_i5_2450m_2500mhz_4Gb, cts_abi_x86, cts_abi_arm, model:stumpy, cros-version:stumpy-release/R64-10044.0.0
,
Nov 9 2017
,
Nov 29 2017
Note this may also be reproducible on other devices w/ similar SoC running 3.8 kernel. At least one report on link (3.8 ivybridge)
,
Dec 4 2017
Hi, this is tagged as a beta blocker for M64. We're not that far out and this bug hasn't been updated in a while. Ping? Thanks...
,
Dec 6 2017
Still able to reproduce this issue on M64 build 10176.5.0
,
Dec 7 2017
,
Dec 7 2017
M64 beta is targeted for next Tuesday (12-Dec); anything we can do to close this out and/or retag as stable block is appreciated. I'll ping the owner as well. Thanks!
,
Dec 8 2017
Broken in all of these releases as well, R59-9460.73.0 R58-9334.72.0 R57-9202.64.0 R56-9000.56.0 Also note while repro on test image appears to be 100% normal users are succeeding at resume (Power.KernelResumeTimeOnAC UMA shows values) sometimes. Given minimal amount of user feedback regarding this I suspect there may be something specific to test image that has soured this. Going to punt ReleaseBlock down the road to stable & have a look at repro on normal mode.
,
Dec 19 2017
This may be a hardware failure and same issue affecting crbug.com/792562#c45 as I now have a 2nd stumpy device that suspend/resumes succesfully w/ exact same SW that fails on previous unit.
,
Jan 4 2018
Removing as a M64 blocker since clearly not a regression....
,
Jan 4 2018
,
Jan 4 2018
This was a beta blocker in 63 but never got fixed. We should get the fix in 64 as it's a pretty serious issue
,
Jan 4 2018
,
Jan 5 2018
Given, 1) there's evidence that this is a hardware failure on specific units only w/in google (c#19) 2) no field reports of similar failure 3) current repro occurs running test images as far back as R56 I'm not sure the I agree w/ priority bump or re-adding the releaseBlock-Beta. In any case I can't dedicate any more time at this point. Marking available should others have more cycles to continue the effort and happy to offer up the two units (1 pass, 1 fail) if needed.
,
Jan 7 2018
Changing to release block stable. Blocking != priority; it should only be used if we can't release beta. We'll take another look this week.
,
Jan 8 2018
Punting RBS based on same assessment of crbug.com/792562#c48 (HW related) and in fact bugs may be result of same HW failure as unit(s) that fail have both failures.
,
Apr 24 2018
stumpy is EOL now, and it seems like the issues here are hardware-specific rather than common OS software |
|||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||
Comment 1 by pgangishetty@chromium.org
, Nov 7 2017