Chrome_ChromeOS: Crash Report - [Assert] chromeos::RendererFreezer::OnThawRenderersComplete |
|||||||||||||
Issue descriptionIn top 20 crashes in M-55 dev/beta: Product name: Chrome_ChromeOS Magic Signature: [Assert] chromeos::RendererFreezer::OnThawRenderersComplete Current link: https://crash.corp.google.com/browse?q=product.name%3D'Chrome_ChromeOS'%20AND%20product.version%3D'55.0.2883.29'%20AND%20custom_data.ChromeCrashProto.ptype%3D'browser'%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D'%5BAssert%5D%20chromeos%3A%3ARendererFreezer%3A%3AOnThawRenderersComplete'%20AND%20ReportID%3D'96061a7900000000'&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#3 Search properties: product.name: Chrome_ChromeOS product.version: 55.0.2883.29 custom_data.chromecrashproto.ptype: browser custom_data.chromecrashproto.magic_signature_1.name: [Assert] chromeos::RendererFreezer::OnThawRenderersComplete reportid: 96061a7900000000 Metadata : Product Name: Chrome_ChromeOS Product Version: 55.0.2883.29 Report ID: 96061a7900000000 Report Time: Tue, 01 Nov 2016 18:52:25 GMT Uptime: 62520926 ms Cumulative Uptime: 0 ms User Email: OS Name: Linux OS Version: 0.0.0 Linux 3.8.11 #1 SMP Wed Oct 26 21:49:16 PDT 2016 x86_64 CPU Architecture: amd64 CPU Info: family 6 model 69 stepping 1 Thread 0 CRASHED [SIGABRT @ 0x000003e8000001eb ] MAGIC SIGNATURE THREAD 0x00007fb7e9febb82 (libc-2.19.so -raise.c:56 ) raise 0x00007fb7e9fed89f (libc-2.19.so -abort.c:89 ) abort 0x00007fb7edf3dbb4 (chrome -debugger_posix.cc:249 ) base::debug::BreakDebugger 0x00007fb7edf4da74 (chrome -logging.cc:748 ) logging::LogMessage::~LogMessage 0x00007fb7ed9e8c37 (chrome -renderer_freezer.cc:139 ) chromeos::RendererFreezer::OnThawRenderersComplete 0x00007fb7ecb6a8bd (chrome -callback.h:64 ) base::internal::Invoker<base::internal::BindState<base::Callback<void(bool), (base::internal::CopyMode)1, (base::internal::RepeatMode)1>, bool>, void()>::Run 0x00007fb7ec71954c (chrome -callback.h:64 ) base::debug::TaskAnnotator::RunTask 0x00007fb7ec708616 (chrome -message_loop.cc:405 ) base::MessageLoop::DoWork 0x00007fb7ec708eb2 (chrome -message_pump_libevent.cc:217 ) base::MessagePumpLibevent::Run 0x00007fb7edf6be07 (chrome -run_loop.cc:35 ) base::RunLoop::Run 0x00007fb7edc56a24 (chrome -chrome_browser_main.cc:2116 ) ChromeBrowserMainParts::MainMessageLoopRun 0x00007fb7ed32b34a (chrome -browser_main_loop.cc:981 ) content::BrowserMainLoop::RunMainMessageLoopParts 0x00007fb7ed32d044 (chrome -browser_main_runner.cc:155 ) content::BrowserMainRunnerImpl::Run 0x00007fb7ed327c3b (chrome -browser_main.cc:46 ) content::BrowserMain 0x00007fb7edbf9050 (chrome -content_main_runner.cc:779 ) content::ContentMainRunnerImpl::Run 0x00007fb7edbf7bea (chrome -content_main.cc:20 ) content::ContentMain 0x00007fb7ec97e195 (chrome -chrome_main.cc:97 ) ChromeMain 0x00007fb7e9fd6fb5 (libc-2.19.so -libc-start.c:292 ) __libc_start_main 0x00007fb7ec97dfe4 (chrome + 0x011c6fe4 ) _start 0x00007ffd23328ce7
,
Nov 1 2016
... [491:1153:1101/131012:ERROR:client_native_pixmap_dmabuf.cc(54)] Failed DMA_BUF_SYNC_END: Invalid argument [491:1153:1101/131031:ERROR:client_native_pixmap_dmabuf.cc(54)] Failed DMA_BUF_SYNC_END: Invalid argument [491:1147:1101/131729:ERROR:freezer_cgroup_process_manager.cc(108)] Writing THAWED to /sys/fs/cgroup/freezer/chrome_renderers/to_be_frozen/freezer.state failed: Bad file descriptor [491:491:1101/131729:VERBOSE1:display_configurator.cc(877)] SetDisplayPower: power_state=ALL_OFF flags=0, configure timer=Stopped [491:491:1101/131730:ERROR:device_event_log_impl.cc(140)] [13:17:30.037] Network: device_event_log.cc:117 @@@ Slow method: ../../../../../../../home/chrome-bot/chrome_root/src/chromeos/network/network_state_handler.cc:ManagedStateListChanged: 71ms [491:491:1101/131730:FATAL:renderer_freezer.cc(139)] Unable to thaw renderers.
,
Nov 1 2016
Some of the reports show that it fails with EBADF and others show that it fails with ENOENT. This appears to be a cgroup problem though because the session_manager logs from the same crash reports show that it is also unable to write to the freezer cgroup. I wonder if there was some ARC++ changed that caused this. This is mainly because the session_manager is apparently using the cgroup to manage the ARC++ container and I don't know of anyone else making any changes to this code (in chrome or the kernel).
,
Nov 1 2016
The EBADF errors in particular strongly suggest some kernel bug because chrome opens the file and immediately writes to it, i.e., it doesn't keep a long-lived fd around for the cgroup. So something wonky is happening in the kernel if the write call is returning EBADF immediately after the file was opened.
,
Nov 1 2016
Yep, I just reached the same conclusion. Here's the code in base/files/file_util_posix.cc that's failing and leaving EBADF set:
---
int WriteFile(const FilePath& filename, const char* data, int size) {
ThreadRestrictions::AssertIOAllowed();
int fd = HANDLE_EINTR(creat(filename.value().c_str(), 0666));
if (fd < 0)
return -1;
int bytes_written = WriteFileDescriptor(fd, data, size) ? size : -1;
if (IGNORE_EINTR(close(fd)) < 0)
return -1;
return bytes_written;
}
bool WriteFileDescriptor(const int fd, const char* data, int size) {
// Allow for partial writes.
ssize_t bytes_written_total = 0;
for (ssize_t bytes_written_partial = 0; bytes_written_total < size;
bytes_written_total += bytes_written_partial) {
bytes_written_partial =
HANDLE_EINTR(write(fd, data + bytes_written_total,
size - bytes_written_total));
if (bytes_written_partial < 0)
return false;
}
return true;
}
---
I see a few places where EBADF appears to be returned by the kernel's cgroup code: http://lxr.free-electrons.com/source/kernel/cgroup.c
,
Nov 1 2016
+dtor, who recently backported a number of cgroup related changes for arc++
,
Nov 1 2016
https://crash.corp.google.com/browse?q=product.name%3D%27Chrome_ChromeOS%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27browser%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27%5BAssert%5D%20chromeos%3A%3ARendererFreezer%3A%3AOnThawRenderersComplete%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D&stbtiq=&reportid=2bda7e1b00000000 Seems like these crashes started happening in 53 and are spread out across all the kernel versions: 3.8, 3.10, 3.14, etc. So it's unlikely that it's a kernel change since the earlier kernel versions didn't get the backports. However, it does seem more likely that it's an ARC++ change. Maybe something session_manager that's not dependent on the kernel version.
,
Nov 1 2016
So the session_manager failures appear to be unrelated to this crash. The path to the session_manager cgroup was wrong: https://chromium-review.googlesource.com/c/399965/ I'm still leaning towards this being an ARC++ related change since we're seeing these crashes on 3.8 kernels and the writes are failing with EBADF.
,
Nov 4 2016
,
Nov 17 2016
The code that's crashing hasn't changed in a long time while these crashes appear to be very recent and happen to coincide with the release of ARC++. Any thoughts on what might have changed to start causing this?
,
Nov 17 2016
Same here using a custom Chromium OS build from yesterday's trunk. It sometimes crashes when the lid is opened, which has been happening since 55 with official builds. Platform: 8996.0.2016_11_16_1937 (Developer Build - zhaofeng) developer-build veyron [670:670:1117/101444:VERBOSE2:webui_screen_locker.cc(237)] Lock screen signin screen is ready [670:989:1117/101444:ERROR:freezer_cgroup_process_manager.cc(108)] Writing 11847 to /sys/fs/cgroup/freezer/chrome_renderers/cgroup.procs failed: No such file or directory [670:670:1117/101444:VERBOSE1:signin_screen_handler.cc(1255)] Login WebUI >> loginVisible, src: account-picker, webui_visible_: 1 [670:670:1117/101444:VERBOSE1:gaia_screen_handler.cc(392)] OnPortalDetectionCompleted Online [670:670:1117/101444:VERBOSE1:lock_state_controller.cc(507)] PostLockAnimationFinished [670:989:1117/101445:ERROR:freezer_cgroup_process_manager.cc(108)] Writing FROZEN to /sys/fs/cgroup/freezer/chrome_renderers/to_be_frozen/freezer.state failed: No such file or directory [670:670:1117/110546:VERBOSE1:drm_display_host_manager.cc(243)] Got display event CHANGE for /dev/dri/card1 [670:670:1117/110546:VERBOSE1:display_configurator.cc(922)] Displays are currently suspended. Not attempting to reconfigure them. [670:989:1117/110546:ERROR:freezer_cgroup_process_manager.cc(108)] Writing THAWED to /sys/fs/cgroup/freezer/chrome_renderers/to_be_frozen/freezer.state failed: No such file or directory [670:670:1117/110546:VERBOSE1:display_configurator.cc(887)] SetDisplayPower: power_state=ALL_ON flags=0, configure timer=Stopped [670:670:1117/110546:FATAL:renderer_freezer.cc(139)] Unable to thaw renderers. #0 0x0000b24ab7ce <unknown> #1 0x0000b24bea22 <unknown> #2 0x0000b1f044ae <unknown> #3 0x0000b0f2e81e <unknown> #4 0x0000b251ef1e <unknown> #5 0x0000b24c5f8e <unknown> #6 0x0000b24c6296 <unknown> #7 0x0000b24c7762 <unknown> #8 0x0000b24c7c72 <unknown> #9 0x0000b24c58d8 <unknown> #10 0x0000b24e22d0 <unknown> #11 0x0000b21a09b4 <unknown> #12 0x0000b183a200 <unknown> #13 0x0000b183c42a <unknown> #14 0x0000b18369ec <unknown> #15 0x0000b2142b86 <unknown> #16 0x0000b2142212 <unknown> #17 0x0000b0d89c24 <unknown> #18 0x0000b025a308 __libc_start_main
,
Nov 17 2016
By the way, the device is veyron_speedy and CrOS was built with the veyron overlay as there isn't a public overlay for speedy yet ( Bug 525815 ). The crash only happens when the lid is opened. uname -a: Linux localhost 3.14.0 #1 SMP PREEMPT Wed Nov 16 15:49:38 PST 2016 armv7l ARMv7 Processor rev 1 (v7l) Rockchip (Device Tree) GNU/Linux Possibly related user reports: https://www.reddit.com/r/chromeos/comments/56pe2f/chromebook_keeps_crashing_and_rebooting_when/ https://www.reddit.com/r/chromeos/comments/561a57/acer_r11_keeps_crashing_everytime_i_shut_the_lid/
,
Nov 17 2016
#11: Your error looks different: [670:989:1117/110546:ERROR:freezer_cgroup_process_manager.cc(108)] Writing THAWED to /sys/fs/cgroup/freezer/chrome_renderers/to_be_frozen/freezer.state failed: No such file or directory
,
Nov 17 2016
+derat, um, interesting. But as mentioned in #3, there are reports with ENOENT, too.
,
Nov 17 2016
Good point. Sameer, can you conscript someone to look at this from the kernel side? If this is causing a bunch of crashes, I can probably disable this code on systems that aren't using dark resume (which is maybe supposed to be all systems?), as described in issue 649350. That's just papering over the fact that cgroup freezers aren't working the way we expect, though... which makes me uneasy.
,
Nov 18 2016
+few kernel folks in the hopes that someone has cycles to look into this.
,
Dec 14 2016
I'm going to check in a workaround so Chrome doesn't abort on failing to thaw if it previously failed to freeze. Someone still needs to figure out the underlying cgroup issue.
,
Dec 14 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d66436c2b7a48ae82eda1af894fa79b81a5ffaad commit d66436c2b7a48ae82eda1af894fa79b81a5ffaad Author: derat <derat@chromium.org> Date: Wed Dec 14 21:19:09 2016 chromeos: Avoid crash on thaw failure if freezing failed. Avoid an "Unable to thaw renderers." LOG(FATAL) in RendererFreezer if freezing the renderers also failed earlier. This is a workaround for unexplained EBADF and ENOENT errors seen when writing to cgroups. BUG= chromium:661310 TEST=manual: log in and chmod freezer.state in /sys/fs/cgroup/freezer/chrome_renderers/to_be_frozen to 444; suspend and resume and check that chrome doesn't abort Review-Url: https://codereview.chromium.org/2575933002 Cr-Commit-Position: refs/heads/master@{#438625} [modify] https://crrev.com/d66436c2b7a48ae82eda1af894fa79b81a5ffaad/chrome/browser/chromeos/power/freezer_cgroup_process_manager.cc
,
Dec 14 2016
We should consider merging the workaround from #18 to M56. It's low-risk (since it just makes us skip a LOG(FATAL)) and should prevent some user-visible crashes on resume until someone figures out what the underlying cgroup issue is.
,
Dec 14 2016
,
Dec 15 2016
,
Dec 15 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f0bf44baaa4b984dd3512d4f9565232919c69637 commit f0bf44baaa4b984dd3512d4f9565232919c69637 Author: Daniel Erat <derat@chromium.org> Date: Thu Dec 15 18:59:48 2016 chromeos: Avoid crash on thaw failure if freezing failed. Avoid an "Unable to thaw renderers." LOG(FATAL) in RendererFreezer if freezing the renderers also failed earlier. This is a workaround for unexplained EBADF and ENOENT errors seen when writing to cgroups. BUG= chromium:661310 TEST=manual: log in and chmod freezer.state in /sys/fs/cgroup/freezer/chrome_renderers/to_be_frozen to 444; suspend and resume and check that chrome doesn't abort Review-Url: https://codereview.chromium.org/2575933002 Cr-Commit-Position: refs/heads/master@{#438625} (cherry picked from commit d66436c2b7a48ae82eda1af894fa79b81a5ffaad) Review-Url: https://codereview.chromium.org/2585473002 . Cr-Commit-Position: refs/branch-heads/2924@{#514} Cr-Branched-From: 3a87aecc31cd1ffe751dd72c04e5a96a1fc8108a-refs/heads/master@{#433059} [modify] https://crrev.com/f0bf44baaa4b984dd3512d4f9565232919c69637/chrome/browser/chromeos/power/freezer_cgroup_process_manager.cc
,
Dec 15 2016
(Note that this bug should remain open until the underlying cgroup issue that's causing the write errors is found and fixed.)
,
Dec 16 2016
[Automated comment] There appears to be on-going work (i.e. bugroid changes), needs manual review.
,
Dec 16 2016
Removing the merge tags to avoid incorrect automated comments. :-)
,
Jan 26 2017
Issue 672766 has been merged into this issue.
,
Jan 18
(5 days ago)
Assuming these are obsolete given their idle status. Re-open if you disagree |
|||||||||||||
►
Sign in to add a comment |
|||||||||||||
Comment 1 by derat@chromium.org
, Nov 1 2016132 void RendererFreezer::OnThawRenderersComplete(bool success) { 133 if (success) 134 return; 135 136 // We failed to write the thaw command and the renderers are still frozen. We 137 // are in big trouble because none of the tabs will be responsive so let's 138 // crash the browser instead. 139 LOG(FATAL) << "Unable to thaw renderers."; 140 } I'll look at some crash reports.