New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 661310 link

Starred by 4 users

Issue metadata

Status: WontFix
Owner:
Closed: Jan 18
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Chrome_ChromeOS: Crash Report - [Assert] chromeos::RendererFreezer::OnThawRenderersComplete

Project Member Reported by zelidrag@chromium.org, Nov 1 2016

Issue description

In top 20 crashes in M-55 dev/beta:

Product name: Chrome_ChromeOS
Magic Signature: [Assert] chromeos::RendererFreezer::OnThawRenderersComplete

Current link:
https://crash.corp.google.com/browse?q=product.name%3D'Chrome_ChromeOS'%20AND%20product.version%3D'55.0.2883.29'%20AND%20custom_data.ChromeCrashProto.ptype%3D'browser'%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D'%5BAssert%5D%20chromeos%3A%3ARendererFreezer%3A%3AOnThawRenderersComplete'%20AND%20ReportID%3D'96061a7900000000'&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#3


Search properties:
product.name: Chrome_ChromeOS
product.version: 55.0.2883.29
custom_data.chromecrashproto.ptype: browser
custom_data.chromecrashproto.magic_signature_1.name: [Assert] chromeos::RendererFreezer::OnThawRenderersComplete
reportid: 96061a7900000000

Metadata :
Product Name: Chrome_ChromeOS
Product Version: 55.0.2883.29
Report ID: 96061a7900000000
Report Time: Tue, 01 Nov 2016 18:52:25 GMT
Uptime: 62520926 ms
Cumulative Uptime: 0 ms
User Email: 
OS Name: Linux
OS Version: 0.0.0 Linux 3.8.11 #1 SMP Wed Oct 26 21:49:16 PDT 2016 x86_64
CPU Architecture: amd64
CPU Info: family 6 model 69 stepping 1

Thread 0 CRASHED [SIGABRT @ 0x000003e8000001eb ] MAGIC SIGNATURE THREAD
0x00007fb7e9febb82	(libc-2.19.so -raise.c:56 )	raise
0x00007fb7e9fed89f	(libc-2.19.so -abort.c:89 )	abort
0x00007fb7edf3dbb4	(chrome -debugger_posix.cc:249 )	base::debug::BreakDebugger
0x00007fb7edf4da74	(chrome -logging.cc:748 )	logging::LogMessage::~LogMessage
0x00007fb7ed9e8c37	(chrome -renderer_freezer.cc:139 )	chromeos::RendererFreezer::OnThawRenderersComplete
0x00007fb7ecb6a8bd	(chrome -callback.h:64 )	base::internal::Invoker<base::internal::BindState<base::Callback<void(bool), (base::internal::CopyMode)1, (base::internal::RepeatMode)1>, bool>, void()>::Run
0x00007fb7ec71954c	(chrome -callback.h:64 )	base::debug::TaskAnnotator::RunTask
0x00007fb7ec708616	(chrome -message_loop.cc:405 )	base::MessageLoop::DoWork
0x00007fb7ec708eb2	(chrome -message_pump_libevent.cc:217 )	base::MessagePumpLibevent::Run
0x00007fb7edf6be07	(chrome -run_loop.cc:35 )	base::RunLoop::Run
0x00007fb7edc56a24	(chrome -chrome_browser_main.cc:2116 )	ChromeBrowserMainParts::MainMessageLoopRun
0x00007fb7ed32b34a	(chrome -browser_main_loop.cc:981 )	content::BrowserMainLoop::RunMainMessageLoopParts
0x00007fb7ed32d044	(chrome -browser_main_runner.cc:155 )	content::BrowserMainRunnerImpl::Run
0x00007fb7ed327c3b	(chrome -browser_main.cc:46 )	content::BrowserMain
0x00007fb7edbf9050	(chrome -content_main_runner.cc:779 )	content::ContentMainRunnerImpl::Run
0x00007fb7edbf7bea	(chrome -content_main.cc:20 )	content::ContentMain
0x00007fb7ec97e195	(chrome -chrome_main.cc:97 )	ChromeMain
0x00007fb7e9fd6fb5	(libc-2.19.so -libc-start.c:292 )	__libc_start_main
0x00007fb7ec97dfe4	(chrome + 0x011c6fe4 )	_start
0x00007ffd23328ce7	
 

Comment 1 by derat@chromium.org, Nov 1 2016

132 void RendererFreezer::OnThawRenderersComplete(bool success) {
133   if (success)
134     return;
135 
136   // We failed to write the thaw command and the renderers are still frozen. We
137   // are in big trouble because none of the tabs will be responsive so let's
138   // crash the browser instead.
139   LOG(FATAL) << "Unable to thaw renderers.";
140 }

I'll look at some crash reports.

Comment 2 by derat@chromium.org, Nov 1 2016

...
[491:1153:1101/131012:ERROR:client_native_pixmap_dmabuf.cc(54)] Failed DMA_BUF_SYNC_END: Invalid argument
[491:1153:1101/131031:ERROR:client_native_pixmap_dmabuf.cc(54)] Failed DMA_BUF_SYNC_END: Invalid argument
[491:1147:1101/131729:ERROR:freezer_cgroup_process_manager.cc(108)] Writing THAWED to /sys/fs/cgroup/freezer/chrome_renderers/to_be_frozen/freezer.state failed: Bad file descriptor
[491:491:1101/131729:VERBOSE1:display_configurator.cc(877)] SetDisplayPower: power_state=ALL_OFF flags=0, configure timer=Stopped
[491:491:1101/131730:ERROR:device_event_log_impl.cc(140)] [13:17:30.037] Network: device_event_log.cc:117 @@@ Slow method: ../../../../../../../home/chrome-bot/chrome_root/src/chromeos/network/network_state_handler.cc:ManagedStateListChanged: 71ms
[491:491:1101/131730:FATAL:renderer_freezer.cc(139)] Unable to thaw renderers.
Cc: dgreid@chromium.org derat@chromium.org
Some of the reports show that it fails with EBADF and others show that it fails with ENOENT.  This appears to be a cgroup problem though because the session_manager logs from the same crash reports show that it is also unable to write to the freezer cgroup.

I wonder if there was some ARC++ changed that caused this.  This is mainly because the session_manager is apparently using the cgroup to manage the ARC++ container and I don't know of anyone else making any changes to this code (in chrome or the kernel).
The EBADF errors in particular strongly suggest some kernel bug because chrome  opens the file and immediately writes to it, i.e., it doesn't keep a long-lived fd around for the cgroup.  So something wonky is happening in the kernel if the write call is returning EBADF immediately after the file was opened.

Comment 5 by derat@chromium.org, Nov 1 2016

Yep, I just reached the same conclusion. Here's the code in base/files/file_util_posix.cc that's failing and leaving EBADF set:

---

int WriteFile(const FilePath& filename, const char* data, int size) {
  ThreadRestrictions::AssertIOAllowed();
  int fd = HANDLE_EINTR(creat(filename.value().c_str(), 0666));
  if (fd < 0)
    return -1;

  int bytes_written = WriteFileDescriptor(fd, data, size) ? size : -1;
  if (IGNORE_EINTR(close(fd)) < 0)
    return -1;
  return bytes_written;
}

bool WriteFileDescriptor(const int fd, const char* data, int size) {
  // Allow for partial writes.
  ssize_t bytes_written_total = 0;
  for (ssize_t bytes_written_partial = 0; bytes_written_total < size;
       bytes_written_total += bytes_written_partial) {
    bytes_written_partial =
        HANDLE_EINTR(write(fd, data + bytes_written_total,
                           size - bytes_written_total));
    if (bytes_written_partial < 0)
      return false;
  }

  return true;
}

---

I see a few places where EBADF appears to be returned by the kernel's cgroup code: http://lxr.free-electrons.com/source/kernel/cgroup.c
Cc: dtor@chromium.org snanda@chromium.org
+dtor, who recently backported a number of cgroup related changes for arc++
https://crash.corp.google.com/browse?q=product.name%3D%27Chrome_ChromeOS%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27browser%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27%5BAssert%5D%20chromeos%3A%3ARendererFreezer%3A%3AOnThawRenderersComplete%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D&stbtiq=&reportid=2bda7e1b00000000

Seems like these crashes started happening in 53 and are spread out across all the kernel versions: 3.8, 3.10, 3.14, etc.  So it's unlikely that it's a kernel change since the earlier kernel versions didn't get the backports.  However, it does seem more likely that it's an ARC++ change.  Maybe something session_manager that's not dependent on the kernel version.
So the session_manager failures appear to be unrelated to this crash.  The path to the session_manager cgroup was wrong: https://chromium-review.googlesource.com/c/399965/

I'm still leaning towards this being an ARC++ related change since we're seeing these crashes on 3.8 kernels and the writes are failing with EBADF.

Comment 9 by derat@chromium.org, Nov 4 2016

Components: -UI>Browser>Sessions
Labels: -Restrict-View-Google
The code that's crashing hasn't changed in a long time while these crashes appear to be very recent and happen to coincide with the release of ARC++.  Any thoughts on what might have changed to start  causing this?
Same here using a custom Chromium OS build from yesterday's trunk. It sometimes crashes when the lid is opened, which has been happening since 55 with official builds.

Platform: 8996.0.2016_11_16_1937 (Developer Build - zhaofeng) developer-build veyron

[670:670:1117/101444:VERBOSE2:webui_screen_locker.cc(237)] Lock screen signin screen is ready
[670:989:1117/101444:ERROR:freezer_cgroup_process_manager.cc(108)] Writing 11847 to /sys/fs/cgroup/freezer/chrome_renderers/cgroup.procs failed: No such file or directory
[670:670:1117/101444:VERBOSE1:signin_screen_handler.cc(1255)] Login WebUI >> loginVisible, src: account-picker, webui_visible_: 1
[670:670:1117/101444:VERBOSE1:gaia_screen_handler.cc(392)] OnPortalDetectionCompleted Online
[670:670:1117/101444:VERBOSE1:lock_state_controller.cc(507)] PostLockAnimationFinished
[670:989:1117/101445:ERROR:freezer_cgroup_process_manager.cc(108)] Writing FROZEN to /sys/fs/cgroup/freezer/chrome_renderers/to_be_frozen/freezer.state failed: No such file or directory
[670:670:1117/110546:VERBOSE1:drm_display_host_manager.cc(243)] Got display event CHANGE for /dev/dri/card1
[670:670:1117/110546:VERBOSE1:display_configurator.cc(922)] Displays are currently suspended.  Not attempting to reconfigure them.
[670:989:1117/110546:ERROR:freezer_cgroup_process_manager.cc(108)] Writing THAWED to /sys/fs/cgroup/freezer/chrome_renderers/to_be_frozen/freezer.state failed: No such file or directory
[670:670:1117/110546:VERBOSE1:display_configurator.cc(887)] SetDisplayPower: power_state=ALL_ON flags=0, configure timer=Stopped
[670:670:1117/110546:FATAL:renderer_freezer.cc(139)] Unable to thaw renderers.
#0 0x0000b24ab7ce <unknown>
#1 0x0000b24bea22 <unknown>
#2 0x0000b1f044ae <unknown>
#3 0x0000b0f2e81e <unknown>
#4 0x0000b251ef1e <unknown>
#5 0x0000b24c5f8e <unknown>
#6 0x0000b24c6296 <unknown>
#7 0x0000b24c7762 <unknown>
#8 0x0000b24c7c72 <unknown>
#9 0x0000b24c58d8 <unknown>
#10 0x0000b24e22d0 <unknown>
#11 0x0000b21a09b4 <unknown>
#12 0x0000b183a200 <unknown>
#13 0x0000b183c42a <unknown>
#14 0x0000b18369ec <unknown>
#15 0x0000b2142b86 <unknown>
#16 0x0000b2142212 <unknown>
#17 0x0000b0d89c24 <unknown>
#18 0x0000b025a308 __libc_start_main
By the way, the device is veyron_speedy and CrOS was built with the veyron overlay as there isn't a public overlay for speedy yet ( Bug 525815 ). The crash only happens when the lid is opened.

uname -a:
Linux localhost 3.14.0 #1 SMP PREEMPT Wed Nov 16 15:49:38 PST 2016 armv7l ARMv7 Processor rev 1 (v7l) Rockchip (Device Tree) GNU/Linux

Possibly related user reports:
https://www.reddit.com/r/chromeos/comments/56pe2f/chromebook_keeps_crashing_and_rebooting_when/
https://www.reddit.com/r/chromeos/comments/561a57/acer_r11_keeps_crashing_everytime_i_shut_the_lid/

Comment 13 by derat@chromium.org, Nov 17 2016

#11: Your error looks different:

[670:989:1117/110546:ERROR:freezer_cgroup_process_manager.cc(108)] Writing THAWED to /sys/fs/cgroup/freezer/chrome_renderers/to_be_frozen/freezer.state failed: No such file or directory
+derat, um, interesting. But as mentioned in #3, there are reports with ENOENT, too.

Comment 15 by derat@chromium.org, Nov 17 2016

Cc: chirantan@chromium.org
Owner: snanda@chromium.org
Good point.

Sameer, can you conscript someone to look at this from the kernel side?

If this is causing a bunch of crashes, I can probably disable this code on systems that aren't using dark resume (which is maybe supposed to be all systems?), as described in issue 649350. That's just papering over the fact that cgroup freezers aren't working the way we expect, though... which makes me uneasy.
Cc: sonnyrao@chromium.org groeck@chromium.org
+few kernel folks in the hopes that someone has cycles to look into this.

Comment 17 by derat@chromium.org, Dec 14 2016

I'm going to check in a workaround so Chrome doesn't abort on failing to thaw if it previously failed to freeze. Someone still needs to figure out the underlying cgroup issue.
Project Member

Comment 18 by bugdroid1@chromium.org, Dec 14 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d66436c2b7a48ae82eda1af894fa79b81a5ffaad

commit d66436c2b7a48ae82eda1af894fa79b81a5ffaad
Author: derat <derat@chromium.org>
Date: Wed Dec 14 21:19:09 2016

chromeos: Avoid crash on thaw failure if freezing failed.

Avoid an "Unable to thaw renderers." LOG(FATAL) in
RendererFreezer if freezing the renderers also failed
earlier. This is a workaround for unexplained EBADF and
ENOENT errors seen when writing to cgroups.

BUG= chromium:661310 
TEST=manual: log in and chmod freezer.state in
     /sys/fs/cgroup/freezer/chrome_renderers/to_be_frozen
     to 444; suspend and resume and check that chrome
     doesn't abort

Review-Url: https://codereview.chromium.org/2575933002
Cr-Commit-Position: refs/heads/master@{#438625}

[modify] https://crrev.com/d66436c2b7a48ae82eda1af894fa79b81a5ffaad/chrome/browser/chromeos/power/freezer_cgroup_process_manager.cc

Comment 19 by derat@chromium.org, Dec 14 2016

Labels: -M-55 M-56 Merge-Request-56
We should consider merging the workaround from #18 to M56. It's low-risk (since it just makes us skip a LOG(FATAL)) and should prevent some user-visible crashes on resume until someone figures out what the underlying cgroup issue is.
Labels: OS-Chrome
Labels: Merge-Approved-56
Project Member

Comment 22 by bugdroid1@chromium.org, Dec 15 2016

Labels: -merge-approved-56 merge-merged-2924
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f0bf44baaa4b984dd3512d4f9565232919c69637

commit f0bf44baaa4b984dd3512d4f9565232919c69637
Author: Daniel Erat <derat@chromium.org>
Date: Thu Dec 15 18:59:48 2016

chromeos: Avoid crash on thaw failure if freezing failed.

Avoid an "Unable to thaw renderers." LOG(FATAL) in
RendererFreezer if freezing the renderers also failed
earlier. This is a workaround for unexplained EBADF and
ENOENT errors seen when writing to cgroups.

BUG= chromium:661310 
TEST=manual: log in and chmod freezer.state in
     /sys/fs/cgroup/freezer/chrome_renderers/to_be_frozen
     to 444; suspend and resume and check that chrome
     doesn't abort

Review-Url: https://codereview.chromium.org/2575933002
Cr-Commit-Position: refs/heads/master@{#438625}
(cherry picked from commit d66436c2b7a48ae82eda1af894fa79b81a5ffaad)

Review-Url: https://codereview.chromium.org/2585473002 .
Cr-Commit-Position: refs/branch-heads/2924@{#514}
Cr-Branched-From: 3a87aecc31cd1ffe751dd72c04e5a96a1fc8108a-refs/heads/master@{#433059}

[modify] https://crrev.com/f0bf44baaa4b984dd3512d4f9565232919c69637/chrome/browser/chromeos/power/freezer_cgroup_process_manager.cc

Comment 23 by derat@chromium.org, Dec 15 2016

(Note that this bug should remain open until the underlying cgroup issue that's causing the write errors is found and fixed.)

Comment 24 by dimu@chromium.org, Dec 16 2016

Labels: -Merge-Request-56 Merge-Review-56 Hotlist-Merge-Review
[Automated comment] There appears to be on-going work (i.e. bugroid changes), needs manual review.

Comment 25 by derat@chromium.org, Dec 16 2016

Labels: -Hotlist-Merge-Review -Merge-Review-56 -merge-merged-2924
Removing the merge tags to avoid incorrect automated comments. :-)
Issue 672766 has been merged into this issue.

Comment 27 by tbroch@chromium.org, Jan 18 (5 days ago)

Status: WontFix (was: Assigned)
Assuming these are obsolete given their idle status.  Re-open if you disagree

Sign in to add a comment