New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 668761 link

Starred by 7 users

Issue metadata

Status: WontFix
Owner: ----
Closed: Dec 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug-Regression



Sign in to add a comment

Chromebook Pixel lockups.

Project Member Reported by sh...@chromium.org, Nov 26 2016

Issue description

I have a Chromebook Pixel, first gen, and I periodically get lockups where everything freezes, the fan goes on, and awhile later (30 seconds, or a minute, some watchdog-sounding length) it reboots.  On reboot, I can click "Restore" and get everything back fine, so it's not entirely a factor of how much is going on or what specific tabs are open.

I don't know if crash reports are expected in all cases like this (for instance, it happened today, I see no crashes for today).  But I do see crashes like:
   go/crash/2bcc4caf00000000
SIGABRT on main thread in pthread_cond_wait, which is unexpected.  Thread 12 looks like 1000 frames of _L_lock_1109, which seems odd.  But maybe it's a red herring.

USUALLY this happens when I do something like open-in-tab on a dozen or so pages.  It works along then hangs.  It works fine with all those pages when I restore.  Today it happened with a single page loading, when I hit Alt-Tab to switch to a different window, and I can't remember another time where it locked up with such low load.  But today didn't generate a crash dump.

If looking through crash dumps from this clientid, I basically never reboot or logout of this Chromebook.  AFAICT, if I let it hang until the watchdog gets it, it comes back without needing to login again, so I think Chrome is restarted, not the OS.  I have sometimes done long-power to power down when this happens.  I don't know which case generates crash dumps.  At any give time I might have 20 to 30 tabs open across multiple windows.
 

Comment 1 by ajha@chromium.org, Nov 27 2016

Labels: OS-Chrome

Comment 2 by sh...@chromium.org, Dec 5 2016

go/crash/cd2457df00000000

browser process, main thread:

0x00007f2f810db17c	(libpthread-2.19.so + 0x0000c17c )	pthread_cond_wait
0x00007f2f849b900f	(chrome -waiter.cc:64 )	mojo::edk::Waiter::Wait
0x00007f2f825363b0	(chrome -core.cc:1169 )	mojo::edk::Core::Wait
0x00007f2f824cd19e	(chrome -handle.h:191 )	mojo::SyncHandleRegistry::WatchAllHandles
0x00007f2f84554f96	(chrome -ipc_sync_channel.cc:620 )	IPC::SyncChannel::WaitForReply
0x00007f2f824f4f88	(chrome -ipc_sync_channel.cc:584 )	IPC::SyncChannel::Send
0x00007f2f828f6501	(chrome -command_buffer_proxy_impl.cc:701 )	gpu::CommandBufferProxyImpl::Send
0x00007f2f828f7397	(chrome -command_buffer_proxy_impl.cc:363 )	gpu::CommandBufferProxyImpl::WaitForGetOffsetInRange
0x00007f2f828f2577	(chrome -cmd_buffer_helper.cc:168 )	gpu::CommandBufferHelper::WaitForGetOffsetInRange
0x00007f2f828f2731	(chrome -cmd_buffer_helper.cc:222 )	gpu::CommandBufferHelper::Finish
0x00007f2f84963aa4	(chrome -gles2_implementation.cc:464 )	gpu::gles2::GLES2Implementation::WaitForCmd
0x00007f2f84966d40	(chrome -gles2_implementation.cc:369 )	gpu::gles2::GLES2Implementation::FreeEverything
0x00007f2f8495ffeb	(chrome -gles2_implementation.cc:428 )	gpu::gles2::GLES2Implementation::SetAggressivelyFreeResources
0x00007f2f845f031e	(chrome -context_cache_controller.cc:166 )	cc::ContextCacheController::OnIdle
0x00007f2f8247fe6c	(chrome -callback.h:64 )	base::debug::TaskAnnotator::RunTask

Maybe I'm reading that wrong, but sync on main thread?  Well, I guess Chromeos uses same compositing engine for UI, so maybe it makes sense.

Again, there's a thread (13 this time) which is endless:

0x00007f2f810ddd84	(libpthread-2.19.so + 0x0000ed84 )	__lll_lock_wait
0x00007f2f810d8fe5	(libpthread-2.19.so + 0x00009fe5 )	_L_lock_1109
0x00007f2f810d8fe5	(libpthread-2.19.so + 0x00009fe5 )	_L_lock_1109
0x00007f2f810d8fe5	(libpthread-2.19.so + 0x00009fe5 )	_L_lock_1109
0x00007f2f810d8fe5	(libpthread-2.19.so + 0x00009fe5 )	_L_lock_1109
<more than 1000 of these before it gives up>

Possibly a red herring, but seems suspicious.

Comment 3 by w...@chromium.org, Dec 6 2016

Cc: dah...@chromium.org domlasko...@chromium.org gkihumba@chromium.org w...@chromium.org
Components: Internals>Mojo Internals>GPU
Labels: -Type-Bug -Pri-3 Stability-Crash M-54 M-55 Pri-1 Type-Bug-Regression
+dahlke, who has also been seeing this with Pixel v1, running M54.

I've seen similar-sounding hangs on Chromebook Flip, FWIW, but running dev-channel.

Adding some tags & CC'ing some folks, since IIUC this kind of hand could be responsible for the increase in issue 594764, if we're seeing an increase in reboots, leading to crash reports in the user partition being inaccessible to the uploader when it next starts up.

Comment 4 by w...@chromium.org, Dec 6 2016

To clarify #3: I'd expect that crash reports following the user doing a hard-reset due to a hung device would lead to empty report uploads; not so if the user waits for the watchdog to restart Chrome.

Comment 5 by sh...@chromium.org, Dec 6 2016

In light of #3, my report isn't about something new, this has been happening for a few months, I just decided I needed to close the loop.  If I had to guess, I'd say that it wasn't really happening before June or July timeframe, though it could have been a month or two before that (a year ago, having my Pixel hang was a surprising once-a-quarter thing, now it's a couple times a week).

New one at:
   go/crash/abd5d23f00000000
There is a thread in __libc_send under ChannelPosix::Write, which seems interesting.  It's GpuProcessHost, and a gpu dump was also uploaded:
   go/crash/99fc9fa300000000
but the gpu dump just looks like "I'm waiting for work", so maybe a red herring.

[I'm wondering if I should go home and force-crash that Pixel, just to see what a "normal" crashdump looks like.]
Regarding #4, both cases wouldn't trigger b/31248243, because the following sequence of events must occur:

1. User is logged in.
2. Uploader starts running, and sleeps before each upload.
3. User logs out to login screen, or switches accounts.
4. Uploader wakes up and uploads an empty report.

The first condition is not satisfied after a hard reset, so the uploader would only consider reports stored outside the cryptohome.

Comment 7 by ajuma@chromium.org, Dec 9 2016

Components: Internals>GPU>Internals
My chromebook crashed again yesterday when I was using the account switcher. Here is the Crash Server ID: 67b8c79880000000.

Comment 9 by w...@chromium.org, Dec 9 2016

Re #8: The account switcher crash is  issue 672344  again; it's a crasher
that's been around for ~2 years but will trigger only if you have error
bubbles visible, so it may be that it is happening more often recently
because of some other factor. :-/
Cc: zmo@chromium.org vmi...@chromium.org
Trying to understand this better - the crash in #2 (go/crash/abd5d23f00000000) and #5 (go/crash/abd5d23f00000000) both look like browser hangs waiting for the GPU process to respond.

What kind of timeout exists on the browser proc before we crash / generate a dump like this? The GPU process should be killed by the watchdog in ~15s if it hangs, so it seems like the browser should wait at least this long.

If the browser doesn't wait long enough, it may be killed before the GPU proc has time to be killed/recreated.

Adding some GPU folks who may have more thoughts
I have also been experiencing this 1-2 times per day for the past 3 days.

Feedback reports:
http://feedback.corp.google.com/#/Report/51526867373
http://feedback.corp.google.com/#/Report/51574115968

Crash IDs:
ce21bb3a80000000
194bc0b080000000
3970bb3a80000000

Comment 12 by ajuma@chromium.org, Jan 19 2017

Crash stack from comment 11 (https://crash.corp.google.com/browse?stbtiq=194bc0b080000000):
0x00007fde8706f17c	(libpthread-2.19.so + 0x0000c17c )	pthread_cond_wait
0x00007fde8a97106f	(chrome -waiter.cc:64 )	mojo::edk::Waiter::Wait
0x00007fde884d8d66	(chrome -core.cc:1169 )	mojo::edk::Core::Wait
0x00007fde884718be	(chrome -handle.h:191 )	mojo::SyncHandleRegistry::WatchAllHandles
0x00007fde8849a7af	(chrome -ipc_sync_channel.cc:620 )	IPC::SyncChannel::WaitForReply
0x00007fde8849ab54	(chrome -ipc_sync_channel.cc:584 )	IPC::SyncChannel::Send
0x00007fde888b3d11	(chrome -command_buffer_proxy_impl.cc:701 )	gpu::CommandBufferProxyImpl::Send
0x00007fde888b4eb7	(chrome -command_buffer_proxy_impl.cc:343 )	gpu::CommandBufferProxyImpl::WaitForTokenInRange
0x00007fde888b2289	(chrome -ring_buffer.cc:44 )	gpu::RingBuffer::FreeOldestBlock
0x00007fde888b28f7	(chrome -ring_buffer.cc:71 )	gpu::RingBuffer::Alloc
0x00007fde888b2bc2	(chrome -transfer_buffer.cc:221 )	gpu::ScopedTransferBufferPtr::Reset
0x00007fde8a9280eb	(chrome -transfer_buffer.h:151 )	gpu::gles2::GLES2Implementation::TexSubImage2D
0x00007fde8a609768	(chrome -resource_provider.cc:893 )	cc::ResourceProvider::CopyToResource
0x00007fde8a5c5155	(chrome -layer_tree_host_impl.cc:3693 )	cc::LayerTreeHostImpl::CreateUIResource
0x00007fde8a5c7de6	(chrome -layer_tree_impl.cc:1579 )	cc::LayerTreeImpl::ProcessUIResourceRequestQueue
0x00007fde8a5bf9d6	(chrome -layer_tree_host_impl.cc:2005 )	cc::LayerTreeHostImpl::ActivateSyncTree
0x00007fde8865405d	(chrome -scheduler.cc:649 )	cc::Scheduler::ProcessScheduledActions
0x00007fde8cb23d60	(chrome -scheduler.cc:158 )	cc::Scheduler::NotifyReadyToCommit
0x00007fde8cb1633f	(chrome -single_thread_proxy.cc:675 )	cc::SingleThreadProxy::DoBeginMainFrame
0x00007fde8cb164e6	(chrome -single_thread_proxy.cc:648 )	cc::SingleThreadProxy::BeginMainFrame
0x00007fde8842a04c	(chrome -callback.h:64 )	base::debug::TaskAnnotator::RunTask
0x00007fde884180fd	(chrome -message_loop.cc:405 )	base::MessageLoop::DoWork
0x00007fde88418a72	(chrome -message_pump_libevent.cc:217 )	base::MessagePumpLibevent::Run
0x00007fde89c7c177	(chrome -run_loop.cc:35 )	base::RunLoop::Run
0x00007fde89967a04	(chrome -chrome_browser_main.cc:2116 )	ChromeBrowserMainParts::MainMessageLoopRun
0x00007fde8903c58a	(chrome -browser_main_loop.cc:981 )	content::BrowserMainLoop::RunMainMessageLoopParts
0x00007fde8903e164	(chrome -browser_main_runner.cc:155 )	content::BrowserMainRunnerImpl::Run
0x00007fde89038e7b	(chrome -browser_main.cc:46 )	content::BrowserMain
0x00007fde8990a170	(chrome -content_main_runner.cc:779 )	content::ContentMainRunnerImpl::Run
0x00007fde89908d0a	(chrome -content_main.cc:20 )	content::ContentMain
0x00007fde88696015	(chrome -chrome_main.cc:97 )	ChromeMain
0x00007fde85cc4fb5	(libc-2.19.so -libc-start.c:292 )	__libc_start_main
0x00007fde88695e54	(chrome + 0x011f0e54 )	_start

Comment 13 by enne@chromium.org, Feb 1 2017

This looks similar to issue 683902, where the renderer appears to be waiting on the gpu.

Comment 14 by w...@chromium.org, Feb 25 2017

FWIW the symptoms would also fit with issue 690517 (browser process memory leak) - are you still seeing this problem, shess@?

Comment 15 by sh...@chromium.org, Feb 28 2017

I think I'm still seeing it, but these days my Chromebook's utility is feeling pretty marginal.  So maybe I'm seeing multiple things happening.
Cc: roc...@chromium.org piman@chromium.org yzshen@chromium.org jbau...@chromium.org sunn...@chromium.org
CC'ing some mojo folks in case this is a mojo issue as well as additional folks familiar with command buffer code.
I went and checked a couple of my crash-server uploads, and I think I'm still seeing this a lot.  Maybe my Chromebook is feeling marginal because this is happening, not because things are just generally tending towards entropy.  If you have something you think a knowledgeable engineer not specifically familiar with chromeos internals could do to help debug, feel free to ping me on it before I rage-quit this device.

I am no longer convinced that this is anything like "Opening too many tabs overwhelms something", I think that it's just that that causes a lot of the kind of activity which causes the problems.  I don't think I've had it happen while just idly reading, but I've definitely had it happen for modest actions which don't feel heavyweight from a user perspective.

I also am noticing a lot of behavior where the system janks pretty hard (even cursor frozen) and then recovers, sometimes for a second or two, sometimes for more like 30s.  Is there a clear way to determine if I'm seeing gpu-process restarts in such cases?  I'd guess that kind of thing would probably be lost in a full browser crash/restart, but for the ones where it recovers I could probably verify what just happened.
The chrome://gpu page shows "GPU process crash count".
chrome://gpu should also show log messages at the bottom if the GPU process is crashing and restarting.

There are two cases I can think of where the GPU process can hang but the watchdog won't trigger. One is if the X server isn't responding to messages, but I think we're not using X11 on the chromebook pixel anymore.

Another possibility is that the command buffer is descheduled but nothing wakes it up. On Chrome OS I think the only deschedule is caused by WaitSyncToken, so I think there'd have to be a bug in that logic to cause it to be descheduled forever.

OK, at the device in question, there are no extensions installed containing the words "offline" or "docs" or "drive", which I disabled long ago and deleted more recently (weeks or months ago).  There is "Google Sheets", so I'm deleting that (where do these things even come from?  Admin domain installing them?) and "Google Slides" (was disabled, now deleted).

"GPU process crash count" is currently 0, but I don't know the state of the machine WRT user-visible issues since the last time something happened, so I'll watch that for next time.
It might be a lockup in browser compositor. Tracing with cc.debug.scheduler
category might help.
Cc: acolwell@chromium.org
I've been experiencing similar behavior several times a day for the last few days on both my Pixel at work and the one I have at home. In fact it happened while I was trying to look at this bug report and only had chrome://gpu and chrome://crashes open in other tabs.

c2d9acf580000000 is the only recent crash ID I seem to have even though I've dealt with 15+ "cursor bogs down, screen freezes, goes black, restarts showing me a restore button" cycles. 

Owner: jbau...@chromium.org
Status: Assigned (was: Untriaged)
John, do you have any idea what could be going on here? Something like the GPU watchdog  not kicking in, or not kicking in right?

Comment 24 by piman@chromium.org, Mar 22 2017

Looking at crash from comment 11, I think this is a /possible/ dup of crbug.com/661306 with the UI thread waiting on the IO thread that is busy, and the session manager killing it. Though in this case it's busy on waitpid, because it believes a child process (not sure we can know which one) is dead. Either the child is indeed dead, and waitpid takes forever - which would be surprising, unless the machine is under heavy load - or the child is not dead, only its socket is... e.g. the child's socket was closed because of a double-close issue somewhere (like there is evidence of in crbug.com/661306).

Comment 25 by sh...@chromium.org, Mar 22 2017

It could be.  Definitely no shutdown involved, here, just browsing around.  I can experiment with finding log files this evening, so I can watch for "Too many open files" being logged.

Comment 26 by sh...@chromium.org, Mar 24 2017

I just now had another crash while using Google Spreadsheets, where it did a long-hang a couple times then crashed with "Restore" infobar at startup.  No crash dump in about:crashes or on the crash server than I can see.  My tabs and windows were pretty stable in this case, whereas most past cases were in transitions where I was opening or closing tabs or moving between tabs.  [Obviously could be unrelated.  Still have no Google-docs type extensions, though I'm starting to consider which extensions I can disable.]
Owner: ----
Status: Available (was: Assigned)

Comment 28 by enne@chromium.org, Aug 29 2017

acolwell: Is this still happening for you?
Status: WontFix (was: Available)
WontFix for radio silence

Sign in to add a comment