Issue metadata
Sign in to add a comment
|
Chromebook Pixel lockups. |
||||||||||||||||||||||
Issue descriptionI have a Chromebook Pixel, first gen, and I periodically get lockups where everything freezes, the fan goes on, and awhile later (30 seconds, or a minute, some watchdog-sounding length) it reboots. On reboot, I can click "Restore" and get everything back fine, so it's not entirely a factor of how much is going on or what specific tabs are open. I don't know if crash reports are expected in all cases like this (for instance, it happened today, I see no crashes for today). But I do see crashes like: go/crash/2bcc4caf00000000 SIGABRT on main thread in pthread_cond_wait, which is unexpected. Thread 12 looks like 1000 frames of _L_lock_1109, which seems odd. But maybe it's a red herring. USUALLY this happens when I do something like open-in-tab on a dozen or so pages. It works along then hangs. It works fine with all those pages when I restore. Today it happened with a single page loading, when I hit Alt-Tab to switch to a different window, and I can't remember another time where it locked up with such low load. But today didn't generate a crash dump. If looking through crash dumps from this clientid, I basically never reboot or logout of this Chromebook. AFAICT, if I let it hang until the watchdog gets it, it comes back without needing to login again, so I think Chrome is restarted, not the OS. I have sometimes done long-power to power down when this happens. I don't know which case generates crash dumps. At any give time I might have 20 to 30 tabs open across multiple windows.
,
Dec 5 2016
go/crash/cd2457df00000000 browser process, main thread: 0x00007f2f810db17c (libpthread-2.19.so + 0x0000c17c ) pthread_cond_wait 0x00007f2f849b900f (chrome -waiter.cc:64 ) mojo::edk::Waiter::Wait 0x00007f2f825363b0 (chrome -core.cc:1169 ) mojo::edk::Core::Wait 0x00007f2f824cd19e (chrome -handle.h:191 ) mojo::SyncHandleRegistry::WatchAllHandles 0x00007f2f84554f96 (chrome -ipc_sync_channel.cc:620 ) IPC::SyncChannel::WaitForReply 0x00007f2f824f4f88 (chrome -ipc_sync_channel.cc:584 ) IPC::SyncChannel::Send 0x00007f2f828f6501 (chrome -command_buffer_proxy_impl.cc:701 ) gpu::CommandBufferProxyImpl::Send 0x00007f2f828f7397 (chrome -command_buffer_proxy_impl.cc:363 ) gpu::CommandBufferProxyImpl::WaitForGetOffsetInRange 0x00007f2f828f2577 (chrome -cmd_buffer_helper.cc:168 ) gpu::CommandBufferHelper::WaitForGetOffsetInRange 0x00007f2f828f2731 (chrome -cmd_buffer_helper.cc:222 ) gpu::CommandBufferHelper::Finish 0x00007f2f84963aa4 (chrome -gles2_implementation.cc:464 ) gpu::gles2::GLES2Implementation::WaitForCmd 0x00007f2f84966d40 (chrome -gles2_implementation.cc:369 ) gpu::gles2::GLES2Implementation::FreeEverything 0x00007f2f8495ffeb (chrome -gles2_implementation.cc:428 ) gpu::gles2::GLES2Implementation::SetAggressivelyFreeResources 0x00007f2f845f031e (chrome -context_cache_controller.cc:166 ) cc::ContextCacheController::OnIdle 0x00007f2f8247fe6c (chrome -callback.h:64 ) base::debug::TaskAnnotator::RunTask Maybe I'm reading that wrong, but sync on main thread? Well, I guess Chromeos uses same compositing engine for UI, so maybe it makes sense. Again, there's a thread (13 this time) which is endless: 0x00007f2f810ddd84 (libpthread-2.19.so + 0x0000ed84 ) __lll_lock_wait 0x00007f2f810d8fe5 (libpthread-2.19.so + 0x00009fe5 ) _L_lock_1109 0x00007f2f810d8fe5 (libpthread-2.19.so + 0x00009fe5 ) _L_lock_1109 0x00007f2f810d8fe5 (libpthread-2.19.so + 0x00009fe5 ) _L_lock_1109 0x00007f2f810d8fe5 (libpthread-2.19.so + 0x00009fe5 ) _L_lock_1109 <more than 1000 of these before it gives up> Possibly a red herring, but seems suspicious.
,
Dec 6 2016
+dahlke, who has also been seeing this with Pixel v1, running M54. I've seen similar-sounding hangs on Chromebook Flip, FWIW, but running dev-channel. Adding some tags & CC'ing some folks, since IIUC this kind of hand could be responsible for the increase in issue 594764, if we're seeing an increase in reboots, leading to crash reports in the user partition being inaccessible to the uploader when it next starts up.
,
Dec 6 2016
To clarify #3: I'd expect that crash reports following the user doing a hard-reset due to a hung device would lead to empty report uploads; not so if the user waits for the watchdog to restart Chrome.
,
Dec 6 2016
In light of #3, my report isn't about something new, this has been happening for a few months, I just decided I needed to close the loop. If I had to guess, I'd say that it wasn't really happening before June or July timeframe, though it could have been a month or two before that (a year ago, having my Pixel hang was a surprising once-a-quarter thing, now it's a couple times a week). New one at: go/crash/abd5d23f00000000 There is a thread in __libc_send under ChannelPosix::Write, which seems interesting. It's GpuProcessHost, and a gpu dump was also uploaded: go/crash/99fc9fa300000000 but the gpu dump just looks like "I'm waiting for work", so maybe a red herring. [I'm wondering if I should go home and force-crash that Pixel, just to see what a "normal" crashdump looks like.]
,
Dec 8 2016
Regarding #4, both cases wouldn't trigger b/31248243, because the following sequence of events must occur: 1. User is logged in. 2. Uploader starts running, and sleeps before each upload. 3. User logs out to login screen, or switches accounts. 4. Uploader wakes up and uploads an empty report. The first condition is not satisfied after a hard reset, so the uploader would only consider reports stored outside the cryptohome.
,
Dec 9 2016
,
Dec 9 2016
My chromebook crashed again yesterday when I was using the account switcher. Here is the Crash Server ID: 67b8c79880000000.
,
Dec 9 2016
Re #8: The account switcher crash is issue 672344 again; it's a crasher that's been around for ~2 years but will trigger only if you have error bubbles visible, so it may be that it is happening more often recently because of some other factor. :-/
,
Jan 11 2017
Trying to understand this better - the crash in #2 (go/crash/abd5d23f00000000) and #5 (go/crash/abd5d23f00000000) both look like browser hangs waiting for the GPU process to respond. What kind of timeout exists on the browser proc before we crash / generate a dump like this? The GPU process should be killed by the watchdog in ~15s if it hangs, so it seems like the browser should wait at least this long. If the browser doesn't wait long enough, it may be killed before the GPU proc has time to be killed/recreated. Adding some GPU folks who may have more thoughts
,
Jan 13 2017
I have also been experiencing this 1-2 times per day for the past 3 days. Feedback reports: http://feedback.corp.google.com/#/Report/51526867373 http://feedback.corp.google.com/#/Report/51574115968 Crash IDs: ce21bb3a80000000 194bc0b080000000 3970bb3a80000000
,
Jan 19 2017
Crash stack from comment 11 (https://crash.corp.google.com/browse?stbtiq=194bc0b080000000): 0x00007fde8706f17c (libpthread-2.19.so + 0x0000c17c ) pthread_cond_wait 0x00007fde8a97106f (chrome -waiter.cc:64 ) mojo::edk::Waiter::Wait 0x00007fde884d8d66 (chrome -core.cc:1169 ) mojo::edk::Core::Wait 0x00007fde884718be (chrome -handle.h:191 ) mojo::SyncHandleRegistry::WatchAllHandles 0x00007fde8849a7af (chrome -ipc_sync_channel.cc:620 ) IPC::SyncChannel::WaitForReply 0x00007fde8849ab54 (chrome -ipc_sync_channel.cc:584 ) IPC::SyncChannel::Send 0x00007fde888b3d11 (chrome -command_buffer_proxy_impl.cc:701 ) gpu::CommandBufferProxyImpl::Send 0x00007fde888b4eb7 (chrome -command_buffer_proxy_impl.cc:343 ) gpu::CommandBufferProxyImpl::WaitForTokenInRange 0x00007fde888b2289 (chrome -ring_buffer.cc:44 ) gpu::RingBuffer::FreeOldestBlock 0x00007fde888b28f7 (chrome -ring_buffer.cc:71 ) gpu::RingBuffer::Alloc 0x00007fde888b2bc2 (chrome -transfer_buffer.cc:221 ) gpu::ScopedTransferBufferPtr::Reset 0x00007fde8a9280eb (chrome -transfer_buffer.h:151 ) gpu::gles2::GLES2Implementation::TexSubImage2D 0x00007fde8a609768 (chrome -resource_provider.cc:893 ) cc::ResourceProvider::CopyToResource 0x00007fde8a5c5155 (chrome -layer_tree_host_impl.cc:3693 ) cc::LayerTreeHostImpl::CreateUIResource 0x00007fde8a5c7de6 (chrome -layer_tree_impl.cc:1579 ) cc::LayerTreeImpl::ProcessUIResourceRequestQueue 0x00007fde8a5bf9d6 (chrome -layer_tree_host_impl.cc:2005 ) cc::LayerTreeHostImpl::ActivateSyncTree 0x00007fde8865405d (chrome -scheduler.cc:649 ) cc::Scheduler::ProcessScheduledActions 0x00007fde8cb23d60 (chrome -scheduler.cc:158 ) cc::Scheduler::NotifyReadyToCommit 0x00007fde8cb1633f (chrome -single_thread_proxy.cc:675 ) cc::SingleThreadProxy::DoBeginMainFrame 0x00007fde8cb164e6 (chrome -single_thread_proxy.cc:648 ) cc::SingleThreadProxy::BeginMainFrame 0x00007fde8842a04c (chrome -callback.h:64 ) base::debug::TaskAnnotator::RunTask 0x00007fde884180fd (chrome -message_loop.cc:405 ) base::MessageLoop::DoWork 0x00007fde88418a72 (chrome -message_pump_libevent.cc:217 ) base::MessagePumpLibevent::Run 0x00007fde89c7c177 (chrome -run_loop.cc:35 ) base::RunLoop::Run 0x00007fde89967a04 (chrome -chrome_browser_main.cc:2116 ) ChromeBrowserMainParts::MainMessageLoopRun 0x00007fde8903c58a (chrome -browser_main_loop.cc:981 ) content::BrowserMainLoop::RunMainMessageLoopParts 0x00007fde8903e164 (chrome -browser_main_runner.cc:155 ) content::BrowserMainRunnerImpl::Run 0x00007fde89038e7b (chrome -browser_main.cc:46 ) content::BrowserMain 0x00007fde8990a170 (chrome -content_main_runner.cc:779 ) content::ContentMainRunnerImpl::Run 0x00007fde89908d0a (chrome -content_main.cc:20 ) content::ContentMain 0x00007fde88696015 (chrome -chrome_main.cc:97 ) ChromeMain 0x00007fde85cc4fb5 (libc-2.19.so -libc-start.c:292 ) __libc_start_main 0x00007fde88695e54 (chrome + 0x011f0e54 ) _start
,
Feb 1 2017
This looks similar to issue 683902, where the renderer appears to be waiting on the gpu.
,
Feb 25 2017
FWIW the symptoms would also fit with issue 690517 (browser process memory leak) - are you still seeing this problem, shess@?
,
Feb 28 2017
I think I'm still seeing it, but these days my Chromebook's utility is feeling pretty marginal. So maybe I'm seeing multiple things happening.
,
Mar 8 2017
CC'ing some mojo folks in case this is a mojo issue as well as additional folks familiar with command buffer code.
,
Mar 8 2017
I went and checked a couple of my crash-server uploads, and I think I'm still seeing this a lot. Maybe my Chromebook is feeling marginal because this is happening, not because things are just generally tending towards entropy. If you have something you think a knowledgeable engineer not specifically familiar with chromeos internals could do to help debug, feel free to ping me on it before I rage-quit this device. I am no longer convinced that this is anything like "Opening too many tabs overwhelms something", I think that it's just that that causes a lot of the kind of activity which causes the problems. I don't think I've had it happen while just idly reading, but I've definitely had it happen for modest actions which don't feel heavyweight from a user perspective. I also am noticing a lot of behavior where the system janks pretty hard (even cursor frozen) and then recovers, sometimes for a second or two, sometimes for more like 30s. Is there a clear way to determine if I'm seeing gpu-process restarts in such cases? I'd guess that kind of thing would probably be lost in a full browser crash/restart, but for the ones where it recovers I could probably verify what just happened.
,
Mar 8 2017
The chrome://gpu page shows "GPU process crash count".
,
Mar 8 2017
chrome://gpu should also show log messages at the bottom if the GPU process is crashing and restarting. There are two cases I can think of where the GPU process can hang but the watchdog won't trigger. One is if the X server isn't responding to messages, but I think we're not using X11 on the chromebook pixel anymore. Another possibility is that the command buffer is descheduled but nothing wakes it up. On Chrome OS I think the only deschedule is caused by WaitSyncToken, so I think there'd have to be a bug in that logic to cause it to be descheduled forever.
,
Mar 9 2017
OK, at the device in question, there are no extensions installed containing the words "offline" or "docs" or "drive", which I disabled long ago and deleted more recently (weeks or months ago). There is "Google Sheets", so I'm deleting that (where do these things even come from? Admin domain installing them?) and "Google Slides" (was disabled, now deleted). "GPU process crash count" is currently 0, but I don't know the state of the machine WRT user-visible issues since the last time something happened, so I'll watch that for next time.
,
Mar 9 2017
It might be a lockup in browser compositor. Tracing with cc.debug.scheduler category might help.
,
Mar 11 2017
I've been experiencing similar behavior several times a day for the last few days on both my Pixel at work and the one I have at home. In fact it happened while I was trying to look at this bug report and only had chrome://gpu and chrome://crashes open in other tabs. c2d9acf580000000 is the only recent crash ID I seem to have even though I've dealt with 15+ "cursor bogs down, screen freezes, goes black, restarts showing me a restore button" cycles.
,
Mar 17 2017
John, do you have any idea what could be going on here? Something like the GPU watchdog not kicking in, or not kicking in right?
,
Mar 22 2017
Looking at crash from comment 11, I think this is a /possible/ dup of crbug.com/661306 with the UI thread waiting on the IO thread that is busy, and the session manager killing it. Though in this case it's busy on waitpid, because it believes a child process (not sure we can know which one) is dead. Either the child is indeed dead, and waitpid takes forever - which would be surprising, unless the machine is under heavy load - or the child is not dead, only its socket is... e.g. the child's socket was closed because of a double-close issue somewhere (like there is evidence of in crbug.com/661306).
,
Mar 22 2017
It could be. Definitely no shutdown involved, here, just browsing around. I can experiment with finding log files this evening, so I can watch for "Too many open files" being logged.
,
Mar 24 2017
I just now had another crash while using Google Spreadsheets, where it did a long-hang a couple times then crashed with "Restore" infobar at startup. No crash dump in about:crashes or on the crash server than I can see. My tabs and windows were pretty stable in this case, whereas most past cases were in transitions where I was opening or closing tabs or moving between tabs. [Obviously could be unrelated. Still have no Google-docs type extensions, though I'm starting to consider which extensions I can disable.]
,
Aug 3 2017
,
Aug 29 2017
acolwell: Is this still happening for you?
,
Aug 29 2017
I'm not seeing crashes in IPC::SyncChannel::WaitForReply since M59. They were predominantly in M53. https://crash.corp.google.com/browse?q=product.name%3D%27Chrome_ChromeOS%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27IPC%3A%3ASyncChannel%3A%3AWaitForReply%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D&unnest= Is this possibly resolved, or has the signature just changed?
,
Dec 1 2017
WontFix for radio silence |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by ajha@chromium.org
, Nov 27 2016