Tracing stalls page. Tracing.tracingCompleted event never fired
Reported by
da...@davidjfox.com,
Apr 27 2016
|
||||||
Issue description
UserAgent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2716.0 Safari/537.36
Steps to reproduce the problem:
Use the extension I have created to speed up this process:
1. Install the extension (preferably in a new browser user, as it will delete all your browser history etc...)
2. Open up the background page for the extension
3. Run "start('https://dribbble.com')" in the console
4. You should see a new tab open up, start loading, and eventually see the console messages (and page) freeze up without ever displaying a Tracing.tracingComplete message
== If you'd like to use DevTools instead: ==
1. Turn on mobile emulation (I use iPhone5)
2. Use an extension to delete all history (so as to simulate a first time viewer load test)
3. Set the network tab to Disable cache and throttle to "Good 2G" speed
4. Set the Timeline to capture "Paint" events and nothing else
5. Start loading "https://dribbble.com" and hit the record button at the same time
6. When the page has loaded, or a sufficient amount of time has passed (e.g., 15 seconds), stop he trace
7. Observe how we are stuck at "Loading timeline..."
What is the expected behavior?
The Tracing.dataCollected events fire as normal, but after a certain amount of them we stall and never receive the Tracing.tracingComplete event. This is needed for the Timeline to finish generating the frames
What went wrong?
The issue seems to only occur when "disabled-by-default-devtools.timeline.picture" is enabled. Not sure why though... as the buffer usage at most hits ~50%.
Did this work before? Yes Version 49.0.2623.112 m - STABLE
Chrome version: 52.0.2716.0 Channel: canary
OS Version: 6.1 (Windows 7, Windows Server 2008 R2)
Flash Version: Shockwave Flash 21.0 r0
,
Apr 27 2016
It isn't running out of memory, thought. If you track the buffer usage it never reaches 50%
,
Apr 27 2016
You can even track the memory usage of chrome via the task manager. 500MB is not even taken up. I can't believe this is an OOM error.
,
Apr 27 2016
> It isn't running out of memory, thought. If you track the buffer usage it never reaches 50% 50% is a guesstimate, it does not mean anything. In the pain mode, it is pretty much certain that you run out of memory serializing such buffer. > You can even track the memory usage of chrome via the task manager. 500MB is not even taken up. I can't believe this is an OOM error. This one is a better signal though. Could you open DevTools on DevTools and see if it failed while loading (look for errors in console).?
,
Apr 27 2016
1. DevTools for the tab is not stalling but the timeline is never able to complete, it just sits there waiting for the Tracing.tracingCompleted event (http://puu.sh/oy74z/fd19801b75.png). You can still navigate through all the panels in DevTools, though. 2. DevTools for DevTools shows no messages at all 3. Here is the memory usage of Chrome and all the tabs: http://puu.sh/oy7dL/00a6bc2021.png
,
Apr 28 2016
Was analyzing some sites at work today and found the issue is much more widespread than I thought. Looks like all you need to do is analyze a page load with "disable cache" enabled in the network tab, and Tracing.tracingComplete never gets sent. These traces only last 1 second as well (no network throttling used), so I think we can say for sure this is not a OOM issue.
,
Apr 28 2016
@caseq: seems like paint kills tracing...
,
Apr 28 2016
Doesn't seem likely... but this could be related to the other bug which showed up at the same time: Skia creating blank duplicate skps of forced layers: https://crbug.com/607720
,
Jun 28 2016
Pretty hard to repro, but I was able to lock it a couple of times in release mode. After a few tries, I hit the following assert with a debug build: [103577:103578:0628/180219:FATAL:ipc_sync_message_filter.cc(50)] Check failed: base::ThreadTaskRunnerHandle::Get() != io_task_runner_. #0 0x7fb737f41a7e base::debug::StackTrace::StackTrace() #1 0x7fb737fa5faf logging::LogMessage::~LogMessage() #2 0x7fb72f9a65c1 IPC::SyncMessageFilter::Send() #3 0x7fb730bd0460 content::ThreadSafeSender::Send() #4 0x7fb730abd9cb content::ChildDiscardableSharedMemoryManager::AllocateLockedDiscardableSharedMemory() #5 0x7fb730abd371 content::ChildDiscardableSharedMemoryManager::AllocateLockedDiscardableMemory() #6 0x7fb735e34027 SkDiscardableMemory::Create() #7 0x7fb736048a30 SkResourceCacheDiscardableAllocator::allocPixelRef() #8 0x7fb735e73776 SkBitmap::tryAllocPixels() #9 0x7fb735f29797 SkImageGenerator::tryGenerateBitmap() #10 0x7fb735f285ad SkImageGenerator::tryGenerateBitmap() #11 0x7fb735f27481 SkImageCacherator::generateBitmap() #12 0x7fb735f2798e SkImageCacherator::tryLockAsBitmap() #13 0x7fb735f27a70 SkImageCacherator::lockAsBitmap() #14 0x7fb7360a3e42 SkImage_Generator::getROPixels() #15 0x7fb7360a26b8 SkImage::encode() #16 0x7fb7360923c6 SkBinaryWriteBuffer::writeImage() #17 0x7fb736003cca SkPictureData::flattenToBuffer() #18 0x7fb736003efa SkPictureData::serialize() #19 0x7fb736000403 SkPicture::serialize() #20 0x7fb736003f5f SkPictureData::serialize() #21 0x7fb736000403 SkPicture::serialize() #22 0x7fb736000339 SkPicture::serialize() #23 0x7fb72f448119 cc::PictureDebugUtil::SerializeAsBase64() #24 0x7fb72f563906 cc::DisplayItemList::AsValue() #25 0x7fb72f45227d cc::TracedDisplayItemList::AppendAsTraceFormat() #26 0x7fb73816e83f base::trace_event::TraceEvent::AppendAsJSON() #27 0x7fb73817ce54 base::trace_event::TraceLog::ConvertTraceEventsToTraceFormat() #28 0x7fb73817cc49 base::trace_event::TraceLog::FinishFlush() #29 0x7fb7381900e6 _ZNK4base8internal15RunnableAdapterIMNS_11trace_event8TraceLogEFvibEE3RunIPS3_JRKiRKbEEEvOT_DpOT0_ #30 0x7fb73818ffcc _ZN4base8internal12InvokeHelperILb0EvE8MakeItSoIRKNS0_15RunnableAdapterIMNS_11trace_event8TraceLogEFvibEEEJPS6_RKiRKbEEEvOT_DpOT0_ #31 0x7fb73818ff64 _ZN4base8internal7InvokerINS0_9BindStateINS0_15RunnableAdapterIMNS_11trace_event8TraceLogEFvibEEEJNS0_17UnretainedWrapperIS5_EERiRbEEEFvvEE7RunImplIRKS8_RKSt5tupleIJSA_ibEEJLm0ELm1ELm2EEEEvOT_OT0_NS_13IndexSequenceIJXspT1_EEEE #32 0x7fb73818fd8c _ZN4base8internal7InvokerINS0_9BindStateINS0_15RunnableAdapterIMNS_11trace_event8TraceLogEFvibEEEJNS0_17UnretainedWrapperIS5_EERiRbEEEFvvEE3RunEPNS0_13BindStateBaseE #33 0x7fb737f21c6e base::Callback<>::Run() #34 0x7fb737f475ee base::debug::TaskAnnotator::RunTask() #35 0x7fb737fc3cfc base::MessageLoop::RunTask() #36 0x7fb737fc3f98 base::MessageLoop::DeferOrRunPendingTask() #37 0x7fb737fc41f2 base::MessageLoop::DoWork() #38 0x7fb737fde4de base::MessagePumpLibevent::Run() #39 0x7fb737fc36ef base::MessageLoop::RunHandler() #40 0x7fb73806c234 base::RunLoop::Run() #41 0x7fb737fc25e4 base::MessageLoop::Run() #42 0x7fb738105449 base::Thread::Run() #43 0x7fb73810577d base::Thread::ThreadMain() #44 0x7fb7380f083a base::(anonymous namespace)::ThreadFunc() #45 0x7fb7257c3182 start_thread #46 0x7fb723b6a47d clone Based on comments, it sounds like this could explain the deadlock: https://cs.chromium.org/chromium/src/ipc/ipc_sync_message_filter.cc?rcl=0&l=46 @scroggo did we change anything related to lazy-decoded image serialization? I see we're trying to re-encode rather than reuse the encoded data, and it seems unsafe to allocate DM on this thread.
,
Jun 28 2016
I am able to get a similar stack trace (consistently) by following the repro steps in issue 623475 . > @scroggo did we change anything related to lazy-decoded image serialization? > I see we're trying to re-encode rather than reuse the encoded data, and it > seems unsafe to allocate DM on this thread. A couple of things changed in https://codereview.chromium.org/1812273003: 1) The code that retrieves the encoded data has moved from the ImageFrameGenerator to the DecodingImageGenerator. It follows a similar pattern: - if all the data has been received, return it - otherwise, return null 2) In the old code, the ImageFrameGenerator locked a mutex to check for more data. In the new code, the DecodingImageGenerator no longer checks for more data (it only has as much data as it was created with). With or without the above CL, NULL will be returned instead of the encoded data if the full file has not yet been received, so we are be forced into the re-encode path. But I can believe that (2) makes it more likely. If a DecodingImageGenerator is created with only part of the file received, then the rest of the file is received, and then we try to serialize, the old code would pick up all of the data and return it, whereas the new code would return NULL, since it did not have all the data. (FWIW, the new code is more correct - it eliminates a way to make a DecodingImageGenerator draw something different on two different attempts to draw.)
,
Jun 28 2016
One way to fix this would be to return the encoded data even if !m_allDataReceived. For some callers - in particular serializers, this is better, functionally. (It saves the work of reencoding, and it stores the original image, albeit incomplete.) The reason we do not is because creating an SkData from the encoded data can be expensive. (The data may not be contiguous so it needs to be copied. This can add up e.g. when it happens each time we need to draw an image due to receiving more data. See issue 568016 and issue 476531 .) FWIW, we've removed a caller of refEncodedData (in https://codereview.chromium.org/1970773002), although we might like to bring it back in the future. (There is another caller in SkImage_Gpu [1], although I don't think that would ever call into DecodingImageGenerator.) The remaining callers appear to all be for serialization. So I think it would be safe to remove the check and always return the data - for now. We would need to address this when we enable SK_SUPPORT_COMPRESSED_TEXTURES_IN_CACHERATOR and when anyone else tries to call the method. (FWIW, I've tested this and it fixes issue 623475 .) A better solution might be to create another interface for serialization, which does not care whether the data is contiguous. [1] https://cs.chromium.org/chromium/src/third_party/skia/src/image/SkImage_Gpu.cpp?cl=GROK&gsn=refEncoded&rcl=1467112798&l=433
,
Jun 28 2016
Issue 623475 has been merged into this issue.
,
Jun 28 2016
,
Jun 29 2016
,
Jun 29 2016
From https://codereview.chromium.org/2108163003/#msg5: Should DevTools do this serialization in a background thread anyway? That will prevent similar issues from popping up if other serialization tasks require allocating discardable memory. Further, it looks like serialization is being done on the IO thread, which I could imagine might make the page unresponsive?
,
Jun 29 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/7ba0aa90179d42ac4806386a88d50ed3778ae30b commit 7ba0aa90179d42ac4806386a88d50ed3778ae30b Author: scroggo <scroggo@google.com> Date: Wed Jun 29 22:36:08 2016 Fix dead-lock issue attempting to decode During serialization, if DecodingImageGenerator::onRefEncodedData() returns nullptr, this may require the serialization code to decode so it can have something to then encode and serialize. This can cause a deadlock if the DecodingImageGenerator is using discardable memory, which cannot be used from certain threads (where serialization may be occurring). Currently, onRefEncodedData() returns nullptr if !m_allDataReceived. This was to fix crbug.com/568016 , but that has been further fixed by crrev.com/1970773002, which avoids calling onRefEncodedData() to pass it to the GPU, since even if m_allDataReceived is true, we do not yet support any GPU texture formats. This change returns the encoded data even if the data has not all been received, except if the GrContext argument is not NULL for the future when the GPU starts calling it again. This will result in serializing the existing data and avoid the deadlock. BUG= 607227 , 623475 Review-Url: https://codereview.chromium.org/2108163003 Cr-Commit-Position: refs/heads/master@{#402963} [modify] https://crrev.com/7ba0aa90179d42ac4806386a88d50ed3778ae30b/third_party/WebKit/Source/platform/graphics/DecodingImageGenerator.cpp
,
Jun 30 2016
,
Jun 30 2016
Just tested it out and everything seems to work. Thanks guys :) |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by pfeldman@chromium.org
, Apr 27 2016