New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 607227 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug



Sign in to add a comment

Tracing stalls page. Tracing.tracingCompleted event never fired

Reported by da...@davidjfox.com, Apr 27 2016

Issue description

UserAgent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2716.0 Safari/537.36

Steps to reproduce the problem:
Use the extension I have created to speed up this process:
1. Install the extension (preferably in a new browser user, as it will delete all your browser history etc...)
2. Open up the background page for the extension
3. Run "start('https://dribbble.com')" in the console
4. You should see a new tab open up, start loading, and eventually see the console messages (and page) freeze up without ever displaying a Tracing.tracingComplete message

== If you'd like to use DevTools instead: ==
1. Turn on mobile emulation (I use iPhone5)
2. Use an extension to delete all history (so as to simulate a first time viewer load test)
3. Set the network tab to Disable cache and throttle to "Good 2G" speed
4. Set the Timeline to capture "Paint" events and nothing else
5. Start loading "https://dribbble.com" and hit the record button at the same time

6. When the page has loaded, or a sufficient amount of time has passed (e.g., 15 seconds), stop he trace
7. Observe how we are stuck at "Loading timeline..."

What is the expected behavior?
The Tracing.dataCollected events fire as normal, but after a certain amount of them we stall and never receive the Tracing.tracingComplete event. This is needed for the Timeline to finish generating the frames

What went wrong?
The issue seems to only occur when "disabled-by-default-devtools.timeline.picture" is enabled. Not sure why though... as the buffer usage at most hits ~50%.

Did this work before? Yes Version 49.0.2623.112 m - STABLE

Chrome version: 52.0.2716.0  Channel: canary
OS Version: 6.1 (Windows 7, Windows Server 2008 R2)
Flash Version: Shockwave Flash 21.0 r0
 
Status: WontFix (was: Unconfirmed)
Timeline is not intended to be recorded for more than 1-2 seconds while Paint category or more than 5-15 seconds without it. We do a best effort there, but if we run out of memory, we stall or crash.

Comment 2 by da...@davidjfox.com, Apr 27 2016

It isn't running out of memory, thought. If you track the buffer usage it never reaches 50%

Comment 3 by da...@davidjfox.com, Apr 27 2016

You can even track the memory usage of chrome via the task manager. 500MB is not even taken up. I can't believe this is an OOM error.
> It isn't running out of memory, thought. If you track the buffer usage it never reaches 50%

50% is a guesstimate, it does not mean anything. In the pain mode, it is pretty much certain that you run out of memory serializing such buffer.

> You can even track the memory usage of chrome via the task manager. 500MB is not even taken up. I can't believe this is an OOM error.

This one is a better signal though. Could you open DevTools on DevTools and see if it failed while loading (look for errors in console).?

Comment 5 by da...@davidjfox.com, Apr 27 2016

1. DevTools for the tab is not stalling but the timeline is never able to complete, it just sits there waiting for the Tracing.tracingCompleted event (http://puu.sh/oy74z/fd19801b75.png). You can still navigate through all the panels in DevTools, though.

2. DevTools for DevTools shows no messages at all

3. Here is the memory usage of Chrome and all the tabs: http://puu.sh/oy7dL/00a6bc2021.png

Comment 6 by da...@davidjfox.com, Apr 28 2016

Was analyzing some sites at work today and found the issue is much more widespread than I thought. Looks like all you need to do is analyze a page load with "disable cache" enabled in the network tab, and Tracing.tracingComplete never gets sent.

These traces only last 1 second as well (no network throttling used), so I think we can say for sure this is not a OOM issue.
Owner: caseq@chromium.org
Status: Assigned (was: WontFix)
@caseq: seems like paint kills tracing...

Comment 8 by da...@davidjfox.com, Apr 28 2016

Doesn't seem likely... but this could be related to the other bug which showed up at the same time: Skia creating blank duplicate skps of forced layers:  https://crbug.com/607720 
Cc: scroggo@chromium.org fmalita@chromium.org reed@google.com
Pretty hard to repro, but I was able to lock it a couple of times in release mode.  After a few tries, I hit the following assert with a debug build:

[103577:103578:0628/180219:FATAL:ipc_sync_message_filter.cc(50)] Check failed: base::ThreadTaskRunnerHandle::Get() != io_task_runner_. 
#0 0x7fb737f41a7e base::debug::StackTrace::StackTrace()
#1 0x7fb737fa5faf logging::LogMessage::~LogMessage()
#2 0x7fb72f9a65c1 IPC::SyncMessageFilter::Send()
#3 0x7fb730bd0460 content::ThreadSafeSender::Send()
#4 0x7fb730abd9cb content::ChildDiscardableSharedMemoryManager::AllocateLockedDiscardableSharedMemory()
#5 0x7fb730abd371 content::ChildDiscardableSharedMemoryManager::AllocateLockedDiscardableMemory()
#6 0x7fb735e34027 SkDiscardableMemory::Create()
#7 0x7fb736048a30 SkResourceCacheDiscardableAllocator::allocPixelRef()
#8 0x7fb735e73776 SkBitmap::tryAllocPixels()
#9 0x7fb735f29797 SkImageGenerator::tryGenerateBitmap()
#10 0x7fb735f285ad SkImageGenerator::tryGenerateBitmap()
#11 0x7fb735f27481 SkImageCacherator::generateBitmap()
#12 0x7fb735f2798e SkImageCacherator::tryLockAsBitmap()
#13 0x7fb735f27a70 SkImageCacherator::lockAsBitmap()
#14 0x7fb7360a3e42 SkImage_Generator::getROPixels()
#15 0x7fb7360a26b8 SkImage::encode()
#16 0x7fb7360923c6 SkBinaryWriteBuffer::writeImage()
#17 0x7fb736003cca SkPictureData::flattenToBuffer()
#18 0x7fb736003efa SkPictureData::serialize()
#19 0x7fb736000403 SkPicture::serialize()
#20 0x7fb736003f5f SkPictureData::serialize()
#21 0x7fb736000403 SkPicture::serialize()
#22 0x7fb736000339 SkPicture::serialize()
#23 0x7fb72f448119 cc::PictureDebugUtil::SerializeAsBase64()
#24 0x7fb72f563906 cc::DisplayItemList::AsValue()
#25 0x7fb72f45227d cc::TracedDisplayItemList::AppendAsTraceFormat()
#26 0x7fb73816e83f base::trace_event::TraceEvent::AppendAsJSON()
#27 0x7fb73817ce54 base::trace_event::TraceLog::ConvertTraceEventsToTraceFormat()
#28 0x7fb73817cc49 base::trace_event::TraceLog::FinishFlush()
#29 0x7fb7381900e6 _ZNK4base8internal15RunnableAdapterIMNS_11trace_event8TraceLogEFvibEE3RunIPS3_JRKiRKbEEEvOT_DpOT0_
#30 0x7fb73818ffcc _ZN4base8internal12InvokeHelperILb0EvE8MakeItSoIRKNS0_15RunnableAdapterIMNS_11trace_event8TraceLogEFvibEEEJPS6_RKiRKbEEEvOT_DpOT0_
#31 0x7fb73818ff64 _ZN4base8internal7InvokerINS0_9BindStateINS0_15RunnableAdapterIMNS_11trace_event8TraceLogEFvibEEEJNS0_17UnretainedWrapperIS5_EERiRbEEEFvvEE7RunImplIRKS8_RKSt5tupleIJSA_ibEEJLm0ELm1ELm2EEEEvOT_OT0_NS_13IndexSequenceIJXspT1_EEEE
#32 0x7fb73818fd8c _ZN4base8internal7InvokerINS0_9BindStateINS0_15RunnableAdapterIMNS_11trace_event8TraceLogEFvibEEEJNS0_17UnretainedWrapperIS5_EERiRbEEEFvvEE3RunEPNS0_13BindStateBaseE
#33 0x7fb737f21c6e base::Callback<>::Run()
#34 0x7fb737f475ee base::debug::TaskAnnotator::RunTask()
#35 0x7fb737fc3cfc base::MessageLoop::RunTask()
#36 0x7fb737fc3f98 base::MessageLoop::DeferOrRunPendingTask()
#37 0x7fb737fc41f2 base::MessageLoop::DoWork()
#38 0x7fb737fde4de base::MessagePumpLibevent::Run()
#39 0x7fb737fc36ef base::MessageLoop::RunHandler()
#40 0x7fb73806c234 base::RunLoop::Run()
#41 0x7fb737fc25e4 base::MessageLoop::Run()
#42 0x7fb738105449 base::Thread::Run()
#43 0x7fb73810577d base::Thread::ThreadMain()
#44 0x7fb7380f083a base::(anonymous namespace)::ThreadFunc()
#45 0x7fb7257c3182 start_thread
#46 0x7fb723b6a47d clone

Based on comments, it sounds like this could explain the deadlock: https://cs.chromium.org/chromium/src/ipc/ipc_sync_message_filter.cc?rcl=0&l=46

@scroggo did we change anything related to lazy-decoded image serialization?  I see we're trying to re-encode rather than reuse the encoded data, and it seems unsafe to allocate DM on this thread.

I am able to get a similar stack trace (consistently) by following the repro steps in  issue 623475 .

> @scroggo did we change anything related to lazy-decoded image serialization? 
> I see we're trying to re-encode rather than reuse the encoded data, and it
> seems unsafe to allocate DM on this thread.

A couple of things changed in https://codereview.chromium.org/1812273003:
1) The code that retrieves the encoded data has moved from the ImageFrameGenerator to the DecodingImageGenerator. It follows a similar pattern:
- if all the data has been received, return it
- otherwise, return null
2) In the old code, the ImageFrameGenerator locked a mutex to check for more data. In the new code, the DecodingImageGenerator no longer checks for more data (it only has as much data as it was created with).

With or without the above CL, NULL will be returned instead of the encoded data if the full file has not yet been received, so we are be forced into the re-encode path.

But I can believe that (2) makes it more likely. If a DecodingImageGenerator is created with only part of the file received, then the rest of the file is received, and then we try to serialize, the old code would pick up all of the data and return it, whereas the new code would return NULL, since it did not have all the data. (FWIW, the new code is more correct - it eliminates a way to make a DecodingImageGenerator draw something different on two different attempts to draw.)
One way to fix this would be to return the encoded data even if !m_allDataReceived. For some callers - in particular serializers, this is better, functionally. (It saves the work of reencoding, and it stores the original image, albeit incomplete.) The reason we do not is because creating an SkData from the encoded data can be expensive. (The data may not be contiguous so it needs to be copied. This can add up e.g. when it happens each time we need to draw an image due to receiving more data. See  issue 568016  and  issue 476531 .)

FWIW, we've removed a caller of refEncodedData (in https://codereview.chromium.org/1970773002), although we might like to bring it back in the future. (There is another caller in SkImage_Gpu [1], although I don't think that would ever call into DecodingImageGenerator.) The remaining callers appear to all be for serialization. So I think it would be safe to remove the check and always return the data - for now. We would need to address this when we enable SK_SUPPORT_COMPRESSED_TEXTURES_IN_CACHERATOR and when anyone else tries to call the method. (FWIW, I've tested this and it fixes  issue 623475 .)

A better solution might be to create another interface for serialization, which does not care whether the data is contiguous.

[1] https://cs.chromium.org/chromium/src/third_party/skia/src/image/SkImage_Gpu.cpp?cl=GROK&gsn=refEncoded&rcl=1467112798&l=433
Cc: jaydasika@chromium.org tkonch...@chromium.org dcheng@chromium.org nyerramilli@chromium.org
 Issue 623475  has been merged into this issue.
Cc: caseq@chromium.org
Owner: scroggo@chromium.org
Status: Started (was: Assigned)
https://codereview.chromium.org/2108163003/ is out for review.
From https://codereview.chromium.org/2108163003/#msg5:

Should DevTools do this serialization in a background thread anyway? That will prevent similar issues from popping up if other serialization tasks require allocating discardable memory. Further, it looks like serialization is being done on the IO thread, which I could imagine might make the page unresponsive?
Project Member

Comment 16 by bugdroid1@chromium.org, Jun 29 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7ba0aa90179d42ac4806386a88d50ed3778ae30b

commit 7ba0aa90179d42ac4806386a88d50ed3778ae30b
Author: scroggo <scroggo@google.com>
Date: Wed Jun 29 22:36:08 2016

Fix dead-lock issue attempting to decode

During serialization, if DecodingImageGenerator::onRefEncodedData()
returns nullptr, this may require the serialization code to decode so
it can have something to then encode and serialize.

This can cause a deadlock if the DecodingImageGenerator is using
discardable memory, which cannot be used from certain threads (where
serialization may be occurring).

Currently, onRefEncodedData() returns nullptr if !m_allDataReceived.
This was to fix  crbug.com/568016 , but that has been further fixed by
crrev.com/1970773002, which avoids calling onRefEncodedData() to pass
it to the GPU, since even if m_allDataReceived is true, we do not yet
support any GPU texture formats.

This change returns the encoded data even if the data has not all been
received, except if the GrContext argument is not NULL for the future
when the GPU starts calling it again.

This will result in serializing the existing data and avoid the
deadlock.

BUG= 607227 , 623475 

Review-Url: https://codereview.chromium.org/2108163003
Cr-Commit-Position: refs/heads/master@{#402963}

[modify] https://crrev.com/7ba0aa90179d42ac4806386a88d50ed3778ae30b/third_party/WebKit/Source/platform/graphics/DecodingImageGenerator.cpp

Status: Fixed (was: Started)
Just tested it out and everything seems to work. Thanks guys :)

Sign in to add a comment