New issue
Advanced search Search tips

Issue 604008 link

Starred by 0 users

Issue metadata

Status: Fixed
Owner:
Closed: Jul 2016
Cc:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug-Regression



Sign in to add a comment

3.2% regression in page_cycler.top_10_mobile at 387300:387345

Project Member Reported by rsch...@chromium.org, Apr 15 2016

Issue description

See the link to graphs below.
 
All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=604008

Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?keys=agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgICghLDqowoM


Bot(s) for this bug's original alert(s):

android-galaxy-s5
Project Member

Comment 2 by 42576172...@developer.gserviceaccount.com, Apr 16 2016

Cc: scroggo@chromium.org
Owner: scroggo@chromium.org

=== Auto-CCing suspected CL author scroggo@chromium.org ===

Hi scroggo@chromium.org, the bisect results pointed to your CL below as possibly
causing a regression. Please have a look at this info and see whether
your CL be related.


===== BISECT JOB RESULTS =====
Status: completed


===== SUSPECTED CL(s) =====
Subject : Eliminate copies of encoded image data
Author  : scroggo
Commit description:
  
Remove the ThreadSafeDataTransport, which required making two copies of
the encoded data. Instead, copy the data (once) into an SkRWBuffer,
which allows creating read only snapshots that can be safely read in
another thread.

Design doc can be found at https://docs.google.com/document/d/1aNmMaWfjkSSkdnVkAJoBUN7mNwdCXT1wLghrhDhhSUw/edit#heading=h.hvbfr8rfyj9k
Benchmark results can be found at https://docs.google.com/spreadsheets/d/1Y_9UbzBvyg1HRs2uieeV2Twb7seJFKZulRVg7OQOqEo/

DeferredImageDecoder:
Copy into an SkRWBuffer, and create snapshots to pass to each
DecodingImageGenerator.

DecodingImageGenerator:
Store a snapshot of the data, along with whether the snapshot contains
all of the data. Pass this to the ImageFrameGenerator when it is time
to decode. Since it already knows whether it has received all the data,
never attempt to YUV decode if not.
When created from an SkData, skip the copy into a SharedBuffer. Instead,
wrap in a SegmentReader.

ImageFrameGenerator:
Its decoding APIs now take the encoded data and whether it's complete as
parameters. Does not own the encoded data, and no longer has the
implementation of onRefEncodedData.
Make the object thread-safe where possible, so we can only lock the
mutex during operations that truly need it.

SegmentReader:
Add a new interface that looks like a read only SharedBuffer. Add three
implementations:
- SharedBufferSegmentReader:
Used when we already have a SharedBuffer (e.g. in the main thread).
- ROBufferSegmentReader:
Facilitates decoding in a worker thread without requiring mutex locks/
an extra copy of the data. (Note - this still requires a mutex for using
the SkROBuffer::Iter. But in practice, there is no contention, unlike
the old implementation, which had to lock in order to copy into the
reader thread.)
- DataSegmentReader:
Allows using an SkData directly without copying.

ImageDecoders/FastSharedBufferReader:
Read data from a SegmentReader instead of directly from a SharedBuffer.
(Add an ImageDecoder helper that takes a SharedBuffer parameter, reducing
the need to change code that already has a SharedBuffer.)
Since the API is the same, the changes here are small, except for
WEBPImageDecoder, which no longer calls SharedBuffer::data(), and
instead calls SegmentReader::refAsSkData().
Use PassRefPtr instead of raw pointers for parameter passing in setData.

Tests:
Adapt where a SharedBuffer is no longer used.

PictureSnapshot.cpp
Wrap SkData in SegmentReader.

BUG= 467772 
BUG= 483369 

BUG= 335694 
This may also help with crrbug.com/335694. The DecodingImageGenerator
will no longer attempt to read beyond the data that had been received
when it was created.

e.g

DeferredImageDecoder did = ...
did.setData(...);

SkImage image = did.createFrameAtIndex(0);
did.setData(...); // More data

image will not attempt to decode the extra data provided in the last
call to setData.

However, it could use a decoder that had previously been used to
decode more data than it was created with.

e.g.

DeferredImageDecoder did = ...
did.setData(...);
SkImage imageA = did.createFrameAtIndex(0); // DecodingImageGenerator

did.setData(...);   // More data
SkImage imageB = did.createFrameAtIndex(0);

// draw imageB - this decodes to the end of data provided in the second
//               call to setData

// draw imageA - this can use the same ImageDecoder retrieved from the
//               ImageDecodingStore, meaning it will have already
//               decoded beyond the data supplied to imageA

imageA *may* look like imageB, even though it had less data.

Review URL: https://codereview.chromium.org/1812273003

Cr-Commit-Position: refs/heads/master@{#387344}
Commit  : d2234904faee943bd987bd38d620096db808efca
Date    : Thu Apr 14 17:05:12 2016


===== TESTED REVISIONS =====
Revision                Mean Value  Std. Dev.   Num Values  Good?
chromium@387299         27431.5     133.712752  6           good
chromium@387322         27363.8     193.312441  5           good
chromium@387334         27508.6     190.410872  5           good
chromium@387340         27349.4     179.217187  5           good
chromium@387343         27482.0     221.286014  5           good
chromium@387344         28347.6     147.23213   5           bad         <-
chromium@387345         28205.4     215.779517  5           bad

Bisect job ran on: android_s5_perf_bisect
Bug ID: 604008

Test Command: src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --also-run-disabled-tests page_cycler.top_10_mobile
Test Metric: vm_private_dirty_final_renderer/vm_private_dirty_final_renderer
Relative Change: 3.08%
Score: 99.8

Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/android_s5_perf_bisect/builds/603
Job details: https://chromeperf.appspot.com/buildbucket_job_status/9015284054956236992


Not what you expected? We'll investigate and get back to you!
  https://chromeperf.appspot.com/bad_bisect?try_job_id=604008

| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Tests>AutoBisect.  Thank you!
Status: Started (was: Assigned)
I expected an improvement (less copies, less mutex contention), and if I look at https://chromeperf.appspot.com//group_report?rev=387344 (improvements and regressions around the time of my change), I see many improvements. In particular, I see a couple of improvements (and no regressions) for the image_decoding.image_decoding_measurement Test Suite (2.0% on chromium-rel-win7-gpu-intel and 18.6% on android-nexus9). That test suite is specifically for image decoding, and my CL should only have affected image decoding. That said, most sites have images (citation needed), and this test suite only times a few images, so it's possible that it misses interesting cases. Looking further in that list, I see several large performance differences (some improvements and some regressions) that may be worth investigating to see whether they're related to my CL.
As stated before, my CL should actually improve image decoding (and does for the image decoding specific tests) and should not affect other things.

But there is a case where I could speculate that my CL would cause a slow down. The change is in DecodingImageGenerator::onRefEncodedData:

Prior to my CL, onRefEncodedData did the following:
1. lock a mutex and copy encoded data from the writer thread to the reader thread (in ThreadSafeDataTransport::data)
2. a) If the data is not all received, return nullptr
   b) If onRefEncodedData has been called before, return the existing SkData object
   c) Call SharedBuffer::data() to copy all the data into a contiguous buffer (throwing away the original segments).
      Create an SkData object wrapping it. Return the SkData, and keep a pointer (ref) to it, for use next time.

With my CL:
- we skip 1, meaning that during load (not all data has been received) we go straight to 2a and return nullptr. This is faster than the old code.
- Once all the data is loaded, we call SegmentReader::refAsSkData, which copies all the data into into a contiguous buffer *without* deleting the old data.[1]
- Here's the potentially bad part. On future calls to onRefEncodedData, we do the copy again, since we did not hang on to the contiguous buffer.

So I expect this method to be faster during load, but slower when called after all the data has been received.

So the question in my mind is, are the cases where we slow down a result of calling this method when all data has been received? The primary caller is inside SkImageCacherator::lockTexture [2]. It only calls onRefEncodedData if the image is not already in the cache, though based on https://codereview.chromium.org/1925533003/, it sounds like we are not using the cache properly. For an animated GIF, I could see how this would be slower. (This assumes we're still calling lockTexture in the new ImageDecodeController world.)

We already have a bug around the fact that onRefEncodedData is so expensive to begin with ( issue 568016 ). Before and after my patch, we should be able to skip this, due to the fact that we are only calling onRefEncodedData to allow the GPU to directly support files like pkm and ktx. DecodingImageGenerator should never have to copy all the data, since it never holds one of those types of files internally.

I have a CL in progress for  issue 568016  in https://codereview.chromium.org/1862133002/. That should also address this regression, if it is due to my CL.

[1] Typically. We do have a fast path for when the data is already contiguous in an SkROBuffer, but this won't be the case unless all the data was loaded initially. (Since we're talking about a DecodingImageGenerator, the SegmentReader must be an ROBufferSegmentReader, so we can ignore the fact that DataSegmentReader also need not do a copy, and that we *could* add a fast path for SharedBufferSegmentReader.)
[2] https://code.google.com/p/chromium/codesearch#chromium/src/third_party/skia/src/core/SkImageCacherator.cpp&sq=package:chromium&l=280&rcl=1449620985
This is still pending:
https://codereview.chromium.org/1862133002/
That CL has been replaced by https://codereview.chromium.org/1970773002/ in Skia. Once that merges into Chromium (https://codereview.chromium.org/1970803002/), this should improve.
That merge CL got replaced with https://codereview.chromium.org/1971543003/ (the first one failed.)
Project Member

Comment 8 by sheriffbot@chromium.org, Jun 1 2016

Labels: -M-52 M-53 MovedFrom-52
Moving this nonessential bug to the next milestone.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -performance-sheriff Performance-Sheriff
Status: Fixed (was: Started)
The metrics have recovered, I think we can close this.

Sign in to add a comment