SkRWBuffer wastes memory on allocator overhead |
||||||
Issue descriptionSkBufferBlock and SkBufferHead can both waste large amounts of memory on allocation overhead on some memory allocators. The problem is that those classes allocate "length + sizeof(Self)" bytes, and sometimes that value is just tad above 4096, causing allocator to select next size bucket, which for jemalloc is 5120 (and for TCMalloc is 4608). Since the header is small (3-4 ptrs), most of allocation overhead is wasted. To see how much is wasted I implemented crude SkRWBuffer instrumentation, see http://crrev.com/2374383004. My test was to visit TheVerge.com and slowly scroll (to load all images, especially animated ones) to the bottom. Linux results: ... SkBufferBlock: allocating 4096 wasted 0 (10799 / 875 KiB allocated / wasted across 2685 blocks) SkBufferBlock: allocating 4136 wasted 472 (10803 / 876 KiB allocated / wasted across 2686 blocks) SkBufferBlock: allocating 4128 wasted 480 (10807 / 876 KiB allocated / wasted across 2687 blocks) SkBufferBlock: allocating 4128 wasted 480 (10811 / 877 KiB allocated / wasted across 2688 blocks) SkBufferBlock: allocating 4128 wasted 480 (10815 / 877 KiB allocated / wasted across 2689 blocks) SkBufferBlock: allocating 4128 wasted 480 (10819 / 878 KiB allocated / wasted across 2690 blocks) SkBufferBlock: allocating 4096 wasted 0 (10823 / 878 KiB allocated / wasted across 2691 blocks) Android results: ... SkBufferBlock: allocating 4116 wasted 1004 (4286 / 926 KiB allocated / wasted across 1068 blocks) SkBufferBlock: allocating 4112 wasted 1008 (4290 / 927 KiB allocated / wasted across 1069 blocks) SkBufferBlock: allocating 4096 wasted 0 (4294 / 927 KiB allocated / wasted across 1070 blocks) SkBufferBlock: allocating 4116 wasted 1004 (4298 / 928 KiB allocated / wasted across 1071 blocks) SkBufferBlock: allocating 4112 wasted 1008 (4302 / 929 KiB allocated / wasted across 1072 blocks) SkBufferBlock: allocating 4112 wasted 1008 (4306 / 930 KiB allocated / wasted across 1073 blocks) SkBufferBlock: allocating 4112 wasted 1008 (4310 / 931 KiB allocated / wasted across 1074 blocks) SkBufferBlock: allocating 4096 wasted 0 (4322 / 931 KiB allocated / wasted across 1077 blocks) Note that TheVerge.com is the worst in this regard, other websites that I tried produced far less waste. I also noticed that numbers jump once animated image loads. I also compiled spreadsheet of requested/allocated sizes for different allocators: https://docs.google.com/spreadsheets/d/1zRdgXSgbxQDUoHoO1iiv0KTG8fBESeWHj-BUOZYqZ40/edit?usp=sharing
,
Sep 30 2016
,
Sep 30 2016
I suspect #1 may introduce internal fragmentation in SharedBuffer itself, and the overhead of an extra allocation in #2 is non-trivial. Ideally, if we knew the total size of the data we're going to receive, we could preallocate a single SkRWBuffer block. Not sure whether that's feasible. But looking at SkRWBuffer, it seems we currently ignore the initialCapacity hint, so even if all the data is passed in a single call we're still going to grow the storage one segment at a time: https://cs.chromium.org/chromium/src/third_party/skia/src/core/SkRWBuffer.cpp?rcl=0&l=184 Fixing the SkRWBuffer initialCapacity allocation seems like a good place to start.
,
Sep 30 2016
3 CLs in flight which should drastically reduce allocator fragmentation: 1) preallocate in the SkRWBuffer ctor: https://codereview.chromium.org/2384763002/ 2) add a reserve hint to SkRWBuffer::append(): https://codereview.chromium.org/2385803002/ 3) use the reserve hint in DeferredImageDecoder: https://codereview.chromium.org/2385823002
,
Sep 30 2016
That was quick, thanks! Let me check how those CLs affect things. jemalloc is pretty coarse grained, for example allocating 120 KiB + 1 byte wastes 32 KiB (see issue 651872 I just filed), so preallocation might actually worsen things if we end up preallocating say 8 KiB + delta wasting 2 KiB.
,
Sep 30 2016
BTW, in suggestion #1 I was thinking of something like:
const char* segment = 0;
for (size_t length = data->getSomeData(segment, m_rwBuffer->size());
length; length = data->getSomeData(segment, m_rwBuffer->size())) {
size_t half_length = length / 2;
m_rwBuffer->append(segment, half_length);
m_rwBuffer->append(segment + half_length, length - half_length);
}
,
Sep 30 2016
The following revision refers to this bug: https://skia.googlesource.com/skia.git/+/2e36e88f40c8a37043d5b9ef17bc72b69a394b95 commit 2e36e88f40c8a37043d5b9ef17bc72b69a394b95 Author: fmalita <fmalita@chromium.org> Date: Fri Sep 30 17:52:08 2016 SkRWBuffer: preallocate 'initialCapacity' We're currently ignoring the hint, resulting in multiple unneeded allocations later. BUG= chromium:651698 R=scroggo@google.com GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2384763002 Review-Url: https://codereview.chromium.org/2384763002 [modify] https://crrev.com/2e36e88f40c8a37043d5b9ef17bc72b69a394b95/src/core/SkRWBuffer.cpp
,
Sep 30 2016
The following revision refers to this bug: https://skia.googlesource.com/skia.git/+/2e36e88f40c8a37043d5b9ef17bc72b69a394b95 commit 2e36e88f40c8a37043d5b9ef17bc72b69a394b95 Author: fmalita <fmalita@chromium.org> Date: Fri Sep 30 17:52:08 2016 SkRWBuffer: preallocate 'initialCapacity' We're currently ignoring the hint, resulting in multiple unneeded allocations later. BUG= chromium:651698 R=scroggo@google.com GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2384763002 Review-Url: https://codereview.chromium.org/2384763002 [modify] https://crrev.com/2e36e88f40c8a37043d5b9ef17bc72b69a394b95/src/core/SkRWBuffer.cpp
,
Sep 30 2016
With the fixes above we now allocate bigger blocks, but the proportion of waste stayed the same: Android: SkBufferBlock: allocating 16391 wasted 4089 (2213 / 364 KiB allocated / wasted across 134 blocks) SkBufferBlock: allocating 7111 wasted 57 (2220 / 364 KiB allocated / wasted across 135 blocks) SkBufferBlock: allocating 32788 wasted 8172 (2252 / 372 KiB allocated / wasted across 136 blocks) SkBufferBlock: allocating 10168 wasted 72 (2262 / 372 KiB allocated / wasted across 137 blocks) SkBufferBlock: allocating 12272 wasted 16 (2274 / 372 KiB allocated / wasted across 138 blocks) SkBufferBlock: allocating 24436 wasted 140 (2298 / 372 KiB allocated / wasted across 139 blocks) SkBufferBlock: allocating 32784 wasted 8176 (2330 / 380 KiB allocated / wasted across 140 blocks) SkBufferBlock: allocating 8105 wasted 87 (2338 / 380 KiB allocated / wasted across 141 blocks) SkBufferBlock: allocating 26451 wasted 2221 (2363 / 383 KiB allocated / wasted across 142 blocks) SkBufferBlock: allocating 8865 wasted 1375 (2372 / 384 KiB allocated / wasted across 143 blocks) SkBufferBlock: allocating 9050 wasted 1190 (2381 / 385 KiB allocated / wasted across 144 blocks) SkBufferBlock: allocating 4096 wasted 0 (2385 / 385 KiB allocated / wasted across 145 blocks) SkBufferBlock: allocating 32788 wasted 8172 (2417 / 393 KiB allocated / wasted across 146 blocks) SkBufferBlock: allocating 8466 wasted 1774 (2425 / 395 KiB allocated / wasted across 147 blocks) SkBufferBlock: allocating 16140 wasted 244 (2441 / 395 KiB allocated / wasted across 148 blocks) SkBufferBlock: allocating 18860 wasted 1620 (2459 / 397 KiB allocated / wasted across 149 blocks) SkBufferBlock: allocating 11208 wasted 1080 (2470 / 398 KiB allocated / wasted across 150 blocks) SkBufferBlock: allocating 9927 wasted 313 (2480 / 398 KiB allocated / wasted across 151 blocks) SkBufferBlock: allocating 14723 wasted 1661 (2494 / 400 KiB allocated / wasted across 152 blocks) SkBufferBlock: allocating 4096 wasted 0 (2498 / 400 KiB allocated / wasted across 153 blocks) SkBufferBlock: allocating 32788 wasted 8172 (2530 / 408 KiB allocated / wasted across 154 blocks) SkBufferBlock: allocating 21841 wasted 2735 (2552 / 410 KiB allocated / wasted across 155 blocks) SkBufferBlock: allocating 32784 wasted 8176 (2584 / 418 KiB allocated / wasted across 156 blocks) SkBufferBlock: allocating 9259 wasted 981 (2593 / 419 KiB allocated / wasted across 157 blocks) SkBufferBlock: allocating 32784 wasted 8176 (2625 / 427 KiB allocated / wasted across 158 blocks) SkBufferBlock: allocating 6547 wasted 621 (2631 / 428 KiB allocated / wasted across 159 blocks) SkBufferBlock: allocating 13713 wasted 623 (2645 / 428 KiB allocated / wasted across 160 blocks) SkBufferBlock: allocating 4096 wasted 0 (2649 / 428 KiB allocated / wasted across 161 blocks) I.e. we ended up with 2649 / 428, while previously it was 4322 / 931 (today TheVerge doesn't have animated images on the front page).
,
Sep 30 2016
That's disappointing, but presumably a jemalloc anomaly? I would expect other allocators to handle consolidated allocations more efficiently. Minimizing the number of allocations has benefits beyond just internal fragmentation though, and Skia generally bends over backwards to consolidate. So I think these changes make sense regardless of jemalloc results - but yeah, it's sad they don't show an impact there.
,
Sep 30 2016
On Linux situation improved 2x: SkBufferBlock: allocating 5938 wasted 206 (1887 / 234 KiB allocated / wasted across 194 blocks) SkBufferBlock: allocating 7152 wasted 1040 (1894 / 235 KiB allocated / wasted across 195 blocks) SkBufferBlock: allocating 21156 wasted 3420 (1914 / 239 KiB allocated / wasted across 196 blocks) SkBufferBlock: allocating 5232 wasted 912 (1920 / 239 KiB allocated / wasted across 197 blocks) SkBufferBlock: allocating 29648 wasted 3120 (1949 / 242 KiB allocated / wasted across 198 blocks) SkBufferBlock: allocating 4096 wasted 0 (1953 / 242 KiB allocated / wasted across 199 blocks) SkBufferBlock: allocating 4096 wasted 0 (1957 / 242 KiB allocated / wasted across 200 blocks) SkBufferBlock: allocating 4096 wasted 0 (1961 / 242 KiB allocated / wasted across 201 blocks) SkBufferBlock: allocating 8608 wasted 1632 (1969 / 244 KiB allocated / wasted across 202 blocks) SkBufferBlock: allocating 5470 wasted 674 (1974 / 245 KiB allocated / wasted across 203 blocks) SkBufferBlock: allocating 7208 wasted 984 (1981 / 246 KiB allocated / wasted across 204 blocks) SkBufferBlock: allocating 11986 wasted 302 (1993 / 246 KiB allocated / wasted across 205 blocks) SkBufferBlock: allocating 23806 wasted 770 (2016 / 247 KiB allocated / wasted across 206 blocks) SkBufferBlock: allocating 6776 wasted 1416 (2023 / 248 KiB allocated / wasted across 207 blocks) SkBufferBlock: allocating 17252 wasted 3228 (2040 / 251 KiB allocated / wasted across 208 blocks) SkBufferBlock: allocating 21074 wasted 3502 (2060 / 255 KiB allocated / wasted across 209 blocks) SkBufferBlock: allocating 4096 wasted 0 (2064 / 255 KiB allocated / wasted across 210 blocks) SkBufferBlock: allocating 3473448 wasted 4056 (5456 / 259 KiB allocated / wasted across 211 blocks) SkBufferBlock: allocating 4096 wasted 0 (5460 / 259 KiB allocated / wasted across 212 blocks) SkBufferBlock: allocating 4096 wasted 0 (5464 / 259 KiB allocated / wasted across 213 blocks) SkBufferBlock: allocating 11674 wasted 614 (5476 / 259 KiB allocated / wasted across 214 blocks) SkBufferBlock: allocating 19014 wasted 1466 (5494 / 261 KiB allocated / wasted across 215 blocks) SkBufferBlock: allocating 13256 wasted 56 (5507 / 261 KiB allocated / wasted across 216 blocks) SkBufferBlock: allocating 8424 wasted 1816 (5516 / 262 KiB allocated / wasted across 217 blocks) SkBufferBlock: allocating 21296 wasted 3280 (5536 / 266 KiB allocated / wasted across 218 blocks) SkBufferBlock: allocating 31270 wasted 1498 (5567 / 267 KiB allocated / wasted across 219 blocks) SkBufferBlock: allocating 12596 wasted 716 (5579 / 268 KiB allocated / wasted across 220 blocks) SkBufferBlock: allocating 10666 wasted 1622 (5590 / 269 KiB allocated / wasted across 221 blocks) SkBufferBlock: allocating 11692 wasted 596 (5601 / 270 KiB allocated / wasted across 222 blocks) SkBufferBlock: allocating 30090 wasted 2678 (5630 / 273 KiB allocated / wasted across 223 blocks) SkBufferBlock: allocating 12340 wasted 972 (5642 / 274 KiB allocated / wasted across 224 blocks) SkBufferBlock: allocating 13652 wasted 2732 (5656 / 276 KiB allocated / wasted across 225 blocks) SkBufferBlock: allocating 4096 wasted 0 (5660 / 276 KiB allocated / wasted across 226 blocks) SkBufferBlock: allocating 4096 wasted 0 (5664 / 276 KiB allocated / wasted across 227 blocks) SkBufferBlock: allocating 4096 wasted 0 (5668 / 276 KiB allocated / wasted across 228 blocks) SkBufferBlock: allocating 4096 wasted 0 (5672 / 276 KiB allocated / wasted across 229 blocks) SkBufferBlock: allocating 4096 wasted 0 (5676 / 276 KiB allocated / wasted across 230 blocks) SkBufferBlock: allocating 4096 wasted 0 (5680 / 276 KiB allocated / wasted across 231 blocks) SkBufferBlock: allocating 4096 wasted 0 (5684 / 276 KiB allocated / wasted across 232 blocks) SkBufferBlock: allocating 4096 wasted 0 (5688 / 276 KiB allocated / wasted across 233 blocks) SkBufferBlock: allocating 7638 wasted 554 (5695 / 277 KiB allocated / wasted across 234 blocks) SkBufferBlock: allocating 5870 wasted 274 (5701 / 277 KiB allocated / wasted across 235 blocks) SkBufferBlock: allocating 7410 wasted 782 (5708 / 278 KiB allocated / wasted across 236 blocks) SkBufferBlock: allocating 6316 wasted 340 (5714 / 278 KiB allocated / wasted across 237 blocks) SkBufferBlock: allocating 11266 wasted 1022 (5725 / 279 KiB allocated / wasted across 238 blocks) SkBufferBlock: allocating 9040 wasted 1200 (5734 / 280 KiB allocated / wasted across 239 blocks) SkBufferBlock: allocating 9400 wasted 840 (5743 / 281 KiB allocated / wasted across 240 blocks) SkBufferBlock: allocating 17114 wasted 3366 (5760 / 284 KiB allocated / wasted across 241 blocks) SkBufferBlock: allocating 5156 wasted 988 (5765 / 285 KiB allocated / wasted across 242 blocks) SkBufferBlock: allocating 4096 wasted 0 (5769 / 285 KiB allocated / wasted across 243 blocks) So we ended up with 5769 / 285, while previously it was 10823 / 878. Also notice giant 3473448 byte block we allocated once. It's good that we optimized number of allocations, but on Android/jemalloc that block would waste 190 KiB. So while this is (mostly) jemalloc-specific anomaly, it would be nice to avoid it.
,
Sep 30 2016
Nice, glad it makes some difference. I think the next thing to try, after landing these CLs, is to tweak the prealloc size internally such that the block size is allocator-friendly. My best guess ATM is to use power-of-two blocks. E.g. * client has ~90K worth of data in 10K segments * client appends 10K with a 80K reserve => we allocate a 64K block * client keeps appending 10K until it exhausts the 64K block (adjusting the reserve as it goes, which we currently ignore) * client appends another 10K with a ~16K reserve => we allocate another 16K block * ... * eventually we allocate the final/unaligned block and we're done So we end up allocating: 64K + 16K + 8K + 2K + something. Which is more allocations than when consolidating everything, but a lot fewer than when not consolidating at all. And since the only unaligned block the smallest, at the end, I expect this to minimize the RAM overhead. WDYT?
,
Sep 30 2016
Looks good! So the smallest block will be 2 KiB instead of current 4 KiB? And the logic is triggered only when 'remaining' is set? What if 'MinBlockSize + delta' is requested - do we allocate exactly that, or somehow round it up? (to the nearest KiB?)
,
Sep 30 2016
That's a good question. On one hand, since the client is always passing reserve hints, it kinda makes sense to do away with that minimum 4K allocation. OTOH, we still have some fragmentation due to networking/other external factors -- so we're not guaranteed to see the full data at once => then that last block is not necessarily final. E.g. the data may very well arrive in packets of 0x1337, 0x1337, 0x1337 bytes. What's the optimal allocation strategy, considering we only see one of these at a time? I think we do get a signal from blink for the final buffer - so maybe enforce a minimum allocation size, except for the last buffer?
,
Sep 30 2016
The following revision refers to this bug: https://skia.googlesource.com/skia.git/+/5776508126534db2af97d560588e7046e745df65 commit 5776508126534db2af97d560588e7046e745df65 Author: fmalita <fmalita@chromium.org> Date: Fri Sep 30 20:34:19 2016 Add a SkRWBuffer reserve mechanism Currently, Chromium stores segmented data in a SharedBuffer and appends to SkRWBuffer one segment at a time: const char* segment = 0; for (size_t length = data->getSomeData(segment, m_rwBuffer->size()); length; length = data->getSomeData(segment, m_rwBuffer->size())) { m_rwBuffer->append(segment, length, remaining); } This can yield a bunch of just-above-4k allocations => wasted RAM due to internal fragmentation. Ideally, we'd want a SkRWBuffer::reserve(size_t bytes) API, but the current internals don't support that trivially. Alternatively, the caller can pass a reserve hint at append() time. BUG= chromium:651698 R=scroggo@google.com,reed@google.com GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2385803002 Review-Url: https://codereview.chromium.org/2385803002 [modify] https://crrev.com/5776508126534db2af97d560588e7046e745df65/include/core/SkRWBuffer.h [modify] https://crrev.com/5776508126534db2af97d560588e7046e745df65/src/core/SkRWBuffer.cpp
,
Sep 30 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f6760c777c4dacf3b432fc5471c1de87d5cc8579 commit f6760c777c4dacf3b432fc5471c1de87d5cc8579 Author: skia-deps-roller <skia-deps-roller@chromium.org> Date: Fri Sep 30 21:17:58 2016 Roll src/third_party/skia/ d921dbb9b..789e25ea7 (4 commits). https://chromium.googlesource.com/skia.git/+log/d921dbb9b883..789e25ea7d0e $ git log d921dbb9b..789e25ea7 --date=short --no-merges --format='%ad %ae %s' 2016-09-30 bsalomon Add gn option to set the location of the Vulkan SDK 2016-09-30 fmalita SkRWBuffer: preallocate 'initialCapacity' 2016-09-30 mtklein Clean up public.bzl after G3-side CL. 2016-09-30 brianosman Helper functions to do SkColor -> GrColor4f BUG= 651698 CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_precise_blink_rel TBR=fmalita@google.com Review-Url: https://codereview.chromium.org/2388453003 Cr-Commit-Position: refs/heads/master@{#422223} [modify] https://crrev.com/f6760c777c4dacf3b432fc5471c1de87d5cc8579/DEPS
,
Sep 30 2016
Hmm, actually I don't understand why DeferredImageDecoder uses SkRWBuffer and not SharedBuffer. 'std::unique_ptr<SkRWBuffer> m_rwBuffer' member is used as follows: 1. To accumulate data that comes to DeferredImageDecoder::setDataInternal() 2. To create SkROBuffer and to convert it to SharedBuffer in DeferredImageDecoder::data() 3. To create SkROBuffer and to convert it to SegmentReader in DeferredImageDecoder::createFrameImageAtIndex() (SegmentReader can also be created from SharedBuffer). So my question is: why not just use SharedBuffer instead of SkRWBuffer? Since SharedBuffer stores everything in 4 KiB segments, using it would also avoid memory waste issue. Paging hajimehoshi@ and haraken@.
,
Sep 30 2016
SharedBuffer is a Blink data structure, but the encoded data needs to be passed to Skia for decoding and rasterization -- hence this conversion. The original Blink data gets discarded at some point to avoid duplication IIRC.
,
Sep 30 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f95ccc684935379db911ea4d1e88c408b9215937 commit f95ccc684935379db911ea4d1e88c408b9215937 Author: skia-deps-roller <skia-deps-roller@chromium.org> Date: Fri Sep 30 22:35:07 2016 Roll src/third_party/skia/ 789e25ea7..7a02f019f (10 commits). https://chromium.googlesource.com/skia.git/+log/789e25ea7d0e..7a02f019f25c $ git log 789e25ea7..7a02f019f --date=short --no-merges --format='%ad %ae %s' 2016-09-30 mtklein Delete unused avx2_sources. 2016-09-30 fmalita Add a SkRWBuffer reserve mechanism 2016-09-30 msarett Fix nanobench crashes 2016-09-30 borenet [task scheduler] Add gen_tasks.go 2016-09-30 reed fix mac build 2016-09-30 msarett Add a src rect to drawImageLattice() API 2016-09-30 borenet Stop uploading nanobench results on Valgrind bot 2016-09-30 borenet upload_dm_results: Remove JSON validation step 2016-09-30 halcanary SkPDF: de-duplicate text-as-paths 2016-09-30 benjaminwagner Unomit SkSL in public.bzl. BUG= 651698 CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_precise_blink_rel TBR=fmalita@google.com Review-Url: https://codereview.chromium.org/2388513002 Cr-Commit-Position: refs/heads/master@{#422248} [modify] https://crrev.com/f95ccc684935379db911ea4d1e88c408b9215937/DEPS
,
Sep 30 2016
Re #18: I don't actually see where SkRWBuffer matters - we create DecodingImageGenerator (which is SkImageGenerator) from SegmentedReader, and the fact that we used SkRWBuffer to create SegmentedReader is buried deep inside it. SkImageGenerator also doesn't mention SkRWBuffer in its API. Am I missing something?
,
Oct 1 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/502fceb9cf8e8c70ba2f90c5dac08cc62ef385d3 commit 502fceb9cf8e8c70ba2f90c5dac08cc62ef385d3 Author: fmalita <fmalita@chromium.org> Date: Sat Oct 01 23:17:02 2016 Pass preallocation hints to SkRWBuffer Compute how much more data we're going to append, and pass that info to SkRWBuffer::append() to minimize the number of internal allocations. BUG= 651698 R=scroggo@chromium.org,reed@google.com Review-Url: https://codereview.chromium.org/2385823002 Cr-Commit-Position: refs/heads/master@{#422333} [modify] https://crrev.com/502fceb9cf8e8c70ba2f90c5dac08cc62ef385d3/third_party/WebKit/Source/platform/graphics/DeferredImageDecoder.cpp
,
Oct 3 2016
Re #18 and #20: I'm not familiar with the details, but Skia needs to access the underlying buffer from the main thread and the decoder thread, right? SkRWBuffer is an infrastructure to handle the concurrent accesses smartly. SharedBuffer doesn't have the infrastructure.
,
Oct 3 2016
Re #22: Right. SkRWBuffer can be accessed from any other threads as read-only buffer without any locks. As fmalita@ said, originally there was duplication of encoded data as SkRWBuffer and ShaerdBuffer, and I've removed the latter recently (https://codereview.chromium.org/2054643003/).
,
Oct 4 2016
Re #17: > So my question is: why not just use SharedBuffer instead of SkRWBuffer? Since > SharedBuffer stores everything in 4 KiB segments, using it would also avoid > memory waste issue. The SegmentReader (and the underlying SkRWBuffer) was introduced [1] because SharedBuffer is not threadsafe. Prior to the introduction of the SegmentReader, we copied from one SharedBuffer to another (in the ThreadSafeDataTransport), so that one thread could continue to append data while another thread read it. So we already had duplication of data, and in the old method there would be no way to remove the duplication (https://codereview.chromium.org/2054643003/ relies on the fact that it can recreate the original SharedBuffer from the SkRWBuffer if needed). [1] https://chromium.googlesource.com/chromium/src/+/d2234904faee943bd987bd38d620096db808efca
,
Oct 4 2016
Thanks for the explanation guys! The part that I was missing was how carefully SkROBuffer/SkRWBuffer are designed to be thread safe. Re #14: enforcing min allocation size except for the last block sounds good! Although it will probably complicate API more, as we'll need 'bool finalBlock' somewhere...
,
Oct 24 2016
fmalita@, we would like to fix this in Q4 - will you have time to do the fix you proposed (allocate in 2^n blocks)?
,
Nov 14 2016
What if we introduce 'sk_malloc_atleast(size)' API, that allocates at least 'size' bytes, and then (potentially) updates 'size' with actual allocated amount? I.e.
void* sk_malloc_atleast_flags(size_t* size, unsigned flags);
Default implementation would just route to sk_malloc_flags().
Android-specific implementation would call sk_malloc_flags() and then update |size| argument with malloc_usable_size():
void* sk_malloc_atleast_flags(size_t* size, unsigned flags) {
void* ptr = sk_malloc_flags(*size, flags);
if (ptr)
*size = malloc_usable_size(ptr);
return ptr;
}
Finally, SkRWBuffer would call it and recalculate block's size from returned size.
I guess we'll have to guarantee that malloc_usable_size() is sufficiently fast on Android. Still, this path may be worthwhile exploring.
,
Nov 14 2016
That sounds like a really handy call to have. I'd use it all the time.
,
Aug 2 2017
An available bug that is a P3 and hasn't been updated in 180 days. I'm going to archive this. If you disagree with this action, please feel free to reopen and do anything necessary to get someone's attention to the fact that this is still a bug we care about. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by dskiba@chromium.org
, Sep 30 2016Components: Internals>Skia Blink
Labels: Performance-Memory
I think I know what is going on. Here is a stack trace that causes 4096+ allocations: blink::ResourceLoader::didReceiveData(blink::WebURLLoader*, char const*, int, int, int) blink::ImageResource::updateImage(bool) blink::BitmapImage::setData(WTF::PassRefPtr<blink::SharedBuffer>, bool) blink::ImageSource::setData(WTF::PassRefPtr<blink::SharedBuffer>, bool) blink::DeferredImageDecoder::create(WTF::PassRefPtr<blink::SharedBuffer>, bool, blink::ImageDecoder::AlphaOption, blink::ImageDecoder::GammaAndColorProfileOption) blink::DeferredImageDecoder::setDataInternal(WTF::PassRefPtr<blink::SharedBuffer>, bool, bool) SkRWBuffer::append(void const*, unsigned int) DeferredImageDecoder::setDataInternal() copies data from blink's SharedBuffer: const char* segment = 0; for (size_t length = data->getSomeData(segment, m_rwBuffer->size()); length; length = data->getSomeData(segment, m_rwBuffer->size())) { m_rwBuffer->append(segment, length); } Since SharedBuffer allocates data in 4096 byte segments (kSegmentSize), it always calls SkRWBuffer::append() with 4096 byte blocks. SkBufferBlock then adds sizeof(SkBufferBlock) to 4096 and ends with 4112 (4096 + sizeof(SkBufferBlock) + 4 bytes that my instrumentation added). And 4116 allocations are 4096 + sizeof(SkBufferHead). Same goes for Linux numbers. I'm not sure what is the proper way to fix this. I see several ways: 1. Blink can feed smaller chunks to SkRWBuffer::append(), say not more than 3 KiB. That would cause SkRWBuffer to allocate exactly 4096 bytes, since that's the minimum allocation size. That feels dirty because we're relying on SkRWBuffer's implementation details. 2. SkRWBuffer can stop storing SkBufferBlock/SkBufferHead inline with the data, and allocate them separately. That way when DeferredImageDecoder feeds 4096 byte chunk, SkRWBuffer would also allocate 4096 byte chunk (plus SkBufferBlock/SkBufferHead in separate allocation).