Issue metadata
Sign in to add a comment
|
v59 performance 2 times slower
Reported by
kinos...@gmail.com,
Jul 5 2017
|
||||||||||||||||||||||
Issue descriptionSteps to reproduce the problem: 1. can be seen on site: https://kinoseed.com/ 2. load (click or drop-in) any image in "target image" 3. load color-palette / sample image from thumbs / or example image in "source image" box 4. press "download" in "target image" box What is the expected behavior? Rendering and downloading a color-manipulated image. What went wrong? Significant drop in javascript performance. It takes 2 times longer on v59 compared to v58. Did this work before? Yes v58 Chrome version: 59 Channel: stable OS Version: 4.1+ Flash Version: 1. Similar test was performed on a x86 Canary-v61-64bit, takes 25% longer compared to v58. 2. Android devices with significant more hardware power seem to perform just as poorly as 6+yr old devices. 3. Site currently downsizes the original image to order to deal with the decreased performance issues, which become a real problem with original-size images.
,
Jul 13 2017
@reporter: Is there a way to access the performance numbers? Measuring by hand is quite of not-optimal. There is also an error in the javascript console that a worker could not be created, maybe that is the problem? @test: please bisect
,
Jul 13 2017
,
Jul 13 2017
,
Jul 13 2017
@hablich: There was no dedicated "performance testing", and the rough observation was about of factor of 2. It may not be optimal, but I think it is irrelevant whether it is 1.8 or 2.2 times slower. The code was not modified between v58 and v59, when the performance drop occurred. Drop of performance (although not that significant) is also seen in x86 (pc-versions). The service worker should not impact performance, as it only delivers app-files, but it should also be deployed properly (unless there is another bug). I can remove the image-downsizing for you, and the difference of performance will be evident with normal/large images.
,
Jul 18 2017
Can be reproduced by setting chrome://flags/#disable-v8-ignition-turbo. With I+TF it indeed takes longer. mvstanton@/rmcilroy@ can you or one on your team please investigate this?
,
Jul 18 2017
In order to reproduce it, make sure to load a big picture (multiple MB) as target.
,
Jul 18 2017
I had a brief look a this with the DevTools profiler. It looks like the function c20 is taking about 3x longer with I+TF than it did with Crankshaft. The function is optimized so it looks like a missing optimization in TurboFan. Profiles attached. I've not yet had time to print the generated code to see what the difference is - mvstanton could you or someone on your team have a look?
,
Jul 19 2017
,
Jul 19 2017
For easier reference, this is the function mentioned in the previous comment:
function c20(_0x28188b, _0x33a6f2, _0x4713bc, _0x551d09=0x1) {
var _0x4475af = document['\x63\x72\x65\x61\x74\x65\x45\x6c\x65\x6d\x65\x6e\x74']('\x63\x61\x6e\x76\x61\x73');
_0x4475af['\x77\x69\x64\x74\x68'] = _0x4713bc;
_0x4475af['\x68\x65\x69\x67\x68\x74'] = _0x28188b['\x68\x65\x69\x67\x68\x74'] * _0x4713bc / _0x28188b['\x77\x69\x64\x74\x68'];
var _0x371bc9 = _0x4475af['\x67\x65\x74\x43\x6f\x6e\x74\x65\x78\x74']('\x32\x64');
_0x371bc9['\x64\x72\x61\x77\x49\x6d\x61\x67\x65'](_0x28188b, 0x0, 0x0, _0x4475af['\x77\x69\x64\x74\x68'], _0x4475af['\x68\x65\x69\x67\x68\x74']);
var _0x875b4d = _0x371bc9['\x67\x65\x74\x49\x6d\x61\x67\x65\x44\x61\x74\x61'](0x0, 0x0, _0x4475af['\x77\x69\x64\x74\x68'], _0x4475af['\x68\x65\x69\x67\x68\x74']);
var _0x67e338, _0x404570;
var _0xa13a6e = _0x4475af['\x77\x69\x64\x74\x68'];
var _0xf9b53e = _0x4475af['\x68\x65\x69\x67\x68\x74'];
var _0x2eacbe = new Uint32Array(_0x875b4d['\x64\x61\x74\x61']['\x62\x75\x66\x66\x65\x72']);
var _0x1afdd2 = 0xff0000
, _0x384328 = 0xff00
, _0x520aea = 0xff;
var _0x45ca5f, _0x159859, _0x19113f, _0x4fe4c9;
var _0x67e338, _0x404570, _0x224c04, _0x33420a, _0x19f95a, _0xa202de, _0x168ac1, _0x4a292f, _0x10aa60;
var _0x129933 = 0x1 - _0x551d09;
function _0x2fbde5(_0x176b7d) {
return _0x33a6f2[_0x33420a][_0xa202de][_0x4a292f][_0x176b7d] * (0x1 - _0x67e338) * (0x1 - _0x404570) * (0x1 - _0x224c04) + _0x33a6f2[_0x19f95a][_0xa202de][_0x4a292f][_0x176b7d] * _0x67e338 * (0x1 - _0x404570) * (0x1 - _0x224c04) + _0x33a6f2[_0x33420a][_0x168ac1][_0x4a292f][_0x176b7d] * (0x1 - _0x67e338) * _0x404570 * (0x1 - _0x224c04) + _0x33a6f2[_0x33420a][_0xa202de][_0x10aa60][_0x176b7d] * (0x1 - _0x67e338) * (0x1 - _0x404570) * _0x224c04 + _0x33a6f2[_0x19f95a][_0xa202de][_0x10aa60][_0x176b7d] * _0x67e338 * (0x1 - _0x404570) * _0x224c04 + _0x33a6f2[_0x33420a][_0x168ac1][_0x10aa60][_0x176b7d] * (0x1 - _0x67e338) * _0x404570 * _0x224c04 + _0x33a6f2[_0x19f95a][_0x168ac1][_0x4a292f][_0x176b7d] * _0x67e338 * _0x404570 * (0x1 - _0x224c04) + _0x33a6f2[_0x19f95a][_0x168ac1][_0x10aa60][_0x176b7d] * _0x67e338 * _0x404570 * _0x224c04;
}
var _0x5c7d78 = g = b = 0x0;
for (var _0x50fcbe = 0x0; _0x50fcbe < _0xf9b53e; ++_0x50fcbe) {
for (var _0x3ad786 = 0x0; _0x3ad786 < _0xa13a6e; ++_0x3ad786) {
_0x45ca5f = _0x2eacbe[_0x50fcbe * _0xa13a6e + _0x3ad786];
r_o = _0x45ca5f & _0x520aea;
g_o = (_0x45ca5f & _0x384328) >> 0x8;
b_o = (_0x45ca5f & _0x1afdd2) >> 0x10;
_0x159859 = 0x1f * r_o / 0xff;
_0x19113f = 0x1f * g_o / 0xff;
_0x4fe4c9 = 0x1f * b_o / 0xff;
_0x33420a = Math['\x66\x6c\x6f\x6f\x72'](_0x159859);
_0xa202de = Math['\x66\x6c\x6f\x6f\x72'](_0x19113f);
_0x4a292f = Math['\x66\x6c\x6f\x6f\x72'](_0x4fe4c9);
_0x19f95a = Math['\x6d\x69\x6e'](_0x33420a + 0x1, 0x1f);
_0x168ac1 = Math['\x6d\x69\x6e'](_0xa202de + 0x1, 0x1f);
_0x10aa60 = Math['\x6d\x69\x6e'](_0x4a292f + 0x1, 0x1f);
_0x67e338 = _0x159859 - _0x33420a;
_0x404570 = _0x19113f - _0xa202de;
_0x224c04 = _0x4fe4c9 - _0x4a292f;
_0x159859 = Math['\x72\x6f\x75\x6e\x64'](_0x551d09 * _0x2fbde5(0x0) + _0x129933 * r_o);
_0x19113f = Math['\x72\x6f\x75\x6e\x64'](_0x551d09 * _0x2fbde5(0x1) + _0x129933 * g_o);
_0x4fe4c9 = Math['\x72\x6f\x75\x6e\x64'](_0x551d09 * _0x2fbde5(0x2) + _0x129933 * b_o);
_0x2eacbe[_0x50fcbe * _0xa13a6e + _0x3ad786] = 0xff << 0x18 | _0x4fe4c9 << 0x10 | _0x19113f << 0x8 | _0x159859;
}
}
_0x371bc9['\x70\x75\x74\x49\x6d\x61\x67\x65\x44\x61\x74\x61'](_0x875b4d, 0x0, 0x0);
return _0x4475af;
}
I'll dig into this now.
,
Jul 19 2017
Ok, quick update on this one... digging deeper, TurboFan is a lot slower on the inner function
function _0x2fbde5(_0x176b7d) {
return _0x33a6f2[_0x33420a][_0xa202de][_0x4a292f][_0x176b7d] * (0x1 - _0x67e338) * (0x1 - _0x404570) * (0x1 - _0x224c04) + _0x33a6f2[_0x19f95a][_0xa202de][_0x4a292f][_0x176b7d] * _0x67e338 * (0x1 - _0x404570) * (0x1 - _0x224c04) + _0x33a6f2[_0x33420a][_0x168ac1][_0x4a292f][_0x176b7d] * (0x1 - _0x67e338) * _0x404570 * (0x1 - _0x224c04) + _0x33a6f2[_0x33420a][_0xa202de][_0x10aa60][_0x176b7d] * (0x1 - _0x67e338) * (0x1 - _0x404570) * _0x224c04 + _0x33a6f2[_0x19f95a][_0xa202de][_0x10aa60][_0x176b7d] * _0x67e338 * (0x1 - _0x404570) * _0x224c04 + _0x33a6f2[_0x33420a][_0x168ac1][_0x10aa60][_0x176b7d] * (0x1 - _0x67e338) * _0x404570 * _0x224c04 + _0x33a6f2[_0x19f95a][_0x168ac1][_0x4a292f][_0x176b7d] * _0x67e338 * _0x404570 * (0x1 - _0x224c04) + _0x33a6f2[_0x19f95a][_0x168ac1][_0x10aa60][_0x176b7d] * _0x67e338 * _0x404570 * _0x224c04;
}
Specifically the hole check on the _0x33a6f2 seems to result in a diamond, which flushes the state of the LoadElimination table (due to the runtime function on one side). Thus Crankshaft loads the _0x33a6f2 only once and does all the checks on it only once, whereas TurboFan does it eight times, i.e. it's always the sequence of
...
0x276fc5891974 fd4 488b4217 REX.W movq rax,[rdx+0x17]
0x276fc5891978 fd8 488b402f REX.W movq rax,[rax+0x2f]
0x276fc589197c fdc c5fb1145e8 vmovsd [rbp-0x18],xmm0
0x276fc5891981 fe1 493945a8 REX.W cmpq [r13-0x58],rax
0x276fc5891985 fe5 0f8522000000 jnz 0x276fc58919ad <+0x100d>
-- B283 start --
-- B284 start --
0x276fc589198b feb 48b889b9cac3653f0000 REX.W movq rax,0x3f65c3cab989 ;; object: 0x3f65c3cab989 <String[9]: _0x33a6f2>
0x276fc5891995 ff5 50 push rax
-- <not inlined:38261> --
0x276fc5891996 ff6 48bbf07335b5187f0000 REX.W movq rbx,0x7f18b53573f0
0x276fc58919a0 1000 b801000000 movl rax,0x1
0x276fc58919a5 1005 488bf2 REX.W movq rsi,rdx
0x276fc58919a8 1008 e8f329e7ff call 0x276fc57043a0 ;; code: STUB, CEntryStub, minor: 8
-- B285 start --
0x276fc58919ad 100d 488b5df8 REX.W movq rbx,[rbp-0x8]
0x276fc58919b1 1011 488b534f REX.W movq rdx,[rbx+0x4f]
-- <not inlined:38271> --
0x276fc58919b5 1015 a801 test al,0x1
0x276fc58919b7 1017 0f84670f0000 jz 0x276fc5892924 <+0x1f84>
0x276fc58919bd 101d 48b91145c087db000000 REX.W movq rcx,0xdb87c04511 ;; object: 0xdb87c04511 <Map(FAST_ELEMENTS)>
0x276fc58919c7 1027 483948ff REX.W cmpq [rax-0x1],rcx
0x276fc58919cb 102b 0f85580f0000 jnz 0x276fc5892929 <+0x1f89>
0x276fc58919d1 1031 488b700f REX.W movq rsi,[rax+0xf]
0x276fc58919d5 1035 488b7817 REX.W movq rdi,[rax+0x17]
0x276fc58919d9 1039 f6c201 testb rdx,0x1
0x276fc58919dc 103c 0f85ae0b0000 jnz 0x276fc5892590 <+0x1bf0>
...
in the generated code.
,
Jul 19 2017
Performance is restored completely on ToT (probably already on M60), due to the hole check machinery by mythria@ and the function context specialization by jarin@. mythria@ can you verify that the investigation makes sense?
,
Jul 19 2017
Yes, I think hole check optimization does help. When I revert that change it regresses C20 by around ~30% and _0x2fbde5 by ~40%. So, it did help recover some of the regression.
,
Jul 19 2017
This change went into M61, so M60 may still have the performance regression.
,
Jul 19 2017
Should we back-merge the performance fix to M60?
,
Jul 20 2017
Yes, please do merge it back today.
,
Jul 20 2017
It turns out that the fix is quite involved and involves multiple commits: aed96e7b049ca41108bc9d69aad5e764a66ee9f3 c360c6a1d01135298aa9b6508f8367db322ce107 66218e4efa7ec0272bb707f21cc793da65954308 There is no new beta push for 60 planned so let's not merge it to 60.
,
Jul 24 2017
Btw, is this bug "Fixed"?
,
Jul 24 2017
I guess we won't know until M61 gets the light of day. If you think it's fixed, mark it so. If the "fix" does not solve the performance issue, I'll refile the bug-report with v61 when the time comes.
,
Sep 26 2017
I am marking this as fixed, since the work is done. Please feel free to reopen this bug if the issue is not fixed in M61. |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by schenney@chromium.org
, Jul 6 2017