Issue metadata
Sign in to add a comment
|
53.4% regression in loading.desktop at 540494:540624 |
||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Mar 13 2018
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/14c03eee440000
,
Mar 13 2018
📍 Found a significant difference after 1 commit. https://pinpoint-dot-chromeperf.appspot.com/job/14c03eee440000 Media remoting cleanup: Remove codes that support encrypted contents. by xjz@chromium.org https://chromium.googlesource.com/chromium/src/+/acebf6504d2e1d1b051f6f8e9fb128d4c7e404c2 Understanding performance regressions: http://g.co/ChromePerformanceRegressions
,
Mar 13 2018
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/1486215e440000
,
Mar 13 2018
📍 Found a significant difference after 1 commit. https://pinpoint-dot-chromeperf.appspot.com/job/1486215e440000 Media remoting cleanup: Remove codes that support encrypted contents. by xjz@chromium.org https://chromium.googlesource.com/chromium/src/+/acebf6504d2e1d1b051f6f8e9fb128d4c7e404c2 Understanding performance regressions: http://g.co/ChromePerformanceRegressions
,
Mar 14 2018
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/14d13c0e440000
,
Mar 14 2018
,
Mar 14 2018
📍 Couldn't reproduce a difference. https://pinpoint-dot-chromeperf.appspot.com/job/14d13c0e440000
,
Mar 14 2018
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/11908d76440000
,
Mar 15 2018
Issue 821387 has been merged into this issue.
,
Mar 15 2018
📍 Couldn't reproduce a difference. https://pinpoint-dot-chromeperf.appspot.com/job/11908d76440000
,
Mar 15 2018
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/1303551e440000
,
Mar 15 2018
📍 Couldn't reproduce a difference. https://pinpoint-dot-chromeperf.appspot.com/job/1303551e440000
,
Mar 15 2018
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/1421fe36440000
,
Mar 16 2018
📍 Couldn't reproduce a difference. https://pinpoint-dot-chromeperf.appspot.com/job/1421fe36440000
,
Mar 16 2018
Sent this back for triage. My change is mainly removing the unused codes. And Pinpoint couldn't reproduce a difference even when that CL is completely reverted (See Comment 15).
,
Mar 28 2018
I'm really sorry the messaging on the bug is unclear--+dtu is working on improving the UI and bug messaging for pinpoint tryjobs. In this case, it's actually using "success rate" (percentage of times the benchmark passed) as the metric with "no difference", which is super confusing. Currently, you need to click on "Analyze benchmark results" to see the all the metrics for the benchmark. Example from #15: * Analyze Benchmark Results points here: https://pinpoint-dot-chromeperf.appspot.com/results2/1421fe36440000 * Clicking on "timeToFirstContentfulPaint" in the table, you see 114.779ms at head vs 111.667ms with your CL reverted.
,
Mar 28 2018
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/107a438b440000
,
Mar 28 2018
📍 Found a significant difference after 1 commit. https://pinpoint-dot-chromeperf.appspot.com/job/107a438b440000 [NOT FOR REVIEW] Revert "Media remoting cleanup: Remove codes that support encrypted contents." by xjz@chromium.org https://chromium-review.googlesource.com/c/chromium/src/+/963852/3 Understanding performance regressions: http://g.co/ChromePerformanceRegressions
,
Mar 28 2018
Sorry, the docs may be out of date. If you use the "+" on the Job results page (the one with the graph) it will run a try job with the right metric.
,
Mar 29 2018
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/14bf8297440000
,
Mar 29 2018
📍 Couldn't reproduce a difference. https://pinpoint-dot-chromeperf.appspot.com/job/14bf8297440000
,
Mar 29 2018
Pinpoint tests indicate that the CL causes a slight regression on one metric, but a larger improve on another metric. I did the Pinpoint with the revert CL twice, one in #15 and one in #22. The "timeToFirstContentfulPaint" is 114.779ms vs 111.667ms in #15 and 130.049ms vs 130.338ms in #25, which sounds a slight regression. However, if looking at "timeToFirstMeaningfulPaint", it is 142.996ms vs 160.390ms in #15 and 236.113ms vs 241.098ms in #25, which indicates a larger improvement. sullivan@: It sounds that my CL doesn't cause the regression as significant as reported (53.4%), which makes sense since that CL mostly removed codes that were never called. One change might affect timing a little bit when loading a video element. However, I did try Pinpoint with disabling that feature in #8. The test result didn't indicate an improvement either.
,
Mar 29 2018
tdresser: As metric owner, can you take a look? This is pretty confusing: pinpoint bisect clearly reproduces a large shift, but the perf try job doesn't (see #23). Any idea what's happening here?
,
Mar 29 2018
There are a few additional confounding factors here. The original regression report and bisects were on just two stories (webpages), PremierLeague and ja.wikipedia. Whereas the perf try job results are across all stories in the benchmark. When I break down the try job results by story and cache temperature, I see 122.577 ms to 157.103 ms for PremierLeague / warm. I also want to note that the bisects show the median, while the perf try job results show the mean. The two can differ significantly for bimodal results like these.
,
Mar 29 2018
Woah. Why do we use different statistics between bisects and tryjobs?
,
Mar 29 2018
,
Mar 29 2018
I would say it has to do with the level of detail you want to show. Internally, bisects use the full distribution, so we don't use any of the summary statistics like mean/median/etc. except for a few edge cases like frame_times. If you want to reduce the distribution to a single number, like we show in results2, the mean (or trimmed mean) is a better choice, since it's resistant to the aliasing effects of bimodal distributions. The Pinpoint chart (for display only) is somewhere in between, so the intent is to show the five-number summary (min, Q1, median, Q3, max, though Q1 and Q3 aren't visible on the chart yet (go/catabug/3877))
,
Mar 29 2018
Gotcha, thanks for clarifying. I think the next step here is probably to try repro'ing locally. The data is consistent enough that it seems likely there's a real problem here. xjz@, are you able to try reproducing this regression locally?
,
Mar 29 2018
Thanks for the explaination. I re-checked the tests in #15 and #22 and break down the results by story and cache temperature as mentioned in #25. It still doesn't indicate a significant regression. For PremierLeague / warm, the result (head vs head+revert cl) is 108.810ms vs 153.498ms in #15 and 122.577ms vs 157.103ms in #22, which both indicate >20% improvement by my CL. For PremierLeagure / cold, there is no result in #15, and the result in #22 is 164.249ms vs 156.442ms, which indicates a ~5% regression. For ja.wikipedia, the results are not consistent. #15 shows a slight regression, which is 727.231ms vs 712.028ms for cold, and 87.100ms vs 86.644ms for warm. However, #22 shows a slight improvement, which is 729.404ms vs 747.102ms for cold, and 91.183ms vs 91.234ms for warm. Please let me know if I still didn't analyze the results properly.
,
Apr 11 2018
Sent this back for triage. I was trying to find out the root cause. However, according to the experimental results, as mentioned in #30, my CL introduces slight regression (<5%) in some cases, but also improvements (>20%) in other cases. Both experiments didn't repro the significant regression as reported. My CL is a refactoring CL that just removed unused codes. Though it might change the timing of other unrelated thing, in which case it is hard for me to find out. For now I don't have any further actionable plan other than closing this issue.
,
May 25 2018
This looks like it just shifts the timing around from bimodal low to bimodal high. |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Mar 13 2018