Issue metadata
Sign in to add a comment
|
70.9% regression in system_health.common_desktop at 507399:507489 |
||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Oct 19 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8965278252256498720
,
Oct 19 2017
=== BISECT JOB RESULTS === NO Perf regression found Bisect Details Configuration: win_perf_bisect Benchmark : system_health.common_desktop Metric : total:500ms_window:renderer_eqt_max/browse_news/browse_news_reddit Revision Result N chromium@507398 760.735 +- 1530.22 21 good chromium@507489 649.677 +- 795.286 21 bad To Run This Test src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=browse.news.reddit system_health.common_desktop More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8965278252256498720 For feedback, file a bug with component Speed>Bisection
,
Oct 20 2017
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/14b28517780000
,
Oct 20 2017
Retrying on a less noisy metric.
,
Oct 20 2017
📍 Couldn't reproduce a difference. https://pinpoint-dot-chromeperf.appspot.com/job/14b28517780000
,
Oct 23 2017
Marking as won't fix since the bisect cannot find the culprit. CCing simonhatch@ to check if this is normal. The graph show a clear ~70% regression and the bisect cannot find a culprit. Maybe the regression is because of a change in the bots...
,
Oct 23 2017
+perezju, sullivan I'm not aware of any change on the bots that coincides with that, plus it jumped on all bots. There doesn't seem to be any ref data available anywhere, not sure if this is a known issue, +annie otherwise should look into that.
,
Oct 23 2017
+tdresser: Any thoughts on how much to investigate? We haven't been monitoring this metric long so not sure how confident we are.
,
Oct 24 2017
+charliea, +nednguyen who own system_health.common_*
,
Oct 24 2017
Presumably we'll start getting ref data automatically eventually? Do the points drawn on the dashboard come from a single run, or multiple runs? This metric does have a fair bit of noise in some cases. Reddit looks pretty rock solid though. The next step would be to take a look at a trace. Unfortunately, I'm getting a "An error occurred: Error uploading patch to rietveld_service" message when trying to get a debug trace.
,
Oct 24 2017
Note that this happens on multiple different bots: chromium-rel-win-7, chromium-rel-win-7-gpu-nvidia, chromium-rel-mac-12, chromium-rel-win7-x64-dual, linux-release, chromium-rel-win-7-gpu-ati, so it's super unlikely to be a simple bot hardware issue. Besides looking at the trace at Tim mentioned, we should also look at the similar metric on other benchmark (which I suspect didn't alert because those other benchmarks have ref build). +Vince/Peter: are you aware of any major change in the lab recently?
,
Oct 24 2017
Before & after traces on ChromiumPerf/chromium-rel-win7-dual/system_health.common_desktop / total:500ms_window:renderer_eqt_max / browse_news / browse_news_reddit Before: https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/browse_news_reddit_2017-10-09_09-46-29_70033.html After: https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/browse_news_reddit_2017-10-10_12-33-17_58751.html
,
Oct 24 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8964843805262352400
,
Oct 24 2017
Thanks for the traces. I'm seeing absolutely massive |RenderAccessibilityImpl::SendPendingAccessibilityEvents| slices. They're significantly larger (longest is 400ms) in the "after" trace, but still scary (200ms) in the first trace. EQT is pretty sensitive to changes in the duration of the longest task, so I suspect this is the source of the regression. Any idea why this is taking so long?
,
Oct 24 2017
Re: #12. Can you give me actual bulder/bot names?
,
Oct 24 2017
+dmazzoni: any ideas why RenderAccessibilityImpl::SendPendingAccessibilityEvents could take so long? Does this do any communication with the server or anything else that could behave strangely on bots?
,
Oct 24 2017
I don't understand why you should be seeing any traces at all for RenderAccessibilityImpl::SendPendingAccessibilityEvents when running browse_news_reddit. Accessibility is supposed to be off by default, so you shouldn't get any traces at all for that function except for the three stories where we added a command-line flag to enable accessibility. Independent of that, the first call to SendPendingAccessibilityEvents after page load scales with the number of DOM nodes, so it wouldn't surprise me that a large Reddit page with a lot of comments would be slow. That's something we'd like to fix, and tracking it here is an important first step. Still, the most important question is why we're seeing these traces for stories that shouldn't be enabling accessibility.
,
Oct 24 2017
=== BISECT JOB RESULTS === NO Perf regression found Bisect Details Configuration: win_perf_bisect Benchmark : system_health.common_desktop Metric : total:500ms_window:renderer_eqt_max/browse_news/browse_news_reddit Revision Result N chromium@507335 718.532 +- 1100.95 21 good chromium@507624 639.183 +- 741.934 21 bad To Run This Test src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=browse.news.reddit system_health.common_desktop More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8964843805262352400 For feedback, file a bug with component Speed>Bisection
,
Oct 25 2017
Who is the right person to follow up on why these trace points are showing up?
,
Oct 25 2017
Has anyone tried to reproduce this locally? If this reproduces locally (i.e. you can see the traces) I can debug it from there, maybe next week. If you run it live, try visiting chrome://accessibility in a tab and see if any checkboxes are checked. When I added the accessibility stories I manually checked that as a sanity check - chrome://accessibility was showing accessibility enabled for those stories but not for others - so I'm not sure what's going on here.
,
Oct 26 2017
dmazzoni@: you can click "M" button in the trace to see the commandline flag used. Can you check if the accessibility is enabled accidentally?
,
Oct 26 2017
Yep, that's it: --force-renderer-accessibility It looks like the extra browser args from tools/perf/page_sets/system_health/accessibility_stories.py are somehow leaking to other stories. @crouleau, could you look at https://chromiumcodereview.appspot.com/3011293002 again?
,
Oct 26 2017
This is a bit serious bug, so I am raising this bug to P1.
,
Oct 26 2017
,
Oct 26 2017
finder_options is a mess 😢 I think the bug is here: https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/page/shared_page_state.py?q=_StartBrowser we should not be mutating the finder options there, as this is a global object shared by all stories in the set. The fix is probably to make a copy of the object there, mutate the copy, and pass that to possible_browser.Create(..)
,
Oct 26 2017
Hi, I will fix this. It seems like the fix is fairly straightforward.
,
Oct 26 2017
Update: I verified that perezju@ is correct in both his diagnosis and in his suggested fix by adding log statements, verifying that flags from previous page extra flags were in the next page, and then verifying that that was no longer the case after the fix. CL is on its way. I figure we will need to add unit tests after since we want to get this fixed asap.
,
Oct 27 2017
This should be fixed now. See https://github.com/catapult-project/catapult/issues/4000
,
Oct 27 2017
,
Oct 27 2017
Caleb: can you add test that would catch the bug as well?
,
Oct 27 2017
This looks like verification to me! https://chromeperf.appspot.com/report?sid=ab5e0bb2af880e89b741c5a9ff4c8d3c52961e84c90b74f03303679e23466504 How do I find a new trace like the links above (After: https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/browse_news_reddit_2017-10-10_12-33-17_58751.html)?
,
Oct 27 2017
dmazzoni@, you can use the trace link by clicking on graph link: https://screenshot.googleplex.com/1chA0UeMoQA.png
,
Oct 27 2017
Sure, I can add a test. I will add it as a task for https://github.com/catapult-project/catapult/issues/4000, and still mark this bug fixed.
,
Oct 30 2017
Re #32: Awesome! Thanks folks!
,
Nov 1 2017
|
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Oct 19 2017