system_health.common_desktop/long_running:tools:gmail-foreground failing on mac-10_12_laptop_low_end-perf |
||
Issue descriptionLink to most recent failing build: https://ci.chromium.org/p/chrome/builders/luci.chrome.ci/mac-10_12_laptop_low_end-perf/2019 Link to logs for this shard: https://chrome-isolated.appspot.com/browse?namespace=default-gzip&digest=93c4f0631e7c15a0d70ad6cbd87b898d1dadd35f&as=benchmark_log.txt Snippet from the logs. Looks like we are exceeding some call stack size. (INFO) 2018-11-08 06:54:07,955 trace_data.Serialize:201 trace2html finished in 25.05 seconds. (INFO) 2018-11-08 06:54:07,957 timeline_based_measurement._ComputeTimelineBasedMetrics:317 Starting to compute metrics on trace (INFO) 2018-11-08 06:55:11,670 timeline_based_measurement._ComputeTimelineBasedMetrics:323 Processing resulting traces took 63.713 seconds (ERROR) 2018-11-08 06:55:11,672 page_test_results.Fail:540 Failure recorded: RangeError: Maximum call stack size exceeded at Function.findToplevelSchedulerTasks (/tracing/extras/chrome/event_finder_utils.html:221:13) at getTasks (/tracing/metrics/system_health/expected_queueing_time_metric.html:121:38) at addExpectedQueueingTimeMetric_ (/tracing/metrics/system_health/expected_queueing_time_metric.html:144:21) at RegisteredTypeInfo.expectedQueueingTimeMetric [as constructor] (/tracing/metrics/system_health/expected_queueing_time_metric.html:86:5) at runMetrics (/b/s/w/ir/third_party/catapult/tracing/tracing/metrics/metric_map_function.html:61:16) at metricMapFunction (/b/s/w/ir/third_party/catapult/tracing/tracing/metrics/metric_map_function.html:190:24) at Object.mapSingleTrace (/tracing/mre/map_single_trace.html:39:7) at eval (/b/s/w/ir/third_party/catapult/tracing/tracing/mre/map_single_trace_cmdline.html:61:18) at Object.runAndConvertErrorsToFailures (/tracing/mre/map_single_trace.html:24:10) at mapSingleTraceWithResult (/b/s/w/ir/third_party/catapult/tracing/tracing/mre/map_single_trace_cmdline.html:52:12) (INFO) 2018-11-08 06:55:11,783 browser.Close:209 Closing browser (pid=4027) ... (WARNING) 2018-11-08 06:55:17,464 desktop_browser_backend.Close:583 Failed to gracefully shutdown. (WARNING) 2018-11-08 06:55:17,465 desktop_browser_backend.Close:587 Proceed to kill the browser. (INFO) 2018-11-08 06:55:17,478 browser.Close:223 Browser is closed. [ FAILED ] system_health.common_desktop/long_running:tools:gmail-foreground@{'case': 'long_running', 'group': 'tools'} (289942 ms) Looks like this has been failing for a while, I will disable.
,
Nov 15
,
Nov 15
The URL of the generated trace file is: https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/long_running_tools_gmail_foreground_2018-11-08_06-50-27_42952.html
,
Nov 15
After more investigation, I think this failure really means "this trace is bigger than what we can handle with trace viewer". I loaded up the trace in the URL, allowed it to import for a while, then paused execution and grabbed the stack trace. This is what I got: https://paste.googleplex.com/6452927187124224 The fact that this is surfacing as a max call stack exceeded instead of just a very slow import is, I believe, a byproduct of how trace viewer imports traces. At some point, there was logic added to do the import as a series of tasks posted via RequestAnimationFrame and RequestIdleCallback. I believe the rationale was that this would allow UI inputs to periodically be handled during the import. You can see this by searching for "(async)" in the gpaste that I linked to. Each RAF posts future work through RIC, and each RIC posts future work through RAF. The ultimate outcome of this is that we have a growing stack of asynchronous calls as work proceeds. Eventually, this stack exceeds what's allowed by Javascript and trace viewer crashes. I do believe there's an underlying architectural flaw, here. I'm not sure if it's worth digging into, or if we're comfortable just disabling this story given that this issue doesn't happen on the vast majority of traces and Perfetto is coming down the pipeline. tdresser@, what do you think?
,
Nov 21
My intuition is that punting until we have perfetto is likely fine. There might also be a short term batching solution, where we don't yield quite as often.
,
Nov 23
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d976aac9c5c356e97270771976a981b284a01e6a commit d976aac9c5c356e97270771976a981b284a01e6a Author: Sami Kyostila <skyostil@chromium.org> Date: Fri Nov 23 17:44:38 2018 Disable a failing system health story on Windows This story is also failing on Windows. It was previously disabled on Mac. NOTRY=True TBR=charliea@chromium.org Bug: 903417 Change-Id: I23d63c76f313dd18413d7d112a5fbfd44546ffc2 Reviewed-on: https://chromium-review.googlesource.com/c/1349326 Reviewed-by: Sami Kyöstilä <skyostil@chromium.org> Commit-Queue: Sami Kyöstilä <skyostil@chromium.org> Cr-Commit-Position: refs/heads/master@{#610645} [modify] https://crrev.com/d976aac9c5c356e97270771976a981b284a01e6a/tools/perf/expectations.config |
||
►
Sign in to add a comment |
||
Comment 1 by bugdroid1@chromium.org
, Nov 8