...... too many results, data snipped.... and 40 other(s) in performance_test_suite failing on chromium.perf/Android Nexus5 Perf |
|||||||||
Issue descriptionFiled by sheriff-o-matic@appspot.gserviceaccount.com on behalf of crouleau@google.com ...... too many results, data snipped.... and 40 other(s) in performance_test_suite failing on chromium.perf/Android Nexus5 Perf Builders failed on: - Android Nexus5 Perf: https://ci.chromium.org/p/chrome/builders/luci.chrome.ci/Android%20Nexus5%20Perf
,
Nov 5
For both v8.browsing_mobile-future/browse:news:toi and memory.long_running_idle_gmail_background_tbmv2/https://mail.google.com/mail/ error is Traceback (most recent call last): File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py", line 128, in _RunStoryAndProcessErrorIfNeeded test.Measure(state.platform, results) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/web_perf/timeline_based_measurement.py", line 268, in Measure trace_result, _ = platform.tracing_controller.StopTracing() File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/core/tracing_controller.py", line 53, in StopTracing return self._tracing_controller_backend.StopTracing() File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py", line 157, in StopTracing '\n'.join(raised_exception_messages)) TracingException: Exceptions raised when trying to stop tracing: Traceback (most recent call last): File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py", line 145, in StopTracing agent.CollectAgentTraceData(builder) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py", line 248, in CollectAgentTraceData '\n'.join(raised_exception_messages)) ChromeTracingStoppedError: Exceptions raised when trying to collect Chrome devtool tracing: Error when collecting Chrome tracing on devtools at port localabstract:chrome_devtools_remote: Traceback (most recent call last): File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py", line 237, in CollectAgentTraceData client.CollectChromeTracingData(trace_data_builder) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/devtools_client_backend.py", line 489, in CollectChromeTracingData self._tracing_backend.CollectTraceData(trace_data_builder, timeout) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/tracing_backend.py", line 237, in CollectTraceData self._CollectTracingData(trace_data_builder, timeout) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/tracing_backend.py", line 265, in _CollectTracingData traceback.format_exc()) TracingUnrecoverableException: Exception raised while collecting tracing data: Traceback (most recent call last): File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/tracing_backend.py", line 258, in _CollectTracingData self._inspector_websocket.DispatchNotifications(timeout) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 156, in DispatchNotifications self._Receive(timeout) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 181, in _Receive raise WebSocketException(err) WebSocketException: WebSocketException of type <class 'websocket._exceptions.WebSocketConnectionClosedException'>. Error message: Connection is already closed.
,
Nov 5
,
Nov 5
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ec9be0fda85723660320ec76e428f88f3686635f commit ec9be0fda85723660320ec76e428f88f3686635f Author: Caleb Rouleau <crouleau@chromium.org> Date: Mon Nov 05 22:16:09 2018 Disable failing perf tests on Nexus_5. TBR=nednguyen@google.com NOTRY=true Bug: 902064 Change-Id: Ic9f91ba4b263fb4ed435ac7a9dc1a096aa452516 Reviewed-on: https://chromium-review.googlesource.com/c/1318586 Reviewed-by: Caleb Rouleau <crouleau@chromium.org> Reviewed-by: Ned Nguyen <nednguyen@google.com> Commit-Queue: Ned Nguyen <nednguyen@google.com> Cr-Commit-Position: refs/heads/master@{#605491} [modify] https://crrev.com/ec9be0fda85723660320ec76e428f88f3686635f/tools/perf/expectations.config
,
Nov 6
π Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/11c81fcde40000
,
Nov 6
I triggered a functional bisect to see if we find any suspect. I will try to reproduce this locally.
,
Nov 7
π Found a significant difference after 1 commit. https://pinpoint-dot-chromeperf.appspot.com/job/11c81fcde40000 Reland "Enable Perfetto by default for all telemetry tests" by oysteine@chromium.org https://chromium.googlesource.com/chromium/src/+/3bb24369bef996b43b2d2711f0033ba472f779ea Failure rate: 0 β 0.9 (+0.9) Understanding performance regressions: http://g.co/ChromePerformanceRegressions Benchmark documentation link: None
,
Nov 7
,
Nov 7
,
Nov 8
(Also trying to reproduce this locally)
,
Nov 8
eseckler: I suspect this is due to massive trace sizes and Perfetto is pushing it over the top (Due to the large global ringbuffer rather than per-thread ones, perhaps?)
mythria: Can we reduce the number of enabled categories for these metrics? Currently this is the list, which is pretty heavy:
"trace_event_overhead",
"loading",
"benchmark",
"blink_gc",
"webkit.console",
"rail",
"toplevel",
"renderer.scheduler",
"v8.console",
"v8",
"blink.user_timing",
"navigation",
"blink.console",
"disabled-by-default-memory-infra.v8.code_stats",
"disabled-by-default-v8.gc",
"disabled-by-default-memory-infra",
"disabled-by-default-v8.runtime_stats"
,
Nov 9
I am afraid we need most of those categories. The following metrics are monitored closely so we need them: "v8", "disabled-by-default-v8.gc", "disabled-by-default-memory-infra", "disabled-by-default-v8.runtime_stats" "blink_gc", This is important as well, especially given some of the optimizations we want to do around the bytecode size. "disabled-by-default-memory-infra.v8.code_stats", The following are needed for EQT (expected queuing time) to measure Jank. We monitor jank closely so these are needed. I am not sure if all of these categories are needed. Ulan@ might know more. "toplevel", "renderer.scheduler", "blink.user_timing", "navigation", The following we added recently to measure the errors to see if the page is functioning properly or not. So it is useful to keep them "v8.console", "blink.console", "webkit.console", We need this to get UE and we use load UE to measure loading performance. "rail", I am not sure about these categories: "trace_event_overhead", "loading", "benchmark", ulan@, cbruni@ what do you think about reducing the number of categories?
,
Nov 9
My guess would be that most of the events are generated by the "disabled-by-default-v8.runtime_stats" category and removing other categories would not help.
,
Nov 9
Right, runtime_stats tables are probably quite large in comparison to other traces.
,
Nov 10
perezju: I know increasing the trace download timeout from 60s to 120s didn't help for crbug.com/900920 (different issue, looks like) but in this case the traces are massive (400mb+) and from local testing seems to fix this issue; is there any downsides to re-landing that CL?
,
Nov 10
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d1ed027ee25d2b4f4de591bdce93bcf8dc651ba5 commit d1ed027ee25d2b4f4de591bdce93bcf8dc651ba5 Author: Oystein Eftevaag <oysteine@chromium.org> Date: Sat Nov 10 01:02:52 2018 Perfetto: Chunk JSON output Pass JSON strings to the callback (and hence Mojo) once they reach 100kb, rather than passing one giant string all at once. BUG= 902064 Change-Id: If95811d3ac907d23d2f882e869d88e933ff5d7b0 Reviewed-on: https://chromium-review.googlesource.com/c/1330068 Commit-Queue: oysteine <oysteine@chromium.org> Reviewed-by: Eric Seckler <eseckler@chromium.org> Cr-Commit-Position: refs/heads/master@{#607060} [modify] https://crrev.com/d1ed027ee25d2b4f4de591bdce93bcf8dc651ba5/services/tracing/perfetto/json_trace_exporter.cc
,
Nov 12
Re #16: Sure, I think it's fine to reland that CL if it helps for this issue.
,
Nov 13
Issue 901967 has been merged into this issue.
,
Nov 13
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/08081e7b7e2d692933fa276ff2cfaae194956d3a commit 08081e7b7e2d692933fa276ff2cfaae194956d3a Author: Juan Antonio Navarro Perez <perezju@chromium.org> Date: Tue Nov 13 19:07:56 2018 Reland "[Telemetry] Increase timeout on CollectChromeTracingData" This is a reland of cc9857f9b84d78b5083df6cd28a62564fc70f7b4 Relanding as this should solve crbug.com/902064 Bug: chromium:900920 Original change's description: > [Telemetry] Increase timeout on CollectChromeTracingData > > Some perf bots running tests are having trouble reading the entire > trace data. > > Bug: chromium:900920 > Change-Id: Ia15f90a24819679f5fcd639b97019ca32569472d > Reviewed-on: https://chromium-review.googlesource.com/c/1312889 > Reviewed-by: Ned Nguyen <nednguyen@google.com> > Commit-Queue: Juan Antonio Navarro PΓ©rez <perezju@chromium.org> Bug: chromium:902064 Change-Id: Ib07ab1aab8df7c5d2133dd8876b462aa1875d512 Reviewed-on: https://chromium-review.googlesource.com/c/1334070 Reviewed-by: Ned Nguyen <nednguyen@google.com> Reviewed-by: oysteine <oysteine@chromium.org> Commit-Queue: oysteine <oysteine@chromium.org> [modify] https://crrev.com/08081e7b7e2d692933fa276ff2cfaae194956d3a/telemetry/telemetry/internal/backends/chrome_inspector/devtools_client_backend.py
,
Nov 13
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/1f613b78ef25828747990f10ffc871812db717f3 commit 1f613b78ef25828747990f10ffc871812db717f3 Author: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com> Date: Tue Nov 13 21:25:57 2018 Roll src/third_party/catapult c14a383e61b4..6d64a5e5e40b (2 commits) https://chromium.googlesource.com/catapult.git/+log/c14a383e61b4..6d64a5e5e40b git log c14a383e61b4..6d64a5e5e40b --date=short --no-merges --format='%ad %ae %s' 2018-11-13 benjhayden@chromium.org Add alerts-table to v2spa. 2018-11-13 perezju@chromium.org Reland "[Telemetry] Increase timeout on CollectChromeTracingData" Created with: gclient setdep -r src/third_party/catapult@6d64a5e5e40b The AutoRoll server is located here: https://autoroll.skia.org/r/catapult-autoroll Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel BUG=chromium:900920, chromium:902064 TBR=sullivan@chromium.org Change-Id: Ib20da9ac3441be3a3952d0812f9e9d918865e234 Reviewed-on: https://chromium-review.googlesource.com/c/1334248 Reviewed-by: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com> Commit-Queue: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#607741} [modify] https://crrev.com/1f613b78ef25828747990f10ffc871812db717f3/DEPS
,
Nov 13
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/622107104815cb5dea41fdac324da6eee55fd53c commit 622107104815cb5dea41fdac324da6eee55fd53c Author: oysteine <oysteine@chromium.org> Date: Tue Nov 13 23:32:29 2018 Revert "Disable failing perf tests on Nexus_5." This reverts commit ec9be0fda85723660320ec76e428f88f3686635f. Reason for revert: Relanding after increasing trace download timeout to 120s Original change's description: > Disable failing perf tests on Nexus_5. > > TBR=nednguyen@google.com > NOTRY=true > > Bug: 902064 > Change-Id: Ic9f91ba4b263fb4ed435ac7a9dc1a096aa452516 > Reviewed-on: https://chromium-review.googlesource.com/c/1318586 > Reviewed-by: Caleb Rouleau <crouleau@chromium.org> > Reviewed-by: Ned Nguyen <nednguyen@google.com> > Commit-Queue: Ned Nguyen <nednguyen@google.com> > Cr-Commit-Position: refs/heads/master@{#605491} TBR=nednguyen@google.com,crouleau@chromium.org # Not skipping CQ checks because original CL landed > 1 day ago. Bug: 902064 Change-Id: I03804b1a557226498df63eb8a7c0dea31e3f227f Reviewed-on: https://chromium-review.googlesource.com/c/1334335 Reviewed-by: oysteine <oysteine@chromium.org> Commit-Queue: oysteine <oysteine@chromium.org> Cr-Commit-Position: refs/heads/master@{#607803} [modify] https://crrev.com/622107104815cb5dea41fdac324da6eee55fd53c/tools/perf/expectations.config
,
Nov 16
v8.browsing_mobile-future/browse:news:toi got fixed but memory.long_running_idle_gmail_background_tbmv2/https://mail.google.com/mail/ is still broken. eseckler: the latter gets "fixed" with a smaller Perfetto buffer size, the browser OOM crashes. I'm suspecting it's the memory-infra dumps which are blowing out the buffer; I'm going to put up a couple of CLs to reduce the size of it and make it more resilient to chunks dropping out of the ringbuffer (max event count per message), but that might make some tests of these failing tests flaky if they're dependent on early trace events (not sure if they are or not).
,
Nov 16
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/044129ce84b4e0ea59aaab9398fc9c76b51ae8d2 commit 044129ce84b4e0ea59aaab9398fc9c76b51ae8d2 Author: Oystein Eftevaag <oysteine@google.com> Date: Fri Nov 16 08:34:55 2018 Perfetto: Add an upper bound to number of events per proto message R=eseckler@chromium.org Bug: 902064 Change-Id: I4b113c69df1192b4d0d05c0a5b9107b84b9cae2e Reviewed-on: https://chromium-review.googlesource.com/c/1338879 Commit-Queue: Eric Seckler <eseckler@chromium.org> Reviewed-by: Eric Seckler <eseckler@chromium.org> Cr-Commit-Position: refs/heads/master@{#608697} [modify] https://crrev.com/044129ce84b4e0ea59aaab9398fc9c76b51ae8d2/services/tracing/public/cpp/perfetto/trace_event_data_source.cc
,
Nov 16
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d66855ad14ef4a2e59aceb9da7ad422a06d42703 commit d66855ad14ef4a2e59aceb9da7ad422a06d42703 Author: Oystein Eftevaag <oysteine@google.com> Date: Fri Nov 16 08:36:25 2018 Perfetto: Reduce max trace buffer size to 300mb to avoid Android browser OOMs Specifically memory.long_running_idle_gmail_background_tbmv2 will OOM on Nexus5 devices with this buffer size. R=eseckler@chromium.org Bug: 902064 Change-Id: If8e5529f3000b9a2f3b86abaecb1f219df356575 Reviewed-on: https://chromium-review.googlesource.com/c/1338889 Commit-Queue: Eric Seckler <eseckler@chromium.org> Reviewed-by: Eric Seckler <eseckler@chromium.org> Cr-Commit-Position: refs/heads/master@{#608698} [modify] https://crrev.com/d66855ad14ef4a2e59aceb9da7ad422a06d42703/services/tracing/perfetto/json_trace_exporter.cc
,
Nov 16
You could have also increased the sampling interval here: https://cs.chromium.org/chromium/src/tools/perf/page_sets/long_running_idle_google_cases.py?rcl=0bdd840823ae18097e3488071b4405b902898e59&l=10 So we get fewer memory dumps in the trace?
,
Nov 16
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/934830c7c9e8c468085c6f27316848bf96f98bd6 commit 934830c7c9e8c468085c6f27316848bf96f98bd6 Author: Oystein Eftevaag <oysteine@google.com> Date: Fri Nov 16 21:35:56 2018 Reduce size of libevent notification trace event This event is extremely frequent and contributes heavily to trace size bloat, which is negatively affecting a lot of the testing infrastructure. Reducing the size of the event and removing the args which are not useful in this case. R=gab@chromium.org BUG= 902064 Change-Id: Id7d0ecba3ed084f5ef30d7a29554b14c6a68b963 Reviewed-on: https://chromium-review.googlesource.com/c/1330067 Commit-Queue: Gabriel Charette <gab@chromium.org> Reviewed-by: Gabriel Charette <gab@chromium.org> Reviewed-by: Etienne Bergeron <etienneb@chromium.org> Reviewed-by: ssid <ssid@chromium.org> Cr-Commit-Position: refs/heads/master@{#608958} [modify] https://crrev.com/934830c7c9e8c468085c6f27316848bf96f98bd6/base/message_loop/message_pump_libevent.cc
,
Nov 26
Remaining issue:
(ERROR) 2018-11-26 21:10:59,967 page_test_results.Fail:545 Failure recorded: TraceImportError: Unable to select a master clock domain because no path can be found from "TELEMETRY" to "LINUX_CLOCK_MONOTONIC".
at ClockSyncManager.ensureAllDomainsAreConnected_ (/tracing/model/clock_sync_manager.html:283:17)
at ClockSyncManager.selectModelDomainId_ (/tracing/model/clock_sync_manager.html:254:12)
at ClockSyncManager.getModelTimeTransformer (/tracing/model/clock_sync_manager.html:156:14)
at TraceEventImporter.toModelTimeFromUs_ (/tracing/extras/importer/trace_event_importer.html:3368:42)
at TraceEventImporter.processDurationEvent (/tracing/extras/importer/trace_event_importer.html:487:23)
at TraceEventImporter.processInstantEvent (/tracing/extras/importer/trace_event_importer.html:704:14)
at TraceEventImporter.processEvent_ (/tracing/extras/importer/trace_event_importer.html:1274:16)
at TraceEventImporter.importEvents (/tracing/extras/importer/trace_event_importer.html:1148:16)
at importer (/tracing/importer/import.html:198:65)
at task.subTask (/tracing/importer/import.html:145:32)
,
Nov 27
We've seen this error before when we were using a log buffer that was too small, see e.g. https://crbug.com/888222 or the test failure linked from https://bugs.chromium.org/p/chromium/issues/detail?id=839071#c3. Might have something to do with decreasing the log buffer size in #25?
,
Nov 30
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/728c481838a43d261c41581a1672de057b121854 commit 728c481838a43d261c41581a1672de057b121854 Author: oysteine <oysteine@chromium.org> Date: Fri Nov 30 22:36:22 2018 Revert "Perfetto: Reduce max trace buffer size to 300mb to avoid Android browser OOMs" This reverts commit d66855ad14ef4a2e59aceb9da7ad422a06d42703. Reason for revert: crbug.com/902064 , reverting to see if this decreases test failures. Original change's description: > Perfetto: Reduce max trace buffer size to 300mb to avoid Android browser OOMs > > Specifically memory.long_running_idle_gmail_background_tbmv2 will OOM > on Nexus5 devices with this buffer size. > > R=βeseckler@chromium.org > > Bug: 902064 > Change-Id: If8e5529f3000b9a2f3b86abaecb1f219df356575 > Reviewed-on: https://chromium-review.googlesource.com/c/1338889 > Commit-Queue: Eric Seckler <eseckler@chromium.org> > Reviewed-by: Eric Seckler <eseckler@chromium.org> > Cr-Commit-Position: refs/heads/master@{#608698} TBR=oysteine@chromium.org,eseckler@chromium.org # Not skipping CQ checks because original CL landed > 1 day ago. Bug: 902064 Change-Id: I0528cc367fb44cc7814fc3b72b5727a25c04450b Reviewed-on: https://chromium-review.googlesource.com/c/1357403 Reviewed-by: oysteine <oysteine@chromium.org> Commit-Queue: oysteine <oysteine@chromium.org> Cr-Commit-Position: refs/heads/master@{#612824} [modify] https://crrev.com/728c481838a43d261c41581a1672de057b121854/services/tracing/perfetto/json_trace_exporter.cc
,
Dec 10
After crrev.com/614540 both v8.browsing_mobile-future/browse:news:toi and memory.long_running_idle_gmail_background_tbmv2/https://mail.google.com/mail/ seem to be working consistently again. |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by crouleau@chromium.org
, Nov 5