New issue
Advanced search Search tips

Issue 902064 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Dec 10
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: ----



Sign in to add a comment

...... too many results, data snipped.... and 40 other(s) in performance_test_suite failing on chromium.perf/Android Nexus5 Perf

Project Member Reported by sheriff-...@appspot.gserviceaccount.com, Nov 5

Issue description

Filed by sheriff-o-matic@appspot.gserviceaccount.com on behalf of crouleau@google.com

...... too many results, data snipped.... and 40 other(s) in performance_test_suite failing on chromium.perf/Android Nexus5 Perf

Builders failed on: 
- Android Nexus5 Perf: 
  https://ci.chromium.org/p/chrome/builders/luci.chrome.ci/Android%20Nexus5%20Perf


 
For both v8.browsing_mobile-future/browse:news:toi and memory.long_running_idle_gmail_background_tbmv2/https://mail.google.com/mail/

error is 
Traceback (most recent call last):
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py", line 128, in _RunStoryAndProcessErrorIfNeeded
    test.Measure(state.platform, results)
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/web_perf/timeline_based_measurement.py", line 268, in Measure
    trace_result, _ = platform.tracing_controller.StopTracing()
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/core/tracing_controller.py", line 53, in StopTracing
    return self._tracing_controller_backend.StopTracing()
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py", line 157, in StopTracing
    '\n'.join(raised_exception_messages))
TracingException: Exceptions raised when trying to stop tracing:
Traceback (most recent call last):
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py", line 145, in StopTracing
    agent.CollectAgentTraceData(builder)
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py", line 248, in CollectAgentTraceData
    '\n'.join(raised_exception_messages))
ChromeTracingStoppedError: Exceptions raised when trying to collect Chrome devtool tracing:
Error when collecting Chrome tracing on devtools at port localabstract:chrome_devtools_remote:
Traceback (most recent call last):
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py", line 237, in CollectAgentTraceData
    client.CollectChromeTracingData(trace_data_builder)
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/devtools_client_backend.py", line 489, in CollectChromeTracingData
    self._tracing_backend.CollectTraceData(trace_data_builder, timeout)
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/tracing_backend.py", line 237, in CollectTraceData
    self._CollectTracingData(trace_data_builder, timeout)
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/tracing_backend.py", line 265, in _CollectTracingData
    traceback.format_exc())
TracingUnrecoverableException: Exception raised while collecting tracing data:
Traceback (most recent call last):
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/tracing_backend.py", line 258, in _CollectTracingData
    self._inspector_websocket.DispatchNotifications(timeout)
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 156, in DispatchNotifications
    self._Receive(timeout)
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 181, in _Receive
    raise WebSocketException(err)
WebSocketException: WebSocketException of type <class 'websocket._exceptions.WebSocketConnectionClosedException'>. Error message: Connection is already closed.

Cc: u...@chromium.org mythria@chromium.org
Project Member

Comment 5 by bugdroid1@chromium.org, Nov 5

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ec9be0fda85723660320ec76e428f88f3686635f

commit ec9be0fda85723660320ec76e428f88f3686635f
Author: Caleb Rouleau <crouleau@chromium.org>
Date: Mon Nov 05 22:16:09 2018

Disable failing perf tests on Nexus_5.

TBR=nednguyen@google.com
NOTRY=true

Bug:  902064 
Change-Id: Ic9f91ba4b263fb4ed435ac7a9dc1a096aa452516
Reviewed-on: https://chromium-review.googlesource.com/c/1318586
Reviewed-by: Caleb Rouleau <crouleau@chromium.org>
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Commit-Queue: Ned Nguyen <nednguyen@google.com>
Cr-Commit-Position: refs/heads/master@{#605491}
[modify] https://crrev.com/ec9be0fda85723660320ec76e428f88f3686635f/tools/perf/expectations.config

I triggered a functional bisect to see if we find any suspect. I will try to reproduce this locally.
Cc: oysteine@chromium.org
Owner: oysteine@chromium.org
Status: Assigned (was: Available)
πŸ“ Found a significant difference after 1 commit.
https://pinpoint-dot-chromeperf.appspot.com/job/11c81fcde40000

Reland "Enable Perfetto by default for all telemetry tests" by oysteine@chromium.org
https://chromium.googlesource.com/chromium/src/+/3bb24369bef996b43b2d2711f0033ba472f779ea
Failure rate: 0 β†’ 0.9 (+0.9)

Understanding performance regressions:
  http://g.co/ChromePerformanceRegressions

Benchmark documentation link:
  None
Cc: skyos...@chromium.org eseckler@chromium.org
Components: Speed>Tracing
Labels: Perfetto
Status: Started (was: Assigned)
(Also trying to reproduce this locally)
eseckler: I suspect this is due to massive trace sizes and Perfetto is pushing it over the top (Due to the large global ringbuffer rather than per-thread ones, perhaps?)

mythria: Can we reduce the number of enabled categories for these metrics? Currently this is the list, which is pretty heavy:

"trace_event_overhead",
                                      "loading",
                                      "benchmark",
                                      "blink_gc",
                                      "webkit.console",
                                      "rail",
                                      "toplevel",
                                      "renderer.scheduler",
                                      "v8.console",
                                      "v8",
                                      "blink.user_timing",
                                      "navigation",
                                      "blink.console",
                                      "disabled-by-default-memory-infra.v8.code_stats",
                                      "disabled-by-default-v8.gc",
                                      "disabled-by-default-memory-infra",
                                      "disabled-by-default-v8.runtime_stats"
Cc: cbruni@chromium.org
I am afraid we need most of those categories.

The following metrics are monitored closely so we need them:
"v8",
"disabled-by-default-v8.gc",
"disabled-by-default-memory-infra",
"disabled-by-default-v8.runtime_stats"
"blink_gc",

This is important as well, especially given some of the optimizations we want to do around the bytecode size.
"disabled-by-default-memory-infra.v8.code_stats",

The following are needed for EQT (expected queuing time) to measure Jank. We monitor jank closely so these are needed. I am not sure if all of these categories are needed. Ulan@ might know more.
"toplevel",
"renderer.scheduler",
"blink.user_timing",
"navigation",

The following we added recently to measure the errors to see if the page is functioning properly or not. So it is useful to keep them
"v8.console",
"blink.console",
"webkit.console",

We need this to get UE and we use load UE to measure loading performance. 
"rail", 

I am not sure about these categories:
"trace_event_overhead",
"loading",
"benchmark",

ulan@, cbruni@ what do you think about reducing the number of categories?
My guess would be that most of the events are generated by the "disabled-by-default-v8.runtime_stats" category and removing other categories would not help.


Right, runtime_stats tables are probably quite large in comparison to other traces.
Cc: perezju@chromium.org
perezju: I know increasing the trace download timeout from 60s to 120s didn't help for crbug.com/900920 (different issue, looks like) but in this case the traces are massive (400mb+) and from local testing seems to fix this issue; is there any downsides to re-landing that CL?
Project Member

Comment 17 by bugdroid1@chromium.org, Nov 10

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d1ed027ee25d2b4f4de591bdce93bcf8dc651ba5

commit d1ed027ee25d2b4f4de591bdce93bcf8dc651ba5
Author: Oystein Eftevaag <oysteine@chromium.org>
Date: Sat Nov 10 01:02:52 2018

Perfetto: Chunk JSON output

Pass JSON strings to the callback (and hence Mojo) once they reach 100kb,
rather than passing one giant string all at once.

BUG= 902064 

Change-Id: If95811d3ac907d23d2f882e869d88e933ff5d7b0
Reviewed-on: https://chromium-review.googlesource.com/c/1330068
Commit-Queue: oysteine <oysteine@chromium.org>
Reviewed-by: Eric Seckler <eseckler@chromium.org>
Cr-Commit-Position: refs/heads/master@{#607060}
[modify] https://crrev.com/d1ed027ee25d2b4f4de591bdce93bcf8dc651ba5/services/tracing/perfetto/json_trace_exporter.cc

Re #16: Sure, I think it's fine to reland that CL if it helps for this issue.
 Issue 901967  has been merged into this issue.
Project Member

Comment 20 by bugdroid1@chromium.org, Nov 13

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/08081e7b7e2d692933fa276ff2cfaae194956d3a

commit 08081e7b7e2d692933fa276ff2cfaae194956d3a
Author: Juan Antonio Navarro Perez <perezju@chromium.org>
Date: Tue Nov 13 19:07:56 2018

Reland "[Telemetry] Increase timeout on CollectChromeTracingData"

This is a reland of cc9857f9b84d78b5083df6cd28a62564fc70f7b4

Relanding as this should solve  crbug.com/902064 

Bug: chromium:900920
Original change's description:
> [Telemetry] Increase timeout on CollectChromeTracingData
>
> Some perf bots running tests are having trouble reading the entire
> trace data.
>
> Bug: chromium:900920
> Change-Id: Ia15f90a24819679f5fcd639b97019ca32569472d
> Reviewed-on: https://chromium-review.googlesource.com/c/1312889
> Reviewed-by: Ned Nguyen <nednguyen@google.com>
> Commit-Queue: Juan Antonio Navarro PΓ©rez <perezju@chromium.org>

Bug:  chromium:902064 

Change-Id: Ib07ab1aab8df7c5d2133dd8876b462aa1875d512
Reviewed-on: https://chromium-review.googlesource.com/c/1334070
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Reviewed-by: oysteine <oysteine@chromium.org>
Commit-Queue: oysteine <oysteine@chromium.org>

[modify] https://crrev.com/08081e7b7e2d692933fa276ff2cfaae194956d3a/telemetry/telemetry/internal/backends/chrome_inspector/devtools_client_backend.py

Project Member

Comment 21 by bugdroid1@chromium.org, Nov 13

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1f613b78ef25828747990f10ffc871812db717f3

commit 1f613b78ef25828747990f10ffc871812db717f3
Author: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Date: Tue Nov 13 21:25:57 2018

Roll src/third_party/catapult c14a383e61b4..6d64a5e5e40b (2 commits)

https://chromium.googlesource.com/catapult.git/+log/c14a383e61b4..6d64a5e5e40b


git log c14a383e61b4..6d64a5e5e40b --date=short --no-merges --format='%ad %ae %s'
2018-11-13 benjhayden@chromium.org Add alerts-table to v2spa.
2018-11-13 perezju@chromium.org Reland "[Telemetry] Increase timeout on CollectChromeTracingData"


Created with:
  gclient setdep -r src/third_party/catapult@6d64a5e5e40b

The AutoRoll server is located here: https://autoroll.skia.org/r/catapult-autoroll

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.

CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel

BUG=chromium:900920, chromium:902064 
TBR=sullivan@chromium.org

Change-Id: Ib20da9ac3441be3a3952d0812f9e9d918865e234
Reviewed-on: https://chromium-review.googlesource.com/c/1334248
Reviewed-by: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Commit-Queue: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#607741}
[modify] https://crrev.com/1f613b78ef25828747990f10ffc871812db717f3/DEPS

Project Member

Comment 22 by bugdroid1@chromium.org, Nov 13

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/622107104815cb5dea41fdac324da6eee55fd53c

commit 622107104815cb5dea41fdac324da6eee55fd53c
Author: oysteine <oysteine@chromium.org>
Date: Tue Nov 13 23:32:29 2018

Revert "Disable failing perf tests on Nexus_5."

This reverts commit ec9be0fda85723660320ec76e428f88f3686635f.

Reason for revert: Relanding after increasing trace download timeout to 120s

Original change's description:
> Disable failing perf tests on Nexus_5.
> 
> TBR=nednguyen@google.com
> NOTRY=true
> 
> Bug:  902064 
> Change-Id: Ic9f91ba4b263fb4ed435ac7a9dc1a096aa452516
> Reviewed-on: https://chromium-review.googlesource.com/c/1318586
> Reviewed-by: Caleb Rouleau <crouleau@chromium.org>
> Reviewed-by: Ned Nguyen <nednguyen@google.com>
> Commit-Queue: Ned Nguyen <nednguyen@google.com>
> Cr-Commit-Position: refs/heads/master@{#605491}

TBR=nednguyen@google.com,crouleau@chromium.org

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug:  902064 
Change-Id: I03804b1a557226498df63eb8a7c0dea31e3f227f
Reviewed-on: https://chromium-review.googlesource.com/c/1334335
Reviewed-by: oysteine <oysteine@chromium.org>
Commit-Queue: oysteine <oysteine@chromium.org>
Cr-Commit-Position: refs/heads/master@{#607803}
[modify] https://crrev.com/622107104815cb5dea41fdac324da6eee55fd53c/tools/perf/expectations.config

v8.browsing_mobile-future/browse:news:toi got fixed but memory.long_running_idle_gmail_background_tbmv2/https://mail.google.com/mail/
 is still broken.

eseckler: the latter gets "fixed" with a smaller Perfetto buffer size, the browser OOM crashes. I'm suspecting it's the memory-infra dumps which are blowing out the buffer; I'm going to put up a couple of CLs to reduce the size of it and make it more resilient to chunks dropping out of the ringbuffer (max event count per message), but that might make some tests of these failing tests flaky if they're dependent on early trace events (not sure if they are or not).
Project Member

Comment 24 by bugdroid1@chromium.org, Nov 16

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/044129ce84b4e0ea59aaab9398fc9c76b51ae8d2

commit 044129ce84b4e0ea59aaab9398fc9c76b51ae8d2
Author: Oystein Eftevaag <oysteine@google.com>
Date: Fri Nov 16 08:34:55 2018

Perfetto: Add an upper bound to number of events per proto message

R=eseckler@chromium.org

Bug:  902064 
Change-Id: I4b113c69df1192b4d0d05c0a5b9107b84b9cae2e
Reviewed-on: https://chromium-review.googlesource.com/c/1338879
Commit-Queue: Eric Seckler <eseckler@chromium.org>
Reviewed-by: Eric Seckler <eseckler@chromium.org>
Cr-Commit-Position: refs/heads/master@{#608697}
[modify] https://crrev.com/044129ce84b4e0ea59aaab9398fc9c76b51ae8d2/services/tracing/public/cpp/perfetto/trace_event_data_source.cc

Project Member

Comment 25 by bugdroid1@chromium.org, Nov 16

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d66855ad14ef4a2e59aceb9da7ad422a06d42703

commit d66855ad14ef4a2e59aceb9da7ad422a06d42703
Author: Oystein Eftevaag <oysteine@google.com>
Date: Fri Nov 16 08:36:25 2018

Perfetto: Reduce max trace buffer size to 300mb to avoid Android browser OOMs

Specifically memory.long_running_idle_gmail_background_tbmv2 will OOM
on Nexus5 devices with this buffer size.

R=eseckler@chromium.org

Bug:  902064 
Change-Id: If8e5529f3000b9a2f3b86abaecb1f219df356575
Reviewed-on: https://chromium-review.googlesource.com/c/1338889
Commit-Queue: Eric Seckler <eseckler@chromium.org>
Reviewed-by: Eric Seckler <eseckler@chromium.org>
Cr-Commit-Position: refs/heads/master@{#608698}
[modify] https://crrev.com/d66855ad14ef4a2e59aceb9da7ad422a06d42703/services/tracing/perfetto/json_trace_exporter.cc

You could have also increased the sampling interval here:
https://cs.chromium.org/chromium/src/tools/perf/page_sets/long_running_idle_google_cases.py?rcl=0bdd840823ae18097e3488071b4405b902898e59&l=10

So we get fewer memory dumps in the trace?
Project Member

Comment 27 by bugdroid1@chromium.org, Nov 16

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/934830c7c9e8c468085c6f27316848bf96f98bd6

commit 934830c7c9e8c468085c6f27316848bf96f98bd6
Author: Oystein Eftevaag <oysteine@google.com>
Date: Fri Nov 16 21:35:56 2018

Reduce size of libevent notification trace event

This event is extremely frequent and contributes heavily to trace
size bloat, which is negatively affecting a lot of the testing
infrastructure. Reducing the size of the event and removing the args
which are not useful in this case.

R=gab@chromium.org
BUG= 902064 

Change-Id: Id7d0ecba3ed084f5ef30d7a29554b14c6a68b963
Reviewed-on: https://chromium-review.googlesource.com/c/1330067
Commit-Queue: Gabriel Charette <gab@chromium.org>
Reviewed-by: Gabriel Charette <gab@chromium.org>
Reviewed-by: Etienne Bergeron <etienneb@chromium.org>
Reviewed-by: ssid <ssid@chromium.org>
Cr-Commit-Position: refs/heads/master@{#608958}
[modify] https://crrev.com/934830c7c9e8c468085c6f27316848bf96f98bd6/base/message_loop/message_pump_libevent.cc

Remaining issue:

(ERROR) 2018-11-26 21:10:59,967 page_test_results.Fail:545  Failure recorded: TraceImportError: Unable to select a master clock domain because no path can be found from "TELEMETRY" to "LINUX_CLOCK_MONOTONIC".
    at ClockSyncManager.ensureAllDomainsAreConnected_ (/tracing/model/clock_sync_manager.html:283:17)
    at ClockSyncManager.selectModelDomainId_ (/tracing/model/clock_sync_manager.html:254:12)
    at ClockSyncManager.getModelTimeTransformer (/tracing/model/clock_sync_manager.html:156:14)
    at TraceEventImporter.toModelTimeFromUs_ (/tracing/extras/importer/trace_event_importer.html:3368:42)
    at TraceEventImporter.processDurationEvent (/tracing/extras/importer/trace_event_importer.html:487:23)
    at TraceEventImporter.processInstantEvent (/tracing/extras/importer/trace_event_importer.html:704:14)
    at TraceEventImporter.processEvent_ (/tracing/extras/importer/trace_event_importer.html:1274:16)
    at TraceEventImporter.importEvents (/tracing/extras/importer/trace_event_importer.html:1148:16)
    at importer (/tracing/importer/import.html:198:65)
    at task.subTask (/tracing/importer/import.html:145:32)
We've seen this error before when we were using a log buffer that was too small, see e.g.  https://crbug.com/888222  or the test failure linked from https://bugs.chromium.org/p/chromium/issues/detail?id=839071#c3.

Might have something to do with decreasing the log buffer size in #25?
Project Member

Comment 30 by bugdroid1@chromium.org, Nov 30

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/728c481838a43d261c41581a1672de057b121854

commit 728c481838a43d261c41581a1672de057b121854
Author: oysteine <oysteine@chromium.org>
Date: Fri Nov 30 22:36:22 2018

Revert "Perfetto: Reduce max trace buffer size to 300mb to avoid Android browser OOMs"

This reverts commit d66855ad14ef4a2e59aceb9da7ad422a06d42703.

Reason for revert:  crbug.com/902064 , reverting to see if this decreases test failures.

Original change's description:
> Perfetto: Reduce max trace buffer size to 300mb to avoid Android browser OOMs
> 
> Specifically memory.long_running_idle_gmail_background_tbmv2 will OOM
> on Nexus5 devices with this buffer size.
> 
> R=​eseckler@chromium.org
> 
> Bug:  902064 
> Change-Id: If8e5529f3000b9a2f3b86abaecb1f219df356575
> Reviewed-on: https://chromium-review.googlesource.com/c/1338889
> Commit-Queue: Eric Seckler <eseckler@chromium.org>
> Reviewed-by: Eric Seckler <eseckler@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#608698}

TBR=oysteine@chromium.org,eseckler@chromium.org

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug:  902064 
Change-Id: I0528cc367fb44cc7814fc3b72b5727a25c04450b
Reviewed-on: https://chromium-review.googlesource.com/c/1357403
Reviewed-by: oysteine <oysteine@chromium.org>
Commit-Queue: oysteine <oysteine@chromium.org>
Cr-Commit-Position: refs/heads/master@{#612824}
[modify] https://crrev.com/728c481838a43d261c41581a1672de057b121854/services/tracing/perfetto/json_trace_exporter.cc

Status: Fixed (was: Started)
After crrev.com/614540 both

v8.browsing_mobile-future/browse:news:toi
and
memory.long_running_idle_gmail_background_tbmv2/https://mail.google.com/mail/

seem to be working consistently again.

Sign in to add a comment