New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 713844 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug-Regression



Sign in to add a comment

system_health.common_mobile/benchmark_duration fluctuates a lot on android-nexus6

Project Member Reported by martiniss@chromium.org, Apr 20 2017

Issue description

Performance dashboard identified a 123.3% regression in system_health.common_mobile/benchmark_duration on android-nexus6 at revision range 464923:464949. Graph: https://chromeperf.appspot.com/report?masters=ChromiumPerf&bots=android-nexus6&tests=system_health.common_mobile%2Fbenchmark_duration&checked=benchmark_duration%2Cbenchmark_duration_ref%2Cref&rev=464949

It looks really noisy now. I figure something happened on the infrastructure side.

rnephew@ or mikecase@, know anything?
 
The increase was caused by:
https://codereview.chromium.org/2787103003

But that doesn't explain the noise.
Hmm. Something strange is going on:

Run 5601:
https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Fchromium.perf%2FAndroid_Nexus6_Perf__2_%2F5601%2F%2B%2Frecipes%2Fsteps%2Fsystem_health.common_mobile.reference%2F0%2Fstdout

[  PASSED  ] 69 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ]  browse:chrome:newtab@{'case': 'browse', 'group': 'chrome'}


Run  5602:
https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Fchromium.perf%2FAndroid_Nexus6_Perf__2_%2F5602%2F%2B%2Frecipes%2Fsteps%2Fsystem_health.common_mobile.reference%2F0%2Fstdout
[  PASSED  ] 7 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ]  browse:shopping:avito@{'case': 'browse', 'group': 'shopping'}



Avito is one of the new stories added. Its throwing an exception that isn't being caught and causing the whole run to fail out early.
To #2 & #3, would catch! Though we need to figure carefully whether they are "recoverable" exception (like browser crash, a metric bug in edge case". In the past, we tried the catch all exceptions and that made Telemetry continue tests in cases like device problems, wpr server bad state,..etc
From the error:
WebSocketConnectionClosedException: Connection is already closed.


I suspect that it is recoverable. It seems like something crashed the browser. It is raised when we try to issue a clock sync marker:
ChromeClockSyncError: Cannot issue clock sync. No devtools clients

That seems eligible. Do you want to file a catapult bug on that & own it?
Ah, I think the real problem is that the chrome_traicing_agent.py exceptions are not inheriting from the telemetry exception base class when they should.

https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py?l=47

Bug filed and self assigned:
https://github.com/catapult-project/catapult/issues/3509
Cc: perezju@chromium.org
Labels: -Pri-2 Pri-1
Owner: rnep...@chromium.org
Status: Assigned (was: Available)
Summary: system_health.common_mobile/benchmark_duration fluctuates a lot on android-nexus6 (was: 123.3% regression in system_health.common_mobile/benchmark_duration on android-nexus6 at 464923:464949)
Retitle the bug to be about fixing the problem of failed system health stories killed all the subsequent stories.
Status: Fixed (was: Assigned)
I think this is fixed.
Owner: ----
Status: Available (was: Started)

Sign in to add a comment