New issue
Advanced search Search tips
Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Nov 14
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment
link

Issue 900909: browse:chrome:* stories failing and taking down entire benchmarks

Reported by perezju@chromium.org, Nov 1 Project Member

Issue description

browse:chrome:omnibox and browse:chrome:newtab have recently started failing with:

    app_ui.WaitForUiNode(resource_id='search_box').Tap()
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/decorators.py", line 57, in timeout_retry_wrapper
    retry_if_func=retry_if_func)
[...]
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/timeout_retry.py", line 111, in WaitFor
    suffix=' waiting for condition %r' % condition_name)
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/timeout_retry.py", line 54, in GetRemainingTime
    raise reraiser_thread.TimeoutError(msg)
CommandTimeoutError: Timeout of 30.0 secs expired waiting for condition 'node_found'

Which is treated as an unhandleable error and brings down the entire benchmark.

In addotion to finding the root cause, the error should be made handleable. For the moment I'll just disable the stories to let the rest of the benchmark run.
 

Comment 1 by nedngu...@google.com, Nov 1

Cc: eyaich@chromium.org crouleau@chromium.org
Components: Speed>Telemetry

Comment 2 by bugdroid1@chromium.org, Nov 1

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b1c0b554e0ff982245a77644057aec7581269487

commit b1c0b554e0ff982245a77644057aec7581269487
Author: Juan Antonio Navarro Perez <perezju@chromium.org>
Date: Thu Nov 01 12:35:43 2018

[tools/perf] Disable browse:chrome:* stories

These stories ara failing with an unhandleable error, preventing the
rest of the benchmark to run.

Disable these as a short term quick fix.

NOTRY=true
TBR=nednguyen@google.com

Bug:  900909 
Change-Id: I8e51ced1644853469ff8310de03cb232abba9253
Reviewed-on: https://chromium-review.googlesource.com/c/1311924
Reviewed-by: Juan Antonio Navarro Pérez <perezju@chromium.org>
Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org>
Cr-Commit-Position: refs/heads/master@{#604564}
[modify] https://crrev.com/b1c0b554e0ff982245a77644057aec7581269487/tools/perf/expectations.config

Comment 3 by nedngu...@google.com, Nov 1

I think might be we should consider changing the error catching part in story_runner to be a blacklist rather than whitelist, e.g: all error of types X, Y, Z will crash the whole benchmarks, the rest will lead to retry.

Few errors that we allow to crash the whole benchmark:
KeyboardInterrupt
SystemExit
EnvironmentError
ImportError
MemoryError

Comment 4 by nedngu...@google.com, Nov 1

*also some DeviceError type from devil probably also lead to crashing the whole benchmark.

Comment 6 by perezju@chromium.org, Nov 2

Cc: mheikal@chromium.org perezju@chromium.org
Owner: ----
Status: Available (was: Started)
Pinpoint job still running, reduced range so far is:
http://test-results.appspot.com/revision_range?start=604425&end=604432&n=1000

From there the following sounds suspicious:
https://chromium.googlesource.com/chromium/src/+/d3fd7a3e0505a4ca842d5c74cdfa125a8a381ef5

+mheikal could your change be preventing us from finding items in the Chrome menu now?

Comment 7 by mheikal@chromium.org, Nov 2

Yes this is my change. The change has been rolled back temporarily for breaking franky tests.

So as to make sure I dont break this again on the reland. Can you give me more information about these benchmarks, how are they run (automatically for each commit, once per official build or manually via developers/testers requesting it?). Which target are these run on (chromium/chrome?, monochrome/chromemodern/chrome?)

Comment 8 by perezju@chromium.org, Nov 2

The tests that broke are defined here:
https://cs.chromium.org/chromium/src/tools/perf/page_sets/system_health/chrome_stories.py?rcl=81bbcaf91131c5ffbfb08de123fa1e372b25c5b8&l=49

They are run continuously on perf waterfall every few commits.

The tests interact with the browser UI find (via uiautomator) elements to "tap" on; e.g. try to find the "url_bar", the "menu_button", etc.

You can run these with e.g.:

src$ ./tools/perf/run_benchmark system_health.common_mobile --browser android-chromium --story-filter browse:chrome:newtab

Comment 9 by 42576172...@developer.gserviceaccount.com, Nov 2

Project Member
Owner: mheikal@chromium.org
Status: Assigned (was: Available)
📍 Found a significant difference after 1 commit.
https://pinpoint-dot-chromeperf.appspot.com/job/10b4b886e40000

Reland "[Android] Remove most resource names in resources.arsc" by mheikal@chromium.org
https://chromium.googlesource.com/chromium/src/+/d3fd7a3e0505a4ca842d5c74cdfa125a8a381ef5
Failure rate: 0 → 1 (+1)

Understanding performance regressions:
  http://g.co/ChromePerformanceRegressions

Comment 10 by bugdroid1@chromium.org, Nov 5

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e734bc5eb8b08668196b95a9e38e6bea0773748c

commit e734bc5eb8b08668196b95a9e38e6bea0773748c
Author: Mohamed Heikal <mheikal@chromium.org>
Date: Mon Nov 05 20:48:41 2018

Reland x2 "[Android] Remove most resource names in resources.arsc"

This reverts commit 65f0e20ac217ae4cfbfec49fa98d73c004690148.

Reason for reland: Fixes  crbug.com/900909  and crbug.com/900993

TBR=torne@chromium.org (No changes in webview section)

Original change's description:
> [Android] Remove most resource names in resources.arsc
>
> resource names are stored in the resources.arsc file to allow for
> getIdentifier with the string name of the resource to work. This cl
> obfuscates all of those resources to one name (and thus stored only
> once) for all resources that are not accessed via getIdentifier.
>
> This improves binary size by about 200KB.
>
> Bug: 894208
>
> Change-Id: I28c440c22b90cd045f53017073fdb88c7410d530
> Reviewed-on: https://chromium-review.googlesource.com/c/1265897
> Commit-Queue: Mohamed Heikal <mheikal@chromium.org>
> Reviewed-by: Richard Coles <torne@chromium.org>
> Reviewed-by: agrieve <agrieve@chromium.org>
> Reviewed-by: Ted Choc <tedchoc@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#603229}
Bug: 894208,  900909 , 900993

Change-Id: Ie1e8790b392b5cbfff6d207c11e12bed6b8cc765
Reviewed-on: https://chromium-review.googlesource.com/c/1316092
Reviewed-by: Mohamed Heikal <mheikal@chromium.org>
Reviewed-by: agrieve <agrieve@chromium.org>
Commit-Queue: Mohamed Heikal <mheikal@chromium.org>
Cr-Commit-Position: refs/heads/master@{#605451}
[add] https://crrev.com/e734bc5eb8b08668196b95a9e38e6bea0773748c/android_webview/aapt2.config
[modify] https://crrev.com/e734bc5eb8b08668196b95a9e38e6bea0773748c/android_webview/apk/java/src/com/android/webview/chromium/LicenseActivity.java
[modify] https://crrev.com/e734bc5eb8b08668196b95a9e38e6bea0773748c/build/android/gyp/apkbuilder.py
[modify] https://crrev.com/e734bc5eb8b08668196b95a9e38e6bea0773748c/build/android/gyp/compile_resources.py
[modify] https://crrev.com/e734bc5eb8b08668196b95a9e38e6bea0773748c/build/config/android/internal_rules.gni
[modify] https://crrev.com/e734bc5eb8b08668196b95a9e38e6bea0773748c/build/config/android/rules.gni
[modify] https://crrev.com/e734bc5eb8b08668196b95a9e38e6bea0773748c/chrome/android/chrome_public_apk_tmpl.gni

Comment 11 by mheikal@google.com, Nov 5

Status: Fixed (was: Assigned)
This should be fixed now, the new reland should keep all id resources that are used by automation tools.

Comment 12 by perezju@chromium.org, Nov 5

Owner: perezju@chromium.org
Status: Assigned (was: Fixed)
Thanks!

Re-opening and assigning to self to:
- Re-enable the disabled tests.
- Make sure these kinds of failures are no longer considered as "fatal", other tests should be able to run just fine.

Comment 13 by bugdroid1@chromium.org, Nov 8

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/001618fc4a619a3037f0542ae980922226f89cfe

commit 001618fc4a619a3037f0542ae980922226f89cfe
Author: Juan Antonio Navarro Pérez <perezju@chromium.org>
Date: Thu Nov 08 21:14:32 2018

Revert "[tools/perf] Disable browse:chrome:* stories"

This reverts commit b1c0b554e0ff982245a77644057aec7581269487.

Reason for revert: Stories show work now after fix of underlying issue.

Original change's description:
> [tools/perf] Disable browse:chrome:* stories
> 
> These stories ara failing with an unhandleable error, preventing the
> rest of the benchmark to run.
> 
> Disable these as a short term quick fix.
> 
> NOTRY=true
> TBR=nednguyen@google.com
> 
> Bug:  900909 
> Change-Id: I8e51ced1644853469ff8310de03cb232abba9253
> Reviewed-on: https://chromium-review.googlesource.com/c/1311924
> Reviewed-by: Juan Antonio Navarro Pérez <perezju@chromium.org>
> Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#604564}

TBR=perezju@chromium.org,nednguyen@google.com

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug:  900909 
Change-Id: Ia698d28b432cc0606e0bb157124c2dd39ca3eff0
Reviewed-on: https://chromium-review.googlesource.com/c/1327561
Reviewed-by: Juan Antonio Navarro Pérez <perezju@chromium.org>
Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org>
Cr-Commit-Position: refs/heads/master@{#606602}
[modify] https://crrev.com/001618fc4a619a3037f0542ae980922226f89cfe/tools/perf/expectations.config

Comment 14 by bugdroid1@chromium.org, Nov 13

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/78869faa9193eea919200caed5c17410c6779558

commit 78869faa9193eea919200caed5c17410c6779558
Author: Juan Antonio Navarro Perez <perezju@chromium.org>
Date: Tue Nov 13 17:33:08 2018

[telemetry] Treat most exceptions as "possibly handleable"

Telemetry classifies errors as "handleable" and "unhandleable" errors.
Handleable errors, defined by a short explicit list, are displayed and
reported as an individual story failure. Unhandleable errors, any
error not in the list, are treated as fatal errors and stop the entire
benchmark execution.

This is often too drastic, as many errors raised by Telemetry clients
are in fact "handleable" and should not stop the execution of the
entire benchmark due to an error in one of its stories.

In this CL we define instead a small set of explicitly "unhandleable"
errors (KeyboardInterrupt, SystemExit, etc) which should definitely
halt the benchmark execution, while most other Exception's are
treated as "possibly handleable", allowing other stories to continue
running but restarting the shared state to be safe.

Bug:  chromium:900909 
Change-Id: I5f23d5c33319b04f7631f0f3e77fe2a2bb62c131
Reviewed-on: https://chromium-review.googlesource.com/c/1326943
Reviewed-by: Caleb Rouleau <crouleau@chromium.org>
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org>

[modify] https://crrev.com/78869faa9193eea919200caed5c17410c6779558/telemetry/telemetry/internal/story_runner.py
[modify] https://crrev.com/78869faa9193eea919200caed5c17410c6779558/telemetry/telemetry/internal/story_runner_unittest.py

Comment 15 by bugdroid1@chromium.org, Nov 13

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2aaa8a58d23ae9fc7b6716cd13d2014b1200b691

commit 2aaa8a58d23ae9fc7b6716cd13d2014b1200b691
Author: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Date: Tue Nov 13 18:36:19 2018

Roll src/third_party/catapult 84f62b6f08e8..78869faa9193 (1 commits)

https://chromium.googlesource.com/catapult.git/+log/84f62b6f08e8..78869faa9193


git log 84f62b6f08e8..78869faa9193 --date=short --no-merges --format='%ad %ae %s'
2018-11-13 perezju@chromium.org [telemetry] Treat most exceptions as "possibly handleable"


Created with:
  gclient setdep -r src/third_party/catapult@78869faa9193

The AutoRoll server is located here: https://autoroll.skia.org/r/catapult-autoroll

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.

CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel

BUG= chromium:900909 
TBR=sullivan@chromium.org

Change-Id: I17117fac12c5ef19aab8edfd9080d031dccaa087
Reviewed-on: https://chromium-review.googlesource.com/c/1333858
Reviewed-by: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Commit-Queue: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#607666}
[modify] https://crrev.com/2aaa8a58d23ae9fc7b6716cd13d2014b1200b691/DEPS

Comment 16 by perezju@chromium.org, Nov 14

Status: Fixed (was: Assigned)

Comment 17 by benhenry@google.com, Jan 16

Components: Test>Telemetry

Comment 18 by benhenry@google.com, Jan 16

Components: -Speed>Telemetry

Sign in to add a comment