New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 793050 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jan 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 882323



Sign in to add a comment

webgl_conformance_tests fails in android_optional_gpu_tests_rel, blocking catapult roll

Project Member Reported by nedngu...@google.com, Dec 7 2017

Issue description

https://ci.chromium.org/buildbot/tryserver.chromium.android/android_optional_gpu_tests_rel/14793

Ken: can you please take a look at this soon? The failure doesn't seem to be caused by Telemetry change IMO


 

Comment 1 by kbr@chromium.org, Dec 7 2017

Cc: zmo@chromium.org sugoi@chromium.org kainino@chromium.org ynovikov@chromium.org
Components: Internals>GPU>Testing Infra>Client>Android Blink>WebGL Blink>JavaScript
Labels: Hotlist-PixelWrangler
These failures are very strange. They just started showing up on the tryserver:
https://ci.chromium.org/buildbot/tryserver.chromium.android/android_optional_gpu_tests_rel/?limit=200

Specifically:

https://ci.chromium.org/buildbot/tryserver.chromium.android/android_optional_gpu_tests_rel/14793
https://ci.chromium.org/buildbot/tryserver.chromium.android/android_optional_gpu_tests_rel/14789
https://ci.chromium.org/buildbot/tryserver.chromium.android/android_optional_gpu_tests_rel/14779
https://ci.chromium.org/buildbot/tryserver.chromium.android/android_optional_gpu_tests_rel/14772

but not on the waterfall bot:

https://ci.chromium.org/buildbot/chromium.gpu.fyi/Android%20Release%20%28Nexus%205X%29/

These two tests are failing:

WebglConformance_deqp_data_gles2_shaders_conversions
WebglConformance_deqp_data_gles2_shaders_swizzles

Here are two failing shards:

https://chromium-swarm.appspot.com/task?id=3a4acd2a84e54610&refresh=10&show_raw=1
https://chromium-swarm.appspot.com/task?id=3a4acd2b16883c10&refresh=10&show_raw=1

The failures are timeouts and no stack traces are produced.

Without knowing more, I would suspect one of the most recent V8 autorolls, which just landed:

https://ci.chromium.org/buildbot/tryserver.chromium.android/android_optional_gpu_tests_rel/14786
https://ci.chromium.org/buildbot/tryserver.chromium.android/android_optional_gpu_tests_rel/14767
https://ci.chromium.org/buildbot/tryserver.chromium.android/android_optional_gpu_tests_rel/14758
https://ci.chromium.org/buildbot/tryserver.chromium.android/android_optional_gpu_tests_rel/14757

I'll try to suppress these failures.

Project Member

Comment 2 by bugdroid1@chromium.org, Dec 7 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/707cae04617a6adc748cb4e67736c582694c2a16

commit 707cae04617a6adc748cb4e67736c582694c2a16
Author: Kenneth Russell <kbr@chromium.org>
Date: Thu Dec 07 21:27:26 2017

Suppress data/gles2/shaders/{conversions,swizzles} on Nexus 5X.

These just started timing out randomly on the tryserver with no
apparent root cause.

BUG= 793050 
TBR=zmo@chromium.org
NOTRY=true

Cq-Include-Trybots: master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel
Change-Id: Ia25b9f72c0d039e19be2f4d63b3d82de7fd01390
Reviewed-on: https://chromium-review.googlesource.com/815378
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#522549}
[modify] https://crrev.com/707cae04617a6adc748cb4e67736c582694c2a16/content/test/gpu/gpu_tests/webgl_conformance_expectations.py

Comment 3 by kbr@chromium.org, Dec 7 2017

Cc: kbr@chromium.org
Labels: -Pri-1 Pri-2
Owner: ----
Status: Available (was: Assigned)
I think the failures will be suppressed by the above CL. Please reassign this to me and upgrade it if not.

Is it possible that there is actually a problem in the catapult roll https://chromium-review.googlesource.com/c/chromium/src/+/814454 and the bot is catching it?
It doesn't seem flaky to me, but always failing for that CL.
BTW, there are also real flakes on that bot:
https://ci.chromium.org/buildbot/tryserver.chromium.android/android_optional_gpu_tests_rel/14795
WebglConformance_conformance_ogles_GL_all_all_001_to_004
WebglConformance_conformance_ogles_GL_cos_cos_001_to_006
WebglConformance_conformance_ogles_GL_swizzlers_swizzlers_041_to_048

https://ci.chromium.org/buildbot/tryserver.chromium.android/android_optional_gpu_tests_rel/14773
WebglConformance_conformance_textures_misc_tex_video_using_tex_unit_non_zero

Those two passed in the retry.
I did find one similar failure on Nexus 5X bot:
https://ci.chromium.org/buildbot/chromium.gpu.fyi/Android%20Release%20%28Nexus%205X%29/13826
WebglConformance_conformance_ogles_GL_clamp_clamp_001_to_006
WebglConformance_conformance_ogles_GL_degrees_degrees_001_to_006
WebglConformance_conformance_ogles_GL_sin_sin_001_to_006
WebglConformance_conformance_ogles_GL_swizzlers_swizzlers_057_to_064
WebglConformance_conformance_ogles_GL_swizzlers_swizzlers_089_to_096
WebglConformance_deqp_data_gles2_shaders_conversions

Comment 7 by kbr@chromium.org, Dec 7 2017

Owner: bmeu...@chromium.org
Status: Assigned (was: Available)
I saw a couple of those. At least one of those other failures looked like a crash, but didn't have a native stack trace.

The others look to me like possible intermittent code generation bugs on ARM, given that the failures are happening all over the place. Can the V8 team try to reproduce these?

Yuly, could you try to track down which other tests are flaking and mark them flaky too?

OK, going to mark Flaky the ones in #5 and also WebglConformance_conformance_textures_image_bitmap_from_video_tex_2d_luminance_luminance_unsigned_byte I see flaking on N5X bot:
https://ci.chromium.org/buildbot/chromium.gpu.fyi/Android%20Release%20%28Nexus%205X%29/14012
https://ci.chromium.org/buildbot/chromium.gpu.fyi/Android%20Release%20%28Nexus%205X%29/13939
https://ci.chromium.org/buildbot/chromium.gpu.fyi/Android%20Release%20%28Nexus%205X%29/13926
Project Member

Comment 9 by bugdroid1@chromium.org, Dec 8 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2750ab4af9c336b5d20183f1c6aa18295c34d051

commit 2750ab4af9c336b5d20183f1c6aa18295c34d051
Author: Yuly Novikov <ynovikov@chromium.org>
Date: Fri Dec 08 00:51:00 2017

Mark Flaky WebGL CTS on Nexus 5X

BUG= 793050 
TBR=kbr@chromium.org

Cq-Include-Trybots: master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel
Change-Id: I069f24a44eb490e10f9ffa101cb0db4e8fdbcbdd
Reviewed-on: https://chromium-review.googlesource.com/815814
Commit-Queue: Yuly Novikov <ynovikov@chromium.org>
Reviewed-by: Yuly Novikov <ynovikov@chromium.org>
Cr-Commit-Position: refs/heads/master@{#522653}
[modify] https://crrev.com/2750ab4af9c336b5d20183f1c6aa18295c34d051/content/test/gpu/gpu_tests/webgl_conformance_expectations.py

Given that https://ci.chromium.org/buildbot/chromium.gpu/Android%20Release%20%28Nexus%205X%29/ is green, and the tests marked Flaky in #2 failed 3 retries here: https://ci.chromium.org/buildbot/tryserver.chromium.android/android_optional_gpu_tests_rel/14823 I strongly suspect that some problem is exposed by the catapult roll.
I see several options to get catapult rolling again:
1. Revert https://chromium-review.googlesource.com/c/chromium/src/+/814454
2. Mark the tests in #2 Fail instead of Flaky.
3. Investigate why the tests timeout.

Ned, Ken, what would you prefer?

Comment 12 by kbr@chromium.org, Dec 8 2017

Components: -Blink>JavaScript Tests>Telemetry
Labels: -Pri-2 Pri-1
Owner: nedngu...@google.com
Thanks Yuly for tracking that down.

If it's really that Telemetry change which has caused a change in behavior, that's a serious issue that needs to be investigated. It could be changing the behavior on any platform.

Looking into the logs, I don't see the browser being started or restarted just before running WebglConformance_deqp_data_gles2_shaders_swizzles in shard 0, for example. So it doesn't necessarily seem to be something like a change in the management of user-data-dir.

I compared the browser's command lines for shard #0 of webgl_conformance_tests between the last successful catapult roll:
https://ci.chromium.org/buildbot/tryserver.chromium.android/android_optional_gpu_tests_rel/14755

and this attempt:
https://ci.chromium.org/buildbot/tryserver.chromium.android/android_optional_gpu_tests_rel/14793

The only difference is that the command line argument:
--proxy-server=socks://localhost:[PORT NUMBER]

shows up in a different place in the command line.

Here's the command line from the good run:
INFO:root:Browser command line: _ --no-default-browser-check --disable-external-intent-requests --enable-gpu-benchmarking --disable-search-geolocation-disclosure --use-cmd-decoder=validating --metrics-recording-only --disable-gpu-watchdog --proxy-server=socks://localhost:42697 --ignore-certificate-errors-spki-list=PhrPvGIaAMmd29hj8BCZOq096yj7uMpRNHpn5PDxI6I= --disable-domain-blocking-for-3d-apis --disable-component-extensions-with-background-pages --disable-gpu-process-crash-limit --user-data-dir=/data/data/org.chromium.chrome/ --disable-default-apps --ignore-autoplay-restrictions --disable-fre --enable-net-benchmarking --js-flags=--expose-gc --no-first-run --test-type=gpu --enable-experimental-canvas-features --enable-logging=stderr --enable-remote-debugging --disable-background-networking --use-mobile-user-agent --top-controls-show-threshold=0.5 --top-controls-hide-threshold=0.5 --use-mobile-user-agent --enable-pinch --enable-viewport --validate-input-event-stream --enable-longpress-drag-selection --touch-selection-strategy=direction --main-frame-resizes-are-orientation-changes --disable-composited-antialiasing --enable-dom-distiller --flag-switches-begin --flag-switches-end


Here's the command line from the bad run:
INFO:root:Browser command line: _ --no-default-browser-check --disable-external-intent-requests --enable-gpu-benchmarking --proxy-server=socks://localhost:47830 --disable-search-geolocation-disclosure --use-cmd-decoder=validating --metrics-recording-only --disable-gpu-watchdog --ignore-certificate-errors-spki-list=PhrPvGIaAMmd29hj8BCZOq096yj7uMpRNHpn5PDxI6I= --disable-domain-blocking-for-3d-apis --disable-component-extensions-with-background-pages --disable-gpu-process-crash-limit --user-data-dir=/data/data/org.chromium.chrome/ --disable-default-apps --ignore-autoplay-restrictions --disable-fre --enable-net-benchmarking --js-flags=--expose-gc --no-first-run --test-type=gpu --enable-experimental-canvas-features --enable-logging=stderr --enable-remote-debugging --disable-background-networking --use-mobile-user-agent --top-controls-show-threshold=0.5 --top-controls-hide-threshold=0.5 --use-mobile-user-agent --enable-pinch --enable-viewport --validate-input-event-stream --enable-longpress-drag-selection --touch-selection-strategy=direction --main-frame-resizes-are-orientation-changes --disable-composited-antialiasing --enable-dom-distiller --flag-switches-begin --flag-switches-end


Ned, I don't know why your patch would have affected the SeriallyExecutedBrowserTestCase and associated runner but it seems to have. Can you please investigate this further on your side?

Comment 13 by kbr@chromium.org, Dec 8 2017

Aha. One big hint might be that these are long-running tests. From the passing run:

(shard #0)
[147/151] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_deqp_data_gles2_shaders_swizzles passed 193.6088s

(shard #1)
[146/151] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_deqp_data_gles2_shaders_conversions passed 205.4912s

Could some of Telemetry's bookkeeping about waiting for JavaScript results have been subtly changed?

The failures:

[146/151] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_deqp_data_gles2_shaders_conversions failed unexpectedly 322.2581s:

[147/151] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_deqp_data_gles2_shaders_swizzles failed unexpectedly 325.2269s:

Ned, while the investigation continues, can we try reverting your CL to see if that unblocks the roll?
Reverting my CL did fix the problem. I am really baffled because my CL mostly change how Telemetry invoke ClearCaches() & SetFullPerformanceModeEnabled() & gpu_test should use none of those

Comment 17 by kbr@chromium.org, Dec 8 2017

Darn. I'm sorry that happened and am as confused as you from reading your patch.

Project Member

Comment 18 by bugdroid1@chromium.org, Dec 19 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/f9663d1bd9be20b9277dfda0b5b23d6dda6951c9

commit f9663d1bd9be20b9277dfda0b5b23d6dda6951c9
Author: Nghia Nguyen <nednguyen@google.com>
Date: Tue Dec 19 12:47:49 2017

Remove platform_backend's DidStartBrowser and WillCloseBrowser hooks

SetFullPerformanceModeEnabled(False) is now called after all the test has run
in SharedPageState.TearDownState() instead of relying browser objects tracking
which is fragile & complex.


This is a partial reland of https://chromium-review.googlesource.com/c/catapult/+/814214


Bug:  chromium:792860 
Bug: chromium:792357
Bug:  chromium:793050 
Change-Id: I0e4ec50230ccd32c47c34b92cd30edeae7322edf
Reviewed-on: https://chromium-review.googlesource.com/833179
Reviewed-by: Juan Antonio Navarro Pérez <perezju@chromium.org>
Commit-Queue: Ned Nguyen <nednguyen@google.com>

[modify] https://crrev.com/f9663d1bd9be20b9277dfda0b5b23d6dda6951c9/telemetry/telemetry/core/platform.py
[modify] https://crrev.com/f9663d1bd9be20b9277dfda0b5b23d6dda6951c9/telemetry/telemetry/internal/platform/platform_backend.py
[modify] https://crrev.com/f9663d1bd9be20b9277dfda0b5b23d6dda6951c9/telemetry/telemetry/page/shared_page_state.py
[modify] https://crrev.com/f9663d1bd9be20b9277dfda0b5b23d6dda6951c9/telemetry/telemetry/internal/browser/browser.py
[modify] https://crrev.com/f9663d1bd9be20b9277dfda0b5b23d6dda6951c9/telemetry/telemetry/testing/fakes/__init__.py
[modify] https://crrev.com/f9663d1bd9be20b9277dfda0b5b23d6dda6951c9/telemetry/telemetry/internal/browser/browser_unittest.py

Given the partial land in #18 committed smoothly without problem, I now recognize that gpu test has always been using full_performance_mode when run test (https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/browser/browser_options.py?rcl=7365f02611830723d60963f0619d5bab45060849&l=285)

So when the original patch land, we accidentally disabled full_performance_mode in correctness test, making those GPU tests fail due to time out.

I will keep continue with the refactoring by also making the call to enable performance mode explicit in the correctness test framework.
Status: Fixed (was: Assigned)
Blocking: 882323

Sign in to add a comment