webgl_conformance_tests fails in android_optional_gpu_tests_rel, blocking catapult roll |
||||||
Issue descriptionhttps://ci.chromium.org/buildbot/tryserver.chromium.android/android_optional_gpu_tests_rel/14793 Ken: can you please take a look at this soon? The failure doesn't seem to be caused by Telemetry change IMO
,
Dec 7 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/707cae04617a6adc748cb4e67736c582694c2a16 commit 707cae04617a6adc748cb4e67736c582694c2a16 Author: Kenneth Russell <kbr@chromium.org> Date: Thu Dec 07 21:27:26 2017 Suppress data/gles2/shaders/{conversions,swizzles} on Nexus 5X. These just started timing out randomly on the tryserver with no apparent root cause. BUG= 793050 TBR=zmo@chromium.org NOTRY=true Cq-Include-Trybots: master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Change-Id: Ia25b9f72c0d039e19be2f4d63b3d82de7fd01390 Reviewed-on: https://chromium-review.googlesource.com/815378 Commit-Queue: Kenneth Russell <kbr@chromium.org> Reviewed-by: Kenneth Russell <kbr@chromium.org> Cr-Commit-Position: refs/heads/master@{#522549} [modify] https://crrev.com/707cae04617a6adc748cb4e67736c582694c2a16/content/test/gpu/gpu_tests/webgl_conformance_expectations.py
,
Dec 7 2017
I think the failures will be suppressed by the above CL. Please reassign this to me and upgrade it if not.
,
Dec 7 2017
Is it possible that there is actually a problem in the catapult roll https://chromium-review.googlesource.com/c/chromium/src/+/814454 and the bot is catching it? It doesn't seem flaky to me, but always failing for that CL.
,
Dec 7 2017
BTW, there are also real flakes on that bot: https://ci.chromium.org/buildbot/tryserver.chromium.android/android_optional_gpu_tests_rel/14795 WebglConformance_conformance_ogles_GL_all_all_001_to_004 WebglConformance_conformance_ogles_GL_cos_cos_001_to_006 WebglConformance_conformance_ogles_GL_swizzlers_swizzlers_041_to_048 https://ci.chromium.org/buildbot/tryserver.chromium.android/android_optional_gpu_tests_rel/14773 WebglConformance_conformance_textures_misc_tex_video_using_tex_unit_non_zero Those two passed in the retry.
,
Dec 7 2017
I did find one similar failure on Nexus 5X bot: https://ci.chromium.org/buildbot/chromium.gpu.fyi/Android%20Release%20%28Nexus%205X%29/13826 WebglConformance_conformance_ogles_GL_clamp_clamp_001_to_006 WebglConformance_conformance_ogles_GL_degrees_degrees_001_to_006 WebglConformance_conformance_ogles_GL_sin_sin_001_to_006 WebglConformance_conformance_ogles_GL_swizzlers_swizzlers_057_to_064 WebglConformance_conformance_ogles_GL_swizzlers_swizzlers_089_to_096 WebglConformance_deqp_data_gles2_shaders_conversions
,
Dec 7 2017
I saw a couple of those. At least one of those other failures looked like a crash, but didn't have a native stack trace. The others look to me like possible intermittent code generation bugs on ARM, given that the failures are happening all over the place. Can the V8 team try to reproduce these? Yuly, could you try to track down which other tests are flaking and mark them flaky too?
,
Dec 7 2017
OK, going to mark Flaky the ones in #5 and also WebglConformance_conformance_textures_image_bitmap_from_video_tex_2d_luminance_luminance_unsigned_byte I see flaking on N5X bot: https://ci.chromium.org/buildbot/chromium.gpu.fyi/Android%20Release%20%28Nexus%205X%29/14012 https://ci.chromium.org/buildbot/chromium.gpu.fyi/Android%20Release%20%28Nexus%205X%29/13939 https://ci.chromium.org/buildbot/chromium.gpu.fyi/Android%20Release%20%28Nexus%205X%29/13926
,
Dec 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2750ab4af9c336b5d20183f1c6aa18295c34d051 commit 2750ab4af9c336b5d20183f1c6aa18295c34d051 Author: Yuly Novikov <ynovikov@chromium.org> Date: Fri Dec 08 00:51:00 2017 Mark Flaky WebGL CTS on Nexus 5X BUG= 793050 TBR=kbr@chromium.org Cq-Include-Trybots: master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Change-Id: I069f24a44eb490e10f9ffa101cb0db4e8fdbcbdd Reviewed-on: https://chromium-review.googlesource.com/815814 Commit-Queue: Yuly Novikov <ynovikov@chromium.org> Reviewed-by: Yuly Novikov <ynovikov@chromium.org> Cr-Commit-Position: refs/heads/master@{#522653} [modify] https://crrev.com/2750ab4af9c336b5d20183f1c6aa18295c34d051/content/test/gpu/gpu_tests/webgl_conformance_expectations.py
,
Dec 8 2017
Given that https://ci.chromium.org/buildbot/chromium.gpu/Android%20Release%20%28Nexus%205X%29/ is green, and the tests marked Flaky in #2 failed 3 retries here: https://ci.chromium.org/buildbot/tryserver.chromium.android/android_optional_gpu_tests_rel/14823 I strongly suspect that some problem is exposed by the catapult roll.
,
Dec 8 2017
I see several options to get catapult rolling again: 1. Revert https://chromium-review.googlesource.com/c/chromium/src/+/814454 2. Mark the tests in #2 Fail instead of Flaky. 3. Investigate why the tests timeout. Ned, Ken, what would you prefer?
,
Dec 8 2017
Thanks Yuly for tracking that down. If it's really that Telemetry change which has caused a change in behavior, that's a serious issue that needs to be investigated. It could be changing the behavior on any platform. Looking into the logs, I don't see the browser being started or restarted just before running WebglConformance_deqp_data_gles2_shaders_swizzles in shard 0, for example. So it doesn't necessarily seem to be something like a change in the management of user-data-dir. I compared the browser's command lines for shard #0 of webgl_conformance_tests between the last successful catapult roll: https://ci.chromium.org/buildbot/tryserver.chromium.android/android_optional_gpu_tests_rel/14755 and this attempt: https://ci.chromium.org/buildbot/tryserver.chromium.android/android_optional_gpu_tests_rel/14793 The only difference is that the command line argument: --proxy-server=socks://localhost:[PORT NUMBER] shows up in a different place in the command line. Here's the command line from the good run: INFO:root:Browser command line: _ --no-default-browser-check --disable-external-intent-requests --enable-gpu-benchmarking --disable-search-geolocation-disclosure --use-cmd-decoder=validating --metrics-recording-only --disable-gpu-watchdog --proxy-server=socks://localhost:42697 --ignore-certificate-errors-spki-list=PhrPvGIaAMmd29hj8BCZOq096yj7uMpRNHpn5PDxI6I= --disable-domain-blocking-for-3d-apis --disable-component-extensions-with-background-pages --disable-gpu-process-crash-limit --user-data-dir=/data/data/org.chromium.chrome/ --disable-default-apps --ignore-autoplay-restrictions --disable-fre --enable-net-benchmarking --js-flags=--expose-gc --no-first-run --test-type=gpu --enable-experimental-canvas-features --enable-logging=stderr --enable-remote-debugging --disable-background-networking --use-mobile-user-agent --top-controls-show-threshold=0.5 --top-controls-hide-threshold=0.5 --use-mobile-user-agent --enable-pinch --enable-viewport --validate-input-event-stream --enable-longpress-drag-selection --touch-selection-strategy=direction --main-frame-resizes-are-orientation-changes --disable-composited-antialiasing --enable-dom-distiller --flag-switches-begin --flag-switches-end Here's the command line from the bad run: INFO:root:Browser command line: _ --no-default-browser-check --disable-external-intent-requests --enable-gpu-benchmarking --proxy-server=socks://localhost:47830 --disable-search-geolocation-disclosure --use-cmd-decoder=validating --metrics-recording-only --disable-gpu-watchdog --ignore-certificate-errors-spki-list=PhrPvGIaAMmd29hj8BCZOq096yj7uMpRNHpn5PDxI6I= --disable-domain-blocking-for-3d-apis --disable-component-extensions-with-background-pages --disable-gpu-process-crash-limit --user-data-dir=/data/data/org.chromium.chrome/ --disable-default-apps --ignore-autoplay-restrictions --disable-fre --enable-net-benchmarking --js-flags=--expose-gc --no-first-run --test-type=gpu --enable-experimental-canvas-features --enable-logging=stderr --enable-remote-debugging --disable-background-networking --use-mobile-user-agent --top-controls-show-threshold=0.5 --top-controls-hide-threshold=0.5 --use-mobile-user-agent --enable-pinch --enable-viewport --validate-input-event-stream --enable-longpress-drag-selection --touch-selection-strategy=direction --main-frame-resizes-are-orientation-changes --disable-composited-antialiasing --enable-dom-distiller --flag-switches-begin --flag-switches-end Ned, I don't know why your patch would have affected the SeriallyExecutedBrowserTestCase and associated runner but it seems to have. Can you please investigate this further on your side?
,
Dec 8 2017
Aha. One big hint might be that these are long-running tests. From the passing run: (shard #0) [147/151] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_deqp_data_gles2_shaders_swizzles passed 193.6088s (shard #1) [146/151] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_deqp_data_gles2_shaders_conversions passed 205.4912s Could some of Telemetry's bookkeeping about waiting for JavaScript results have been subtly changed? The failures: [146/151] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_deqp_data_gles2_shaders_conversions failed unexpectedly 322.2581s: [147/151] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_deqp_data_gles2_shaders_swizzles failed unexpectedly 325.2269s:
,
Dec 8 2017
Ned, while the investigation continues, can we try reverting your CL to see if that unblocks the roll?
,
Dec 8 2017
I reverted it in https://chromium-review.googlesource.com/c/catapult/+/816659
,
Dec 8 2017
Reverting my CL did fix the problem. I am really baffled because my CL mostly change how Telemetry invoke ClearCaches() & SetFullPerformanceModeEnabled() & gpu_test should use none of those
,
Dec 8 2017
Darn. I'm sorry that happened and am as confused as you from reading your patch.
,
Dec 19 2017
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/f9663d1bd9be20b9277dfda0b5b23d6dda6951c9 commit f9663d1bd9be20b9277dfda0b5b23d6dda6951c9 Author: Nghia Nguyen <nednguyen@google.com> Date: Tue Dec 19 12:47:49 2017 Remove platform_backend's DidStartBrowser and WillCloseBrowser hooks SetFullPerformanceModeEnabled(False) is now called after all the test has run in SharedPageState.TearDownState() instead of relying browser objects tracking which is fragile & complex. This is a partial reland of https://chromium-review.googlesource.com/c/catapult/+/814214 Bug: chromium:792860 Bug: chromium:792357 Bug: chromium:793050 Change-Id: I0e4ec50230ccd32c47c34b92cd30edeae7322edf Reviewed-on: https://chromium-review.googlesource.com/833179 Reviewed-by: Juan Antonio Navarro Pérez <perezju@chromium.org> Commit-Queue: Ned Nguyen <nednguyen@google.com> [modify] https://crrev.com/f9663d1bd9be20b9277dfda0b5b23d6dda6951c9/telemetry/telemetry/core/platform.py [modify] https://crrev.com/f9663d1bd9be20b9277dfda0b5b23d6dda6951c9/telemetry/telemetry/internal/platform/platform_backend.py [modify] https://crrev.com/f9663d1bd9be20b9277dfda0b5b23d6dda6951c9/telemetry/telemetry/page/shared_page_state.py [modify] https://crrev.com/f9663d1bd9be20b9277dfda0b5b23d6dda6951c9/telemetry/telemetry/internal/browser/browser.py [modify] https://crrev.com/f9663d1bd9be20b9277dfda0b5b23d6dda6951c9/telemetry/telemetry/testing/fakes/__init__.py [modify] https://crrev.com/f9663d1bd9be20b9277dfda0b5b23d6dda6951c9/telemetry/telemetry/internal/browser/browser_unittest.py
,
Jan 3 2018
Given the partial land in #18 committed smoothly without problem, I now recognize that gpu test has always been using full_performance_mode when run test (https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/browser/browser_options.py?rcl=7365f02611830723d60963f0619d5bab45060849&l=285) So when the original patch land, we accidentally disabled full_performance_mode in correctness test, making those GPU tests fail due to time out. I will keep continue with the refactoring by also making the call to enable performance mode explicit in the correctness test framework.
,
Jan 3 2018
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/77863be41ffde6f6d149fdd1bf6ce68122288541 commit 77863be41ffde6f6d149fdd1bf6ce68122288541 Author: Nghia Nguyen <nednguyen@google.com> Date: Wed Jan 03 14:45:34 2018 Move the call to set platform performance mode out of platform_backend.DidCreateBrowser and remove that API Bug: chromium:793050 Change-Id: I97f6e672359188bc4391a961587c00210d670609 Reviewed-on: https://chromium-review.googlesource.com/848075 Commit-Queue: Ned Nguyen <nednguyen@google.com> Reviewed-by: Juan Antonio Navarro Pérez <perezju@chromium.org> [modify] https://crrev.com/77863be41ffde6f6d149fdd1bf6ce68122288541/telemetry/telemetry/internal/platform/platform_backend.py [modify] https://crrev.com/77863be41ffde6f6d149fdd1bf6ce68122288541/telemetry/telemetry/internal/browser/browser.py [modify] https://crrev.com/77863be41ffde6f6d149fdd1bf6ce68122288541/telemetry/telemetry/internal/browser/browser_options.py [modify] https://crrev.com/77863be41ffde6f6d149fdd1bf6ce68122288541/telemetry/telemetry/page/shared_page_state.py [modify] https://crrev.com/77863be41ffde6f6d149fdd1bf6ce68122288541/telemetry/telemetry/testing/serially_executed_browser_test_case.py
,
Jan 3 2018
,
Sep 10
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by kbr@chromium.org
, Dec 7 2017Components: Internals>GPU>Testing Infra>Client>Android Blink>WebGL Blink>JavaScript
Labels: Hotlist-PixelWrangler