New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 840988 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows , Mac
Pri: 1
Type: Bug-Regression

Blocking:
issue 843338
issue 842019
issue 845411



Sign in to add a comment

Timeout failures in GPU FYI bots

Project Member Reported by rjkroege@chromium.org, May 8 2018

Issue description

Multiple FYI GPU bots are having errors:

https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Experimental%20Release%20%28Intel%29/2089
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win10%20FYI%20Release%20%28NVIDIA%29/1026
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win7%20FYI%20x64%20Release%20%28NVIDIA%29/952
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Experimental%20Release%20%28Intel%29/2089
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Retina%20Debug%20%28NVIDIA%29/1495
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Retina%20Release%20%28NVIDIA%29/1335
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20GPU%20ASAN%20Release/801

In each case, webgl2_conformance_tests are failing on a different test across a range of devices:


[111/138] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_deqp_functional_gles3_shadermatrix_add_assign failed unexpectedly 618.6246s:

[133/139] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_deqp_functional_gles3_texturespecification_basic_teximage3d_2d_array_02 failed unexpectedly 322.8495s:
  
[424/462] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_conformance_more_conformance_constants failed unexpectedly 324.2782s:

[60/136] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_conformance_glsl_functions_glsl_function_clamp_float failed unexpectedly 317.1177s:
 
[113/140] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_deqp_functional_gles3_shaderoperator_angle_and_trigonometry_03 failed unexpectedly 320.3420s:

[109/138] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_deqp_functional_gles3_shadermatrix_add_const failed unexpectedly 317.7220s:
 
[60/136] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_conformance_glsl_functions_glsl_function_clamp_float failed unexpectedly 317.1177s:


Each such failure looks like this:
  Traceback (most recent call last):
    _RunGpuTest at content/test/gpu/gpu_tests/gpu_integration_test.py:132
      self.RunActualGpuTest(url, *args)
    RunActualGpuTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:188
      getattr(self, test_name)(test_path, *args[1:])
    _RunConformanceTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:202
      self._CheckTestCompletion()
    _CheckTestCompletion at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:196
      'webglTestHarness._finished', timeout=self._GetTestTimeout())
    traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
      return func(*args, **kwargs)
    WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/actions/action_runner.py:261
      return self._tab.WaitForJavaScriptCondition(*args, **kwargs)
    traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
      return func(*args, **kwargs)
    WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py:239
      return self._inspector_backend.WaitForJavaScriptCondition(*args, **kwargs)
    traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
      return func(*args, **kwargs)
    WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:302
      e.message + '\n' + debug_message)
  TimeoutException: Timed out while waiting 300s for IsJavaScriptExpressionTrue.
  Console output:
  
  Locals:
    IsJavaScriptExpressionTrue : <function IsJavaScriptExpressionTrue at 0x105997cf8>
    condition                  : 'webglTestHarness._finished'
    context_id                 : None
    debug_message              : 'Console output:\n'
    e                          : TimeoutException('Timed out while waiting 300s for IsJavaScriptExpressionTrue.',)
    kwargs                     : {}
    timeout                    : 300
  
  Found crashpad_database_util
  No minidump found via crashpad_database_util
  No minidump paths to symbolize
  Found crashpad_database_util
  No minidump found via crashpad_database_util
  Restarting browser due to unexpected test failure
  Closing browser (pid=8299) ...
  Browser is closed.
  Starting Chrome ['/b/s/w/ir/out/Release/Chromium.app/Contents/MacOS/Chromium', '--disable-gpu-watchdog', '--enable-experimental-web-platform-features', '--test-type=gpu', '--disable-domain-blocking-for-3d-apis', '--disable-gpu-process-crash-limit', '--disable-blink-features=WebXR', '--js-flags=--expose-gc', '--enable-logging=stderr', '--autoplay-policy=no-user-gesture-required', '--use-cmd-decoder=validating', '--enable-net-benchmarking', '--metrics-recording-only', '--no-default-browser-check', '--no-first-run', '--enable-gpu-benchmarking', '--deny-permission-prompts', '--autoplay-policy=no-user-gesture-required', '--disable-background-networking', '--disable-component-extensions-with-background-pages', '--disable-default-apps', '--disable-search-geolocation-disclosure', '--proxy-server=socks://localhost:53758', '--ignore-certificate-errors-spki-list=PhrPvGIaAMmd29hj8BCZOq096yj7uMpRNHpn5PDxI6I=', '--remote-debugging-port=0', '--enable-crash-reporter-for-testing', '--disable-component-update', '--window-size=1280,1024', '--user-data-dir=/b/s/w/itZPMhfX/tmpqw7awd', 'about:blank']
  DoNothingForwarder started between 127.0.0.1:53901 and 53901
  Got devtools config: ws://127.0.0.1:53901/devtools/browser/90db85be-1408-40ba-aa82-a1bde1b5e56f
  Browser started (pid=8379).
  OS: mac highsierra
  Detailed OS version: 10.13.4 17E139j
  Model: Macmini 7.1
  Browser command line: /b/s/w/ir/out/Release/Chromium.app/Contents/MacOS/Chromium --disable-gpu-watchdog --enable-experimental-web-platform-features --test-type=gpu --disable-domain-blocking-for-3d-apis --disable-gpu-process-crash-limit --disable-blink-features=WebXR --js-flags=--expose-gc --enable-logging=stderr --autoplay-policy=no-user-gesture-required --use-cmd-decoder=validating --enable-net-benchmarking --metrics-recording-only --no-default-browser-check --no-first-run --enable-gpu-benchmarking --deny-permission-prompts --autoplay-policy=no-user-gesture-required --disable-background-networking --disable-component-extensions-with-background-pages --disable-default-apps --disable-search-geolocation-disclosure --proxy-server=socks://localhost:53758 --ignore-certificate-errors-spki-list=PhrPvGIaAMmd29hj8BCZOq096yj7uMpRNHpn5PDxI6I= --remote-debugging-port=0 --enable-crash-reporter-for-testing --disable-component-update --window-size=1280,1024 --user-data-dir=/b/s/w/itZPMhfX/tmpqw7awd --flag-switches-begin --flag-switches-end about:blank
  GPU device 0: VENDOR = 0x8086 (Intel), DEVICE = 0xa2e
  GPU Attributes:
    amd_switchable      : False
    can_support_threaded_texture_mailbox: False
    direct_composition  : False
    direct_rendering    : True
    driver_date         : 
    driver_vendor       : 
    driver_version      : 
    encrypted_only      : False
    gl_extensions       : 
    gl_renderer         : 
    gl_reset_notification_strategy: 0
    gl_vendor           : 
    gl_version          : 
    gl_ws_extensions    : 
    gl_ws_vendor        : 
    gl_ws_version       : 
    in_process_gpu      : False
    initialization_time : 0.0902
    jpeg_decode_accelerator_supported: False
    max_framerate_denominator: 1
    max_framerate_numerator: 30
    max_msaa_samples    : 
    max_resolution_height: 2160
    max_resolution_width: 4096
    min_resolution_height: 16
    min_resolution_width: 16
    optimus             : False
    passthrough_cmd_decoder: False
    pixel_shader_version: 
    process_crash_count : 0
    profile             : 3
    sandboxed           : True
    software_rendering  : False
    supports_overlays   : False
    vertex_shader_version: 
    video_decode_accelerator_flags: 0
  Feature Status:
    2d_canvas           : enabled
    flash_3d            : enabled
    flash_stage3d       : enabled
    flash_stage3d_baseline: enabled
    gpu_compositing     : enabled
    multiple_raster_threads: enabled_on
    native_gpu_memory_buffers: enabled
    rasterization       : enabled
    surface_synchronization: enabled_on
    video_decode        : enabled
    viz_display_compositor: disabled_off
    webgl               : enabled
    webgl2              : enabled
  Driver Bug Workarounds:
    add_and_true_to_loop_condition
    adjust_src_dst_region_for_blitframebuffer
    avoid_stencil_buffers
    decode_encode_srgb_for_generatemipmap
    disable_framebuffer_cmaa
    disable_webgl_rgb_multisampling_usage
    dont_use_loops_to_initialize_variables
    emulate_abs_int_function
    get_frag_data_info_bug
    init_two_cube_map_levels_before_copyteximage
    max_msaa_sample_count_4
    msaa_is_slow
    pack_parameters_workaround_with_pack_buffer
    rebind_transform_feedback_before_resume
    regenerate_struct_names
    remove_invariant_and_centroid_for_essl3
    reset_teximage2d_base_level
    rewrite_texelfetchoffset_to_texelfetch
    scalarize_vec_and_mat_constructor_args
    set_zero_level_before_generating_mipmap
    unfold_short_circuit_as_ternary_operation
    unpack_alignment_workaround_with_unpack_buffer
    unpack_image_height_workaround_with_unpack_buffer
    use_intermediary_for_copy_texture_image
    use_unused_standard_shared_blocks
  Traceback (most recent call last):
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/testing/serially_executed_browser_test_case.py", line 206, in <lambda>
      return lambda self: based_method(self, *args)
    File "/b/s/w/ir/content/test/gpu/gpu_tests/gpu_integration_test.py", line 132, in _RunGpuTest
      self.RunActualGpuTest(url, *args)
    File "/b/s/w/ir/content/test/gpu/gpu_tests/webgl_conformance_integration_test.py", line 188, in RunActualGpuTest
      getattr(self, test_name)(test_path, *args[1:])
    File "/b/s/w/ir/content/test/gpu/gpu_tests/webgl_conformance_integration_test.py", line 202, in _RunConformanceTest
      self._CheckTestCompletion()
    File "/b/s/w/ir/content/test/gpu/gpu_tests/webgl_conformance_integration_test.py", line 196, in _CheckTestCompletion
      'webglTestHarness._finished', timeout=self._GetTestTimeout())
    File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
      return func(*args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/actions/action_runner.py", line 261, in WaitForJavaScriptCondition
      return self._tab.WaitForJavaScriptCondition(*args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
      return func(*args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py", line 239, in WaitForJavaScriptCondition
      return self._inspector_backend.WaitForJavaScriptCondition(*args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
      return func(*args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 302, in WaitForJavaScriptCondition
      e.message + '\n' + debug_message)
  TimeoutException: Timed out while waiting 300s for IsJavaScriptExpressionTrue.
  Console output:
  
[8379:775:0508/111601.508531:WARNING:gaia_auth_fetcher.cc(902)] Could not reach Google Accounts servers: errno -120

A catapult issue perhaps? Suggestions welcome.
 

Comment 1 by kbr@chromium.org, May 8 2018

Components: Blink>MemoryAllocator Blink>JavaScript
Labels: -Type-Bug -Pri-3 Pri-1 Type-Bug-Regression
Owner: rjkroege@chromium.org
On this bot:
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win10%20FYI%20Release%20%28NVIDIA%29

This is a good build, two builds before the first bad one:
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win10%20FYI%20Release%20%28NVIDIA%29/1024

and here's the first bad build:
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win10%20FYI%20Release%20%28NVIDIA%29/1026

Given this, here's a regression range:
http://crrev.com/f1b4c43b85e7cec2a61da121b9244deffec38d03..22f7bd8b0541ed47a6e906c1d79d12f7eb86660f

Rob, can you please help go through the regression range and see if there is anything suspicious? I wonder whether this V8 roll:
https://chromium.googlesource.com/chromium/src/+/c0e585261f693123f3560c63dd6e254e55a83374

or this change to Blink to infrequently purge unused memory:
https://chromium.googlesource.com/chromium/src/+/f4544427fd7e40abba353339146780a2a2b90051

could be related.

Comment 2 by kbr@chromium.org, May 8 2018

Cc: haraken@chromium.org bmeu...@chromium.org gyuyoung...@lge.com

Comment 3 by kbr@chromium.org, May 8 2018

Cc: mvstan...@chromium.org
Note: the V8 changes seem innocuous, aside from possibly mvstanton's change. However, some change has definitely been made recently which has destabilized these tests and caused them to intermittently time out. It's possible the regression range is not correct.

Summary of flaking:

Builder Win7 FYI x64 Release (NVIDIA): 
897  failed refs/heads/master@{#556792}
896 succeeded refs/heads/master@{#556761}
895 succeeded refs/heads/master@{#556737}

Builder Mac FYI Retina Release (NVIDIA)  (flaking)
1333 success refs/heads/master@{#556761}
1332 fail refs/heads/master@{#556746}
1331 success refs/heads/master@{#556735}
1330 success refs/heads/master@{#556716}

Builder Mac FYI GPU ASAN Release
803 fail refs/heads/master@{#556770}
801 fail refs/heads/master@{#556716}
800 success refs/heads/master@{#556687}
799 success refs/heads/master@{#556686}

Builder Mac FYI Experimental Release
2090 success refs/heads/master@{#556858}
2089 a refs/heads/master@{#556835}
2088 success refs/heads/master@{#556813}
2087 success refs/heads/master@{#556806}

Windows 10
refs/heads/master@{#556_760}, refs/heads/master@{#556_720}

Ignoring windows 10, flake would seem to start after  556_806 556_686 556_716 556_737
and be present by 556_835 556_716 556_746 556_792? 

That suggests flake starting by 556_686 and already in place by by 556_716 but the CLs between 556_760 and 556_686 don't seem suspicious other than the ones pointed out by kbr@ Maybe devtools changes are interfering with telemetry operation?

Comment 5 by kbr@chromium.org, May 9 2018

Cc: zmo@chromium.org rjkroege@chromium.org ynovikov@chromium.org jmad...@chromium.org fjhenigman@chromium.org
 Issue 841388  has been merged into this issue.

Comment 6 by kbr@chromium.org, May 9 2018

Status: Assigned (was: Untriaged)
There is a clear signal of the regression on this particular bot:
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Retina%20Release%20%28NVIDIA%29?limit=200

We urgently need to speculatively revert any suspect CLs to get this bot reliably green again. This may involve pausing the V8 autoroller and reverting back to an earlier version.

Screen Shot 2018-05-09 at 9.41.12 AM.png
808 KB View Download

Comment 7 by kbr@chromium.org, May 9 2018

Cc: machenb...@chromium.org hablich@chromium.org
first uptick of timeout error rate in webgl tests on Mac Retina NVIDIA is (https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Retina%20Release%20(NVIDIA)?limit=200) is 1322

Average size of success window is ~2 so looking at 4 previous builds: 1321, 1320, 1319, 1318.

Besides:

v8 roll: https://chromium.googlesource.com/chromium/src/+/c0e585261f693123f3560c63dd6e254e55a83374
blink memory management change: https://chromium.googlesource.com/chromium/src/+/f4544427fd7e40abba353339146780a2a2b90051

Maybe these CLs deserve a once-over?

* maybe an infra change: https://chromium-review.googlesource.com/c/chromium/src/+/1043002?
* really reaching here but could perturb timings in image code: https://chromium-review.googlesource.com/c/chromium/src/+/1045729?
* devtools change might perturb how telemetry works https://chromium-review.googlesource.com/c/chromium/src/+/1045180?


Reviewing Mac FYI ASAN (https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20GPU%20ASAN%20Release)

More frequent timeout failures start at build 801

suspects on build 801 itself from this builder are exactly the same as #8


Reviewing Linux NVIDIA (https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Release%20%28NVIDIA%29?limit=200)

More frequent timeout failure start at 1965.
1963 features the v8 roll mentioned above.
1962 has dev tool change mentioned above and blink memory change.




blink roll revert https://chromium-review.googlesource.com/c/chromium/src/+/1052527 landed as 

Change-Id: I7108d2aecebef1e7ca0355bca571e85d1cc01499
Cr-Commit-Position: refs/heads/master@{#557306}

blink memory management change https://chromium-review.googlesource.com/c/chromium/src/+/1052987 landed:

Bug: None
Change-Id: I26910af177818011bf1aa00efeacb9bb8058f108
Reviewed-on: https://chromium-review.googlesource.com/1052987
Reviewed-by: Robert Kroeger <rjkroege@chromium.org>
Reviewed-by: Kentaro Hara <haraken@chromium.org>
Commit-Queue: Robert Kroeger <rjkroege@chromium.org>
Cr-Commit-Position: refs/heads/master@{#557322}

Comment 14 by kozy@chromium.org, May 10 2018

It looks like Mac boots recovered and windowns gpu bots failure looks unrelated. Can we try roll V8 again?

Comment 15 by kbr@chromium.org, May 10 2018

Yes, please go ahead and try rolling V8 again.

This particular bot was most severely affected:
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Retina%20Release%20%28NVIDIA%29?limit=200

From the attached screenshot this is the build where the bot became green again:
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Retina%20Release%20%28NVIDIA%29/1355

that build contains the revert of the Blink memory management change https://chromium-review.googlesource.com/1052987 .

The previous build https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Retina%20Release%20%28NVIDIA%29/1354 contained the revert of the most recent V8 roll, and that build was still red.

So it looks like the culprit was the Blink memory management change.

Screen Shot 2018-05-10 at 12.02.09 AM.png
814 KB View Download
> So it looks like the culprit was the Blink memory management change.

I'm sorry. I couldn't guess that the CL could influence on the webgl tests. Let me try to check what was a problem. Thanks.
>> So it looks like the culprit was the Blink memory management change.

>I'm sorry. I couldn't guess that the CL could influence on the webgl tests. Let me try to check what was a problem. Thanks.

BTW, though my CL was reverted, it looks the timeout failures occurred again.
- https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Retina%20Release%20%28NVIDIA%29/1360

Comment 18 by kbr@chromium.org, May 10 2018

gyuyoung.kim@: I think that failure is just one particular flaky test which we need to suppress. We will confirm by letting things run overnight. Please do not reland your CL. Also, please file a bug and refer to it from your CL when landing changes like yours which have potentially significant impact. Thanks.

>gyuyoung.kim@: I think that failure is just one particular flaky test which we >need to suppress. We will confirm by letting things run overnight. Please do not >reland your CL. Also, please file a bug and refer to it from your CL when landing >changes like yours which have potentially significant impact. Thanks.

ok, let me do that next time. Thanks too.
Status: Fixed (was: Assigned)
All builder issues are tracked in other bugs. This particular bug seems to have been resolved. Closing.

Comment 21 by kbr@chromium.org, May 11 2018

Blocking: 842019

Comment 22 by kbr@chromium.org, May 11 2018

Robert, *thank you* for tracking down the cause of this flakiness! This was an important regression to find and fix.

Gyuyoung: I've filed follow-on  Issue 842019  for you to track down the reason why your CL caused these failures. It seems suspicious that it did, and we should get to the bottom of it. Also, I've submitted https://chromium-review.googlesource.com/1054773 to suppress the flakes you observed on the Mac NVIDIA bot in the WebGL dEQP shaderoperator tests.

kbr@, thank you for filing a bug. ok, let me investigate why my CL caused the flaky test there.
Project Member

Comment 24 by bugdroid1@chromium.org, May 11 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d5efb3fb1f0451424f6233de328906f7002465f3

commit d5efb3fb1f0451424f6233de328906f7002465f3
Author: Kenneth Russell <kbr@chromium.org>
Date: Fri May 11 01:40:45 2018

Add link to Blink MemoryCoordinator bug to flakiness examples.

This was a difficult bug to track down and is worth mentioning as a
reason to prioritize stamping out flakiness on the waterfall.

Bug:  840988 
Change-Id: If2c56a80b02178ddc28efd44d73ec440a2aa2e0c
Tbr: rjkroege@chromium.org
Reviewed-on: https://chromium-review.googlesource.com/1054585
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#557758}
[modify] https://crrev.com/d5efb3fb1f0451424f6233de328906f7002465f3/docs/gpu/gpu_testing.md

Comment 25 by kbr@chromium.org, May 15 2018

Cc: lijeffrey@chromium.org st...@chromium.org
Components: Tools>Test>FindIt
stgao, lijeffrey: this bug and the associated flaky failures would be a good use case for FindIt. The problematic CL caused flakes in random tests, but all in the same test suite. It would be ideal if the flake detection could handle this case.

Comment 26 by st...@chromium.org, May 15 2018

Cc: liaoyuke@chromium.org
As it is for flake detection, liaoyuke@ could follow up here. Jeff focuses on flake analysis -- find the culprit.

Comment 27 by kbr@chromium.org, May 15 2018

Blocking: 843338

Comment 28 by kbr@chromium.org, May 22 2018

Blocking: 845411

Sign in to add a comment