New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 628022 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Bug

Blocked on:
issue 628713
issue 634999

Blocking:
issue 352807
issue 633617
issue 653870
issue 682819



Sign in to add a comment

Occasional failure of an entire shard with the new browser_test_runner harness

Project Member Reported by kbr@chromium.org, Jul 13 2016

Issue description

Occasionally one failed test will cause all subsequent tests to fail with the new browser_test_runner harness. I'm not sure what is causing this. The initial thought was that a renderer or GPU process crash was causing a minidump to be generated, and that restarting the browser was causing Telemetry's internal symbolization code to run, and to fail. This doesn't immediately seem to be the cause based on initial testing, but it may be necessary to do more failure injection.

I'll upload a CL which provokes a failure of the first test in the suite. Either a GPU or renderer process crash may be simulated. Despite best efforts (including modifying code in Catapult to do failure injection) only the first test in the run actually fails.

Attached is the full (large) failure log from this try job:
https://build.chromium.org/p/tryserver.chromium.win/builders/win_optional_gpu_tests_rel/builds/2151

from this CL:
https://codereview.chromium.org/2138673002

All of the tests in shard 10 (search the text for "Shard 10") failed. It's not clear why.

 
stdout.txt.gz
690 KB Download

Comment 1 by kbr@chromium.org, Jul 13 2016

Here's a CL which uses failure injection to fail the first test in the WebGL conformance suite:
https://codereview.chromium.org/2149823002

It can be used by patching it in and then running:

python content\test\gpu\run_gpu_integration_test.py webgl_conformance --browser=canary --test-filter=conformance_attribs

Unfortunately, it doesn't reproduce the problem reported above. The first test fails as expected, but the second and subsequent ones recover well.

Comment 2 by kbr@chromium.org, Jul 13 2016

Here's another similar kind of failure, this time on Mac OS:
https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/1839

It's from this CL:
https://codereview.chromium.org/2133673003

Attached is the gzipped stdio.html. One failing test was WebglConformance_conformance2_glsl3_forbidden_operators . Search through the file in a text editor (it seems to be too large for Chrome to load) for "Shard 14". It can be seen that the GPU process hung at the beginning of that shard, it crashed and that crash was symbolized, and then all subsequent tests failed with the error messages:

WebglConformance_deqp_functional_gles3_fbocolorbuffer_tex2darray_00 (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ... INFO:root:Found crashpad_database_util
[0712/121837:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/work/isolated/isolated_tmptyJSiz/tmp9lirf6/completed/48bc24d7-d8e1-4041-b5c2-dd48d1ecab45.dmp.stripped
INFO:root:Minidump found: /b/swarm_slave/work/isolated/isolated_tmptyJSiz/tmp9lirf6/completed/48bc24d7-d8e1-4041-b5c2-dd48d1ecab45.dmp
INFO:root:Dumping breakpad symbols.
Can't get standard output with --show-stdout
ERROR
WebglConformance_deqp_functional_gles3_fragmentoutput_random_02 (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ... INFO:root:Found crashpad_database_util
[0712/121841:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/work/isolated/isolated_tmptyJSiz/tmp9lirf6/completed/48bc24d7-d8e1-4041-b5c2-dd48d1ecab45.dmp.stripped
INFO:root:Minidump found: /b/swarm_slave/work/isolated/isolated_tmptyJSiz/tmp9lirf6/completed/48bc24d7-d8e1-4041-b5c2-dd48d1ecab45.dmp
INFO:root:Dumping breakpad symbols.
Can't get standard output with --show-stdout
ERROR

It looks like if browser startup fails at some key points, the test fails.

stdio.html.gz
4.3 MB Download
In https://chromium-swarm.appspot.com/user/task/2ff6de5d924f7c10, it looks like the harness did not try to restart the browser? If so, we should have seen multiple "INFO:root:Starting Chrome.." log. 

Ctrl+F for "Starting Chrome" only shows 1 result. 

Looking at the stack trace, the browser fails at the setUp method of gpu_integration_test.py (https://cs.chromium.org/chromium/src/content/test/gpu/gpu_tests/gpu_integration_test.py?rcl=0&l=128).

I guess this is happening because we don't have logic to restart browser if there is exception raised in setUp/tearDown. Maybe you can try to reproduce this problem by inject a browser crash in gpu_integration_test.GpuIntegrationTest.setUp?

Comment 5 by kbr@chromium.org, Jul 13 2016

#3: I think it's hitting a buildbot limitation. You can use src/content/test/gpu/gather_swarming_json_results.py to scrape and assemble those JSON results.

#4: thanks, that's a good idea. I'm really swamped at this point but if you or someone else could verify that idea, I'd appreciate it.

Sure, the problem can be reproduced with https://codereview.chromium.org/2147133002/

Comment 7 by eyaich@chromium.org, Jul 14 2016

Owner: eyaich@chromium.org
Status: Assigned (was: Untriaged)
It looks like Ned has done the heavy lifting here.  I have confirmed that catching the exception in setUp does enable the rest of the tests to execute and pass.  I will write a unittest for this fix and get a cl out for review.

Comment 8 by kbr@chromium.org, Jul 15 2016

Thanks Ned and Emily for reproducing the issue. I wonder whether the issue is that the renderer process is crashing (which is what https://codereview.chromium.org/2147133002/ provokes) or whether the browser's failing to launch. The current code in setUp calls:

  self.browser.tabs[0]

and one of those dereferences is failing. It's not clear why.

It's more likely a renderer crash. Here is the code in inspector_backend_list.py
>  def __getitem__(self, index):
>    self._Update()
>    if index >= len(self._filtered_context_ids):
>      raise exceptions.DevtoolsTargetCrashException(     <--- Exception thrown here
>          self.app,
>          'Web content with index %s may have crashed. '
>          'filtered_context_ids = %s' % (
>              index, repr(self._filtered_context_ids)))

That must mean the self._Update() method is running fine, and that method does issue a request to chrome devtool, indicating that browser process is still alive at that point.

Comment 10 by kbr@chromium.org, Jul 15 2016

Thanks for your analysis.

Given this I think we should get rid of GpuIntegrationTest.setUp and merge its code into _RunGpuTest, since doing so will catch this exception in the same way that others are caught by the harness, so we'll have a chance at diagnosing these failures.

Comment 11 by kbr@chromium.org, Jul 15 2016

Cc: cwallez@chromium.org
Note: here's a slightly different instance of the same problem:

https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/1904

from CL:
https://codereview.chromium.org/2144293002

Shard 4 failed:
https://chromium-swarm.appspot.com/user/task/30030b427d36bd10

It looks like the browser failed to start:

ERROR:root:Failure while starting browser backend.
Traceback (most recent call last):
  File "/b/swarm_slave/work/isolated/ir0Uihr1/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 55, in __init__
    self._browser_backend.Start()
  File "/b/swarm_slave/work/isolated/ir0Uihr1/third_party/catapult/telemetry/telemetry/internal/backends/chrome/desktop_browser_backend.py", line 291, in Start
    self._WaitForBrowserToComeUp()
  File "/b/swarm_slave/work/isolated/ir0Uihr1/third_party/catapult/telemetry/telemetry/internal/backends/chrome/chrome_browser_backend.py", line 174, in _WaitForBrowserToComeUp
    raise exceptions.BrowserConnectionGoneException(self.browser, e)
BrowserConnectionGoneException: Timed out while waiting 60s for HasBrowserFinishedLaunching.
Found Minidump: False
Stack Trace:
********************************************************************************
	No crash dump found.
********************************************************************************
Standard output:
********************************************************************************
********************************************************************************
ERROR

======================================================================
ERROR: setUpClass (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/b/swarm_slave/work/isolated/ir0Uihr1/content/test/gpu/gpu_tests/webgl_conformance_integration_test.py", line 77, in setUpClass
    cls.StartBrowser()
  File "/b/swarm_slave/work/isolated/ir0Uihr1/third_party/catapult/telemetry/telemetry/testing/serially_executed_browser_test_case.py", line 83, in StartBrowser
    cls.browser = cls._browser_to_create.Create(cls._browser_options)
  File "/b/swarm_slave/work/isolated/ir0Uihr1/third_party/catapult/telemetry/telemetry/internal/backends/chrome/desktop_browser_finder.py", line 68, in Create
    browser_backend, self._platform_backend, self._credentials_path)
  File "/b/swarm_slave/work/isolated/ir0Uihr1/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 55, in __init__
    self._browser_backend.Start()
  File "/b/swarm_slave/work/isolated/ir0Uihr1/third_party/catapult/telemetry/telemetry/internal/backends/chrome/desktop_browser_backend.py", line 291, in Start
    self._WaitForBrowserToComeUp()
  File "/b/swarm_slave/work/isolated/ir0Uihr1/third_party/catapult/telemetry/telemetry/internal/backends/chrome/chrome_browser_backend.py", line 174, in _WaitForBrowserToComeUp
    raise exceptions.BrowserConnectionGoneException(self.browser, e)
BrowserConnectionGoneException: Timed out while waiting 60s for HasBrowserFinishedLaunching.
Found Minidump: False
Stack Trace:
********************************************************************************
	No crash dump found.
********************************************************************************
Standard output:
********************************************************************************
********************************************************************************


This is pretty unfortunate and I wonder what kind of additional instrumentation could be added to figure out what happened. (Specifically, does it seem likely the first renderer process which the browser launched crashed upon startup, like the other failure modes?)

Comment 12 by kbr@chromium.org, Jul 15 2016

Blockedon: 628713

Comment 13 by kbr@chromium.org, Jul 18 2016

Cc: ccameron@chromium.org
Another problem where one failure left a corrupted minidump that caused catastrophic failure of the harness:

https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/2034

Shard #3 failed:
https://chromium-swarm.appspot.com/user/task/3018ab1fad251110

WebglConformance_deqp_functional_gles3_texturefiltering_3d_combinations_25 (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ... INFO:root:Found crashpad_database_util
[0718/142618:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp.stripped
INFO:root:Minidump found: /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp
INFO:root:Dumping breakpad symbols.
INFO:root:Found crashpad_database_util
[0718/142636:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp.stripped
Can&#39;t get standard output with --show-stdout
ERROR
WebglConformance_deqp_functional_gles3_fboinvalidate_default (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ... INFO:root:Found crashpad_database_util
[0718/142636:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp.stripped
INFO:root:Minidump found: /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp
INFO:root:Dumping breakpad symbols.
INFO:root:Found crashpad_database_util
[0718/142655:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp.stripped
Can&#39;t get standard output with --show-stdout
ERROR
...

It looks like this last one on the 18th might be slightly different since it doesn't appear to have anything to do with the state of the browser.  It seems that is is trying to parse the same corrupt metadata each time and is failing.  I wonder if that should be a separate bug.

Comment 15 by kbr@chromium.org, Jul 21 2016

You're right that it's not the same problem as the browser crashing at the beginning of the run, but it's a similar kind of failure which has been seen before on Mac OS. It'd be fine to file a separate bug, but do you think you could try to figure out how to mock that failure in the harness, and work around it by not trying to re-symbolize the corrupted minidump each time?

Comment 16 by kbr@chromium.org, Jul 21 2016

Here's another failure:

https://build.chromium.org/p/tryserver.chromium.angle/builders/mac_angle_rel_ng/builds/1700

Shard 1 failed:
https://chromium-swarm.appspot.com/user/task/30285ede678e5b10

Full stdout's attached, but the symptom is that the browser failed to launch the first time:

----------
WebglConformance_deqp_functional_gles3_vertexarrays_multiple_attributes_output (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ... [298:92183:0721/152808:WARNING:simple_synchronous_entry.cc(1054)] Could not open platform files for entry.

Traceback (most recent call last):
  _RunGpuTest at content/test/gpu/gpu_tests/gpu_integration_test.py:49
    self.RunActualGpuTest(url, *args)
  RunActualGpuTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:48
    'webglTestHarness._finished', timeout_in_seconds=300)
  WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/actions/action_runner.py:187
    self._tab.WaitForJavaScriptExpression(condition, timeout_in_seconds)
  WaitForJavaScriptExpression at third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py:136
    e.message + '\n' + debug_message)
TimeoutException: Timed out while waiting 300s for IsJavaScriptExpressionTrue.
Console output:

Locals:
  IsJavaScriptExpressionTrue : <function IsJavaScriptExpressionTrue at 0x1060a8500>
  debug_message              : 'Console output:\n'
  e                          : TimeoutException('Timed out while waiting 300s for IsJavaScriptExpressionTrue.',)
  expr                       : 'webglTestHarness._finished'
  timeout                    : 300

WARNING:root:Restarting browser due to unexpected test failure
[307:20227:0721/153322:ERROR:node_controller.cc(1099)] Could not be introduced to peer AC2230C166E56678.4BF0E494F7A4559A
[298:1287:0721/153323:WARNING:url_request_context_getter.cc(43)] URLRequestContextGetter leaking due to no owning thread.
WARNING:root:Chrome build location for mac_x86_64 not found. Browser will be run without Flash.
INFO:root:Chose browser: PossibleDesktopBrowser(type=release, executable=/b/swarm_slave/w/ir4r00_T/out/Release/Chromium.app/Contents/MacOS/Chromium, flash=None)
INFO:root:Requested remote debugging port: 0
INFO:root:Starting Chrome ['/b/swarm_slave/w/ir4r00_T/out/Release/Chromium.app/Contents/MacOS/Chromium', '--enable-unsafe-es3-apis', '--test-type=gpu', '--disable-domain-blocking-for-3d-apis', '--disable-gesture-requirement-for-media-playback', '--disable-gpu-process-crash-limit', '--disable-accelerated-video-decode', '--enable-experimental-canvas-features', '--js-flags=--expose-gc', '--enable-logging=stderr', '--enable-net-benchmarking', '--metrics-recording-only', '--no-default-browser-check', '--no-first-run', '--enable-gpu-benchmarking', '--disable-background-networking', '--no-proxy-server', '--disable-component-extensions-with-background-pages', '--disable-default-apps', '--remote-debugging-port=0', '--enable-crash-reporter-for-testing', '--window-size=1280,1024', '--user-data-dir=/b/swarm_slave/w/itTQ15AN/tmp0cIGEd', 'about:blank']
INFO:root:Found crashpad_database_util
INFO:root:No minidump found via crashpad_database_util
INFO:root:Found crashpad_database_util
INFO:root:No minidump found via crashpad_database_util
Can't get standard output with --show-stdout
ERROR:root:Failure while starting browser backend.
Traceback (most recent call last):
  File "/b/swarm_slave/w/ir4r00_T/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 55, in __init__
    self._browser_backend.Start()
  File "/b/swarm_slave/w/ir4r00_T/third_party/catapult/telemetry/telemetry/internal/backends/chrome/desktop_browser_backend.py", line 291, in Start
    self._WaitForBrowserToComeUp()
  File "/b/swarm_slave/w/ir4r00_T/third_party/catapult/telemetry/telemetry/internal/backends/chrome/chrome_browser_backend.py", line 174, in _WaitForBrowserToComeUp
    raise exceptions.BrowserConnectionGoneException(self.browser, e)
BrowserConnectionGoneException: Timed out while waiting 60s for HasBrowserFinishedLaunching.
Found Minidump: False
Stack Trace:
********************************************************************************
	No crash dump found.
********************************************************************************
Standard output:
********************************************************************************
********************************************************************************
ERROR
WebglConformance_deqp_functional_gles3_transformfeedback_array_interleaved_triangles (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ... ERROR
WebglConformance_deqp_functional_gles3_vertexarrays_single_attribute_output_type_unsigned_short (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ... ERROR
......
----------

This is a really serious problem that's affecting the Mac GPU tryservers badly. I think we should consider trying to work around it by retrying the browser's bringup multiple times inside the harness.

https://build.chromium.org/p/tryserver.chromium.angle/builders/mac_angle_rel_ng?numbuilds=200
https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel?numbuilds=200


Fortunately, it's not affecting the main Chromium CQ bots which run webgl_conformance_tests:

https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_ng?numbuilds=200

I don't know why there's a behavioral difference between these two sets of bots. They are running the on the same hardware in the Swarming pool.

stdout.txt
121 KB View Download
Working around it by retrying the browser sounds good to me. It's really strange that there is no stdout from the browser here.

Comment 18 by kbr@chromium.org, Jul 22 2016

We're also seeing random timeouts in the middle of test runs. See https://bugs.chromium.org/p/chromium/issues/detail?id=619264#c28 and the random failure of WebglConformance_deqp_functional_gles3_shadermatrix_div_uniform in https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/2150 .

I wonder whether something's flaky in DevTools' websocket connection to the browser on Mac OS.

Comment 19 by kbr@chromium.org, Jul 22 2016

Cc: -yhirano@chromium.org pfeldman@chromium.org
Components: Platform>DevTools
CC'ing pfeldman@ for discussion about the possibility of a DevTools issue.

Project Member

Comment 20 by bugdroid1@chromium.org, Jul 22 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/202a09250ad4141be8a80cd63881998d8e821f09

commit 202a09250ad4141be8a80cd63881998d8e821f09
Author: eyaich <eyaich@google.com>
Date: Fri Jul 22 03:26:28 2016

Adding logic to restart the browser if there is an exception
in the setUp of GpuIntegrationTest

Note: I am not certain there is a good way to test the setup
functionality of GpuIntegrationTest in a unittest.  Given that
we are using a Fakes, I have added a hack to simulate throwing
an error in the setup method, but it is not how it would behave
in practice.  Any suggestions for a better way to unittest this
are appreciated.

Dependent on https://codereview.chromium.org/2148283003 landing in telemetry first

BUG= 628022 
CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2151983002
Cr-Commit-Position: refs/heads/master@{#407022}

[modify] https://crrev.com/202a09250ad4141be8a80cd63881998d8e821f09/content/test/gpu/gpu_tests/gpu_integration_test.py
[modify] https://crrev.com/202a09250ad4141be8a80cd63881998d8e821f09/content/test/gpu/gpu_tests/gpu_integration_test_unittest.py

Project Member

Comment 21 by bugdroid1@chromium.org, Jul 22 2016

Labels: merge-merged-2804
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/202a09250ad4141be8a80cd63881998d8e821f09

commit 202a09250ad4141be8a80cd63881998d8e821f09
Author: eyaich <eyaich@google.com>
Date: Fri Jul 22 03:26:28 2016

Adding logic to restart the browser if there is an exception
in the setUp of GpuIntegrationTest

Note: I am not certain there is a good way to test the setup
functionality of GpuIntegrationTest in a unittest.  Given that
we are using a Fakes, I have added a hack to simulate throwing
an error in the setup method, but it is not how it would behave
in practice.  Any suggestions for a better way to unittest this
are appreciated.

Dependent on https://codereview.chromium.org/2148283003 landing in telemetry first

BUG= 628022 
CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2151983002
Cr-Commit-Position: refs/heads/master@{#407022}

[modify] https://crrev.com/202a09250ad4141be8a80cd63881998d8e821f09/content/test/gpu/gpu_tests/gpu_integration_test.py
[modify] https://crrev.com/202a09250ad4141be8a80cd63881998d8e821f09/content/test/gpu/gpu_tests/gpu_integration_test_unittest.py

RE comment 16:
A cl is out to restart browser up to 3 times in the even of crashing on restart.

RE comment 13-15:

Upon closer inspection of the stack traces I don't think that a corrupted minidump is actually what is crashing the browser.  In the shard 3 log that you pointed out (https://chromium-swarm.appspot.com/user/task/3018ab1fad251110) there are two instances of these failed minidump symbolizations: 

The first looks like a dev tools crash: 

INFO:root:Found crashpad_database_util
INFO:root:Minidump found: /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp
INFO:root:Downloading gs://chromium-telemetry/binary_dependencies/minidump_stackwalk_76c5983fc9e9316a9d4251ba3e68b955c4fc9bf3 to /b/swarm_slave/w/irFPEqPO/third_party/catapult/telemetry/telemetry/internal/bin/mac/x86_64/minidump_stackwalk
INFO:root:Dumping breakpad symbols.
INFO:root:Downloading gs://chromium-telemetry/binary_dependencies/minidump_dump_c39bd7a3b9fa6279893b2d759045699d79ce4dcb to /b/swarm_slave/w/irFPEqPO/third_party/catapult/telemetry/telemetry/internal/bin/mac/x86_64/minidump_dump
[0718/142540:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp.stripped
INFO:root:Found crashpad_database_util
[0718/142618:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp.stripped
Can't get standard output with --show-stdout

Expected exception while running WebglConformance_deqp_functional_gles3_shaderloop_do_while

Traceback (most recent call last):
  _RunGpuTest at content/test/gpu/gpu_tests/gpu_integration_test.py:49
    self.RunActualGpuTest(url, *args)
  RunActualGpuTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:48
    'webglTestHarness._finished', timeout_in_seconds=300)
  WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/actions/action_runner.py:187
    self._tab.WaitForJavaScriptExpression(condition, timeout_in_seconds)
  WaitForJavaScriptExpression at third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py:123
    util.WaitFor(IsJavaScriptExpressionTrue, timeout)
  WaitFor at third_party/catapult/telemetry/telemetry/core/util.py:86
    res = condition()
  IsJavaScriptExpressionTrue at third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py:116
    return bool(self.EvaluateJavaScript(expr))
  EvaluateJavaScript at third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py:187
    expr, context_id=None, timeout=timeout)
  EvaluateJavaScriptInContext at third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py:215
    expr, context_id=context_id, timeout=timeout)
  inner at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:32
    return func(inspector_backend, *args, **kwargs)
  EvaluateJavaScript at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:203
    return self._runtime.Evaluate(expr, context_id, timeout)
  Evaluate at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_runtime.py:45
    res = self._inspector_websocket.SyncRequest(request, timeout)
  SyncRequest at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py:110
    res = self._Receive(timeout)
  _Receive at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py:166
    self._HandleNotification(result)
  _HandleNotification at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py:179
    self._domain_handlers[domain_name](result)
  _HandleInspectorDomainNotification at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:357
    raise exception
DevtoolsTargetCrashException: Devtools target crashed
********************************************************************************
(/b/swarm_slave/w/irFPEqPO/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:410 _AddDebuggingInformation) Received a socket error in the browser connection and the tab no longer exists. The tab probably crashed.
********************************************************************************
(/b/swarm_slave/w/irFPEqPO/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:411 _AddDebuggingInformation) Debugger url: ws://127.0.0.1:50705/devtools/page/760a1760-c3ef-48ee-843d-3ec701bf3fcc
Found Minidump: True
Stack Trace:




The second appears to be crashing browser set up again because the browser is down and my first CL that restarts on a setup crash (https://codereview.chromium.org/2151983002/)  should take care of that: 

[0718/145233:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp.stripped
WARNING:root:Crash dump is older than 5 minutes. May not be correct.
Can't get standard output with --show-stdout
ERROR
WebglConformance_conformance2_state_gl_get_calls (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ... INFO:root:Found crashpad_database_util
[0718/145233:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp.stripped
WARNING:root:Crash dump is older than 5 minutes. May not be correct.
INFO:root:Minidump found: /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp
INFO:root:Dumping breakpad symbols.
INFO:root:Found crashpad_database_util
[0718/145238:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp.stripped
WARNING:root:Crash dump is older than 5 minutes. May not be correct.
Can't get standard output with --show-stdout
ERROR
WebglConformance_conformance_textures_misc_tex_sub_image_2d (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ... INFO:root:Found crashpad_database_util
[0718/145238:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp.stripped
WARNING:root:Crash dump is older than 5 minutes. May not be correct.
INFO:root:Minidump found: /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp
INFO:root:Dumping breakpad symbols.
INFO:root:Found crashpad_database_util
[0718/145243:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp.stripped
WARNING:root:Crash dump is older than 5 minutes. May not be correct.
Can't get standard output with --show-stdout
ERROR
[4192:1287:0718/145244:WARNING:url_request_context_getter.cc(43)] URLRequestContextGetter leaking due to no owning thread.

======================================================================
ERROR: WebglConformance_deqp_functional_gles3_texturefiltering_3d_combinations_25 (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/b/swarm_slave/w/irFPEqPO/content/test/gpu/gpu_tests/gpu_integration_test.py", line 129, in setUp
    self.tab = self.browser.tabs[0]
  File "/b/swarm_slave/w/irFPEqPO/third_party/catapult/telemetry/telemetry/internal/browser/tab_list.py", line 18, in __getitem__
    return self._tab_list_backend.__getitem__(index)
  File "/b/swarm_slave/w/irFPEqPO/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend_list.py", line 62, in __getitem__
    index, repr(self._filtered_context_ids)))
DevtoolsTargetCrashException: Web content with index 0 may have crashed. filtered_context_ids = []
Found Minidump: True
Stack Trace:
*******************************************************************************
Project Member

Comment 23 by bugdroid1@chromium.org, Jul 26 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ddb380f8be43fc6f1edef0ddb962403695b96370

commit ddb380f8be43fc6f1edef0ddb962403695b96370
Author: eyaich <eyaich@google.com>
Date: Tue Jul 26 17:54:42 2016

Adding 3 attempts at starting the browser for a gpu integration test.

BUG= 628022 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2181673002
Cr-Commit-Position: refs/heads/master@{#407847}

[modify] https://crrev.com/ddb380f8be43fc6f1edef0ddb962403695b96370/content/test/gpu/gpu_tests/gpu_integration_test.py
[modify] https://crrev.com/ddb380f8be43fc6f1edef0ddb962403695b96370/content/test/gpu/gpu_tests/gpu_integration_test_unittest.py

Comment 24 by kbr@chromium.org, Aug 3 2016

Blocking: 633617

Comment 25 by kbr@chromium.org, Aug 3 2016

Thanks Emily for working on this. Unfortunately, it doesn't look like the 3 restart attempts are working. See  Issue 633617  and https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/2420 as an example. https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/2420/steps/webgl2_conformance_tests%20on%20NVIDIA%20GPU%20on%20Mac%20Retina%20%28with%20patch%29%20on%20Mac/logs/stdio shows shards 4 and 13 failing to launch the browser 3 times:

https://chromium-swarm.appspot.com/user/task/3064d44f1d72fe10
https://chromium-swarm.appspot.com/user/task/3064d465907f3c10

Do you have any ideas? From the stack traces does it look like your restart code is working as expected?

So this isn't actually in the same part of the code.  This failure is in setUpClass, not setUp, which is where I added the 3 restart attempts before.  This is failing the first time the test is trying to run and it can't bring the browser up.

I think the bigger problem is why the browser is not able to be restarted.  Ned has suggested I could add the ability to take a screenshot on failure so we might get a better idea of what is going on when the browser crashes.  Similar to what he did for the benchmarks: https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/page/shared_page_state.py#L154.  I will look at what I can reuse there and see if I can add the same functionality.

Next, I propose that we pull the restart into StartBrowser.  I will override the method in GpuIntegrationTest to do the restart so that all subclasses of SeriallyExecutedBrowserTestCase don't inherit this behavior.  This is where I will do the screenshot on failure as well.  The only concern I have with this is that the retry logic in restart actually wraps more than just StartBrowser, but we have no evidence that the failure happened anywhere else, so I think think is a good approach.
To #26, pull the restart into StartBrowser sgtm.

Comment 28 by kbr@chromium.org, Aug 3 2016

Great analysis. SGTM to refactor StartBrowser as you see fit. Thanks for continuing to work on this.

I agree it's really mysterious why the browser is failing to launch. This is definitely new behavior and I'm concerned that there's a newly-introduced race condition (perhaps limited to Mac OS) affecting the actual product.

Project Member

Comment 29 by bugdroid1@chromium.org, Aug 3 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1d7d9c3fb80c5aa8b532713176263654e54e3d79

commit 1d7d9c3fb80c5aa8b532713176263654e54e3d79
Author: catapult-deps-roller <catapult-deps-roller@chromium.org>
Date: Wed Aug 03 19:22:34 2016

Roll src/third_party/catapult/ ff62a5c2f..7d2a597a4 (1 commit).

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/ff62a5c2f33c..7d2a597a4cff

$ git log ff62a5c2f..7d2a597a4 --date=short --no-merges --format='%ad %ae %s'

BUG= 628022 

TBR=catapult-sheriff@chromium.org

Review-Url: https://codereview.chromium.org/2205313003
Cr-Commit-Position: refs/heads/master@{#409590}

[modify] https://crrev.com/1d7d9c3fb80c5aa8b532713176263654e54e3d79/DEPS

Project Member

Comment 30 by bugdroid1@chromium.org, Aug 5 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/085c13919b7a8f7fb77e428a77df45bbc98978ae

commit 085c13919b7a8f7fb77e428a77df45bbc98978ae
Author: nednguyen <nednguyen@google.com>
Date: Fri Aug 05 02:33:01 2016

[content/test/gpu] Pushing the restart logic into start browser.

This patch is done on behalf of eyaich@chromium.org (original work in
https://codereview.chromium.org/2209673003).

For reviewing this CL, the 1st patch set is the patch set 1 of
https://codereview.chromium.org/2209673003.
The 2nd patch set address some nits & add TODO for improve unittest logic.

BUG= chromium:628022 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2219593003
Cr-Commit-Position: refs/heads/master@{#409971}

[modify] https://crrev.com/085c13919b7a8f7fb77e428a77df45bbc98978ae/content/test/gpu/gpu_tests/gpu_integration_test.py
[modify] https://crrev.com/085c13919b7a8f7fb77e428a77df45bbc98978ae/content/test/gpu/gpu_tests/gpu_integration_test_unittest.py

Comment 31 by kbr@chromium.org, Aug 5 2016

Components: Infra>Platform>Swarming
Thanks for continuing to push this forward.

After https://codereview.chromium.org/2219593003/ landed, this failure happened:

https://build.chromium.org/p/chromium.gpu.fyi/builders/Mac%20Retina%20Release/builds/6146/steps/webgl_conformance_tests%20on%20NVIDIA%20GPU%20on%20Mac%20Retina%20on%20Mac/logs/stdio

Link to Swarming shard:
https://chromium-swarm.appspot.com/user/task/307452fac34aca10

(ran on build101-b1)

Note that the browser restarted 3 times, each time timing out.

Looking at this bot:

https://chromium-swarm.appspot.com/restricted/bot/build101-b1

All of the recent runs of webgl_conformance_tests failed. I think the hardware is failing. I'll file a P0 ticket about removing it from the Swarming pool.

stdout.txt
29.6 KB View Download

Comment 32 by kbr@chromium.org, Aug 5 2016

Blockedon: 634999
Project Member

Comment 33 by bugdroid1@chromium.org, Aug 8 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/da89ebc5ae8414fb51713d9aac6a70310869bf83

commit da89ebc5ae8414fb51713d9aac6a70310869bf83
Author: catapult-deps-roller <catapult-deps-roller@chromium.org>
Date: Mon Aug 08 15:19:11 2016

Roll src/third_party/catapult/ 88c5a34b8..542ff3334 (1 commit).

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/88c5a34b8a88..542ff3334ba1

$ git log 88c5a34b8..542ff3334 --date=short --no-merges --format='%ad %ae %s'

BUG= 628022 

TBR=catapult-sheriff@chromium.org

Review-Url: https://codereview.chromium.org/2225863002
Cr-Commit-Position: refs/heads/master@{#410361}

[modify] https://crrev.com/da89ebc5ae8414fb51713d9aac6a70310869bf83/DEPS

Project Member

Comment 34 by bugdroid1@chromium.org, Aug 8 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6e61cce3dc6538310ef292be09b16593f3b18c6e

commit 6e61cce3dc6538310ef292be09b16593f3b18c6e
Author: eyaich <eyaich@google.com>
Date: Mon Aug 08 20:55:58 2016

Unittest for pushing restart logic into the browser.  Original patch for the restart logic was checked in outside this CL in https://codereview.chromium.org/2219593003/ so it could get in earlier, this is just the follow on CL for the unittest.

BUG= chromium:628022 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2209673003
Cr-Commit-Position: refs/heads/master@{#410454}

[modify] https://crrev.com/6e61cce3dc6538310ef292be09b16593f3b18c6e/content/test/gpu/gpu_tests/gpu_integration_test_unittest.py

Comment 35 by kbr@chromium.org, Aug 8 2016

Status: Started (was: Assigned)
Thanks Emily for your persistence on this. With your unit test in place can we call this fixed?

Ken: do you find another instance which Chrome crashes on start-up? With the logic that Emily added, we should be able to see the screenshot taken at that time in the log.

Comment 37 by kbr@chromium.org, Aug 8 2016

Cc: -cwallez@chromium.org sunn...@chromium.org
Labels: Hotlist-PixelWrangler
Since Emily added the restart logic, the only failures to start the browser I've seen have been problems with the bots: e.g. https://bugs.chromium.org/p/chromium/issues/detail?id=628022#c31 .

I suspect the problem has been broken bots all along, and this was only really confirmed once we started restarting the browser if it failed to launch the first time.

Comment 38 by kbr@chromium.org, Aug 16 2016

Status: Fixed (was: Started)
Thanks Emily for your work on this. Closing as fixed.

Comment 39 by kbr@chromium.org, Aug 16 2016

 Issue 628765  has been merged into this issue.

Comment 40 by kbr@chromium.org, Oct 7 2016

Blocking: 653870

Comment 41 by kbr@chromium.org, Jan 19 2017

Blocking: 682819

Sign in to add a comment