Occasional failure of an entire shard with the new browser_test_runner harness |
|||||||||||||||
Issue descriptionOccasionally one failed test will cause all subsequent tests to fail with the new browser_test_runner harness. I'm not sure what is causing this. The initial thought was that a renderer or GPU process crash was causing a minidump to be generated, and that restarting the browser was causing Telemetry's internal symbolization code to run, and to fail. This doesn't immediately seem to be the cause based on initial testing, but it may be necessary to do more failure injection. I'll upload a CL which provokes a failure of the first test in the suite. Either a GPU or renderer process crash may be simulated. Despite best efforts (including modifying code in Catapult to do failure injection) only the first test in the run actually fails. Attached is the full (large) failure log from this try job: https://build.chromium.org/p/tryserver.chromium.win/builders/win_optional_gpu_tests_rel/builds/2151 from this CL: https://codereview.chromium.org/2138673002 All of the tests in shard 10 (search the text for "Shard 10") failed. It's not clear why.
,
Jul 13 2016
Here's another similar kind of failure, this time on Mac OS: https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/1839 It's from this CL: https://codereview.chromium.org/2133673003 Attached is the gzipped stdio.html. One failing test was WebglConformance_conformance2_glsl3_forbidden_operators . Search through the file in a text editor (it seems to be too large for Chrome to load) for "Shard 14". It can be seen that the GPU process hung at the beginning of that shard, it crashed and that crash was symbolized, and then all subsequent tests failed with the error messages: WebglConformance_deqp_functional_gles3_fbocolorbuffer_tex2darray_00 (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ... INFO:root:Found crashpad_database_util [0712/121837:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/work/isolated/isolated_tmptyJSiz/tmp9lirf6/completed/48bc24d7-d8e1-4041-b5c2-dd48d1ecab45.dmp.stripped INFO:root:Minidump found: /b/swarm_slave/work/isolated/isolated_tmptyJSiz/tmp9lirf6/completed/48bc24d7-d8e1-4041-b5c2-dd48d1ecab45.dmp INFO:root:Dumping breakpad symbols. Can't get standard output with --show-stdout ERROR WebglConformance_deqp_functional_gles3_fragmentoutput_random_02 (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ... INFO:root:Found crashpad_database_util [0712/121841:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/work/isolated/isolated_tmptyJSiz/tmp9lirf6/completed/48bc24d7-d8e1-4041-b5c2-dd48d1ecab45.dmp.stripped INFO:root:Minidump found: /b/swarm_slave/work/isolated/isolated_tmptyJSiz/tmp9lirf6/completed/48bc24d7-d8e1-4041-b5c2-dd48d1ecab45.dmp INFO:root:Dumping breakpad symbols. Can't get standard output with --show-stdout ERROR It looks like if browser startup fails at some key points, the test fails.
,
Jul 13 2016
In https://chromium-swarm.appspot.com/user/task/2ff6de5d924f7c10, it looks like the harness did not try to restart the browser? If so, we should have seen multiple "INFO:root:Starting Chrome.." log. Ctrl+F for "Starting Chrome" only shows 1 result. Looking at the stack trace, the browser fails at the setUp method of gpu_integration_test.py (https://cs.chromium.org/chromium/src/content/test/gpu/gpu_tests/gpu_integration_test.py?rcl=0&l=128). I guess this is happening because we don't have logic to restart browser if there is exception raised in setUp/tearDown. Maybe you can try to reproduce this problem by inject a browser crash in gpu_integration_test.GpuIntegrationTest.setUp?
,
Jul 13 2016
#3: I think it's hitting a buildbot limitation. You can use src/content/test/gpu/gather_swarming_json_results.py to scrape and assemble those JSON results. #4: thanks, that's a good idea. I'm really swamped at this point but if you or someone else could verify that idea, I'd appreciate it.
,
Jul 14 2016
Sure, the problem can be reproduced with https://codereview.chromium.org/2147133002/
,
Jul 14 2016
It looks like Ned has done the heavy lifting here. I have confirmed that catching the exception in setUp does enable the rest of the tests to execute and pass. I will write a unittest for this fix and get a cl out for review.
,
Jul 15 2016
Thanks Ned and Emily for reproducing the issue. I wonder whether the issue is that the renderer process is crashing (which is what https://codereview.chromium.org/2147133002/ provokes) or whether the browser's failing to launch. The current code in setUp calls: self.browser.tabs[0] and one of those dereferences is failing. It's not clear why.
,
Jul 15 2016
It's more likely a renderer crash. Here is the code in inspector_backend_list.py > def __getitem__(self, index): > self._Update() > if index >= len(self._filtered_context_ids): > raise exceptions.DevtoolsTargetCrashException( <--- Exception thrown here > self.app, > 'Web content with index %s may have crashed. ' > 'filtered_context_ids = %s' % ( > index, repr(self._filtered_context_ids))) That must mean the self._Update() method is running fine, and that method does issue a request to chrome devtool, indicating that browser process is still alive at that point.
,
Jul 15 2016
Thanks for your analysis. Given this I think we should get rid of GpuIntegrationTest.setUp and merge its code into _RunGpuTest, since doing so will catch this exception in the same way that others are caught by the harness, so we'll have a chance at diagnosing these failures.
,
Jul 15 2016
Note: here's a slightly different instance of the same problem: https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/1904 from CL: https://codereview.chromium.org/2144293002 Shard 4 failed: https://chromium-swarm.appspot.com/user/task/30030b427d36bd10 It looks like the browser failed to start: ERROR:root:Failure while starting browser backend. Traceback (most recent call last): File "/b/swarm_slave/work/isolated/ir0Uihr1/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 55, in __init__ self._browser_backend.Start() File "/b/swarm_slave/work/isolated/ir0Uihr1/third_party/catapult/telemetry/telemetry/internal/backends/chrome/desktop_browser_backend.py", line 291, in Start self._WaitForBrowserToComeUp() File "/b/swarm_slave/work/isolated/ir0Uihr1/third_party/catapult/telemetry/telemetry/internal/backends/chrome/chrome_browser_backend.py", line 174, in _WaitForBrowserToComeUp raise exceptions.BrowserConnectionGoneException(self.browser, e) BrowserConnectionGoneException: Timed out while waiting 60s for HasBrowserFinishedLaunching. Found Minidump: False Stack Trace: ******************************************************************************** No crash dump found. ******************************************************************************** Standard output: ******************************************************************************** ******************************************************************************** ERROR ====================================================================== ERROR: setUpClass (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/b/swarm_slave/work/isolated/ir0Uihr1/content/test/gpu/gpu_tests/webgl_conformance_integration_test.py", line 77, in setUpClass cls.StartBrowser() File "/b/swarm_slave/work/isolated/ir0Uihr1/third_party/catapult/telemetry/telemetry/testing/serially_executed_browser_test_case.py", line 83, in StartBrowser cls.browser = cls._browser_to_create.Create(cls._browser_options) File "/b/swarm_slave/work/isolated/ir0Uihr1/third_party/catapult/telemetry/telemetry/internal/backends/chrome/desktop_browser_finder.py", line 68, in Create browser_backend, self._platform_backend, self._credentials_path) File "/b/swarm_slave/work/isolated/ir0Uihr1/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 55, in __init__ self._browser_backend.Start() File "/b/swarm_slave/work/isolated/ir0Uihr1/third_party/catapult/telemetry/telemetry/internal/backends/chrome/desktop_browser_backend.py", line 291, in Start self._WaitForBrowserToComeUp() File "/b/swarm_slave/work/isolated/ir0Uihr1/third_party/catapult/telemetry/telemetry/internal/backends/chrome/chrome_browser_backend.py", line 174, in _WaitForBrowserToComeUp raise exceptions.BrowserConnectionGoneException(self.browser, e) BrowserConnectionGoneException: Timed out while waiting 60s for HasBrowserFinishedLaunching. Found Minidump: False Stack Trace: ******************************************************************************** No crash dump found. ******************************************************************************** Standard output: ******************************************************************************** ******************************************************************************** This is pretty unfortunate and I wonder what kind of additional instrumentation could be added to figure out what happened. (Specifically, does it seem likely the first renderer process which the browser launched crashed upon startup, like the other failure modes?)
,
Jul 15 2016
,
Jul 18 2016
Another problem where one failure left a corrupted minidump that caused catastrophic failure of the harness: https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/2034 Shard #3 failed: https://chromium-swarm.appspot.com/user/task/3018ab1fad251110 WebglConformance_deqp_functional_gles3_texturefiltering_3d_combinations_25 (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ... INFO:root:Found crashpad_database_util [0718/142618:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp.stripped INFO:root:Minidump found: /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp INFO:root:Dumping breakpad symbols. INFO:root:Found crashpad_database_util [0718/142636:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp.stripped Can't get standard output with --show-stdout ERROR WebglConformance_deqp_functional_gles3_fboinvalidate_default (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ... INFO:root:Found crashpad_database_util [0718/142636:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp.stripped INFO:root:Minidump found: /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp INFO:root:Dumping breakpad symbols. INFO:root:Found crashpad_database_util [0718/142655:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp.stripped Can't get standard output with --show-stdout ERROR ...
,
Jul 21 2016
It looks like this last one on the 18th might be slightly different since it doesn't appear to have anything to do with the state of the browser. It seems that is is trying to parse the same corrupt metadata each time and is failing. I wonder if that should be a separate bug.
,
Jul 21 2016
You're right that it's not the same problem as the browser crashing at the beginning of the run, but it's a similar kind of failure which has been seen before on Mac OS. It'd be fine to file a separate bug, but do you think you could try to figure out how to mock that failure in the harness, and work around it by not trying to re-symbolize the corrupted minidump each time?
,
Jul 21 2016
Here's another failure: https://build.chromium.org/p/tryserver.chromium.angle/builders/mac_angle_rel_ng/builds/1700 Shard 1 failed: https://chromium-swarm.appspot.com/user/task/30285ede678e5b10 Full stdout's attached, but the symptom is that the browser failed to launch the first time: ---------- WebglConformance_deqp_functional_gles3_vertexarrays_multiple_attributes_output (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ... [298:92183:0721/152808:WARNING:simple_synchronous_entry.cc(1054)] Could not open platform files for entry. Traceback (most recent call last): _RunGpuTest at content/test/gpu/gpu_tests/gpu_integration_test.py:49 self.RunActualGpuTest(url, *args) RunActualGpuTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:48 'webglTestHarness._finished', timeout_in_seconds=300) WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/actions/action_runner.py:187 self._tab.WaitForJavaScriptExpression(condition, timeout_in_seconds) WaitForJavaScriptExpression at third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py:136 e.message + '\n' + debug_message) TimeoutException: Timed out while waiting 300s for IsJavaScriptExpressionTrue. Console output: Locals: IsJavaScriptExpressionTrue : <function IsJavaScriptExpressionTrue at 0x1060a8500> debug_message : 'Console output:\n' e : TimeoutException('Timed out while waiting 300s for IsJavaScriptExpressionTrue.',) expr : 'webglTestHarness._finished' timeout : 300 WARNING:root:Restarting browser due to unexpected test failure [307:20227:0721/153322:ERROR:node_controller.cc(1099)] Could not be introduced to peer AC2230C166E56678.4BF0E494F7A4559A [298:1287:0721/153323:WARNING:url_request_context_getter.cc(43)] URLRequestContextGetter leaking due to no owning thread. WARNING:root:Chrome build location for mac_x86_64 not found. Browser will be run without Flash. INFO:root:Chose browser: PossibleDesktopBrowser(type=release, executable=/b/swarm_slave/w/ir4r00_T/out/Release/Chromium.app/Contents/MacOS/Chromium, flash=None) INFO:root:Requested remote debugging port: 0 INFO:root:Starting Chrome ['/b/swarm_slave/w/ir4r00_T/out/Release/Chromium.app/Contents/MacOS/Chromium', '--enable-unsafe-es3-apis', '--test-type=gpu', '--disable-domain-blocking-for-3d-apis', '--disable-gesture-requirement-for-media-playback', '--disable-gpu-process-crash-limit', '--disable-accelerated-video-decode', '--enable-experimental-canvas-features', '--js-flags=--expose-gc', '--enable-logging=stderr', '--enable-net-benchmarking', '--metrics-recording-only', '--no-default-browser-check', '--no-first-run', '--enable-gpu-benchmarking', '--disable-background-networking', '--no-proxy-server', '--disable-component-extensions-with-background-pages', '--disable-default-apps', '--remote-debugging-port=0', '--enable-crash-reporter-for-testing', '--window-size=1280,1024', '--user-data-dir=/b/swarm_slave/w/itTQ15AN/tmp0cIGEd', 'about:blank'] INFO:root:Found crashpad_database_util INFO:root:No minidump found via crashpad_database_util INFO:root:Found crashpad_database_util INFO:root:No minidump found via crashpad_database_util Can't get standard output with --show-stdout ERROR:root:Failure while starting browser backend. Traceback (most recent call last): File "/b/swarm_slave/w/ir4r00_T/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 55, in __init__ self._browser_backend.Start() File "/b/swarm_slave/w/ir4r00_T/third_party/catapult/telemetry/telemetry/internal/backends/chrome/desktop_browser_backend.py", line 291, in Start self._WaitForBrowserToComeUp() File "/b/swarm_slave/w/ir4r00_T/third_party/catapult/telemetry/telemetry/internal/backends/chrome/chrome_browser_backend.py", line 174, in _WaitForBrowserToComeUp raise exceptions.BrowserConnectionGoneException(self.browser, e) BrowserConnectionGoneException: Timed out while waiting 60s for HasBrowserFinishedLaunching. Found Minidump: False Stack Trace: ******************************************************************************** No crash dump found. ******************************************************************************** Standard output: ******************************************************************************** ******************************************************************************** ERROR WebglConformance_deqp_functional_gles3_transformfeedback_array_interleaved_triangles (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ... ERROR WebglConformance_deqp_functional_gles3_vertexarrays_single_attribute_output_type_unsigned_short (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ... ERROR ...... ---------- This is a really serious problem that's affecting the Mac GPU tryservers badly. I think we should consider trying to work around it by retrying the browser's bringup multiple times inside the harness. https://build.chromium.org/p/tryserver.chromium.angle/builders/mac_angle_rel_ng?numbuilds=200 https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel?numbuilds=200 Fortunately, it's not affecting the main Chromium CQ bots which run webgl_conformance_tests: https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_ng?numbuilds=200 I don't know why there's a behavioral difference between these two sets of bots. They are running the on the same hardware in the Swarming pool.
,
Jul 22 2016
Working around it by retrying the browser sounds good to me. It's really strange that there is no stdout from the browser here.
,
Jul 22 2016
We're also seeing random timeouts in the middle of test runs. See https://bugs.chromium.org/p/chromium/issues/detail?id=619264#c28 and the random failure of WebglConformance_deqp_functional_gles3_shadermatrix_div_uniform in https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/2150 . I wonder whether something's flaky in DevTools' websocket connection to the browser on Mac OS.
,
Jul 22 2016
CC'ing pfeldman@ for discussion about the possibility of a DevTools issue.
,
Jul 22 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/202a09250ad4141be8a80cd63881998d8e821f09 commit 202a09250ad4141be8a80cd63881998d8e821f09 Author: eyaich <eyaich@google.com> Date: Fri Jul 22 03:26:28 2016 Adding logic to restart the browser if there is an exception in the setUp of GpuIntegrationTest Note: I am not certain there is a good way to test the setup functionality of GpuIntegrationTest in a unittest. Given that we are using a Fakes, I have added a hack to simulate throwing an error in the setup method, but it is not how it would behave in practice. Any suggestions for a better way to unittest this are appreciated. Dependent on https://codereview.chromium.org/2148283003 landing in telemetry first BUG= 628022 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2151983002 Cr-Commit-Position: refs/heads/master@{#407022} [modify] https://crrev.com/202a09250ad4141be8a80cd63881998d8e821f09/content/test/gpu/gpu_tests/gpu_integration_test.py [modify] https://crrev.com/202a09250ad4141be8a80cd63881998d8e821f09/content/test/gpu/gpu_tests/gpu_integration_test_unittest.py
,
Jul 22 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/202a09250ad4141be8a80cd63881998d8e821f09 commit 202a09250ad4141be8a80cd63881998d8e821f09 Author: eyaich <eyaich@google.com> Date: Fri Jul 22 03:26:28 2016 Adding logic to restart the browser if there is an exception in the setUp of GpuIntegrationTest Note: I am not certain there is a good way to test the setup functionality of GpuIntegrationTest in a unittest. Given that we are using a Fakes, I have added a hack to simulate throwing an error in the setup method, but it is not how it would behave in practice. Any suggestions for a better way to unittest this are appreciated. Dependent on https://codereview.chromium.org/2148283003 landing in telemetry first BUG= 628022 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2151983002 Cr-Commit-Position: refs/heads/master@{#407022} [modify] https://crrev.com/202a09250ad4141be8a80cd63881998d8e821f09/content/test/gpu/gpu_tests/gpu_integration_test.py [modify] https://crrev.com/202a09250ad4141be8a80cd63881998d8e821f09/content/test/gpu/gpu_tests/gpu_integration_test_unittest.py
,
Jul 26 2016
RE comment 16: A cl is out to restart browser up to 3 times in the even of crashing on restart. RE comment 13-15: Upon closer inspection of the stack traces I don't think that a corrupted minidump is actually what is crashing the browser. In the shard 3 log that you pointed out (https://chromium-swarm.appspot.com/user/task/3018ab1fad251110) there are two instances of these failed minidump symbolizations: The first looks like a dev tools crash: INFO:root:Found crashpad_database_util INFO:root:Minidump found: /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp INFO:root:Downloading gs://chromium-telemetry/binary_dependencies/minidump_stackwalk_76c5983fc9e9316a9d4251ba3e68b955c4fc9bf3 to /b/swarm_slave/w/irFPEqPO/third_party/catapult/telemetry/telemetry/internal/bin/mac/x86_64/minidump_stackwalk INFO:root:Dumping breakpad symbols. INFO:root:Downloading gs://chromium-telemetry/binary_dependencies/minidump_dump_c39bd7a3b9fa6279893b2d759045699d79ce4dcb to /b/swarm_slave/w/irFPEqPO/third_party/catapult/telemetry/telemetry/internal/bin/mac/x86_64/minidump_dump [0718/142540:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp.stripped INFO:root:Found crashpad_database_util [0718/142618:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp.stripped Can't get standard output with --show-stdout Expected exception while running WebglConformance_deqp_functional_gles3_shaderloop_do_while Traceback (most recent call last): _RunGpuTest at content/test/gpu/gpu_tests/gpu_integration_test.py:49 self.RunActualGpuTest(url, *args) RunActualGpuTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:48 'webglTestHarness._finished', timeout_in_seconds=300) WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/actions/action_runner.py:187 self._tab.WaitForJavaScriptExpression(condition, timeout_in_seconds) WaitForJavaScriptExpression at third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py:123 util.WaitFor(IsJavaScriptExpressionTrue, timeout) WaitFor at third_party/catapult/telemetry/telemetry/core/util.py:86 res = condition() IsJavaScriptExpressionTrue at third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py:116 return bool(self.EvaluateJavaScript(expr)) EvaluateJavaScript at third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py:187 expr, context_id=None, timeout=timeout) EvaluateJavaScriptInContext at third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py:215 expr, context_id=context_id, timeout=timeout) inner at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:32 return func(inspector_backend, *args, **kwargs) EvaluateJavaScript at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:203 return self._runtime.Evaluate(expr, context_id, timeout) Evaluate at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_runtime.py:45 res = self._inspector_websocket.SyncRequest(request, timeout) SyncRequest at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py:110 res = self._Receive(timeout) _Receive at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py:166 self._HandleNotification(result) _HandleNotification at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py:179 self._domain_handlers[domain_name](result) _HandleInspectorDomainNotification at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:357 raise exception DevtoolsTargetCrashException: Devtools target crashed ******************************************************************************** (/b/swarm_slave/w/irFPEqPO/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:410 _AddDebuggingInformation) Received a socket error in the browser connection and the tab no longer exists. The tab probably crashed. ******************************************************************************** (/b/swarm_slave/w/irFPEqPO/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:411 _AddDebuggingInformation) Debugger url: ws://127.0.0.1:50705/devtools/page/760a1760-c3ef-48ee-843d-3ec701bf3fcc Found Minidump: True Stack Trace: The second appears to be crashing browser set up again because the browser is down and my first CL that restarts on a setup crash (https://codereview.chromium.org/2151983002/) should take care of that: [0718/145233:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp.stripped WARNING:root:Crash dump is older than 5 minutes. May not be correct. Can't get standard output with --show-stdout ERROR WebglConformance_conformance2_state_gl_get_calls (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ... INFO:root:Found crashpad_database_util [0718/145233:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp.stripped WARNING:root:Crash dump is older than 5 minutes. May not be correct. INFO:root:Minidump found: /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp INFO:root:Dumping breakpad symbols. INFO:root:Found crashpad_database_util [0718/145238:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp.stripped WARNING:root:Crash dump is older than 5 minutes. May not be correct. Can't get standard output with --show-stdout ERROR WebglConformance_conformance_textures_misc_tex_sub_image_2d (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ... INFO:root:Found crashpad_database_util [0718/145238:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp.stripped WARNING:root:Crash dump is older than 5 minutes. May not be correct. INFO:root:Minidump found: /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp INFO:root:Dumping breakpad symbols. INFO:root:Found crashpad_database_util [0718/145243:WARNING:crash_report_database_mac.mm(636)] Failed to read report metadata for /b/swarm_slave/w/itbUmasW/tmpGzJ01O/completed/89fc039b-82cb-45d6-b1f9-1fa9bdbb19f6.dmp.stripped WARNING:root:Crash dump is older than 5 minutes. May not be correct. Can't get standard output with --show-stdout ERROR [4192:1287:0718/145244:WARNING:url_request_context_getter.cc(43)] URLRequestContextGetter leaking due to no owning thread. ====================================================================== ERROR: WebglConformance_deqp_functional_gles3_texturefiltering_3d_combinations_25 (gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/b/swarm_slave/w/irFPEqPO/content/test/gpu/gpu_tests/gpu_integration_test.py", line 129, in setUp self.tab = self.browser.tabs[0] File "/b/swarm_slave/w/irFPEqPO/third_party/catapult/telemetry/telemetry/internal/browser/tab_list.py", line 18, in __getitem__ return self._tab_list_backend.__getitem__(index) File "/b/swarm_slave/w/irFPEqPO/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend_list.py", line 62, in __getitem__ index, repr(self._filtered_context_ids))) DevtoolsTargetCrashException: Web content with index 0 may have crashed. filtered_context_ids = [] Found Minidump: True Stack Trace: *******************************************************************************
,
Jul 26 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ddb380f8be43fc6f1edef0ddb962403695b96370 commit ddb380f8be43fc6f1edef0ddb962403695b96370 Author: eyaich <eyaich@google.com> Date: Tue Jul 26 17:54:42 2016 Adding 3 attempts at starting the browser for a gpu integration test. BUG= 628022 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2181673002 Cr-Commit-Position: refs/heads/master@{#407847} [modify] https://crrev.com/ddb380f8be43fc6f1edef0ddb962403695b96370/content/test/gpu/gpu_tests/gpu_integration_test.py [modify] https://crrev.com/ddb380f8be43fc6f1edef0ddb962403695b96370/content/test/gpu/gpu_tests/gpu_integration_test_unittest.py
,
Aug 3 2016
,
Aug 3 2016
Thanks Emily for working on this. Unfortunately, it doesn't look like the 3 restart attempts are working. See Issue 633617 and https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/2420 as an example. https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/2420/steps/webgl2_conformance_tests%20on%20NVIDIA%20GPU%20on%20Mac%20Retina%20%28with%20patch%29%20on%20Mac/logs/stdio shows shards 4 and 13 failing to launch the browser 3 times: https://chromium-swarm.appspot.com/user/task/3064d44f1d72fe10 https://chromium-swarm.appspot.com/user/task/3064d465907f3c10 Do you have any ideas? From the stack traces does it look like your restart code is working as expected?
,
Aug 3 2016
So this isn't actually in the same part of the code. This failure is in setUpClass, not setUp, which is where I added the 3 restart attempts before. This is failing the first time the test is trying to run and it can't bring the browser up. I think the bigger problem is why the browser is not able to be restarted. Ned has suggested I could add the ability to take a screenshot on failure so we might get a better idea of what is going on when the browser crashes. Similar to what he did for the benchmarks: https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/page/shared_page_state.py#L154. I will look at what I can reuse there and see if I can add the same functionality. Next, I propose that we pull the restart into StartBrowser. I will override the method in GpuIntegrationTest to do the restart so that all subclasses of SeriallyExecutedBrowserTestCase don't inherit this behavior. This is where I will do the screenshot on failure as well. The only concern I have with this is that the retry logic in restart actually wraps more than just StartBrowser, but we have no evidence that the failure happened anywhere else, so I think think is a good approach.
,
Aug 3 2016
To #26, pull the restart into StartBrowser sgtm.
,
Aug 3 2016
Great analysis. SGTM to refactor StartBrowser as you see fit. Thanks for continuing to work on this. I agree it's really mysterious why the browser is failing to launch. This is definitely new behavior and I'm concerned that there's a newly-introduced race condition (perhaps limited to Mac OS) affecting the actual product.
,
Aug 3 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/1d7d9c3fb80c5aa8b532713176263654e54e3d79 commit 1d7d9c3fb80c5aa8b532713176263654e54e3d79 Author: catapult-deps-roller <catapult-deps-roller@chromium.org> Date: Wed Aug 03 19:22:34 2016 Roll src/third_party/catapult/ ff62a5c2f..7d2a597a4 (1 commit). https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/ff62a5c2f33c..7d2a597a4cff $ git log ff62a5c2f..7d2a597a4 --date=short --no-merges --format='%ad %ae %s' BUG= 628022 TBR=catapult-sheriff@chromium.org Review-Url: https://codereview.chromium.org/2205313003 Cr-Commit-Position: refs/heads/master@{#409590} [modify] https://crrev.com/1d7d9c3fb80c5aa8b532713176263654e54e3d79/DEPS
,
Aug 5 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/085c13919b7a8f7fb77e428a77df45bbc98978ae commit 085c13919b7a8f7fb77e428a77df45bbc98978ae Author: nednguyen <nednguyen@google.com> Date: Fri Aug 05 02:33:01 2016 [content/test/gpu] Pushing the restart logic into start browser. This patch is done on behalf of eyaich@chromium.org (original work in https://codereview.chromium.org/2209673003). For reviewing this CL, the 1st patch set is the patch set 1 of https://codereview.chromium.org/2209673003. The 2nd patch set address some nits & add TODO for improve unittest logic. BUG= chromium:628022 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2219593003 Cr-Commit-Position: refs/heads/master@{#409971} [modify] https://crrev.com/085c13919b7a8f7fb77e428a77df45bbc98978ae/content/test/gpu/gpu_tests/gpu_integration_test.py [modify] https://crrev.com/085c13919b7a8f7fb77e428a77df45bbc98978ae/content/test/gpu/gpu_tests/gpu_integration_test_unittest.py
,
Aug 5 2016
Thanks for continuing to push this forward. After https://codereview.chromium.org/2219593003/ landed, this failure happened: https://build.chromium.org/p/chromium.gpu.fyi/builders/Mac%20Retina%20Release/builds/6146/steps/webgl_conformance_tests%20on%20NVIDIA%20GPU%20on%20Mac%20Retina%20on%20Mac/logs/stdio Link to Swarming shard: https://chromium-swarm.appspot.com/user/task/307452fac34aca10 (ran on build101-b1) Note that the browser restarted 3 times, each time timing out. Looking at this bot: https://chromium-swarm.appspot.com/restricted/bot/build101-b1 All of the recent runs of webgl_conformance_tests failed. I think the hardware is failing. I'll file a P0 ticket about removing it from the Swarming pool.
,
Aug 5 2016
,
Aug 8 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/da89ebc5ae8414fb51713d9aac6a70310869bf83 commit da89ebc5ae8414fb51713d9aac6a70310869bf83 Author: catapult-deps-roller <catapult-deps-roller@chromium.org> Date: Mon Aug 08 15:19:11 2016 Roll src/third_party/catapult/ 88c5a34b8..542ff3334 (1 commit). https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/88c5a34b8a88..542ff3334ba1 $ git log 88c5a34b8..542ff3334 --date=short --no-merges --format='%ad %ae %s' BUG= 628022 TBR=catapult-sheriff@chromium.org Review-Url: https://codereview.chromium.org/2225863002 Cr-Commit-Position: refs/heads/master@{#410361} [modify] https://crrev.com/da89ebc5ae8414fb51713d9aac6a70310869bf83/DEPS
,
Aug 8 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6e61cce3dc6538310ef292be09b16593f3b18c6e commit 6e61cce3dc6538310ef292be09b16593f3b18c6e Author: eyaich <eyaich@google.com> Date: Mon Aug 08 20:55:58 2016 Unittest for pushing restart logic into the browser. Original patch for the restart logic was checked in outside this CL in https://codereview.chromium.org/2219593003/ so it could get in earlier, this is just the follow on CL for the unittest. BUG= chromium:628022 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2209673003 Cr-Commit-Position: refs/heads/master@{#410454} [modify] https://crrev.com/6e61cce3dc6538310ef292be09b16593f3b18c6e/content/test/gpu/gpu_tests/gpu_integration_test_unittest.py
,
Aug 8 2016
Thanks Emily for your persistence on this. With your unit test in place can we call this fixed?
,
Aug 8 2016
Ken: do you find another instance which Chrome crashes on start-up? With the logic that Emily added, we should be able to see the screenshot taken at that time in the log.
,
Aug 8 2016
Since Emily added the restart logic, the only failures to start the browser I've seen have been problems with the bots: e.g. https://bugs.chromium.org/p/chromium/issues/detail?id=628022#c31 . I suspect the problem has been broken bots all along, and this was only really confirmed once we started restarting the browser if it failed to launch the first time.
,
Aug 16 2016
Thanks Emily for your work on this. Closing as fixed.
,
Aug 16 2016
Issue 628765 has been merged into this issue.
,
Oct 7 2016
,
Jan 19 2017
|
|||||||||||||||
►
Sign in to add a comment |
|||||||||||||||
Comment 1 by kbr@chromium.org
, Jul 13 2016