telemetry_perf_unittests does not print stack trace or upload minidump on failure |
||||||
Issue descriptionOne of my CLs[1] was failing telemetry_perf_unittests on win7_chromium_rel_ng[2]. The swarming logs[3] show a connection closed error in devtools. This indicates that Chrome crashed but there is no corresponding stack trace. I couldn't reproduce this crash locally, so I created another CL[4] to always dump the stack trace on failure (in _SystemHealthSharedState.DumpStateUponFailure). It appears that we only collect the minidump and log the stack trace in AppCrashException. InspectorBackend._ConvertExceptionFromInspectorWebsocket converts from devtools exceptions to telemetry exceptions (such as app crash) and if this code fails we don't see a stack trace. kbr@ suggested that we should always collect the minidump and log the stack trace on failure. [1] https://chromium-review.googlesource.com/c/chromium/src/+/714654 [2] https://build.chromium.org/p/tryserver.chromium.win/builders/win7_chromium_rel_ng/builds/28594 [3] https://chromium-swarm.appspot.com/task?id=3973d8092d12fb10&refresh=10&show_raw=1 https://chromium-swarm.appspot.com/task?id=3973d80b1eb00e10&refresh=10&show_raw=1 https://chromium-swarm.appspot.com/task?id=3973d80ba2fd1710&refresh=10&show_raw=1 [4] https://chromium-review.googlesource.com/c/chromium/src/+/742654 Telemetry stack trace: Traceback (most recent call last): File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\story_runner.py", line 104, in _RunStoryAndProcessErrorIfNeeded state.RunStory(results) File "e:\b\swarm_slave\w\ir\third_party\catapult\common\py_trace_event\py_trace_event\trace_event_impl\decorators.py", line 52, in traced_function return func(*args, **kwargs) File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\page\shared_page_state.py", line 324, in RunStory self._current_page.Run(self) File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\page\__init__.py", line 118, in Run self.RunPageInteractions(action_runner) File "e:\b\swarm_slave\w\ir\tools\perf\page_sets\system_health\system_health_story.py", line 108, in RunPageInteractions self._Measure(action_runner) File "e:\b\swarm_slave\w\ir\tools\perf\page_sets\system_health\system_health_story.py", line 91, in _Measure action_runner.MeasureMemory(deterministic_mode=True) File "e:\b\swarm_slave\w\ir\third_party\catapult\common\py_trace_event\py_trace_event\trace_event_impl\decorators.py", line 52, in traced_function return func(*args, **kwargs) File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\actions\action_runner.py", line 161, in MeasureMemory dump_id = self.tab.browser.DumpMemory() File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\browser\browser.py", line 350, in DumpMemory return self._browser_backend.DumpMemory(timeout=timeout) File "e:\b\swarm_slave\w\ir\third_party\catapult\common\py_trace_event\py_trace_event\trace_event_impl\decorators.py", line 52, in traced_function return func(*args, **kwargs) File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\backends\chrome\chrome_browser_backend.py", line 322, in DumpMemory return self.devtools_client.DumpMemory(timeout=timeout) File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\devtools_client_backend.py", line 414, in DumpMemory return self._tracing_backend.DumpMemory(timeout=timeout) File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\tracing_backend.py", line 214, in DumpMemory 'request:\n' + traceback.format_exc()) TracingUnrecoverableException: Exception raised while sending a Tracing.requestMemoryDump request: Traceback (most recent call last): File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\tracing_backend.py", line 205, in DumpMemory response = self._inspector_websocket.SyncRequest(request, timeout) File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_websocket.py", line 116, in SyncRequest res = self._Receive(timeout) File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_websocket.py", line 155, in _Receive data = self._socket.recv() File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py", line 293, in recv opcode, data = self.recv_data() File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py", line 310, in recv_data opcode, frame = self.recv_data_frame(control_frame) File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py", line 323, in recv_data_frame frame = self.recv_frame() File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py", line 357, in recv_frame return self.frame_buffer.recv_frame() File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_abnf.py", line 336, in recv_frame self.recv_header() File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_abnf.py", line 286, in recv_header header = self.recv_strict(2) File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_abnf.py", line 371, in recv_strict bytes_ = self.recv(min(16384, shortage)) File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py", line 427, in _recv return recv(self.sock, bufsize) File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_socket.py", line 80, in recv bytes_ = sock.recv(bufsize) error: [Errno 10054] An existing connection was forcibly closed by the remote host
,
Oct 30 2017
Actually let use crbug (as we migrated to gerrit & use googlesource to host our project)
,
Oct 30 2017
We are working on standardize the ways Telemetry store these artifacts (see issue 772208 ) which should address this. If this bug is P1, do you have some workaround or just reproduce the bug locally?
,
Oct 30 2017
Echoing kbr's comment on the github issue: "In talking with @sunnyps offline I suggested adding code to DumpStateUponFailure, in telemetry/telemetry/internal/browser/browser.py, to symbolize all unsymbolized minidumps at that point. This should be a good catch-all location for all Telemetry based benchmarks to ensure that the stacks are printed if the correct sort of Error or Exception isn't propagated out of the various call stacks. What do you think?"
,
Oct 30 2017
This sgtm. We should add test to ensure that this works the way we expect.
,
Oct 30 2017
,
Oct 31
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jan 16
,
Jan 16
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by sunn...@chromium.org
, Oct 30 2017