New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 779678 link

Starred by 1 user

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug



Sign in to add a comment

telemetry_perf_unittests does not print stack trace or upload minidump on failure

Project Member Reported by sunn...@chromium.org, Oct 30 2017

Issue description

One of my CLs[1] was failing telemetry_perf_unittests on win7_chromium_rel_ng[2]. The swarming logs[3] show a connection closed error in devtools. This indicates that Chrome crashed but there is no corresponding stack trace. I couldn't reproduce this crash locally, so I created another CL[4] to always dump the stack trace on failure (in _SystemHealthSharedState.DumpStateUponFailure).

It appears that we only collect the minidump and log the stack trace in AppCrashException. InspectorBackend._ConvertExceptionFromInspectorWebsocket converts from devtools exceptions to telemetry exceptions (such as app crash) and if this code fails we don't see a stack trace. kbr@ suggested that we should always collect the minidump and log the stack trace on failure.

[1] https://chromium-review.googlesource.com/c/chromium/src/+/714654
[2] https://build.chromium.org/p/tryserver.chromium.win/builders/win7_chromium_rel_ng/builds/28594
[3] https://chromium-swarm.appspot.com/task?id=3973d8092d12fb10&refresh=10&show_raw=1
    https://chromium-swarm.appspot.com/task?id=3973d80b1eb00e10&refresh=10&show_raw=1
    https://chromium-swarm.appspot.com/task?id=3973d80ba2fd1710&refresh=10&show_raw=1
[4] https://chromium-review.googlesource.com/c/chromium/src/+/742654

Telemetry stack trace:

  Traceback (most recent call last):
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\story_runner.py", line 104, in _RunStoryAndProcessErrorIfNeeded
      state.RunStory(results)
    File "e:\b\swarm_slave\w\ir\third_party\catapult\common\py_trace_event\py_trace_event\trace_event_impl\decorators.py", line 52, in traced_function
      return func(*args, **kwargs)
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\page\shared_page_state.py", line 324, in RunStory
      self._current_page.Run(self)
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\page\__init__.py", line 118, in Run
      self.RunPageInteractions(action_runner)
    File "e:\b\swarm_slave\w\ir\tools\perf\page_sets\system_health\system_health_story.py", line 108, in RunPageInteractions
      self._Measure(action_runner)
    File "e:\b\swarm_slave\w\ir\tools\perf\page_sets\system_health\system_health_story.py", line 91, in _Measure
      action_runner.MeasureMemory(deterministic_mode=True)
    File "e:\b\swarm_slave\w\ir\third_party\catapult\common\py_trace_event\py_trace_event\trace_event_impl\decorators.py", line 52, in traced_function
      return func(*args, **kwargs)
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\actions\action_runner.py", line 161, in MeasureMemory
      dump_id = self.tab.browser.DumpMemory()
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\browser\browser.py", line 350, in DumpMemory
      return self._browser_backend.DumpMemory(timeout=timeout)
    File "e:\b\swarm_slave\w\ir\third_party\catapult\common\py_trace_event\py_trace_event\trace_event_impl\decorators.py", line 52, in traced_function
      return func(*args, **kwargs)
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\backends\chrome\chrome_browser_backend.py", line 322, in DumpMemory
      return self.devtools_client.DumpMemory(timeout=timeout)
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\devtools_client_backend.py", line 414, in DumpMemory
      return self._tracing_backend.DumpMemory(timeout=timeout)
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\tracing_backend.py", line 214, in DumpMemory
      'request:\n' + traceback.format_exc())
  TracingUnrecoverableException: Exception raised while sending a Tracing.requestMemoryDump request:
  Traceback (most recent call last):
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\tracing_backend.py", line 205, in DumpMemory
      response = self._inspector_websocket.SyncRequest(request, timeout)
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_websocket.py", line 116, in SyncRequest
      res = self._Receive(timeout)
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_websocket.py", line 155, in _Receive
      data = self._socket.recv()
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py", line 293, in recv
      opcode, data = self.recv_data()
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py", line 310, in recv_data
      opcode, frame = self.recv_data_frame(control_frame)
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py", line 323, in recv_data_frame
      frame = self.recv_frame()
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py", line 357, in recv_frame
      return self.frame_buffer.recv_frame()
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_abnf.py", line 336, in recv_frame
      self.recv_header()
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_abnf.py", line 286, in recv_header
      header = self.recv_strict(2)
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_abnf.py", line 371, in recv_strict
      bytes_ = self.recv(min(16384, shortage))
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py", line 427, in _recv
      return recv(self.sock, bufsize)
    File "e:\b\swarm_slave\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_socket.py", line 80, in recv
      bytes_ = sock.recv(bufsize)
  error: [Errno 10054] An existing connection was forcibly closed by the remote host

 
Status: WontFix (was: Untriaged)
Oops, forgot that telemetry issues are filed on github. Closing.
Status: Available (was: WontFix)
Actually let use crbug (as we migrated to gerrit & use googlesource to host our project)
We are working on standardize the ways Telemetry store these artifacts (see  issue 772208 ) which should address this.

If this bug is P1, do you have some workaround or just reproduce the bug locally?
Echoing kbr's comment on the github issue:
"In talking with @sunnyps offline I suggested adding code to DumpStateUponFailure, in telemetry/telemetry/internal/browser/browser.py, to symbolize all unsymbolized minidumps at that point. This should be a good catch-all location for all Telemetry based benchmarks to ensure that the stacks are printed if the correct sort of Error or Exception isn't propagated out of the various call stacks. What do you think?"
This sgtm. We should add test to ensure that this works the way we expect.
Cc: -nednguyen@chromium.org nedngu...@google.com
Project Member

Comment 7 by sheriffbot@chromium.org, Oct 31

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Components: Test>Telemetry
Components: -Tests>Telemetry

Sign in to add a comment