New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 891869 link

Starred by 1 user

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocked on:
issue 891898



Sign in to add a comment

Graceful handling of "TEST RESULTS WERE INVALID" for telemetry_perf_unittests

Project Member Reported by erikc...@chromium.org, Oct 3

Issue description

This CQ try job:
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_chromium_rel_ng/200447

fails with "TEST RESULTS WERE INVALID" for telemetry_perf_unittests -- both 'with patch' and 'retry with patch'. I'm trying to figure out what causes "TEST RESULTS WERE INVALID", but the logs have so many error messages I have trouble telling.

See:
https://chromium-swarm.appspot.com/task?id=4051f87e00cec610&refresh=10&show_raw=1

and:
https://chromium-swarm.appspot.com/task?id=4051f87e00cec610&refresh=10&show_raw=1

Both of the swarming tasks have status "TIMED_OUT" even though they completed in about 16 minutes. 

Request 1: Let's not emit logs if the test passes. That will make the log file easier to read.

Request 2: Let's go through and figure out what's causing all these python exceptions -- which ones are fatal? If they're not fatal, let's not emit logs -- it makes it very hard to tell what's going on.

Examples of exceptions in these two logs:
"""
  Traceback (most recent call last):
    File "/b/s/w/ir/.swarming_module/lib/python2.7/logging/__init__.py", line 884, in emit
      stream.write(fs % msg.encode("UTF-8"))
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 5955: ordinal not in range(128)
  Logged from file browser.py, line 357
...
DevtoolsTargetCrashException: Devtools target crashed
...
  Exception when trying to capture screenshot: TimeoutException('',)
  Deleting unused artifact 'minidump' of 'multitab:misc:typical24:2018'
  Failure recorded: Exception raised running multitab:misc:typical24:2018
...
Exception raised when cleaning story run: 
  
  Traceback (most recent call last):
    _RunStoryAndProcessErrorIfNeeded at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py:153
      None
    DidRunStory at /b/s/w/ir/third_party/catapult/telemetry/telemetry/web_perf/timeline_based_measurement.py:296
      None
    FakeStopTracing at /b/s/w/ir/third_party/catapult/telemetry/telemetry/testing/page_test_test_case.py:97
      None
    StopTracing at /b/s/w/ir/third_party/catapult/telemetry/telemetry/core/tracing_controller.py:53
      None
    StopTracing at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py:157
      None
  TracingException: Exceptions raised when trying to stop tracing:
  Traceback (most recent call last):
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py", line 135, in StopTracing
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py", line 202, in StopAgentTracing
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py", line 303, in _RemoveTraceConfigFile
    File "/b/s/w/ir/.swarming_module/lib/python2.7/shutil.py", line 253, in rmtree
    File "/b/s/w/ir/.swarming_module/lib/python2.7/shutil.py", line 251, in rmtree
  OSError: [Errno 2] No such file or directory: '/b/s/w/itnZtyBw/tmp2Cohvb'
...
 Benchmark execution interrupted by a fatal exception: <type 'exceptions.AssertionError'>()
  Try printing formatted exception: None None None
  
  Traceback (most recent call last):
    RunBenchmark at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py:375
      None
    Run at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py:279
      None
    PopulateHistogramSet at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/results/page_test_results.py:285
      None
    ConvertChartJson at /b/s/w/ir/third_party/catapult/tracing/tracing/value/convert_chart_json.py:24
      None
    RunFile at /b/s/w/ir/third_party/catapult/third_party/vinn/vinn/_vinn.py:168
      None
  AssertionError
...
(ERROR) 2018-10-03 03:07:37,479 pid=11685  atexit_with_log._WrappedFn:16  Exception running <bound method ReplayServer.StopServer of <telemetry.internal.util.webpagereplay_go_server.ReplayServer object at 0x7fe57695dd90>>
Traceback (most recent call last):
  File "/b/s/w/ir/third_party/catapult/common/py_utils/py_utils/atexit_with_log.py", line 13, in _WrappedFn
  File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/util/webpagereplay_go_server.py", line 255, in StopServer
  File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/util/webpagereplay_go_server.py", line 312, in _CleanUpTempLogFilePath
OSError: [Errno 2] No such file or directory: '/b/s/w/itnZtyBw/tmpacCHhx'
**Non zero exit code**
"""

Request 3: Let's figure out what's causing the swarming job to time out and fix it.
 
Oops, copy-pasted the same link above. It should be:

https://chromium-swarm.appspot.com/task?id=40522850c9a47a10&refresh=10&show_raw=1


Cc: perezju@chromium.org eyaich@chromium.org
Components: Tests>Telemetry
Not sure which team owns the Telemetry harness at this point. CC'ing a few people.

Blockedon: 891898
Request (1) has been broken out into https://bugs.chromium.org/p/chromium/issues/detail?id=891898.
Cc: crouleau@google.com
kbr@: Chrome core automation team own Telemetry harness now. The official members right now are me & crouleau@.


Labels: Infra-Platform-Test

Comment 6 by benhenry@google.com, Jan 16 (6 days ago)

Components: Test>Telemetry

Comment 7 by benhenry@google.com, Jan 16 (6 days ago)

Components: -Tests>Telemetry

Sign in to add a comment