New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 828917 link

Starred by 3 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: ----

Blocked on:
issue 865725



Sign in to add a comment

system_health.common_desktop/multitab:misc:typical24 failing flakily on Mac

Project Member Reported by sheriff-...@appspot.gserviceaccount.com, Apr 4 2018

Issue description

Filed by sheriff-o-matic@appspot.gserviceaccount.com on behalf of sullivan@google.com

system_health.common_desktop/multitab:misc:typical24 in system_health.common_desktop failing on multiple builders

Builders failed on: 
- Mac 10.12 Perf: 
  https://build.chromium.org/p/chromium.perf/builders/Mac%2010.12%20Perf
- Mac Air 10.11 Perf: 
  https://build.chromium.org/p/chromium.perf/builders/Mac%20Air%2010.11%20Perf


 
Example log: https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2FMac_10.12_Perf%2F1875%2F%2B%2Frecipes%2Fsteps%2Fsystem_health.common_desktop_on_Intel_GPU_on_Mac_on_Mac-10.12%2F0%2Fstdout

Traceback (most recent call last):
  File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py", line 123, in _RunStoryAndProcessErrorIfNeeded
    state.RunStory(results)
  File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function
    return func(*args, **kwargs)
  File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/page/shared_page_state.py", line 311, in RunStory
    self._current_page.Run(self)
  File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/page/__init__.py", line 91, in Run
    shared_state.page_test.RunNavigateSteps(self, current_tab)
  File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function
    return func(*args, **kwargs)
  File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/page/legacy_page_test.py", line 152, in RunNavigateSteps
    page.RunNavigateSteps(action_runner)
  File "/b/s/w/ir/tools/perf/page_sets/system_health/multi_tab_stories.py", line 30, in RunNavigateSteps
    new_tab.action_runner.Navigate(url)
  File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function
    return func(*args, **kwargs)
  File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/actions/action_runner.py", line 186, in Navigate
    timeout_in_seconds=timeout_in_seconds))
  File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function
    return func(*args, **kwargs)
  File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/actions/action_runner.py", line 62, in _RunAction
    action.RunAction(self._tab)
  File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function
    return func(*args, **kwargs)
  File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/actions/navigate.py", line 27, in RunAction
    tab.WaitForDocumentReadyStateToBeInteractiveOrBetter(time_left_in_seconds)
  File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function
    return func(*args, **kwargs)
  File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py", line 103, in WaitForDocumentReadyStateToBeInteractiveOrBetter
    'document.readyState == "complete"', timeout=timeout)
  File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function
    return func(*args, **kwargs)
  File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py", line 239, in WaitForJavaScriptCondition
    return self._inspector_backend.WaitForJavaScriptCondition(*args, **kwargs)
  File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function
    return func(*args, **kwargs)
  File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 302, in WaitForJavaScriptCondition
    e.message + '\n' + debug_message)
TimeoutException: Timed out while waiting 59s for IsJavaScriptExpressionTrue.
Console output:
(INFO) 2018-04-03 04:49:33,464 chrome_tracing_agent.RecordClockSyncMarker:183  Chrome version: 3388
(INFO) 2018-04-03 04:49:46,426 trace_data.Serialize:191  Trace sizes in bytes: {'cpuSnapshots': 7738357, 'traceEvents': 175110862, 'telemetry': 245693, 'tabIds': 900}
(INFO) 2018-04-03 04:49:59,912 browser.Close:214  Closing browser (pid=13607) ...
(INFO) 2018-04-03 04:50:02,239 browser.Close:228  Browser is closed.

Screenshot looks like a timeout, not a crash: https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/b5a57911-3736-11e8-b140-985aebcda12e
Cc: charliea@chromium.org nednguyen@chromium.org
cc-ing benchmark owners: will disable this story since flake is over 50% currently.
Project Member

Comment 3 by bugdroid1@chromium.org, Apr 5 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/edbaf0953118be443a3d03690779b15dfd0e47d8

commit edbaf0953118be443a3d03690779b15dfd0e47d8
Author: Annie Sullivan <sullivan@chromium.org>
Date: Thu Apr 05 14:49:28 2018

Disable very flaky multitab:misc:typical24 on Mac

NOTRY=true

Bug: 828917
Change-Id: I5c36321a1e1ac1c5076e8e3025f9007e47142f73
Reviewed-on: https://chromium-review.googlesource.com/996183
Commit-Queue: Annie Sullivan <sullivan@chromium.org>
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Cr-Commit-Position: refs/heads/master@{#548420}
[modify] https://crrev.com/edbaf0953118be443a3d03690779b15dfd0e47d8/tools/perf/expectations.config

BTW, multitab:misc:typical24 had a few failures on ChromeOS try bots with similar symptoms:

https://ci.chromium.org/buildbot/tryserver.chromium.chromiumos/linux-chromeos-rel/107194
https://ci.chromium.org/buildbot/tryserver.chromium.chromiumos/linux-chromeos-rel/107195
https://ci.chromium.org/buildbot/tryserver.chromium.chromiumos/linux-chromeos-rel/107196

but then it stopped. chromium-try-flakes will file a new bug if it detects this flake again.
Components: Speed>Benchmarks>Waterfall
This runs locally. Kicking off perf try jobs to see if it passes in the lab, will re-enable if it does.
📍 Job complete. See results below.
https://pinpoint-dot-chromeperf.appspot.com/job/11ba1e1ea40000
📍 Job complete. See results below.
https://pinpoint-dot-chromeperf.appspot.com/job/16e96f49a40000
The low end Mac (10.12) passeed 20/20 runs, but the high end Mac (10.13) failed 10/20 runs. 

Logs:
https://chrome-swarming.appspot.com/task?id=3ecc08ad71498110&refresh=10&show_raw=1
https://chrome-swarming.appspot.com/task?id=3ecc08b3324a2a10&refresh=10&show_raw=1
https://chrome-swarming.appspot.com/task?id=3ecc08b6b673e310&refresh=10&show_raw=1
https://chrome-swarming.appspot.com/task?id=3ecc08b81317b110&refresh=10&show_raw=1

I see different timeouts in the logs, and there are two different screenshots that are produced (attached). Both seem to be due to issues loading fda.gov. We should probably re-record that page.
fda-cant-be-reached.png
421 KB View Download
waiting-for-fda.png
398 KB View Download
I re-recorded the pageset. The pinpoint jobs above test it on windows, mac, and linux.
📍 Job complete. See results below.
https://pinpoint-dot-chromeperf.appspot.com/job/1188ad96a40000
📍 Job complete. See results below.
https://pinpoint-dot-chromeperf.appspot.com/job/17cc8235a40000
Blockedon: 865725
The first two jobs all failed, but there are no logs explaining why. I filed  bug 865725  on this issue and am blocking on that.
📍 Job complete. See results below.
https://pinpoint-dot-chromeperf.appspot.com/job/11e25c3ea40000
📍 Job complete. See results below.
https://pinpoint-dot-chromeperf.appspot.com/job/10826e71a40000
In  bug 865725 , Ned found that a stray character getting into the trace from the re-recording caused trace processing to die with no errors. He's fixing the logging part now. When that's done, the log will say, from https://bugs.chromium.org/p/chromium/issues/detail?id=865725#c5:

>With https://chromium-review.googlesource.com/c/catapult/+/1145042, I can now see the error is due to trace processing:
>(ERROR) 2018-07-20 07:48:49,201 page_test_results.Fail:540  Failure recorded: TraceImportError: Unexpected token Ö in JSON at position 0
>    at JSON.parse (<anonymous>)
>    at TraceEventImporter.createAsyncSlices_ (/tracing/extras/importer/trace_event_importer.html:1645:31)
>    at TraceEventImporter.importEvents (/tracing/extras/importer/trace_event_importer.html:1150:12)
>    at importer (/tracing/importer/import.html:198:65)
>    at task.subTask (/tracing/importer/import.html:145:32)
>    at Task.run (/tracing/base/task.html:80:50)
>    at Function.Task.RunSynchronously (/tracing/base/task.html:125:25)
>    at Import.importTraces (/tracing/importer/import.html:74:17)
>    at createModelFromTraceData (/Users/nednguyen/projects/chromium/src/third_party/catapult/tracing/tracing/mre/map_single_trace_cmdline.html:38:9)
>    at eval >>(/Users/nednguyen/projects/chromium/src/third_party/catapult/tracing/tracing/mre/map_single_trace_cmdline.html:57:25)
>[  FAILED  ] system_health.common_desktop/multitab:misc:typical24@{'case': 'multitab', 'group': 'misc'} (139881 ms)
>[  PASSED  ] 0 tests.
>[  FAILED  ] 1 test, listed below:
>[  FAILED  ]  system_health.common_desktop/multitab:misc:typical24@{'case': 'multitab', 'group': 'misc'}
>
>1 FAILED TEST
>
>So it seems like we collected some weird character in the multitab traces.

I tried re-recording twics on different machines, but encountered the same issue each time. I don't have the cycles to fix, but I think any of the following approaches would work:

* Just try again later when the pages have changed
* Choose new pages
* Dig into the error and fix it.
To reproduce this issue, we can grab the trace uploaded to cloud storage & run the tracing metrics.

Not sure how that token got into the trace. I am guessing some trace producer in Chrome mistakenly put this in? If so, the fix could be adding some sort of character sanitization in tracing parser code 
Here's a link to a trace from cloud storage in case anyone has time to dig more into this!

https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/multitab_misc_typical24_2018-07-20_11-36-51_2534.html
Cc: shrike@chromium.org vovoy@chromium.org erikc...@chromium.org nedngu...@google.com perezju@chromium.org
 Issue 742475  has been merged into this issue.
Project Member

Comment 26 by bugdroid1@chromium.org, Oct 22

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7bf838412002c77636db1e14202cace0f581a696

commit 7bf838412002c77636db1e14202cace0f581a696
Author: Emily Hanley <eyaich@google.com>
Date: Mon Oct 22 14:17:49 2018

Re-enabling multitab:misc:typical24.

Regex was fixed in crrev.com/c/1281482

TBR=nednguyen@google.com
NOTRY=true

Bug: 828917,874803
Change-Id: Ib2ce0867b8096c209bb15e65c8d3623c13f6247c
Reviewed-on: https://chromium-review.googlesource.com/c/1292805
Reviewed-by: Emily Hanley <eyaich@chromium.org>
Commit-Queue: Emily Hanley <eyaich@chromium.org>
Cr-Commit-Position: refs/heads/master@{#601563}
[modify] https://crrev.com/7bf838412002c77636db1e14202cace0f581a696/tools/perf/expectations.config

Sign in to add a comment