New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 920471 link

Starred by 2 users

Issue metadata

Status: Started
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

telemetry_unittests very flaky on chromeos-amd64-generic-rel

Project Member Reported by jmad...@chromium.org, Jan 10

Issue description

Flake seems to be affecting a large number of the last 200 builds. Especially the non-trivial (> 10 min) builds:

https://ci.chromium.org/p/chromium/builders/luci.chromium.try/chromeos-amd64-generic-rel?limit=200

Example flakes:

https://ci.chromium.org/p/chromium/builders/luci.chromium.try/chromeos-amd64-generic-rel/164817
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/chromeos-amd64-generic-rel/164793
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/chromeos-amd64-generic-rel/164739

Some of the flaky tests:

telemetry.core.tracing_controller_unittest.StartupTracingTest.testCloseBrowserBeforeTracingIsStopped
telemetry.core.tracing_controller_unittest.StartupTracingTest.testRestartBrowserWhileTracing
telemetry.core.tracing_controller_unittest.StartupTracingTest.testStopTracingWhileBrowserIsRunning
telemetry.internal.backends.chrome_inspector.devtools_client_backend_unittest.DevToolsClientBackendTest.testTracing

Unsure the correct labels to use so being probably overly general.
 
Cc: erikc...@chromium.org
Erik, might this be related to  issue 919113  ?
Cc: bpastene@chromium.org achuith@chromium.org
Cc: dpranke@chromium.org
From looking at one of the logs:
https://chromium-swarm.appspot.com/task?id=424de5940a385110&refresh=10&show_raw=1

the renderer process is crashing while running the test. And, *very* unfortunately, symbolization of stack traces doesn't seem to be implemented in Telemetry on ChromeOS.

Not sure who on the ChromeOS team would know how to wire this up to debug it.

Cc: alemate@chromium.org
Owner: goog...@chromium.org
Status: Assigned (was: Available)
Leo, can you ptal?

http://go/cros-vm has details on how to run telemetry unit tests.

We can probably disable the tests to start with. They all seem to be tracing related. It also sounds like an actual chrome bug rather than catapult.
Basically you need to repo the crash locally, and then you could bisect chrome to find the bad CL.

You can also deploy an unstripped chrome to get the renderer crash stack using:
https://chromium.googlesource.com/chromiumos/docs/+/master/simple_chrome_workflow.md#deploying-chrome-to-the-user-partition

The same suite appears to be fine on the main waterfall version of that trybot:
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/chromeos-amd64-generic-rel

So it's likely hitting a dcheck (since that's the only difference between the two). I'll disable dchecks on that bot (prob shoulda done that a while ago) until bug 913750 is fixed.
I am on it now
I have tried to repro the tests crash in CrOS VM with Simple Chrome flow. 

I built a Chromium on the head version and deployed it into the VM and started a telemetry testing.
 
But the 2 tests (Flaky tests in bug description) I have tested were passed without crashing.

~/work/chrome/src$ third_party/catapult/telemetry/bin/run_tests --browser=cros-chrome --remote=localhost --remote-ssh-port=9222 telemetry.core.tracing_controller_unittest.StartupTracingTest.testCloseBrowserBeforeTracingIsStopped
WARNING:root:Unable to import numpy due to: Incorrect numpy version found, expected 1.8.0 <= version < 1.12.0, found version 1.12.1
WARNING:root:Unable to import numpy due to: Incorrect numpy version found, expected 1.8.0 <= version < 1.12.0, found version 1.12.1
[1/1] telemetry.core.tracing_controller_unittest.StartupTracingTest.testCloseBrowserBeforeTracingIsStopped passed 44.6083s
1 test passed in 45.2s, 0 skipped, 0 failures.

~/work/chrome/src$ third_party/catapult/telemetry/bin/run_tests --browser=cros-chrome --remote=localhost --remote-ssh-port=9222  telemetry.core.tracing_controller_unittest.StartupTracingTest.testRestartBrowserWhileTracing
WARNING:root:Unable to import numpy due to: Incorrect numpy version found, expected 1.8.0 <= version < 1.12.0, found version 1.12.1
WARNING:root:Unable to import numpy due to: Incorrect numpy version found, expected 1.8.0 <= version < 1.12.0, found version 1.12.1
[1/1] telemetry.core.tracing_controller_unittest.StartupTracingTest.testRestartBrowserWhileTracing passed 454.1667s
1 test passed in 454.7s, 0 skipped, 0 failures.


Any special build instruction or flags of testing I missed? Otherwise, the bug is not able to be reproduced locally.
 Issue 920454  has been merged into this issue.
Status: Started (was: Assigned)
Project Member

Comment 12 by Findit, Jan 10

Labels: Type-Bug Test-Flaky Test-Findit-Detected Sheriff-Chromium

Findit has detected 13+ new flake occurrences of tests in this bug
within the past 24 hours.

List of all flake occurrences can be found at:
https://findit-for-me.appspot.com/ranked-flakes?bug_id=920471.

Since these tests are still flaky, this issue has been moved back onto the Sheriff Bug Queue if it hasn't already.

If the result above is wrong, please file a bug using this link:
https://bugs.chromium.org/p/chromium/issues/entry?status=Unconfirmed&labels=Pri-1,Test-Findit-Wrong&components=Tools%3ETest%3EFindit%3EFlakiness&summary=%5BFindit%5D%20Flake%20Detection%20-%20Wrong%20result%3A%20920471&comment=Link%20to%20flake%20details%3A%20https://findit-for-me.appspot.com/ranked-flakes?bug_id=920471

Automatically posted by the findit-for-me app (https://goo.gl/Ot9f7N).
Project Member

Comment 13 by bugdroid1@chromium.org, Jan 11

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f8c4c1b4034ecd72e38b1e54a4c3d363c4859df4

commit f8c4c1b4034ecd72e38b1e54a4c3d363c4859df4
Author: Ben Pastene <bpastene@chromium.org>
Date: Fri Jan 11 01:35:27 2019

Disable DCHECKs on the cros VM CQ test bot.

The PFQ currently runs chrome without DCHECKs. This can cause problems when
the PFQ promotes a new version of chrome/an SDK that causes DCHECK crashes,
which shows up only on chromium's bots.

This turns DCHECKs back off until the PFQ also tests them.

Bug: 920471, 913750
Change-Id: I9daa588ac6838fde357fae1c2ed93f48ffa99966
Reviewed-on: https://chromium-review.googlesource.com/c/1404428
Reviewed-by: Achuith Bhandarkar <achuith@chromium.org>
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: Ben Pastene <bpastene@chromium.org>
Cr-Commit-Position: refs/heads/master@{#621851}
[modify] https://crrev.com/f8c4c1b4034ecd72e38b1e54a4c3d363c4859df4/tools/mb/mb_config.pyl

Cc: -alemate@chromium.org goog...@chromium.org
Owner: alemate@chromium.org
I have run another build with --gn-extra-args='dcheck_always_on=true'

cros chrome-sdk --board=amd64-generic --internal --log-level=info --gn-extra-args='dcheck_always_on=true' --download-vm


Rebuilt the Chrome and deployed it to VM by 

deploy_chrome --build-dir=out_$SDK_BOARD/Release/ --to=localhost --port=9222

And rerun the 2 tests below, both of them passed. 
~/work/chrome/src$ third_party/catapult/telemetry/bin/run_tests --browser=cros-chrome --remote=localhost --remote-ssh-port=9222 telemetry.core.tra
cing_controller_unittest.StartupTracingTest.testCloseBrowserBeforeTracingIsStopped
WARNING:root:Unable to import numpy due to: Incorrect numpy version found, expected 1.8.0 <= version < 1.12.0, found version 1.12.1
WARNING:root:Unable to import numpy due to: Incorrect numpy version found, expected 1.8.0 <= version < 1.12.0, found version 1.12.1
[1/1] telemetry.core.tracing_controller_unittest.StartupTracingTest.testCloseBrowserBeforeTracingIsStopped passed 41.4396s
1 test passed in 42.0s, 0 skipped, 0 failures.                                                
~/work/chrome/src$ third_party/catapult/telemetry/bin/run_tests --browser=cros-chrome --remote=localhost --remote-ssh-port=9222  telemetry.core.tr
acing_controller_unittest.StartupTracingTest.testRestartBrowserWhileTracin                                           
WARNING:root:Unable to import numpy due to: Incorrect numpy version found, expected 1.8.0 <= version < 1.12.0, found version 1.12.1
WARNING:root:Unable to import numpy due to: Incorrect numpy version found, expected 1.8.0 <= version < 1.12.0, found version 1.12.1
[1/1] telemetry.core.tracing_controller_unittest.StartupTracingTest.testRestartBrowserWhileTracing passed 358.8346s
1 test passed in 359.4s, 0 skipped, 0 failures.


Assigned it to the primary gardener for more investigation.
Did you just run the test once? I think there's a flag to run the tests multiple times. The crash seems to be flaky, so if you could run it 20 times, you might see it.
Labels: -Sheriff-Chromium

Comment 17 by benhenry@google.com, Jan 16 (6 days ago)

Components: Test>Telemetry

Comment 18 by benhenry@google.com, Jan 16 (6 days ago)

Components: -Tests>Telemetry

Comment 19 by benhenry@google.com, Jan 16 (6 days ago)

Components: -Speed>Telemetry

Sign in to add a comment