telemetry_unittests very flaky on chromeos-amd64-generic-rel |
|||||||||||
Issue descriptionFlake seems to be affecting a large number of the last 200 builds. Especially the non-trivial (> 10 min) builds: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/chromeos-amd64-generic-rel?limit=200 Example flakes: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/chromeos-amd64-generic-rel/164817 https://ci.chromium.org/p/chromium/builders/luci.chromium.try/chromeos-amd64-generic-rel/164793 https://ci.chromium.org/p/chromium/builders/luci.chromium.try/chromeos-amd64-generic-rel/164739 Some of the flaky tests: telemetry.core.tracing_controller_unittest.StartupTracingTest.testCloseBrowserBeforeTracingIsStopped telemetry.core.tracing_controller_unittest.StartupTracingTest.testRestartBrowserWhileTracing telemetry.core.tracing_controller_unittest.StartupTracingTest.testStopTracingWhileBrowserIsRunning telemetry.internal.backends.chrome_inspector.devtools_client_backend_unittest.DevToolsClientBackendTest.testTracing Unsure the correct labels to use so being probably overly general.
,
Jan 10
,
Jan 10
From looking at one of the logs: https://chromium-swarm.appspot.com/task?id=424de5940a385110&refresh=10&show_raw=1 the renderer process is crashing while running the test. And, *very* unfortunately, symbolization of stack traces doesn't seem to be implemented in Telemetry on ChromeOS. Not sure who on the ChromeOS team would know how to wire this up to debug it.
,
Jan 10
Leo, can you ptal? http://go/cros-vm has details on how to run telemetry unit tests. We can probably disable the tests to start with. They all seem to be tracing related. It also sounds like an actual chrome bug rather than catapult.
,
Jan 10
Basically you need to repo the crash locally, and then you could bisect chrome to find the bad CL. You can also deploy an unstripped chrome to get the renderer crash stack using: https://chromium.googlesource.com/chromiumos/docs/+/master/simple_chrome_workflow.md#deploying-chrome-to-the-user-partition
,
Jan 10
The same suite appears to be fine on the main waterfall version of that trybot: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/chromeos-amd64-generic-rel So it's likely hitting a dcheck (since that's the only difference between the two). I'll disable dchecks on that bot (prob shoulda done that a while ago) until bug 913750 is fixed.
,
Jan 10
I am on it now
,
Jan 10
I have tried to repro the tests crash in CrOS VM with Simple Chrome flow. I built a Chromium on the head version and deployed it into the VM and started a telemetry testing. But the 2 tests (Flaky tests in bug description) I have tested were passed without crashing. ~/work/chrome/src$ third_party/catapult/telemetry/bin/run_tests --browser=cros-chrome --remote=localhost --remote-ssh-port=9222 telemetry.core.tracing_controller_unittest.StartupTracingTest.testCloseBrowserBeforeTracingIsStopped WARNING:root:Unable to import numpy due to: Incorrect numpy version found, expected 1.8.0 <= version < 1.12.0, found version 1.12.1 WARNING:root:Unable to import numpy due to: Incorrect numpy version found, expected 1.8.0 <= version < 1.12.0, found version 1.12.1 [1/1] telemetry.core.tracing_controller_unittest.StartupTracingTest.testCloseBrowserBeforeTracingIsStopped passed 44.6083s 1 test passed in 45.2s, 0 skipped, 0 failures. ~/work/chrome/src$ third_party/catapult/telemetry/bin/run_tests --browser=cros-chrome --remote=localhost --remote-ssh-port=9222 telemetry.core.tracing_controller_unittest.StartupTracingTest.testRestartBrowserWhileTracing WARNING:root:Unable to import numpy due to: Incorrect numpy version found, expected 1.8.0 <= version < 1.12.0, found version 1.12.1 WARNING:root:Unable to import numpy due to: Incorrect numpy version found, expected 1.8.0 <= version < 1.12.0, found version 1.12.1 [1/1] telemetry.core.tracing_controller_unittest.StartupTracingTest.testRestartBrowserWhileTracing passed 454.1667s 1 test passed in 454.7s, 0 skipped, 0 failures. Any special build instruction or flags of testing I missed? Otherwise, the bug is not able to be reproduced locally.
,
Jan 10
Did you try chrome with dchecks on? https://chromium.googlesource.com/chromiumos/docs/+/master/simple_chrome_workflow.md#cros-chrome_sdk-options
,
Jan 10
Issue 920454 has been merged into this issue.
,
Jan 10
,
Jan 10
Findit has detected 13+ new flake occurrences of tests in this bug within the past 24 hours. List of all flake occurrences can be found at: https://findit-for-me.appspot.com/ranked-flakes?bug_id=920471. Since these tests are still flaky, this issue has been moved back onto the Sheriff Bug Queue if it hasn't already. If the result above is wrong, please file a bug using this link: https://bugs.chromium.org/p/chromium/issues/entry?status=Unconfirmed&labels=Pri-1,Test-Findit-Wrong&components=Tools%3ETest%3EFindit%3EFlakiness&summary=%5BFindit%5D%20Flake%20Detection%20-%20Wrong%20result%3A%20920471&comment=Link%20to%20flake%20details%3A%20https://findit-for-me.appspot.com/ranked-flakes?bug_id=920471 Automatically posted by the findit-for-me app (https://goo.gl/Ot9f7N).
,
Jan 11
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f8c4c1b4034ecd72e38b1e54a4c3d363c4859df4 commit f8c4c1b4034ecd72e38b1e54a4c3d363c4859df4 Author: Ben Pastene <bpastene@chromium.org> Date: Fri Jan 11 01:35:27 2019 Disable DCHECKs on the cros VM CQ test bot. The PFQ currently runs chrome without DCHECKs. This can cause problems when the PFQ promotes a new version of chrome/an SDK that causes DCHECK crashes, which shows up only on chromium's bots. This turns DCHECKs back off until the PFQ also tests them. Bug: 920471, 913750 Change-Id: I9daa588ac6838fde357fae1c2ed93f48ffa99966 Reviewed-on: https://chromium-review.googlesource.com/c/1404428 Reviewed-by: Achuith Bhandarkar <achuith@chromium.org> Reviewed-by: John Budorick <jbudorick@chromium.org> Commit-Queue: Ben Pastene <bpastene@chromium.org> Cr-Commit-Position: refs/heads/master@{#621851} [modify] https://crrev.com/f8c4c1b4034ecd72e38b1e54a4c3d363c4859df4/tools/mb/mb_config.pyl
,
Jan 11
I have run another build with --gn-extra-args='dcheck_always_on=true' cros chrome-sdk --board=amd64-generic --internal --log-level=info --gn-extra-args='dcheck_always_on=true' --download-vm Rebuilt the Chrome and deployed it to VM by deploy_chrome --build-dir=out_$SDK_BOARD/Release/ --to=localhost --port=9222 And rerun the 2 tests below, both of them passed. ~/work/chrome/src$ third_party/catapult/telemetry/bin/run_tests --browser=cros-chrome --remote=localhost --remote-ssh-port=9222 telemetry.core.tra cing_controller_unittest.StartupTracingTest.testCloseBrowserBeforeTracingIsStopped WARNING:root:Unable to import numpy due to: Incorrect numpy version found, expected 1.8.0 <= version < 1.12.0, found version 1.12.1 WARNING:root:Unable to import numpy due to: Incorrect numpy version found, expected 1.8.0 <= version < 1.12.0, found version 1.12.1 [1/1] telemetry.core.tracing_controller_unittest.StartupTracingTest.testCloseBrowserBeforeTracingIsStopped passed 41.4396s 1 test passed in 42.0s, 0 skipped, 0 failures. ~/work/chrome/src$ third_party/catapult/telemetry/bin/run_tests --browser=cros-chrome --remote=localhost --remote-ssh-port=9222 telemetry.core.tr acing_controller_unittest.StartupTracingTest.testRestartBrowserWhileTracin WARNING:root:Unable to import numpy due to: Incorrect numpy version found, expected 1.8.0 <= version < 1.12.0, found version 1.12.1 WARNING:root:Unable to import numpy due to: Incorrect numpy version found, expected 1.8.0 <= version < 1.12.0, found version 1.12.1 [1/1] telemetry.core.tracing_controller_unittest.StartupTracingTest.testRestartBrowserWhileTracing passed 358.8346s 1 test passed in 359.4s, 0 skipped, 0 failures. Assigned it to the primary gardener for more investigation.
,
Jan 11
Did you just run the test once? I think there's a flag to run the tests multiple times. The crash seems to be flaky, so if you could run it 20 times, you might see it.
,
Jan 11
,
Jan 16
(6 days ago)
,
Jan 16
(6 days ago)
,
Jan 16
(6 days ago)
|
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by jmad...@chromium.org
, Jan 10