telemetry_perf_unittests flaky on trybots. Request for audit from speed-ops team. |
||||||||
Issue descriptionI just had an unrelated CL: https://chromium-review.googlesource.com/c/566595/ flake on telemetry_perf_unittests, requiring me to re-run all the Android tests. When I navigate to http://chromium-try-flakes.appspot.com/, the 3rd and 4rth results are telemetry_perf_unittests and benchmarks.system_health_smoke_test.SystemHealthBenchmarkSmokeTest.system_health.memory_mobile.browse:news:cricbuzz respectively. Looking for more occurrences shows ~5-10 failures a day: http://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyMAsSBUZsYWtlIiV0ZWxlbWV0cnlfcGVyZl91bml0dGVzdHMgKHdpdGggcGF0Y2gpDA benchmarks.system_health_smoke_test.SystemHealthBenchmarkSmokeTest.system_health.memory_mobile.browse:news:cricbuzz just started flaking a few days ago. http://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyfgsSBUZsYWtlInNiZW5jaG1hcmtzLnN5c3RlbV9oZWFsdGhfc21va2VfdGVzdC5TeXN0ZW1IZWFsdGhCZW5jaG1hcmtTbW9rZVRlc3Quc3lzdGVtX2hlYWx0aC5tZW1vcnlfbW9iaWxlLmJyb3dzZTpuZXdzOmNyaWNidXp6DA Looking at recent changes to system_health_smoke_test.py shows that this flakiness is not a new phenomenon. commit 478997bc9c776b0d27290dd4ab511af8de609b30 Author: Juan A. Navarro Perez <perezju@chromium.org> Date: Wed Jul 5 14:48:33 2017 +0000 Re-enable system_health.memory_desktop.browse:media:youtube smoke test commit 837b9e2e60fce4b0b5043aa9f9c0af766ace8570 Author: benwells <benwells@chromium.org> Date: Wed Jun 28 22:31:38 2017 -0700 Disabled flaky system health smoke test. Could someone from the speed-ops team do a quick audit for flakiness of tests that are supposed to be *unit tests*? I feel like I am regularly hit by flakiness of telemetry_perf_unittests.
,
Jul 13 2017
The purpose of telemetry_perf_unittests is to actually get test coverage of the //tools/perf and telemetry code. That does include running some of the benchmarking code, but we're not running it *as* a benchmark, i.e., we don't care about the performance numbers that are returned. We very much *do* want to run these tests on all CLs on as many configurations as possible, because in addition to the test coverage of the python code, they're good integration tests of chromium. We should be trying to make sure that it's not flaky, but that's been a big problem that we've been working on for the past couple quarters.
,
Jul 13 2017
tools/perf/benchmarks/system_health_smoke_test.py has 24 disabled tests, many of which are from still-open bugs that are marked as "flaky". I guess we could add this to the list. The main problem is that this test suite depends on WPR, which is inherently flaky and non-deterministic. Given that we want: 1) an integration test for Chromium 2) Testing code in tools/perf, but don't care about actual performance I recommend that we switch from the system_health story set to trivial_sites, or something else that has 100% determinism.
,
Jul 13 2017
,
Jul 13 2017
fixing email address for nednguyen@. I don't think WPR is supposed to be inherently flaky, so I'm a bit puzzled by that statement. As to what the right test suites to use our, I'm staying out of that; the others can figure it out.
,
Jul 13 2017
,
Jul 13 2017
I tried to use/fix WPR a couple of years ago, back when I was naive and thought that the whole concept was viable. [I was actually made an OWNER of the repo at some point, it's possible I've been removed since I'm not active] The fundamental problem is that web-pages *want* sources of entropy, and it's impossible to fully avoid them [e.g. spawn multiple workers, look at callback ordering] This entropy is used to drive URL requests, and WPR does mostly-strict URL matching, so small changes to the URL cause resources to not match. If you open devtools + WPR with a real page [e.g. cnn], you'll notice that around 30% of resources don't load, and the set of non-loading resources is non-deterministic. Sometimes, the resources that fail to load trigger JS logic trying to refetch, or something else, resulting in never-quiescent network, which will cause the telemetry test to fail.
,
Jul 13 2017
For whether or not WPR is flaky, it depends on the use case. Using it as a general-case record/replay framework that works on any webpage is inherently flaky. But especially for our system health benchmarks, we choose pages that will replay reliably and use those. The reason we have integration tests on the CQ is because we experience a high rate of Chrome crashes and bugs at ToT that aren't covered by correctness tests, and we'd like to stop those from getting submitted. Here is a short list collected over a small timespan last year: https://docs.google.com/document/d/1ZZABME5aaiS34PwTCK-hRwk23CbDgnhvjiFRMtBnDF4/edit#heading=h.5bgf5j1sydvh It's important for us to run the pages in the system_health benchmark suite on the CQ, so that we can keep Chrome crashes that would break the perf waterfall from landing.
,
Jul 13 2017
telemetry_perf_unittests just failed again, with a different reason this time. Run #1 passed [but another test flaked] Run #2 failure: https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_swarming_rel/builds/219246 This was actually a Chrome crash: """ [FATAL:layer_tree_impl.cc(169)] Check failed: property_trees()->needs_rebuild. """ but that was not clear from the error messaging. I didn't realize this until now. """ Unexpected Failures: * benchmarks.system_health_smoke_test.SystemHealthBenchmarkSmokeTest.system_health.memory_mobile.browse:news:cricbuzz """ Run #3 failure: https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_swarming_rel/builds/219367 shard #0 timed out, took too much time to complete This one looks like a telemetry bug. """ Exception raised when cleaning story run: Traceback (most recent call last): _RunStoryAndProcessErrorIfNeeded at /b/swarm_slave/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py:127 None DidRunStory at /b/swarm_slave/w/ir/third_party/catapult/telemetry/telemetry/web_perf/timeline_based_measurement.py:309 None StopTracing at /b/swarm_slave/w/ir/third_party/catapult/telemetry/telemetry/core/tracing_controller.py:47 None StopTracing at /b/swarm_slave/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py:140 None TracingException: Exceptions raised when trying to stop tracing: Traceback (most recent call last): File "/b/swarm_slave/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py", line 118, in StopTracing File "/b/swarm_slave/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py", line 203, in StopAgentTracing File "/b/swarm_slave/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py", line 296, in _RemoveTraceConfigFile File "/b/swarm_slave/w/ir/third_party/catapult/devil/devil/android/decorators.py", line 57, in timeout_retry_wrapper File "/b/swarm_slave/w/ir/third_party/catapult/devil/devil/utils/timeout_retry.py", line 159, in Run File "/b/swarm_slave/w/ir/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 186, in JoinAll File "/b/swarm_slave/w/ir/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 158, in _JoinAll File "/b/swarm_slave/w/ir/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 81, in run File "/b/swarm_slave/w/ir/third_party/catapult/devil/devil/utils/timeout_retry.py", line 152, in <lambda> File "/b/swarm_slave/w/ir/third_party/catapult/devil/devil/android/decorators.py", line 47, in impl File "/b/swarm_slave/w/ir/third_party/catapult/devil/devil/android/device_utils.py", line 1034, in RunShellCommand File "/b/swarm_slave/w/ir/third_party/catapult/devil/devil/android/device_utils.py", line 1003, in handle_large_output File "/b/swarm_slave/w/ir/third_party/catapult/devil/devil/android/device_utils.py", line 985, in handle_large_command File "/b/swarm_slave/w/ir/third_party/catapult/devil/devil/android/device_utils.py", line 976, in handle_check_return File "/b/swarm_slave/w/ir/third_party/catapult/devil/devil/android/device_utils.py", line 972, in run File "/b/swarm_slave/w/ir/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py", line 489, in Shell File "/b/swarm_slave/w/ir/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py", line 286, in _RunDeviceAdbCmd File "/b/swarm_slave/w/ir/third_party/catapult/devil/devil/android/decorators.py", line 51, in timeout_retry_wrapper File "/b/swarm_slave/w/ir/third_party/catapult/devil/devil/android/decorators.py", line 47, in impl File "/b/swarm_slave/w/ir/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py", line 253, in _RunAdbCmd NoAdbError: [Errno 2] No such file or directory """
,
Jul 13 2017
,
Jul 13 2017
#9: 3 looks more like infra bugs. The log shows that a bunch of files are missing on the host, which I suspect due to a malfunction of swarming.
IOError: [Errno 2] No such file or directory: '/b/swarm_slave/w/ir/third_party/catapult/telemetry/telemetry/internal/actions/gesture_common.js'
Traceback (most recent call last):
RunBenchmark at /b/swarm_slave/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py:397
None
Run at /b/swarm_slave/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py:295
None
PopulateHistogramSet at /b/swarm_slave/w/ir/third_party/catapult/telemetry/telemetry/internal/results/page_test_results.py:190
None
mkstemp at /usr/lib/python2.7/tempfile.py:308
return _mkstemp_inner(dir, prefix, suffix, flags)
_mkstemp_inner at /usr/lib/python2.7/tempfile.py:239
fd = _os.open(file, flags, 0600)
OSError: [Errno 2] No such file or directory: '/b/swarm_slave/w/ityCoCmk/tmprAc1qx'
(https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_swarming_rel/builds/219367)
+maruel: I am not sure what's going on here. Is it possible that the swarming host hard drive is filled, so it automatically remove a bunch of files in the middle of test?
,
Jul 13 2017
A task won't run if not all files can be mapped in. The fact the temp directory was deleted during the task(?) is very odd. Because the path printed is the path to the temp dir. Did the script try at any place to try to delete the tempdir by accident? It would fail on /tmp but will succeed on Swarming.
,
Jul 13 2017
Line 190 from page_test_results.py (from stack in #11) is this:
file_descriptor, chart_json_path = tempfile.mkstemp()
(https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/internal/results/page_test_results.py#L190)
So for some reason, python's tempfile.mkstemp function call is failing, which is very odd. I highly suspect that this has nothing to do with Telemetry code, but a problem with swarming infra or the hosting machine itself.
,
Jul 13 2017
Can't look until next week but nothing changed in this for path for months AFAIK; do you call entree in the code flow before this call? Especially in an exceptional case.
,
Jul 13 2017
#14: what do you mean by "call entree in the code flow"?
,
Jul 13 2017
I meant "code path in the process that would call rmtree()". The poin I want to stress out is that the fact that the TEMPDIR disappears is not normal and I've never seen this happen anywhere else. You may not reproduce it elsewhere as /tmp is not user-deleteable. You can reproduce locally by creating a local directory, have TEMPDIR point to it, then reproduce the task.
,
Jul 13 2017
Ah got it. It's not easy to know all the place that call rmtree. Lemme try to see if I can monkey patch rmtree to log the all callsites.
,
Jul 13 2017
,
Jul 13 2017
Hi, We are finding that the TEMP directory is being deleted on some of our clients running Chrome 59 and has been logged here: https://bugs.chromium.org/p/chromium/issues/detail?id=741980. Could this be related to your TEMP issue?
,
Jan 16
,
Jan 16
,
Jan 16
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by erikc...@chromium.org
, Jul 13 2017+ dpranke. While I'm staring at telemetry_perf_unittests, what is its intended purpose? It's running [after a few layers of indirection] "tools/perf/run_tests". According to the file, "This script runs unit tests of the code in the perf directory. This script DOES NOT run benchmarks. run_benchmark does that. " But looking at the failure: """ Traceback (most recent call last): File "/b/swarm_slave/w/ir/tools/perf/benchmarks/system_health_smoke_test.py", line 107, in RunTest msg='Failed: %s' % benchmark_class) AssertionError: Failed: <class 'benchmarks.system_health.MobileMemorySystemHealth'> """ It's pretty clear that a benchmark is being run. It seems like: 1) We shouldn't be running telemetry_perf_unittests for all Chromium CLs. 2) This test suite shouldn't run benchmarks, and should be fast. It took 19 minutes to fail for my unrelated CL: https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_swarming_rel/builds/219246