AtomicFlagTest.ReadFromDifferentThread fails on Fuchsia (also TestTimeouts sometimes exceeded) |
|||||
Issue descriptionHappened here https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.fyi%2FFuchsia%2F5089%2F%2B%2Frecipes%2Fsteps%2Fbase_unittests%2F0%2Fstdout but is also reproducible locally for me normally with out/fuch/bin/run_base_unittests --gtest_filter=AtomicFlagTest.ReadFromDifferentThread --gtest_repeat=1000 == https://gist.github.com/sgraham/6c0b93c1a84cb981f2e240e15824dcd4 Somewhat suspiciously it always seems to fail after about 45s of running (several hundred iterations of the test) so sems to indicate a leak of some sort.
,
Jun 16 2017
ah, if I make the sleeps longer, it still seems to start failing after 45s, so I guess it must be some test harness thing panicing and causing an abort.
,
Jun 16 2017
mhmm https://cs.chromium.org/chromium/src/base/test/test_timeouts.cc?rcl=52dd5c4fd4a98a209ed3bfb8eadf4caf9e40fed6&l=86 i guess the test harness might be missing some signalling on fuchsia?
,
Jun 16 2017
OK, looks like this could explain a bunch of the flakiness on the bot run for Fuchsia. If a subprocess times out (== takes longer than 45s) the rest of the tests get killed and reported as UNKNOWN status. Because the bot runs are non-KVM QEMU, they're really slow. So we could - set a flag to avoid timing out (maybe problematic when things really timeout?) - raise the timeout (kinda lame) - eventually, run on hardware or faster emulation (longer term, harder).
,
Jun 16 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a41e0a0e12300dc1577a09ca7461898b734c437b commit a41e0a0e12300dc1577a09ca7461898b734c437b Author: Scott Graham <scottmg@chromium.org> Date: Fri Jun 16 19:42:38 2017 fuchsia: longer test timeout when running on bots QEMU without KVM is quite slow, which is what's used on bots. Increase the timeout to avoid timeouts and "UNKNOWN" test status. Bug: 734130 , 706592 Change-Id: I9696805d727a93e04fd18731f3eb8955180403d6 Reviewed-on: https://chromium-review.googlesource.com/538953 Reviewed-by: Nico Weber <thakis@chromium.org> Commit-Queue: Scott Graham <scottmg@chromium.org> Cr-Commit-Position: refs/heads/master@{#480126} [modify] https://crrev.com/a41e0a0e12300dc1577a09ca7461898b734c437b/build/fuchsia/test_runner.py
,
Jun 16 2017
Harumph, so sometimes that test actually legit fails also. https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.fyi%2FFuchsia%2F5109%2F%2B%2Frecipes%2Fsteps%2Fbase_unittests%2F0%2Fstdout [00107.471] 02342.02877> [ RUN ] AtomicFlagTest.ReadFromDifferentThread [00107.471] 02342.02877> ../../base/synchronization/atomic_flag_unittest.cc:92: Failure [00107.471] 02342.02877> Value of: reset_flag.IsSet() [00107.471] 02342.02877> Actual: false [00107.471] 02342.02877> Expected: true [00107.471] 02342.02877> [ FAILED ] AtomicFlagTest.ReadFromDifferentThread (146 ms) This test seems a bit crappy. I believe what it's doing is: - start thread - post a BusyWaitUntilFlagSet to that thread which does: - busy loop until for tested_flag is set (calling sched_yield()) - EXPECTs expected_after_flag to be set by then - sets done_flag - meanwhile on the main thread: - sleep 20ms - set expected_after_flag - set tested_flag - sleep 20ms - assert that done_flag has been set So, it's relying on the background thread winning the race between being released from the busy loop and getting to set done_flag vs. the main loop getting from setting the tested_flag, sleeping for 20ms and then the assert. 20ms is a relatively long time, but it also seems quite reasonable that the background thread wouldn't be scheduled during that 20ms to get done_flag set before the ASSERT_TRUE. Possibly it works elsewhere because other platforms more aggressively yield during sleeps.
,
Jun 16 2017
+gab from looking at git history. Am I interpreting what this test is doing correctly (i.e. that the background thread needs to win the race while the main thread sleeps for 20ms?)
,
Jun 16 2017
And, would it reduce what's being tested if https://cs.chromium.org/chromium/src/base/synchronization/atomic_flag_unittest.cc?rcl=c97486f93ac2360cb9d7b8a29aa5a624da3fb2df&l=92 was instead a busy wait for reset_flag to be set?
,
Jun 16 2017
,
Jun 19 2017
Bugdroid is napping but https://chromium.googlesource.com/chromium/src/+/a2c092c7fd8fedb790f47996f71b3755e04eb270 . |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by scottmg@chromium.org
, Jun 16 2017