New issue
Advanced search Search tips

Issue 734130 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Fuchsia
Pri: 3
Type: Bug

Blocking:
issue 706592



Sign in to add a comment

AtomicFlagTest.ReadFromDifferentThread fails on Fuchsia (also TestTimeouts sometimes exceeded)

Project Member Reported by scottmg@chromium.org, Jun 16 2017

Issue description

Happened here

https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.fyi%2FFuchsia%2F5089%2F%2B%2Frecipes%2Fsteps%2Fbase_unittests%2F0%2Fstdout

but is also reproducible locally for me normally with 

out/fuch/bin/run_base_unittests --gtest_filter=AtomicFlagTest.ReadFromDifferentThread --gtest_repeat=1000

== https://gist.github.com/sgraham/6c0b93c1a84cb981f2e240e15824dcd4

Somewhat suspiciously it always seems to fail after about 45s of running (several hundred iterations of the test) so sems to indicate a leak of some sort.
 
Labels: -OS-Linux
ah, if I make the sleeps longer, it still seems to start failing after 45s, so I guess it must be some test harness thing panicing and causing an abort.
mhmm https://cs.chromium.org/chromium/src/base/test/test_timeouts.cc?rcl=52dd5c4fd4a98a209ed3bfb8eadf4caf9e40fed6&l=86 i guess the test harness might be missing some signalling on fuchsia?
Owner: scottmg@chromium.org
Status: Started (was: Unconfirmed)
Summary: TestTimeouts sometimes exceeded on Fuchsia (was: AtomicFlagTest.ReadFromDifferentThread flaky on Fuchsia)
OK, looks like this could explain a bunch of the flakiness on the bot run for Fuchsia. If a subprocess times out (== takes longer than 45s) the rest of the tests get killed and reported as UNKNOWN status.

Because the bot runs are non-KVM QEMU, they're really slow.

So we could
- set a flag to avoid timing out (maybe problematic when things really timeout?) 
- raise the timeout (kinda lame)
- eventually, run on hardware or faster emulation (longer term, harder).
Project Member

Comment 5 by bugdroid1@chromium.org, Jun 16 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a41e0a0e12300dc1577a09ca7461898b734c437b

commit a41e0a0e12300dc1577a09ca7461898b734c437b
Author: Scott Graham <scottmg@chromium.org>
Date: Fri Jun 16 19:42:38 2017

fuchsia: longer test timeout when running on bots

QEMU without KVM is quite slow, which is what's used on bots.
Increase the timeout to avoid timeouts and "UNKNOWN" test status.

Bug:  734130 , 706592
Change-Id: I9696805d727a93e04fd18731f3eb8955180403d6
Reviewed-on: https://chromium-review.googlesource.com/538953
Reviewed-by: Nico Weber <thakis@chromium.org>
Commit-Queue: Scott Graham <scottmg@chromium.org>
Cr-Commit-Position: refs/heads/master@{#480126}
[modify] https://crrev.com/a41e0a0e12300dc1577a09ca7461898b734c437b/build/fuchsia/test_runner.py

Summary: AtomicFlagTest.ReadFromDifferentThread fails on Fuchsia (also TestTimeouts sometimes exceeded) (was: TestTimeouts sometimes exceeded on Fuchsia)
Harumph, so sometimes that test actually legit fails also.

https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.fyi%2FFuchsia%2F5109%2F%2B%2Frecipes%2Fsteps%2Fbase_unittests%2F0%2Fstdout

[00107.471] 02342.02877> [ RUN      ] AtomicFlagTest.ReadFromDifferentThread
[00107.471] 02342.02877> ../../base/synchronization/atomic_flag_unittest.cc:92: Failure
[00107.471] 02342.02877> Value of: reset_flag.IsSet()
[00107.471] 02342.02877>   Actual: false
[00107.471] 02342.02877> Expected: true
[00107.471] 02342.02877> [  FAILED  ] AtomicFlagTest.ReadFromDifferentThread (146 ms)


This test seems a bit crappy. I believe what it's doing is:

- start thread

- post a BusyWaitUntilFlagSet to that thread which does:
  - busy loop until for tested_flag is set (calling sched_yield())
  - EXPECTs expected_after_flag to be set by then
  - sets done_flag

- meanwhile on the main thread:
  - sleep 20ms
  - set expected_after_flag
  - set tested_flag
  - sleep 20ms
  - assert that done_flag has been set

So, it's relying on the background thread winning the race between being released from the busy loop and getting to set done_flag vs. the main loop getting from setting the tested_flag, sleeping for 20ms and then the assert.

20ms is a relatively long time, but it also seems quite reasonable that the background thread wouldn't be scheduled during that 20ms to get done_flag set before the ASSERT_TRUE.

Possibly it works elsewhere because other platforms more aggressively yield during sleeps.
Cc: gab@chromium.org
+gab from looking at git history. Am I interpreting what this test is doing correctly (i.e. that the background thread needs to win the race while the main thread sleeps for 20ms?)
And, would it reduce what's being tested if https://cs.chromium.org/chromium/src/base/synchronization/atomic_flag_unittest.cc?rcl=c97486f93ac2360cb9d7b8a29aa5a624da3fb2df&l=92 was instead a busy wait for reset_flag to be set?
Status: Fixed (was: Started)
Bugdroid is napping but https://chromium.googlesource.com/chromium/src/+/a2c092c7fd8fedb790f47996f71b3755e04eb270 .

Sign in to add a comment