New issue
Advanced search Search tips

Issue 751894 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 11
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Fuchsia
Pri: 3
Type: Bug

Blocked on:
issue 10607

Blocking:
issue 706592
issue 738275



Sign in to add a comment

ConditionVariableTest.MultiThreadConsumerTest flakily timed out on Fuchsia

Project Member Reported by scottmg@chromium.org, Aug 3 2017

Issue description

I got a local timeout, it's a bit strange:

[00003.640] 02586.02614> Using 1 parallel jobs.
[00004.334] 03119.03146> Note: Google Test filter = ConditionVariableTest.MultiThreadConsumerTest:ReadWriteLockTest.MutexTwoThreads
[00004.335] 03119.03146> [==========] Running 2 tests from 2 test cases.
[00004.335] 03119.03146> [----------] Global test environment set-up.
[00004.335] 03119.03146> [----------] 1 test from ConditionVariableTest
[00004.337] 03119.03146> [ RUN      ] ConditionVariableTest.MultiThreadConsumerTest
[00004.341] 03119.03146> [3:-1435772112:0803/005755.235764:4341298:ERROR:condition_variable_unittest.cc(258)] a0
[00004.343] 03119.03146> [3:-1435772112:0803/005755.238140:4343192:ERROR:condition_variable_unittest.cc(260)] a1
[00018.803] 02586.02614> Still waiting for the following processes to finish:
[00018.803] 02586.02614> /system/base_unittests --gtest_filter=ConditionVariableTest.MultiThreadConsumerTest:ReadWriteLockTest.MutexTwoThreads --gtest_flagfile=/tmp/.org.chromium.Chromium.McfiKA/.org.chromium.Chromium.hJMMKm --single-process-tests -
[00018.803] 02586.02614> -test-launcher-output=/tmp/.org.chromium.Chromium.HpEDhD/test_results.xml --test-launcher-retry-limit=0 --test-launcher-timeout=90000
[00034.136] 02586.02614> Still waiting for the following processes to finish:
[00034.137] 02586.02614> /system/base_unittests --gtest_filter=ConditionVariableTest.MultiThreadConsumerTest:ReadWriteLockTest.MutexTwoThreads --gtest_flagfile=/tmp/.org.chromium.Chromium.McfiKA/.org.chromium.Chromium.hJMMKm --single-process-tests -
[00034.137] 02586.02614> -test-launcher-output=/tmp/.org.chromium.Chromium.HpEDhD/test_results.xml --test-launcher-retry-limit=0 --test-launcher-timeout=90000
[00049.186] 02586.02614> Still waiting for the following processes to finish:
[00049.187] 02586.02614> /system/base_unittests --gtest_filter=ConditionVariableTest.MultiThreadConsumerTest:ReadWriteLockTest.MutexTwoThreads --gtest_flagfile=/tmp/.org.chromium.Chromium.McfiKA/.org.chromium.Chromium.hJMMKm --single-process-tests -
[00049.187] 02586.02614> -test-launcher-output=/tmp/.org.chromium.Chromium.HpEDhD/test_results.xml --test-launcher-retry-limit=0 --test-launcher-timeout=90000
[00064.226] 02586.02614> Still waiting for the following processes to finish:
[00064.226] 02586.02614> /system/base_unittests --gtest_filter=ConditionVariableTest.MultiThreadConsumerTest:ReadWriteLockTest.MutexTwoThreads --gtest_flagfile=/tmp/.org.chromium.Chromium.McfiKA/.org.chromium.Chromium.hJMMKm --single-process-tests -
[00064.226] 02586.02614> -test-launcher-output=/tmp/.org.chromium.Chromium.HpEDhD/test_results.xml --test-launcher-retry-limit=0 --test-launcher-timeout=90000
[00079.226] 02586.02614> Still waiting for the following processes to finish:
[00079.226] 02586.02614> /system/base_unittests --gtest_filter=ConditionVariableTest.MultiThreadConsumerTest:ReadWriteLockTest.MutexTwoThreads --gtest_flagfile=/tmp/.org.chromium.Chromium.McfiKA/.org.chromium.Chromium.hJMMKm --single-process-tests -
[00079.226] 02586.02614> -test-launcher-output=/tmp/.org.chromium.Chromium.HpEDhD/test_results.xml --test-launcher-retry-limit=0 --test-launcher-timeout=90000
[00094.521] 02586.02614> Still waiting for the following processes to finish:
[00094.521] 02586.02614> /system/base_unittests --gtest_filter=ConditionVariableTest.MultiThreadConsumerTest:ReadWriteLockTest.MutexTwoThreads --gtest_flagfile=/tmp/.org.chromium.Chromium.McfiKA/.org.chromium.Chromium.hJMMKm --single-process-tests -
[00094.521] 02586.02614> -test-launcher-output=/tmp/.org.chromium.Chromium.HpEDhD/test_results.xml --test-launcher-retry-limit=0 --test-launcher-timeout=90000
[00100.789] 03119.03146> [3:-1435772112:0803/005931.684042:100789060:ERROR:condition_variable_unittest.cc(262)] a2
[00100.790] 03119.03146> [3:-1435772112:0803/005931.685192:100790191:ERROR:condition_variable_unittest.cc(265)] a3
[00100.791] 03119.03146> [3:-1435772112:0803/005931.686080:100791079:ERROR:condition_variable_unittest.cc(268)] a4
[00100.792] 03119.03146> [3:-1435772112:0803/005931.687008:100792007:ERROR:condition_variable_unittest.cc(271)] a5
[00100.792] 03119.03146> [3:-1435772112:0803/005931.687916:100792915:ERROR:condition_variable_unittest.cc(273)] a6
[00100.803] 03119.03146> [3:-1435772112:0803/005931.698208:100803207:ERROR:condition_variable_unittest.cc(276)] a7
[00100.804] 03119.03146> [3:-1435772112:0803/005931.699167:100804167:ERROR:condition_variable_unittest.cc(282)] a8
[00100.805] 03119.03146> [3:-1435772112:0803/005931.700595:100805594:ERROR:condition_variable_unittest.cc(293)] a9
[00100.806] 03119.03146> [3:-1435772112:0803/005931.701491:100806490:ERROR:condition_variable_unittest.cc(298)] a10
[00100.807] 03119.03146> [3:-1435772112:0803/005931.702419:100807419:ERROR:condition_variable_unittest.cc(300)] a11
[00100.808] 03119.03146> [3:-1435772112:0803/005931.703341:100808341:ERROR:condition_variable_unittest.cc(302)] a12
[00100.809] 03119.03146> [3:-1435772112:0803/005931.704318:100809318:ERROR:condition_variable_unittest.cc(304)] a13
[00100.810] 03119.03146> [3:-1435772112:0803/005931.705292:100810292:ERROR:condition_variable_unittest.cc(307)] a14
[00100.811] 03119.03146> [3:-1435772112:0803/005931.706229:100811229:ERROR:condition_variable_unittest.cc(310)] a15
[00100.812] 03119.03146> [3:-1435772112:0803/005931.707196:100812196:ERROR:condition_variable_unittest.cc(312)] a16
[00100.844] 03119.03146> [3:-1435772112:0803/005931.738703:100844131:ERROR:condition_variable_unittest.cc(315)] a17
[00100.875] 03119.03146> [3:-1435772112:0803/005931.770623:100875654:ERROR:condition_variable_unittest.cc(318)] a18
[00100.877] 03119.03146> [3:-1435772112:0803/005931.772589:100877689:ERROR:condition_variable_unittest.cc(321)] a19
[00100.880] 03119.03146> [3:-1435772112:0803/005931.775454:100880588:ERROR:condition_variable_unittest.cc(324)] a20
[00100.883] 03119.03146> [3:-1435772112:0803/005931.777664:100882764:ERROR:condition_variable_unittest.cc(327)] a21
[00100.886] 03119.03146> [3:-1435772112:0803/005931.780290:100885345:ERROR:condition_variable_unittest.cc(336)] a22
[00100.889] 03119.03146> [3:-1435772112:0803/005931.784298:100889375:ERROR:condition_variable_unittest.cc(338)] a22
[00100.891] 03119.03146> [3:-1435772112:0803/005931.786621:100891699:ERROR:condition_variable_unittest.cc(340)] a23
[00100.894] 03119.03146> [3:-1435772112:0803/005931.788966:100894045:ERROR:condition_variable_unittest.cc(342)] a24
[00100.896] 03119.03146> [3:-1435772112:0803/005931.791396:100896486:ERROR:condition_variable_unittest.cc(344)] a25
[00100.901] 03119.03146> [3:-1435772112:0803/005931.793920:100900981:ERROR:condition_variable_unittest.cc(347)] a26
[00100.903] 03119.03146> [3:-1435772112:0803/005931.798116:100903185:ERROR:condition_variable_unittest.cc(350)] a27
[00100.935] 03119.03146> [3:-1435772112:0803/005931.830636:100935715:ERROR:condition_variable_unittest.cc(353)] a28
[00100.938] 03119.03146> [3:-1435772112:0803/005931.833319:100938419:ERROR:condition_variable_unittest.cc(357)] a29
[00100.941] 03119.03146> [3:-1435772112:0803/005931.836395:100941492:ERROR:condition_variable_unittest.cc(359)] a30
[00100.944] 03119.03146> [3:-1435772112:0803/005931.839559:100944658:ERROR:condition_variable_unittest.cc(361)] a31
[00100.948] 03119.03146> [3:-1435772112:0803/005931.842986:100948086:ERROR:condition_variable_unittest.cc(366)] a32
[00100.950] 03119.03146> [3:-1435772112:0803/005931.845047:100950135:ERROR:condition_variable_unittest.cc(369)] a33
[00100.951] 03119.03146> [3:-1435772112:0803/005931.846748:100951786:ERROR:condition_variable_unittest.cc(371)] a34
[00100.953] 03119.03146> [3:-1435772112:0803/005931.848304:100953393:ERROR:condition_variable_unittest.cc(373)] a35
[00100.958] 03119.03146> [3:-1435772112:0803/005931.850274:100955360:ERROR:condition_variable_unittest.cc(375)] a36
[00100.960] 03119.03146> [3:-1435772112:0803/005931.855517:100960631:ERROR:condition_variable_unittest.cc(377)] a37
[00100.963] 03119.03146> [3:-1435772112:0803/005931.857754:100962823:ERROR:condition_variable_unittest.cc(379)] a38
[00100.966] 03119.03146> [3:-1435772112:0803/005931.860868:100965935:ERROR:condition_variable_unittest.cc(381)] a39
[00100.968] 03119.03146> [3:-1435772112:0803/005931.863056:100968124:ERROR:condition_variable_unittest.cc(385)] a40
[00101.001] 03119.03146> [3:-1435772112:0803/005931.895705:101000812:ERROR:condition_variable_unittest.cc(387)] a41
[00101.005] 03119.03146> [3:-1435772112:0803/005931.900088:101005187:ERROR:condition_variable_unittest.cc(398)] a42
[00101.008] 03119.03146> [3:-1435772112:0803/005931.903043:101008139:ERROR:condition_variable_unittest.cc(401)] a43
[00101.011] 03119.03146> [3:-1435772112:0803/005931.905893:101010993:ERROR:condition_variable_unittest.cc(403)] a44
[00101.013] 03119.03146> [3:-1435772112:0803/005931.908653:101013741:ERROR:condition_variable_unittest.cc(405)] a45
[00101.016] 03119.03146> [3:-1435772112:0803/005931.911188:101016275:ERROR:condition_variable_unittest.cc(407)] a46
[00101.019] 03119.03146> [3:-1435772112:0803/005931.913740:101018827:ERROR:condition_variable_unittest.cc(409)] a47
[00101.022] 03119.03146> [3:-1435772112:0803/005931.917473:101022559:ERROR:condition_variable_unittest.cc(411)] a48
[00101.025] 03119.03146> [3:-1435772112:0803/005931.920014:101025091:ERROR:condition_variable_unittest.cc(414)] a49
[00101.089] 03119.03146> [3:-1435772112:0803/005931.983658:101088767:ERROR:condition_variable_unittest.cc(426)] a50
[00101.091] 03119.03146> [3:-1435772112:0803/005931.986539:101091635:ERROR:condition_variable_unittest.cc(428)] a51
[00101.094] 03119.03146> [3:-1435772112:0803/005931.989397:101094495:ERROR:condition_variable_unittest.cc(430)] a52
[00101.097] 03119.03146> [3:-1435772112:0803/005931.992150:101097250:ERROR:condition_variable_unittest.cc(432)] a53
[00101.100] 03119.03146> [3:-1435772112:0803/005931.995149:101100248:ERROR:condition_variable_unittest.cc(434)] a54
[00101.103] 03119.03146> [3:-1435772112:0803/005931.997787:101102874:ERROR:condition_variable_unittest.cc(436)] a55
[00101.106] 03119.03146> [3:-1435772112:0803/005932.001403:101106490:ERROR:condition_variable_unittest.cc(438)] a56
[00101.109] 03119.03146> [3:-1435772112:0803/005932.004283:101109372:ERROR:condition_variable_unittest.cc(441)] a57
[00101.172] 03119.03146> [3:-1435772112:0803/005932.067431:101172567:ERROR:condition_variable_unittest.cc(446)] a58
[00101.176] 03119.03146> [3:-1435772112:0803/005932.071086:101176186:ERROR:condition_variable_unittest.cc(453)] a59
[00101.179] 03119.03146> [3:-1435772112:0803/005932.073896:101178996:ERROR:condition_variable_unittest.cc(456)] a60
[00101.190] 03119.03146> [3:-1435772112:0803/005932.085270:101190362:ERROR:condition_variable_unittest.cc(459)] a61
[00101.206] 03119.03146> [3:-1435772112:0803/005932.100745:101205828:ERROR:condition_variable_unittest.cc(462)] a62
[00101.216] 03119.03146> [       OK ] ConditionVariableTest.MultiThreadConsumerTest (96873 ms)
[00101.219] 03119.03146> [----------] 1 test from ConditionVariableTest (96880 ms total)
[00101.220] 03119.03146>
[00101.220] 03119.03146> [----------] 1 test from ReadWriteLockTest
[00101.222] 03119.03146> [ RUN      ] ReadWriteLockTest.MutexTwoThreads
[00101.224] 03119.03146> [3:-1435772112:0803/005932.119664:101224725:ERROR:read_write_lock_unittest.cc(185)] b0
[00101.226] 03119.03146> [3:-1435772112:0803/005932.121646:101226710:ERROR:read_write_lock_unittest.cc(187)] b1
[00101.228] 03119.03146> [3:-1435772112:0803/005932.123607:101228672:ERROR:read_write_lock_unittest.cc(189)] b2
[00101.230] 03119.03146> [3:-1435772112:0803/005932.125570:101230635:ERROR:read_write_lock_unittest.cc(192)] b3
[00101.232] 03119.03146> [3:-1435772112:0803/005932.127724:101232783:ERROR:read_write_lock_unittest.cc(194)] b4
[00101.239] 03119.03146> [3:-1435772112:0803/005932.134472:101239528:ERROR:read_write_lock_unittest.cc(197)] b5
[00101.668] 03119.03146> [3:-1435772112:0803/005932.563248:101668353:ERROR:read_write_lock_unittest.cc(200)] b6
[00101.671] 03119.03146> [3:-1435772112:0803/005932.566292:101671402:ERROR:read_write_lock_unittest.cc(203)] b7
[00101.674] 03119.03146> [3:-1435772112:0803/005932.568924:101674016:ERROR:read_write_lock_unittest.cc(206)] b8
[00101.677] 03119.03146> [       OK ] ReadWriteLockTest.MutexTwoThreads (452 ms)
[00101.679] 03119.03146> [----------] 1 test from ReadWriteLockTest (455 ms total)
[00101.679] 03119.03146>
[00101.679] 03119.03146> [----------] Global test environment tear-down
[00101.681] 03119.03146> [==========] 2 tests from 2 test cases ran. (97347 ms total)
[00101.682] 03119.03146> [  PASSED  ] 2 tests.
[00101.803] 02586.02614> [1/2] ConditionVariableTest.MultiThreadConsumerTest (TIMED OUT)
[00101.803] 02586.02614> [2/2] ReadWriteLockTest.MutexTwoThreads (452 ms)
[00101.805] 02586.02614> 1 test timed out:
[00101.805] 02586.02614>     ConditionVariableTest.MultiThreadConsumerTest (../../base/synchronization/condition_variable_unittest.cc:257)
[00101.819] 02586.02614> Tests took 98 seconds.


where the LOGs are in https://chromium-review.googlesource.com/c/599358.

This makes it look like it hung creating the WorkQueue https://cs.chromium.org/chromium/src/base/synchronization/condition_variable_unittest.cc?rcl=36cbc24e0e99afca6e45d8f791d0653eb566f0fb&l=259 from that log. But I'm not sure if I trust the timestamps of the log statements. ... more digging required.

(Ignoring the fact that this test is dumb, which it is...) the only real thing being done in WorkQueue::WorkQueue() is making an array of 10 threads. That in turn is doing PlatformThread::Create() which goes here https://cs.chromium.org/chromium/src/base/threading/platform_thread_posix.cc?l=81 and doesn't do a whole lot other than pthread_create(). So that's a bit worrying.

I added a bit more logging around that thread creation loop, but haven't been able to repro again yet with that.
Different failure in that test :(

[00003.475] 03111.03138> Note: Google Test filter = ConditionVariableTest.MultiThreadConsumerTest:ReadWriteLockTest.MutexTwoThreads
[00003.475] 03111.03138> [==========] Running 2 tests from 2 test cases.
[00003.475] 03111.03138> [----------] Global test environment set-up.
[00003.477] 03111.03138> [----------] 1 test from ConditionVariableTest
[00003.478] 03111.03138> [ RUN      ] ConditionVariableTest.MultiThreadConsumerTest
[00003.481] 03111.03138> [3:-1435772112:0803/181620.767473:3481293:ERROR:condition_variable_unittest.cc(258)] a0
[00003.482] 03111.03138> [3:-1435772112:0803/181620.769404:3482815:ERROR:condition_variable_unittest.cc(260)] a1
[00003.484] 03111.03138> [3:-1435772112:0803/181620.770732:3484142:ERROR:condition_variable_unittest.cc(580)] c0
[00004.095] 03111.03138> [3:-1435772112:0803/181621.382220:4095635:ERROR:condition_variable_unittest.cc(586)] c3
[00004.096] 03111.03138> [3:-1435772112:0803/181621.383275:4096687:ERROR:condition_variable_unittest.cc(262)] a2
[00004.097] 03111.03138> [3:-1435772112:0803/181621.384221:4097632:ERROR:condition_variable_unittest.cc(265)] a3
[00004.098] 03111.03138> [3:-1435772112:0803/181621.385145:4098556:ERROR:condition_variable_unittest.cc(268)] a4
[00004.099] 03111.03138> [3:-1435772112:0803/181621.386074:4099497:ERROR:condition_variable_unittest.cc(271)] a5
[00004.100] 03111.03138> [3:-1435772112:0803/181621.386999:4100410:ERROR:condition_variable_unittest.cc(273)] a6
[00183.065] 02760.03095> [3:654035760:0803/181920.350575:183064449:ERROR:process_fuchsia.cc(173)] mx_object_wait_one timed out, and no signals
[00183.067] 02760.02850> Still waiting for the following processes to finish:
[00183.067] 02760.02850> /system/base_unittests --gtest_filter=ConditionVariableTest.MultiThreadConsumerTest:ReadWriteLockTest.MutexTwoThreads --gtest_flagfile=/tmp/.org.chromium.Chromium.kAJihJ/.org.chromium.Chromium.NbeLng --single-process-tests -

Status: Started (was: Unconfirmed)
Got the failure from #c3 again:

[00003.544] 03307.03334> Note: Google Test filter = ConditionVariableTest.MultiThreadConsumerTest:ReadWriteLockTest.MutexTwoThreads
[00003.544] 03307.03334> [==========] Running 2 tests from 2 test cases.
[00003.544] 03307.03334> [----------] Global test environment set-up.
[00003.545] 03307.03334> [----------] 1 test from ConditionVariableTest
[00003.547] 03307.03334> [ RUN      ] ConditionVariableTest.MultiThreadConsumerTest
[00003.550] 03307.03334> [3:-1270506704:0804/183343.163891:3549796:ERROR:condition_variable_unittest.cc(258)] a0
[00003.551] 03307.03334> [3:-1270506704:0804/183343.165813:3551304:ERROR:condition_variable_unittest.cc(260)] a1
[00003.552] 03307.03334> [3:-1270506704:0804/183343.167154:3552642:ERROR:condition_variable_unittest.cc(580)] c0
[00003.757] 03307.03334> [3:-1270506704:0804/183343.371468:3756960:ERROR:condition_variable_unittest.cc(586)] c3
[00003.757] 03307.03334> [3:-1270506704:0804/183343.372413:3757903:ERROR:condition_variable_unittest.cc(262)] a2
[00003.758] 03307.03334> [3:-1270506704:0804/183343.373302:3758792:ERROR:condition_variable_unittest.cc(265)] a3
[00003.759] 03307.03334> [3:-1270506704:0804/183343.374235:3759725:ERROR:condition_variable_unittest.cc(268)] a4
[00003.760] 03307.03334> [3:-1270506704:0804/183343.375152:3760641:ERROR:condition_variable_unittest.cc(271)] a5
[00003.761] 03307.03334> [3:-1270506704:0804/183343.376055:3761545:ERROR:condition_variable_unittest.cc(273)] a6
[00018.361] 02490.02536> Still waiting for the following processes to finish:
[00018.361] 02490.02536> /system/base_unittests --gtest_filter=ConditionVariableTest.MultiThreadConsumerTest:ReadWriteLockTest.MutexTwoThreads --gtest_flagfile=/tmp/.org.chromium.Chromium.AMjiLM/.org.chromium.Chromium.nkmFNm --single-process-tests -
[00018.361] 02490.02536> -test-launcher-output=/tmp/.org.chromium.Chromium.FGAfDo/test_results.xml --test-launcher-retry-limit=0 --test-launcher-timeout=90000
[00033.810] 02490.02536> Still waiting for the following processes to finish:
[00033.810] 02490.02536> /system/base_unittests --gtest_filter=ConditionVariableTest.MultiThreadConsumerTest:ReadWriteLockTest.MutexTwoThreads --gtest_flagfile=/tmp/.org.chromium.Chromium.AMjiLM/.org.chromium.Chromium.nkmFNm --single-process-tests -
[00033.810] 02490.02536> -test-launcher-output=/tmp/.org.chromium.Chromium.FGAfDo/test_results.xml --test-launcher-retry-limit=0 --test-launcher-timeout=90000
[00049.027] 02490.02536> Still waiting for the following processes to finish:
[00049.027] 02490.02536> /system/base_unittests --gtest_filter=ConditionVariableTest.MultiThreadConsumerTest:ReadWriteLockTest.MutexTwoThreads --gtest_flagfile=/tmp/.org.chromium.Chromium.AMjiLM/.org.chromium.Chromium.nkmFNm --single-process-tests -
[00049.027] 02490.02536> -test-launcher-output=/tmp/.org.chromium.Chromium.FGAfDo/test_results.xml --test-launcher-retry-limit=0 --test-launcher-timeout=90000
[00064.027] 02490.02536> Still waiting for the following processes to finish:
[00064.027] 02490.02536> /system/base_unittests --gtest_filter=ConditionVariableTest.MultiThreadConsumerTest:ReadWriteLockTest.MutexTwoThreads --gtest_flagfile=/tmp/.org.chromium.Chromium.AMjiLM/.org.chromium.Chromium.nkmFNm --single-process-tests -
[00064.027] 02490.02536> -test-launcher-output=/tmp/.org.chromium.Chromium.FGAfDo/test_results.xml --test-launcher-retry-limit=0 --test-launcher-timeout=90000
[00079.028] 02490.02536> Still waiting for the following processes to finish:
[00079.028] 02490.02536> /system/base_unittests --gtest_filter=ConditionVariableTest.MultiThreadConsumerTest:ReadWriteLockTest.MutexTwoThreads --gtest_flagfile=/tmp/.org.chromium.Chromium.AMjiLM/.org.chromium.Chromium.nkmFNm --single-process-tests -
[00079.028] 02490.02536> -test-launcher-output=/tmp/.org.chromium.Chromium.FGAfDo/test_results.xml --test-launcher-retry-limit=0 --test-launcher-timeout=90000
[00094.223] 02490.02536> Still waiting for the following processes to finish:
[00094.223] 02490.02536> /system/base_unittests --gtest_filter=ConditionVariableTest.MultiThreadConsumerTest:ReadWriteLockTest.MutexTwoThreads --gtest_flagfile=/tmp/.org.chromium.Chromium.AMjiLM/.org.chromium.Chromium.nkmFNm --single-process-tests -
[00094.223] 02490.02536> -test-launcher-output=/tmp/.org.chromium.Chromium.FGAfDo/test_results.xml --test-launcher-retry-limit=0 --test-launcher-timeout=90000
[00103.580] 03307.03334> [3:-1270506704:0804/183523.195169:103580670:ERROR:condition_variable_unittest.cc(276)] a7
[00103.581] 03307.03334> [3:-1270506704:0804/183523.196263:103581753:ERROR:condition_variable_unittest.cc(282)] a8
[00103.583] 03307.03334> [3:-1270506704:0804/183523.197675:103583165:ERROR:condition_variable_unittest.cc(293)] a9
[00103.584] 03307.03334> [3:-1270506704:0804/183523.198663:103584153:ERROR:condition_variable_unittest.cc(298)] a10
[00103.585] 03307.03334> [3:-1270506704:0804/183523.199591:103585081:ERROR:condition_variable_unittest.cc(300)] a11
...

I don't quite know why it manages to sort of continue after hanging for 90s just as it's about to be killed. Perhaps something in process termination releases a thing and lets it run the rest of the test.

The loop that it's gotten stuck on twice is this though:

  LOG(ERROR) << "a6";
    while (!queue.EveryIdWasAllocated())
      queue.all_threads_have_ids()->Wait();
  LOG(ERROR) << "a7";

EveryIdWasAllocated() does https://cs.chromium.org/chromium/src/base/synchronization/condition_variable_unittest.cc?rcl=c8ffa1beb0ef45a81878553717983152b735b87b&l=542 a check that all the threads have been assigned ids by counting whether thread_count_ == thread_started_counter_.

queue.all_threads_have_ids()->Wait() is waiting on a ConditionVariable that is Signal()d by the main thread under those same conditions (when EveryIdWasAllocated()) is true. 

Because our ConditionVariable class no longer actually does much itself, but rather forwards to either pthreads or win32 condition variables, I'm going to assume that the Signal/Wait are not the bug for now.

My current best guess then would be that WorkQueue::GetThreadId() is racing. https://cs.chromium.org/chromium/src/base/synchronization/condition_variable_unittest.cc?rcl=1d7b37d1258fd3a24d07ef745aa37747a3732650&l=536 There is some locking-type thing there, but I'm not (yet) convinced it's doing the right thing.

This doesn't explain the original hang in #c1 though, which looked more like it was in the thread creation loop.
Ah, yes, the DFAKE_SCOPED_RECURSIVE_LOCK is keyed on !NDEBUG, rather than whether dchecks are enabled like you would think (this is very old code, so probably predates that distinction).
Hmm, but the one call to GetThreadId() is guarded by lock_, and that's the only place that thread_started_counter_ is mutated. So that actually seems OK.

One explanation that would work for the 3 observed hangs is that pthread_create() can hang, or the started thread can hang before execution. However, that way lies madness.
Owner: scottmg@chromium.org
Wait, I don't understand this test at all I think. The main body locks the queue.lock(), and then busy loops until all threads are initialized, while the ThreadMain try to get to initialize, but they also take that lock before doing so?

...

OK, they must be locking different things it seems. But I have no idea which things they're locking then.

...

Uh, nope. Same lock_ pointer:

[00003.563] 03310.03337> [==========] Running 1 test from 1 test case.
[00003.563] 03310.03337> [----------] Global test environment set-up.
[00003.563] 03310.03337> [----------] 1 test from ConditionVariableTest
[00003.565] 03310.03337> [ RUN      ] ConditionVariableTest.MultiThreadConsumerTest
[00003.573] 03310.03390> [3:-515642576:0804/204944.217960:3572677:ERROR:condition_variable_unittest.cc(723)] ThreadMain 0 has lock 0x1e5019408718
[00003.574] 03310.03390> [3:-515642576:0804/204944.220504:3574815:ERROR:condition_variable_unittest.cc(727)] ThreadMain 0 released lock 0x1e5019408718
[00003.577] 03310.03385> [3:-759051472:0804/204944.223313:3577623:ERROR:condition_variable_unittest.cc(723)] ThreadMain 1 has lock 0x1e5019408718
[00003.579] 03310.03385> [3:-759051472:0804/204944.224606:3578932:ERROR:condition_variable_unittest.cc(727)] ThreadMain 1 released lock 0x1e5019408718
[00003.582] 03310.03401> [3:-105788624:0804/204944.227378:3581687:ERROR:condition_variable_unittest.cc(723)] ThreadMain 2 has lock 0x1e5019408718
[00003.583] 03310.03401> [3:-105788624:0804/204944.228668:3582980:ERROR:condition_variable_unittest.cc(727)] ThreadMain 2 released lock 0x1e5019408718
[00003.585] 03310.03408> [3:1462008624:0804/204944.231255:3585563:ERROR:condition_variable_unittest.cc(723)] ThreadMain 3 has lock 0x1e5019408718
[00003.586] 03310.03408> [3:1462008624:0804/204944.232546:3586857:ERROR:condition_variable_unittest.cc(727)] ThreadMain 3 released lock 0x1e5019408718
[00003.592] 03310.03415> [3:-748053712:0804/204944.237807:3592130:ERROR:condition_variable_unittest.cc(723)] ThreadMain 4 has lock 0x1e5019408718
[00003.593] 03310.03415> [3:-748053712:0804/204944.239189:3593500:ERROR:condition_variable_unittest.cc(727)] ThreadMain 4 released lock 0x1e5019408718
[00003.596] 03310.03438> [3:-1028764880:0804/204944.241863:3596173:ERROR:condition_variable_unittest.cc(723)] ThreadMain 5 has lock 0x1e5019408718
[00003.597] 03310.03438> [3:-1028764880:0804/204944.243127:3597439:ERROR:condition_variable_unittest.cc(727)] ThreadMain 5 released lock 0x1e5019408718
[00003.600] 03310.03445> [3:-703407312:0804/204944.245827:3600150:ERROR:condition_variable_unittest.cc(723)] ThreadMain 6 has lock 0x1e5019408718
[00003.602] 03310.03445> [3:-703407312:0804/204944.247931:3602245:ERROR:condition_variable_unittest.cc(727)] ThreadMain 6 released lock 0x1e5019408718
[00003.604] 03310.03337> [3:-1435772112:0804/204944.249844:3604159:ERROR:condition_variable_unittest.cc(267)] test body has lock 0x1e5019408718
[00003.605] 03310.03421> [3:-576439504:0804/204944.251149:3605461:ERROR:condition_variable_unittest.cc(723)] ThreadMain 7 has lock 0x1e5019408718
[00003.606] 03310.03421> [3:-576439504:0804/204944.252502:3606816:ERROR:condition_variable_unittest.cc(727)] ThreadMain 7 released lock 0x1e5019408718
[00003.610] 03310.03426> [3:842332976:0804/204944.255325:3609637:ERROR:condition_variable_unittest.cc(723)] ThreadMain 8 has lock 0x1e5019408718
[00003.612] 03310.03433> [3:-1359353040:0804/204944.257816:3612129:ERROR:condition_variable_unittest.cc(723)] ThreadMain 9 has lock 0x1e5019408718
[00003.613] 03310.03433> [3:-1359353040:0804/204944.259451:3613766:ERROR:condition_variable_unittest.cc(727)] ThreadMain 9 released lock 0x1e5019408718
[00003.615] 03310.03337> [3:-1435772112:0804/204944.261121:3615445:ERROR:condition_variable_unittest.cc(271)] test body released lock 0x1e5019408718
[00003.616] 03310.03426> [3:842332976:0804/204944.262495:3616810:ERROR:condition_variable_unittest.cc(727)] ThreadMain 8 released lock 0x1e5019408718
[00004.057] 03310.03337> [       OK ] ConditionVariableTest.MultiThreadConsumerTest (486 ms)
[00004.060] 03310.03337> [----------] 1 test from ConditionVariableTest (492 ms total)

And both the main loop and thread 7 have it acquired at the same time. Urm?

[00003.604] 03310.03337> [3:-1435772112:0804/204944.249844:3604159:ERROR:condition_variable_unittest.cc(267)] test body has lock 0x1e5019408718
[00003.605] 03310.03421> [3:-576439504:0804/204944.251149:3605461:ERROR:condition_variable_unittest.cc(723)] ThreadMain 7 has lock 0x1e5019408718

The changes for the logs above are https://chromium-review.googlesource.com/c/599358/2/base/synchronization/condition_variable_unittest.cc

I'm going crazy I think?
(As pointed out to me on IRC, the lock code could be racing the print, and the ConditionVariable unlocks its lock, so all is well there. I still have no idea why it's hanging rarely though.)
A more convenient (in that it's not just a no-output hang) flake locally in the same suite, but might be a different problem:

[00097.032] 02543.02599> [ RUN      ] ConditionVariableTest.LargeFastTaskTest
[00097.032] 02543.02599> [3:-223110352:0811/190217.778455:96900642:FATAL:thread_id_name_manager.cc(114)] Check failed: (id_to_handle_iter!= thread_id_to_handle_.end()).
----- start symbolized stack
#00: base::debug::StackTrace::StackTrace(unsigned long) at /work/cr/src/out/fuch/../../base/debug/stack_trace_fuchsia.cc:169
#01: endl<char, std::__1::char_traits<char> > at /work/cr/src/out/fuch/../../buildtools/third_party/libc++/trunk/include/ostream:1001
      (inlined by) operator<< at /work/cr/src/out/fuch/../../buildtools/third_party/libc++/trunk/include/ostream:195
      (inlined by) ~LogMessage at /work/cr/src/out/fuch/../../base/logging.cc:554
#02: base::ThreadIdNameManager::RemoveName(unsigned long, int) at /work/cr/src/out/fuch/../../base/threading/thread_id_name_manager.cc:114
#03: base::(anonymous namespace)::ThreadFunc(void*) at /work/cr/src/out/fuch/../../base/threading/platform_thread_posix.cc:77
#04: base_unittests+0x19896
#05: base_unittests+0x910fa
----- end symbolized stack
[00097.032] 02543.02599>
[00097.033] 02543.02599> [1386/2462] ConditionVariableTest.LargeFastTaskTest (CRASHED)

This seems different, but platform_thread_posix.cc seems pretty straightforward (RegisterThread(), run thread, RemoveName()) https://cs.chromium.org/chromium/src/base/threading/platform_thread_posix.cc?rcl=24cdbc10fbc7474216e7648a0bcb6fa300e0e578&l=67 so it seems like something funky in thread teardown, or the fixture.


Cc: w...@chromium.org
wez found #c10, and I repro'd with

diff --git a/base/threading/thread_id_name_manager.cc b/base/threading/thread_id_name_manager.cc
index 107e0dc..3227579 100644
--- a/base/threading/thread_id_name_manager.cc
+++ b/base/threading/thread_id_name_manager.cc
@@ -44,6 +44,11 @@ const char* ThreadIdNameManager::GetDefaultInternedString() {
 void ThreadIdNameManager::RegisterThread(PlatformThreadHandle::Handle handle,
                                          PlatformThreadId id) {
   AutoLock locked(lock_);
+  if (thread_id_to_handle_.find(id) != thread_id_to_handle_.end()) {
+    LOG(ERROR) << "Adding id=" << id << " as handle=" << handle
+               << ", but already there as handle="
+               << thread_id_to_handle_.find(id)->second;
+  }
   thread_id_to_handle_[id] = handle;
   thread_handle_to_interned_name_[handle] =
       name_to_interned_name_[kDefaultName];

Eventually, we hit this output:

[00098.866] 27213.28061> [3:2099071792:0811/222136.724944:98866476:ERROR:thread_id_name_manager.cc(48)] Adding id=2099071792 as handle=97210093882160, but already there as handle=4778102704944


hex(97210093882160) == 0x58697d1d4b30
hex(4778102704944)  ==  0x4587d1d4b30

which are the same in the bottom 32 bits!

And PlatformThreadId is pid_t, which is int, but pthread_t (which is what pthread_self() returns) is unsigned long or __pthread* ... in either case, 64 bits. Yikes!
Project Member

Comment 12 by bugdroid1@chromium.org, Aug 12 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c4822834e91f44835e3fcdd49fcf9ccafe6a84ec

commit c4822834e91f44835e3fcdd49fcf9ccafe6a84ec
Author: Wez <wez@chromium.org>
Date: Sat Aug 12 01:59:30 2017

Fix PlatformThread::CurrentId() under Fuchsia.

Under Fuchsia CurrentId() was returning the pthread_t result of calling
pthread_self() as the PlatformThreadId, which under POSIX platforms is
defined to pid_t. Under Fuchsia these are an unsigned long derived from
the pthread's internal point (i.e. 64-bit), and an int (32-bit)
respectively. This is changed to use the platform thread handle, which
is a 32-bit value as well.

Bug:  751894 ,  735701 , 738275
Change-Id: Ie5ab21c3c01d609b9b2d5b6a43d04ae57990eaf3
Reviewed-on: https://chromium-review.googlesource.com/611853
Commit-Queue: Wez <wez@chromium.org>
Reviewed-by: Daniel Cheng <dcheng@chromium.org>
Reviewed-by: Scott Graham <scottmg@chromium.org>
Cr-Commit-Position: refs/heads/master@{#493935}
[modify] https://crrev.com/c4822834e91f44835e3fcdd49fcf9ccafe6a84ec/base/threading/platform_thread.h
[modify] https://crrev.com/c4822834e91f44835e3fcdd49fcf9ccafe6a84ec/base/threading/platform_thread_posix.cc

Project Member

Comment 13 by bugdroid1@chromium.org, Aug 12 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0cdc0f4c4db9d2766b29d3a93b3617c107f9f8b0

commit 0cdc0f4c4db9d2766b29d3a93b3617c107f9f8b0
Author: Wez <wez@chromium.org>
Date: Sat Aug 12 16:45:16 2017

Fix inappropriate use of pthread_self() in PlatformThread::SetName().

We updated PlatformThread::CurrentID() to work correctly for Fuchsia
in https://chromium-review.googlesource.com/c/611853, but ironically
the Fuchsia version of PlatformThread::SetName() was still mis-using
pthread_self() as the PlatformThreadId as well, causing swathes of
tests to fail due to ThreadIdManager::Set/Get being called with
different Ids (pthread_t vs mx_handle_t) and therefore mis-matching.

TBR: scottmg@chromium.org
Bug:  751894 ,  735701 , 738275
Change-Id: I520560050c18efe624f701b4b9c5925b92bbec9a
Reviewed-on: https://chromium-review.googlesource.com/612027
Reviewed-by: Wez <wez@chromium.org>
Commit-Queue: Wez <wez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#493980}
[modify] https://crrev.com/0cdc0f4c4db9d2766b29d3a93b3617c107f9f8b0/base/threading/platform_thread_fuchsia.cc

Project Member

Comment 14 by bugdroid1@chromium.org, Aug 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5c0b8e30fbf94783c389c90f4ea78bec23e51bf0

commit 5c0b8e30fbf94783c389c90f4ea78bec23e51bf0
Author: Sergey Ulanov <sergeyu@chromium.org>
Date: Thu Aug 31 21:51:02 2017

Actually disable ConditionVariableTest.LargeFastTaskTest

The test-name didn't have - in the front, so the test wasn't actually
disabled.

TBR=scottmg@chromium.org
NOTRY=true

Bug:  751894 
Change-Id: If0292834f085fee86a7a82c88ad88cf4ceeb6570
Reviewed-on: https://chromium-review.googlesource.com/646492
Commit-Queue: Sergey Ulanov <sergeyu@chromium.org>
Reviewed-by: Sergey Ulanov <sergeyu@chromium.org>
Reviewed-by: Scott Graham <scottmg@chromium.org>
Cr-Commit-Position: refs/heads/master@{#499022}
[modify] https://crrev.com/5c0b8e30fbf94783c389c90f4ea78bec23e51bf0/testing/buildbot/filters/fuchsia.base_unittests.filter

Status: Fixed (was: Started)
Fixed by c10-c13, I believe.

Sign in to add a comment