New issue
Advanced search Search tips

Issue 793412 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Jul 6
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Fuchsia
Pri: 3
Type: Bug

Blocking:
issue 738275



Sign in to add a comment

ProcessUtilTest.LaunchWithHandleTransfer flaked in base_unittests on Fuchsia Debug

Project Member Reported by scottmg@chromium.org, Dec 8 2017

Issue description

https://chromium-swarm.appspot.com/task?id=3a4f470bdbc66910&refresh=10&show_raw=1


[00130.505] 03758.03784> [1178/2659] ProcessUtilTest.FDRemapping (678 ms)
[00130.505] 03758.03784> [1179/2659] ProcessUtilTest.FDRemappingIncludesStdio (310 ms)
[00130.505] 03758.03784> [ RUN      ] ProcessUtilTest.LaunchWithHandleTransfer
[00130.506] 03758.03784> ../../base/process/process_util_unittest.cc:778: Failure
[00130.506] 03758.03784> Expected equality of these values:
[00130.506] 03758.03784>   (0)
[00130.506] 03758.03784>     Which is: 0
[00130.506] 03758.03784>   result
[00130.507] 03758.03784>     Which is: -21
[00130.507] 03758.03784> ../../base/process/process_util_unittest.cc:779: Failure
[00130.507] 03758.03784> Value of: signals & ZX_SOCKET_READABLE
[00130.507] 03758.03784>   Actual: false
[00130.508] 03758.03784> Expected: true
[00130.508] 03758.03784> ../../base/process/process_util_unittest.cc:784: Failure
[00130.508] 03758.03784> Expected equality of these values:
[00130.509] 03758.03784>   (0)
[00130.509] 03758.03784>     Which is: 0
[00130.509] 03758.03784>   result
[00130.510] 03758.03784>     Which is: -22
[00130.510] 03758.03784> ../../base/process/process_util_unittest.cc:785: Failure
[00130.510] 03758.03784> Expected equality of these values:
[00130.510] 03758.03784>   1u
[00130.511] 03758.03784>     Which is: 1
[00130.511] 03758.03784>   bytes_read
[00130.511] 03758.03784>     Which is: 0
[00130.511] 03758.03784> ../../base/process/process_util_unittest.cc:786: Failure
[00130.511] 03758.03784> Expected equality of these values:
[00130.512] 03758.03784>   kPipeValue
[00130.512] 03758.03784>     Which is: '\xCC' (204)
[00130.513] 03758.03784>   buf[0]
[00130.513] 03758.03784>     Which is: '\0'
[00130.513] 03758.03784> [3:1468420079:1208/141137.119602:130358757:WARNING:test_suite.cc(237)] Test launcher output path /tmp/.org.chromium.Chromium.gFOHGh/test_results.xml exists. Not adding test launcher result printer.
[00130.514] 03758.03784> [3:1468420079:1208/141137.123381:130362537:FATAL:process_util_unittest.cc(752)] Check failed: (0) == result (0 vs. -24)
#00: pc 0xfe12d554f60 (libbase.so,0x4f4f60)
#01: pc 0xfe12d1dfe3c (libbase.so,0x17fe3c)
#02: pc 0xfe12d25602a (libbase.so,0x1f602a)
#03: pc 0x7dfedf9ad869 (app:/system/base_unittests,0xb80869)
#04: pc 0x7dfee04193a5 (app:/system/base_unittests,0x15ec3a5)
#05: pc 0x7dfee039b74e (app:/system/base_unittests,0x156e74e)
#06: pc 0x7dfee03718fd (app:/system/base_unittests,0x15448fd)
#07: pc 0x7dfee0371844 (app:/system/base_unittests,0x1544844)
#08: pc 0x7dfee03717f5 (app:/system/base_unittests,0x15447f5)
#09: pc 0x7dfee037173c (app:/system/base_unittests,0x154473c)
#10: pc 0x7dfedf2b39ed (app:/system/base_unittests,0x4869ed)
#11: pc 0x7dfee03cacc7 (app:/system/base_unittests,0x159dcc7)
#12: pc 0x7dfee03cab35 (app:/system/base_unittests,0x159db35)
#13: pc 0x7dfee037159d (app:/system/base_unittests,0x154459d)
#14: pc 0x6fc91b3b012e (libc.so,0x1b12e)
[00130.518] 03758.03784>
[00130.518] 03758.03784> ../../base/process/process_util_unittest.cc:793: Failure
[00130.518] 03758.03784> Expected equality of these values:
[00130.518] 03758.03784>   0
[00130.518] 03758.03784>   exit_code
[00130.519] 03758.03784>     Which is: 1
[00130.519] 03758.03784> [  FAILED  ] ProcessUtilTest.LaunchWithHandleTransfer (5346 ms)
 

Comment 2 by w...@chromium.org, Dec 12 2017

Labels: M-65
Owner: w...@chromium.org
Status: Assigned (was: Available)

Comment 3 by w...@chromium.org, Dec 13 2017

Cc: scottmg@chromium.org
Status: Started (was: Assigned)
Summary: ProcessUtilTest.LaunchWithHandleTransfer flaked in base_unittests on Fuchsia Debug (was: ProcessUtilTest.LaunchWithHandleTransfer flaked in base_unittests on Fuchsia)
Both of the flakes are in the Debug/component build; my initial assumption was that this was a regression from my removing the binary-strip step from the build, but that didn't land until December 11th. Re-running the test locally, it's noticable that it completes considerably faster than on the bot; all the ProcessUtilTest tests seem to take 5-10s on the bots, whereas they complete in ~60ms if you run them on a workstation.

Running locally:
[00179.847] 01831.01857> [1/10] ProcessUtilTest.SpawnChild (59 ms)
[00179.847] 01831.01857> [2/10] ProcessUtilTest.KillSlowChild (69 ms)
[00179.847] 01831.01857> [3/10] ProcessUtilTest.CurrentDirectory (59 ms)
[00179.847] 01831.01857> [4/10] ProcessUtilTest.FDRemapping (115 ms)
[00179.847] 01831.01857> [5/10] ProcessUtilTest.FDRemappingIncludesStdio (58 ms)
[00179.847] 01831.01857> [6/10] ProcessUtilTest.LaunchWithHandleTransfer (57 ms)

On the bots a good run looks like:
[00116.623] 03760.03786> [1285/2659] ProcessUtilTest.SpawnChild (8834 ms)
[00116.623] 03760.03786> [1286/2659] ProcessUtilTest.KillSlowChild (6981 ms)
[00116.623] 03760.03786> [1287/2659] ProcessUtilTest.CurrentDirectory (5600 ms)
[00116.624] 03760.03786> [1288/2659] ProcessUtilTest.FDRemapping (930 ms)
[00116.624] 03760.03786> [1289/2659] ProcessUtilTest.FDRemappingIncludesStdio (499 ms)
[00116.626] 03760.03786> [1290/2659] ProcessUtilTest.LaunchWithHandleTransfer (2312 ms)

while a bad run looks like:
[00134.938] 03762.03788> [1285/2659] ProcessUtilTest.SpawnChild (13060 ms)
[00134.939] 03762.03788> [1286/2659] ProcessUtilTest.KillSlowChild (5739 ms)
[00134.939] 03762.03788> [1287/2659] ProcessUtilTest.CurrentDirectory (5440 ms)
[00134.939] 03762.03788> [1288/2659] ProcessUtilTest.FDRemapping (800 ms)
[00134.940] 03762.03788> [1289/2659] ProcessUtilTest.FDRemappingIncludesStdio (6756 ms)
[00134.940] 03762.03788> [ RUN      ] ProcessUtilTest.LaunchWithHandleTransfer
...
[00134.949] 03762.03788> [  FAILED  ] ProcessUtilTest.LaunchWithHandleTransfer (5130 ms)
[00134.950] 03762.03788> [1290/2659] ProcessUtilTest.LaunchWithHandleTransfer (5130 ms)

Even the timings of the "good" run look suspicious; the CurrentDirectory test simply spawns a child process with a |current_directory| configured to point to TMP, and verifies in the child that it's current-working-directory is there, yet in the "good" run above, it takes longer than the FDRemapping[IncludesStdio] tests, which actually perform a pipe IO exchange w/ the child before waiting for it. The FDRemapping test only waits for the child process for up to one second, by default, and yet seems not to be flaking?

The other things to note about these tests are:
- SpawnChild and KillSlowChild wait for up to action_max_timeout() for the process to exit. That defaults to ~30s (i.e. longer than the individual test timeout).
- CurrentDirectory waits indefinitely for the process to exit.

I'll replace the current timeouts with action_timeout(), since that seems more appropriate than hard-coding 5s in, but it might be worth also reviewing the spread of execution times for these tests, if possible, in case there is some obvious pattern that we're missing.

Comment 4 by w...@chromium.org, Dec 15 2017

Missed that in all these failures the sub-process (don't let the pid/tid numbers fool you; we're redirecting the output via the TestLauncher!):

[00130.514] 03758.03784> [3:1468420079:1208/141137.123381:130362537:FATAL:process_util_unittest.cc(752)] Check failed: (0) == result (0 vs. -24)

which is ZX_ERR_PEER_CLOSED, i.e. the sub-process isn't even reaching the zx_socket_write() call until the test itself has already timed-out.

One difference between this and the FDRemappingIncludesStdio test, which rarely flakes, is that FDRemappingIncludesStdio uses read(), which performs an "infinite timeout" wait for I/O from the child.
Project Member

Comment 5 by bugdroid1@chromium.org, Dec 18 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2490b28ceb23d899791b0a23d6d8c59b70570382

commit 2490b28ceb23d899791b0a23d6d8c59b70570382
Author: Wez <wez@chromium.org>
Date: Mon Dec 18 02:19:22 2017

Switch LaunchWithHandleTransfer test to use action_timeout().

This test times-out on Fuchsia bots, seemingly due to an extremely long
delay in getting around to scheduling the sub-process to execute, some
of the time. Other similar ProcessUtilTests succeed because they wait
indefinitely for the sub-process to do its work; raising the timeout
to action_timeout() provides enough head-room that this test passes.

This also has the advantage that the test's timeouts are now
configurable.

Bug:  793412 
Change-Id: I8b43e3cadfe302cf8afd9e835ee4ed1fbcda3c9e
Reviewed-on: https://chromium-review.googlesource.com/828383
Reviewed-by: Scott Graham <scottmg@chromium.org>
Reviewed-by: Daniel Cheng <dcheng@chromium.org>
Commit-Queue: Daniel Cheng <dcheng@chromium.org>
Cr-Commit-Position: refs/heads/master@{#524639}
[modify] https://crrev.com/2490b28ceb23d899791b0a23d6d8c59b70570382/base/process/process_util_unittest.cc

Comment 6 by w...@chromium.org, Dec 18 2017

Blocking: -754861
Cc: w...@chromium.org
Labels: -Pri-1 Pri-2
Owner: ----
Status: Available (was: Started)
Dropping priority since this is no longer flaking, but leaving open for investigation of this and similar tests sometimes taking an extra 5s to run.

Comment 7 by w...@chromium.org, Dec 21 2017

Cc: sergeyu@chromium.org
This flaked again even with the longer timeout:
https://ci.chromium.org/buildbot/chromium.fyi/Fuchsia%20%28dbg%29/15187
thanks to a whopping 13 second(!) run-time.

sergeyu@ thinks that slow disk writes on the bots may be causing output summary writes to the minfs mount to be excessively slow, which may also explain these strange delays.

Comment 8 by w...@chromium.org, Dec 21 2017

See https://fuchsia.atlassian.net/browse/ZX-1499 for the Fuchsia-side investigation bug.

Comment 9 by w...@chromium.org, Dec 21 2017

Owner: w...@chromium.org
Status: Assigned (was: Available)

Comment 10 by w...@chromium.org, Jan 6 2018

Note to self: We've observed potential priority-inversions with normal-priority threads stuck in spin-locks in LazyInstance, waiting for a background-priority thread to finish initializing the instance; we might try making the thread priority setter for Fuchsia a no-op to explore whether that improves things at all.
Project Member

Comment 11 by bugdroid1@chromium.org, Jan 10 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/660ff99a094be171290b0f1fe4a6a3e450ca55ef

commit 660ff99a094be171290b0f1fe4a6a3e450ca55ef
Author: Kevin Marshall <kmarshall@chromium.org>
Date: Wed Jan 10 01:59:59 2018

Roll Fuchsia SDK to 6b4cb32d100d2ecfaaa9642adfb0de451c5b9a69.

- Fixes argv[0] to report package-relative path for "main" binary.
- Fixes 'threads' utility not to hang if threads exit mid-dump.
- Adds tracing command & service to the SDK, for easier debugging.
- Fixes SSH to return valid exit codes from remote commands.
- Fixes "run" to correctly route program output via stdout/stderr.
- Fixes NET-354 (SSH leaving processes hanging after client disconnects.)

Bug: 707030,  799268 ,  793412 ,  798851 , 778467
Change-Id: Ie3ab3fed54df1884089b57e1638883684de6836f
Reviewed-on: https://chromium-review.googlesource.com/857809
Commit-Queue: Wez <wez@chromium.org>
Reviewed-by: Wez <wez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#528206}
[modify] https://crrev.com/660ff99a094be171290b0f1fe4a6a3e450ca55ef/DEPS

Project Member

Comment 12 by bugdroid1@chromium.org, Jan 10 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/26a004d0c98da273581c9d6953a35ea97874a204

commit 26a004d0c98da273581c9d6953a35ea97874a204
Author: Wez <wez@chromium.org>
Date: Wed Jan 10 08:27:08 2018

Dump list of all threads in timed-out/hung sub-processes.

Bug:  799268 ,  755282 ,  793412 
Change-Id: I6737bbac53253205c6d31d32ce1a34a27e9ceee1
Reviewed-on: https://chromium-review.googlesource.com/853079
Commit-Queue: Wez <wez@chromium.org>
Reviewed-by: Daniel Cheng <dcheng@chromium.org>
Cr-Commit-Position: refs/heads/master@{#528261}
[modify] https://crrev.com/26a004d0c98da273581c9d6953a35ea97874a204/base/test/launcher/test_launcher.cc

Comment 13 by w...@chromium.org, Jan 19 2018

This just flaked again (https://ci.chromium.org/buildbot/chromium.fyi/Fuchsia%20(dbg)/15679), but sadly due to an assertion-failure crash:

[00127.819] 03871.03901> [ RUN      ] ProcessUtilTest.LaunchWithHandleTransfer
[00127.819] 03871.03901> ../../base/process/process_util_unittest.cc:778: Failure
[00127.819] 03871.03901> Expected equality of these values:
[00127.820] 03871.03901>   (0)
[00127.820] 03871.03901>     Which is: 0
[00127.820] 03871.03901>   result
[00127.820] 03871.03901>     Which is: -21
[00127.820] 03871.03901> ../../base/process/process_util_unittest.cc:779: Failure
[00127.820] 03871.03901> Value of: signals & ZX_SOCKET_READABLE
[00127.821] 03871.03901>   Actual: false
[00127.821] 03871.03901> Expected: true
[00127.821] 03871.03901> ../../base/process/process_util_unittest.cc:784: Failure
[00127.822] 03871.03901> Expected equality of these values:
[00127.822] 03871.03901>   (0)
[00127.822] 03871.03901>     Which is: 0
[00127.822] 03871.03901>   result
[00127.823] 03871.03901>     Which is: -22
[00127.823] 03871.03901> ../../base/process/process_util_unittest.cc:785: Failure
[00127.823] 03871.03901> Expected equality of these values:
[00127.824] 03871.03901>   1u
[00127.824] 03871.03901>     Which is: 1
[00127.824] 03871.03901>   bytes_read
[00127.824] 03871.03901>     Which is: 0
[00127.824] 03871.03901> ../../base/process/process_util_unittest.cc:786: Failure
[00127.825] 03871.03901> Expected equality of these values:
[00127.825] 03871.03901>   kPipeValue
[00127.825] 03871.03901>     Which is: '\xCC' (204)
[00127.825] 03871.03901>   buf[0]
[00127.825] 03871.03901>     Which is: '\0'
[00127.826] 03871.03901> [3:975496893:0119/190526.827871:122647397:ERROR:process_fuchsia.cc(168)] going to wait for process 3a14e6b5 (deadline=132629649000, now=122657301000)
[00127.826] 03871.03901> [3:535410411:0119/190529.322930:125141554:WARNING:test_suite.cc(237)] Test launcher output path /tmp/.org.chromium.Chromium.GGkgnL/test_results.xml exists. Not adding test launcher result printer.
[00127.826] 03871.03901> [3:535410411:0119/190529.355058:125170874:FATAL:process_util_unittest.cc(751)] Check failed: (0) == result (0 vs. -24)
#00: base::debug::StackTrace::StackTrace(unsigned long) at ??:?
#01: base::debug::StackTrace::StackTrace() at ??:?
#02: logging::LogMessage::~LogMessage() at ??:?
#03: base::ProcessUtilsVerifyHandle() at ??:?
#04: multi_process_function_list::InvokeChildProcessTest(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) at ??:?
#05: base::TestSuite::Run() at ??:?
#06: int base::internal::FunctorTraits<int (base::TestSuite::*)(), void>::Invoke<base::TestSuite*>(int (base::TestSuite::*)(), base::TestSuite*&&) at ??:?
#07: int base::internal::InvokeHelper<false, int>::MakeItSo<int (base::TestSuite::* const&)(), base::TestSuite*>(int (base::TestSuite::* const&)(), base::TestSuite*&&) at ??:?
#08: int base::internal::Invoker<base::internal::BindState<int (base::TestSuite::*)(), base::internal::UnretainedWrapper<base::TestSuite> >, int ()>::RunImpl<int (base::TestSuite::* const&)(), std::__1::tuple<base::internal::UnretainedWrapper<base::TestSuite> > const&, 0ul>(int (base::TestSuite::* const&)(), std::__1::tuple<base::internal::UnretainedWrapper<base::TestSuite> > const&, std::__1::integer_sequence<unsigned long, 0ul>) at ??:?
#09: base::internal::Invoker<base::internal::BindState<int (base::TestSuite::*)(), base::internal::UnretainedWrapper<base::TestSuite> >, int ()>::Run(base::internal::BindStateBase*) at ??:?
#10: base::RepeatingCallback<int ()>::Run() const & at ??:?
#11: base::(anonymous namespace)::LaunchUnitTestsInternal(base::RepeatingCallback<int ()> const&, unsigned long, int, bool, base::RepeatingCallback<void ()> const&) at ??:?
#12: base::LaunchUnitTests(int, char**, base::RepeatingCallback<int ()> const&) at ??:?
#13: main at ??:?
#14: pc 0x6480815cf14e (libc.so,0x1b14e)
[00127.831] 03871.03901>
[00127.832] 03871.03901> ../../base/process/process_util_unittest.cc:793: Failure
[00127.832] 03871.03901> Expected equality of these values:
[00127.832] 03871.03901>   0
[00127.832] 03871.03901>   exit_code
[00127.833] 03871.03901>     Which is: 1
[00127.833] 03871.03901> [  FAILED  ] ProcessUtilTest.LaunchWithHandleTransfer (13220 ms)

Comment 14 by w...@chromium.org, Jan 26 2018

Labels: -M-65 M-66

Comment 15 by w...@chromium.org, Feb 7 2018

Blockedon: 810171

Comment 16 by w...@chromium.org, Feb 7 2018

Labels: -M-66 M-67

Comment 17 by w...@chromium.org, Apr 10 2018

Status: Started (was: Assigned)
Test flaked again in https://ci.chromium.org/buildbot/chromium.fyi/Fuchsia%20(dbg)/18059 with the same assertion failure in the child process, indicating that it isn't getting a chance to run until after the parent has closed the channel.

In successful runs LaunchWithHandleTransfer takes ~7.8s(!), whereas the LaunchProcess test, which launches _six_ child processes in succession, and reads a string back from each, consistently takes ~800ms.  Difference between the two is that LaunchProcess launches the shell and runs 'echo', whereas LaunchWithHandleTransfer launches a copy of the unit-test binary, which is presumably larger, and has ~20 dynamic libraries as dependencies.

Will disable this test in Debug builds until we have these infra performance issues addressed.
Project Member

Comment 18 by bugdroid1@chromium.org, Apr 10 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9ea3e3cf0148a0685ebf8f7af4979e31735d94e5

commit 9ea3e3cf0148a0685ebf8f7af4979e31735d94e5
Author: Wez <wez@chromium.org>
Date: Tue Apr 10 22:29:58 2018

Disable or filter some more tests which often flake under Fuchsia.

- Disable ProcessUtilTest.LaunchWithHandleTransfer under Debug builds,
  since launching the child process is currently too time consuming.
- Modifies that test to distinguish the child process failing to send
  data from the timeout case.
- Disable SecurityTest.NewOverflow, which fails under Fuchsia/Debug
  builds due to the SDK exporting the wrong new[].
- Disable RTCRtpSenderTest.GetStats, which is part of a suite of tests
  which appear to be inherently flaky, but fail most often under Fuchsia.
- Filter DataPipeTest.SendConsumerAndCloseProducer

Bug:  793412 ,  828229 ,  827450 , 831024
Change-Id: Icb655ca91881c3ff7059f333f0e22d9bf2d92468
Reviewed-on: https://chromium-review.googlesource.com/1003996
Reviewed-by: Daniel Cheng <dcheng@chromium.org>
Reviewed-by: Yuri Wiitala <miu@chromium.org>
Reviewed-by: Scott Graham <scottmg@chromium.org>
Commit-Queue: Wez <wez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#549663}
[modify] https://crrev.com/9ea3e3cf0148a0685ebf8f7af4979e31735d94e5/base/process/process_util_unittest.cc
[modify] https://crrev.com/9ea3e3cf0148a0685ebf8f7af4979e31735d94e5/base/security_unittest.cc
[modify] https://crrev.com/9ea3e3cf0148a0685ebf8f7af4979e31735d94e5/content/renderer/media/webrtc/rtc_rtp_sender_unittest.cc
[modify] https://crrev.com/9ea3e3cf0148a0685ebf8f7af4979e31735d94e5/testing/buildbot/filters/fuchsia.mojo_unittests.filter

Comment 19 by w...@chromium.org, Apr 11 2018

Labels: -Pri-2 -M-67 Pri-3
Owner: ----
Status: Available (was: Started)
Project Member

Comment 20 by bugdroid1@chromium.org, May 7 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/bce3e02f453dfbc7fcdf101e282582c794f52ccd

commit bce3e02f453dfbc7fcdf101e282582c794f52ccd
Author: Sergey Ulanov <sergeyu@chromium.org>
Date: Mon May 07 18:13:06 2018

Enable ProcessUtilTest.LaunchWithHandleTransfer on Fuchsia

VM performance was improved significantly with TSC clock, so the test
is not expected to timeout anymore.

Bug:  793412 
Change-Id: I8a4cd1ddd35a364e490252d02ccd59946a7600fa
Reviewed-on: https://chromium-review.googlesource.com/1045612
Reviewed-by: Scott Graham <scottmg@chromium.org>
Reviewed-by: Daniel Cheng <dcheng@chromium.org>
Commit-Queue: Sergey Ulanov <sergeyu@chromium.org>
Cr-Commit-Position: refs/heads/master@{#556499}
[modify] https://crrev.com/bce3e02f453dfbc7fcdf101e282582c794f52ccd/base/process/process_util_unittest.cc

Owner: sergeyu@chromium.org
Status: Fixed (was: Available)
Closing out since this seems to have stuck.
Blockedon: -810171

Sign in to add a comment