New issue
Advanced search Search tips

Issue 850941 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2018
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Chrome , Mac
Pri: 1
Type: Bug



Sign in to add a comment

ProcessTest.WaitForExitWithTimeout sometimes fail on Linux x86

Project Member Reported by rog...@vewd.com, Jun 8 2018

Issue description

Chrome Version: 69.0.3454.0 (locally built base_unittests)
OS: Linux (Ubuntu 18.04)

What steps will reproduce the problem?
(1) Build base_unittests
(2) ./base_unittests --gtest_filter=ProcessMetricsTest*:ProcessTest.*
(3)

What is the expected result?
All tests pass

What happens instead?
[  FAILED  ] ProcessTest.WaitForExitWithTimeout
[  FAILED  ] ProcessTest.ChildProcessIsRunning


Please use labels and text to provide additional information.
I bisected this and found that the issue was introduced by https://chromium-review.googlesource.com/920799 (https://crbug.com/806451).

The tests does not fail if I run the tests alone or with either of "ProcessMetricsTest*" and "ProcessTest.*", only if I run both of them. If I run all of the tests in base_unittests these tests fail but the test suite passes due to  automatic retries.
 

Comment 1 by w...@chromium.org, Jun 8 2018

Cc: -w...@chromium.org
Owner: w...@chromium.org
Status: Assigned (was: Untriaged)
Thanks for tracking this down; tests that fail first-time and only pass due to retries are terrible. :-/

Comment 2 by w...@chromium.org, Jun 8 2018

Labels: -Pri-3 FoundIn-67 M-69 FoundIn-68 OS-Android OS-Chrome OS-Mac Pri-1
Status: Started (was: Assigned)
Running base_unittests with:
--gtest_filter=ProcessTest.Terminate:ProcessTest.WaitForExitWithTimeout
or
--gtest_filter=ProcessTest.Terminate:ProcessTest.ChildProcessIsRunning
causes the second test to fail, in both cases.

Based on this I was able to trace the issue to being that the next WaitForExit*() after the ProcessTest.Terminate test will always report that the child exited, regardless of whether it did or not.  The bug is at https://cs.chromium.org/chromium/src/base/process/process_posix.cc?l=207:

  if (!WaitpidWithTimeout(handle, &status, timeout)) {
    // If multiple threads wait on the same |handle| then one wait will succeed
    // and the other will fail with errno set to ECHILD.
>   return exited || (errno == ECHILD);
  }

where I added a check against |errno|.  |errno| will only be (re)set by the waitpid() call in WaitpidWithTimeout() if it returns failure (-1), but not if it returns no-children-exited (0). Since WaitpidWithTimeout() treats both cases as failure, my change will cause a stale ECHILD from a previous call to be used.  Ooops.


Project Member

Comment 3 by bugdroid1@chromium.org, Jun 12 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2c2caae3608757dee09e4f802ac002545d71bfad

commit 2c2caae3608757dee09e4f802ac002545d71bfad
Author: Wez <wez@chromium.org>
Date: Tue Jun 12 04:19:53 2018

Remove ECHILD error handling from WaitForExit*().

In https://chromium-review.googlesource.com/c/chromium/src/+/920799 the
WaitForExit*() calls were modified to handle the ECHILD error, which
will be returned by waitpid() if the process being waited on exits and
is reaped by a different thread.

The implementation was incorrect, and would return true (process exited)
in the case where no processes had exited, but |errno| happened to have
been left set to ECHILD by a previous system call.

The broken implementation passed the try-bots and waterfall thanks to
TestLauncher retries, since the relevant tests pass unless run after a
test that leave |errno| set to ECHILD.

Remove the ECHILD handling from WaitForExit*() and instead modify the
//base unit-test that requires it to do its own special handling.

Bug:  850941 
Change-Id: I129c5b60428c9bc3eba0cb9e82846f0c45917f85
Reviewed-on: https://chromium-review.googlesource.com/1093974
Reviewed-by: Lei Zhang <thestig@chromium.org>
Commit-Queue: Wez <wez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#566301}
[modify] https://crrev.com/2c2caae3608757dee09e4f802ac002545d71bfad/base/process/process_posix.cc
[modify] https://crrev.com/2c2caae3608757dee09e4f802ac002545d71bfad/base/process/process_util_unittest.cc

Comment 4 by w...@chromium.org, Jun 12 2018

Status: Fixed (was: Started)

Comment 5 by rog...@vewd.com, Jun 12 2018

That was quick, thanks!
I can also confirm that the CL fixed the issue.

Sign in to add a comment