New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 879555 link

Starred by 2 users

Issue metadata

Status: Started
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug-Regression



Sign in to add a comment

UtilityProcessHostBrowserTest.LaunchProcessAndCrash is flaky on Windows

Project Member Reported by rbasuvula@chromium.org, Aug 31

Issue description

content_browsertests failed failure on Win Asan and win trunk build on Official & continuous builder

Build Path:
----------
https://uberchromegw.corp.google.com/i/official.desktop/builders/win-asan/builds/1083
https://uberchromegw.corp.google.com/i/official.desktop.continuous/builders/win%20trunk/builds/91679

Log Path:
---------
https://logs.chromium.org/v/?s=chrome%2Fbb%2Fofficial.desktop%2Fwin-asan%2F1083%2F%2B%2Frecipes%2Fsteps%2Fcontent_browsertests%2F0%2Fstdout
https://logs.chromium.org/v/?s=chrome%2Fbb%2Fofficial.desktop.continuous%2Fwin_trunk%2F91679%2F%2B%2Frecipes%2Fsteps%2Fcontent_browsertests%2F0%2Fstdout

Error log:
----------
[  FAILED  ] UtilityProcessHostBrowserTest.LaunchProcessAndCrash, where TypeParam =  and GetParam() =  (2583 ms)
[2931/2931] UtilityProcessHostBrowserTest.LaunchProcessAndCrash (2717 ms)
1 test failed:
    UtilityProcessHostBrowserTest.LaunchProcessAndCrash (../../content/browser/utility_process_host_browsertest.cc:163)
<Thread(Thread-1, started 10748)> ProcessRead: proc.stdout finished.
<Thread(Thread-1, started 10748)> ProcessRead: cleaning up.
<Thread(Thread-2, started daemon 3200)> TimedFlush: Finished
<Thread(Thread-1, started 10748)> ProcessRead: finished.
python C:\b\rr\tmp3pgipx\rw\checkout\recipes\.recipe_deps\depot_tools\gsutil.py -- cp file://c:\users\chrome~1\appdata\local\temp\tmpak3tee gs://chrome-gtest-results/raw/2018/8/27/31/f956e1eba333737350dfbcd0674b11ed315a5ed7.json.gz
Copying file://c:\users\chrome~1\appdata\local\temp\tmpak3tee [Content-Type=application/octet-stream]...
/ [0 files][    0.0 B/  1.2 MiB]                                                
-
- [0 files][  1.0 MiB/  1.2 MiB]                                                
- [1 files][  1.2 MiB/  1.2 MiB]                                                
\
Operation completed over 1 objects/1.2 MiB.                                      
python C:\b\rr\tmp3pgipx\rw\checkout\recipes\.recipe_deps\depot_tools\gsutil.py -- cp file://c:\users\chrome~1\appdata\local\temp\tmpak3tee "gs://chrome-gtest-results/buildbot/official.desktop.continuous/win trunk/91679/content_browsertests.json.gz"
Copying file://c:\users\chrome~1\appdata\local\temp\tmpak3tee [Content-Type=application/octet-stream]...
/ [0 files][    0.0 B/  1.2 MiB]                                                
-
- [0 files][  1.0 MiB/  1.2 MiB]                                                
- [1 files][  1.2 MiB/  1.2 MiB]                                                
\
Operation completed over 1 objects/1.2 MiB.                                      
python C:\b\rr\tmp3pgipx\rw\checkout\recipes\.recipe_deps\depot_tools\gsutil.py -- cp file://c:\users\chrome~1\appdata\local\temp\tmp1ecuwj gs://chrome-gtest-results/bigquery/2018/8/27/31/f956e1eba333737350dfbcd0674b11ed315a5ed7.json.gz
Copying file://c:\users\chrome~1\appdata\local\temp\tmp1ecuwj [Content-Type=application/octet-stream]...
/ [0 files][    0.0 B/ 70.5 KiB]                                                
/ [1 files][ 70.5 KiB/ 70.5 KiB]                                                
Operation completed over 1 objects/70.5 KiB.                                     
exit code (as seen by runtest.py): 1
Confused: 93 files were deleted from c:\users\chrome~1\appdata\local\temp during the test run
step returned non-zero exit code: 1

Unable to find the possible suspect.Untriaging it so that it gets addressed.

Note: its working fine in next build in continuous builder for tracking purpose raising this issue.

Thank You! 
 
Cc: ligim...@chromium.org pnangunoori@chromium.org
Just to update:

"UtilityProcessHostBrowserTest.LaunchProcessAndCrash"-content browser tests failure seen on win-asan build #71.0.3542.1 on official desktop builder.

Link to the builder:
====================
https://uberchromegw.corp.google.com/i/official.desktop/builders/win-asan

Link to the log:
================
https://uberchromegw.corp.google.com/i/official.desktop/builders/win-asan/builds/1087

Thanks!
Cc: wfh@chromium.org
when was the last successful run?

I look at win-asan and it looks like aug 20th.

https://uberchromegw.corp.google.com/i/official.desktop/builders/win-asan

I look at "win trunk" and it looks mostly green to me, except a recent generate_build_files failure.

https://uberchromegw.corp.google.com/i/official.desktop.continuous/builders/win%20trunk

Yeah that is correct these test failures are pretty recent on "Win_trunk" official desktop builder and suspecting that this a flaky test. 

https://uberchromegw.corp.google.com/i/official.desktop.continuous/builders/win64%20trunk/builds/39937
it does looks like the asan builder doesn't like crash tests. I see other failures in other crash tests e.g. https://uberchromegw.corp.google.com/i/official.desktop/builders/win-asan/builds/1068 fails on PDFExtensionTest.PdfAccessibilityTextRunCrash

perhaps some configuration changed on the asan builder.

UtilityProcessHostBrowserTest.LaunchProcessAndCrash has existed since https://chromium-review.googlesource.com/c/chromium/src/+/1086225

this was 6 June 2018. I see successful runs on win-asan since then e.g. https://uberchromegw.corp.google.com/i/official.desktop/builders/win-asan/builds/1067 was 15 Aug 2018.
Definitely flaky. I locally did a few runs UtilityProcessHostBrowserTest.LaunchProcessAndCrash built with crrev.com/590984 and LLVM 340925. It passed multiple times and failed multiple times.
Cc: h...@chromium.org thakis@chromium.org
I sent https://chromium-review.googlesource.com/c/chromium/src/+/1227303 to disable the flaky test.
Why do we have a win/asan bot on the official waterfall? What config does it build in? How is it different from https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/win-asan on the public memory waterfall?

(Off corp, can't check myself right now)
https://uberchromegw.corp.google.com/i/official.desktop/builders/win-asan/ doesn't do ASan builds.

It's configured like this:

dcheck_is_configurable = true
is_chrome_branded = true
is_debug = false
is_official_build = true
target_cpu = "x86"

I'm guessing it's a leftover from when we shipped Syzy-ASan builds to a fraction of users, and now it's used to ship builds with dchecks enabled maybe?
Labels: -Restrict-View-Google Test-Flaky
Owner: wfh@chromium.org
Status: Assigned (was: Untriaged)
Summary: UtilityProcessHostBrowserTest.LaunchProcessAndCrash is flaky on Windows (was: "UtilityProcessHostBrowserTest.LaunchProcessAndCrash" content_browsertests failed failure on Win Asan and win trunk build on Official & continuous builder)
Project Member

Comment 10 by bugdroid1@chromium.org, Sep 18

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b7c3f6a2c0f51ca0f253af17919450be53736bbd

commit b7c3f6a2c0f51ca0f253af17919450be53736bbd
Author: inglorion <inglorion@chromium.org>
Date: Tue Sep 18 21:53:36 2018

Disable flaky UtilityProcessHostBrowserTest.LaunchProcessAndCrash on Windows

content_browsertests UtilityProcessHostBrowserTest.LaunchProcessAndCrash is
flaky on Windows. Disabling the test so that it doesn't cause build bots to
be red on non-breaking changes.

Bug: 879555
Change-Id: If2aff8e69d240adf09173b82472db6c86eb6aa91
Reviewed-on: https://chromium-review.googlesource.com/1227303
Reviewed-by: Will Harris <wfh@chromium.org>
Reviewed-by: Jochen Eisinger <jochen@chromium.org>
Reviewed-by: John Abd-El-Malek <jam@chromium.org>
Commit-Queue: Reid Kleckner <rnk@chromium.org>
Cr-Commit-Position: refs/heads/master@{#592216}
[modify] https://crrev.com/b7c3f6a2c0f51ca0f253af17919450be53736bbd/content/browser/utility_process_host_browsertest.cc

Just to update:

'UtilityProcessHostBrowserTest.LaunchProcessAndCrash'--content_browsertests failed browser_tests failed Failure reason seen on Win64 stable on official continuous builder.

Link to the builder:
----------------
https://uberchromegw.corp.google.com/i/official.desktop.continuous/builders/win64%20stable/builds/1224

Link to error log:
-----------------
https://logs.chromium.org/v/?s=chrome%2Fbb%2Fofficial.desktop.continuous%2Fwin64_stable%2F1224%2F%2B%2Frecipes%2Fsteps%2Fcontent_browsertests%2F0%2Fstdout

wfh@, Please take a look into it.
Thanks..!



Status: Started (was: Assigned)
UtilityProcessHostBrowserTest.LaunchProcessAndCrash is alreayd disabled on trunk, so I'm not sure why this is failing on this bot. 

I am now actively investigating this failure. So far I am able to repro if I add a crash in UtilityMain startup, but unable to induce the failure any other way e.g. by adding delays or sleeps in parts of the code.

I might have to add some additional diagnostic details for this failure and re-enable the test to see what's actually going on here.
are the binaries from https://uberchromegw.corp.google.com/i/official.desktop.continuous/builders/win64%20stable/builds/1224 available anywhere - are they uploaded to isolate server perhaps? I still am unable to repro this flakiness locally unless I artificially induce a crash in the wrong place.
Cc: roc...@chromium.org gab@chromium.org
I'm having small success reproducing this with a build with the same flags as official. Adding a debug statement in run task to try and see what is different here between the failing run and the passing run.

My current understanding is that the failure occurs when BrowserChildProcessCrashed is called before OnChannelConnected. Compare passing.txt with failing.txt:

Passing.txt:

[6316:23624:1018/133432.412:INFO:task_annotator.cc(99)] running task from Send@../../gpu/ipc/client/gpu_channel_host.cc:68
[6316:20408:1018/133432.412:INFO:task_annotator.cc(99)] running task from OnChannelConnected@../../content/browser/browser_child_process_host_impl.cc:388  <-- CONNECTED
[6316:20408:1018/133432.412:INFO:task_annotator.cc(99)] running task from OnChannelConnected@../../content/browser/browser_child_process_host_impl.cc:395
[6316:20408:1018/133432.412:INFO:utility_process_host_browsertest.cc(158)] test_process has launched.
[6316:23624:1018/133432.412:INFO:task_annotator.cc(99)] running task from Send@../../gpu/ipc/client/gpu_channel_host.cc:68
[6316:20408:1018/133432.412:INFO:task_annotator.cc(99)] running task from OnChildSupportsTracing@../../content/browser/tracing/trace_message_filter.cc:57
[6316:23624:1018/133432.412:INFO:task_annotator.cc(99)] running task from Send@../../gpu/ipc/client/gpu_channel_host.cc:68
[6316:20408:1018/133432.412:INFO:task_annotator.cc(99)] running task from Notify@../../mojo/public/cpp/system/simple_watcher.cc:108
[6316:23624:1018/133432.412:INFO:task_annotator.cc(99)] running task from Notify@../../mojo/public/cpp/system/simple_watcher.cc:108
[6316:20408:1018/133432.412:INFO:task_annotator.cc(99)] running task from OnChildDisconnected@../../content/browser/browser_child_process_host_impl.cc:455
[6316:23624:1018/133432.412:INFO:task_annotator.cc(99)] running task from BrowserChildProcessCrashed@../../content/browser/utility_process_host_browsertest.cc:150  <-- CRASHES

but in failing.txt, there is no call happening to OnChannelConnected before the call to BrowserChildProcessCrashed. I do not know why this is happening.

I think I'm going to need someone more familiar with the internals of the service manager and/or task runner to work out what is happening here...
failing.txt
63.4 KB View Download
passing.txt
56.0 KB View Download
mojo is happening on the IO thread, and calls via ipc\ipc_mojo_bootstrap.cc mojo::MessageReceiver:Accept into client->HandleIncomingMessage(message) which ends up calling via some generated code (ChannelStubDispatch::Accept) into impl->SetPeerPid(std::move(p_pid)) which ends up in ChildProcessHostImpl::OnChannelConnected.

This ends up in BrowserChildProcessHostImpl::OnChannelConnected where a task is posted to UI thread for both the NotifyProcessHostConnected and NotifyProcessLaunchedAndConnected.

The issue seems to happen when the disconnect comes before that connected task is able to be run on the UI thread, resulting in the discrepancy.

It seems this is 'expected' - see comment in https://cs.chromium.org/chromium/src/content/browser/browser_child_process_host_impl.cc?l=437 so I think perhaps the solution is just to expect an out-of-order disconnect.
Project Member

Comment 16 by bugdroid1@chromium.org, Oct 19

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c67f8bc96da85822756d512c8925e97e2bb1a52c

commit c67f8bc96da85822756d512c8925e97e2bb1a52c
Author: Will Harris <wfh@chromium.org>
Date: Fri Oct 19 21:04:37 2018

Re-enable UtilityProcessHostBrowserTest.LaunchProcess without tracking launches.

As stated in BrowserChildProcessHostImpl::OnChildDisconnected:

"OnChildDisconnected may be called without OnChannelConnected, so stop the
early exit watcher so GetTerminationStatus can close the process handle"

This test previously did not take this into account and always expected a
connected notification before a disconnected notification, so the test can
be fixed by just removing this assumption.

BUG=879555

Change-Id: I68ce0d27765a6b5387d57bf7af10b84c8fb240ac
Reviewed-on: https://chromium-review.googlesource.com/c/1286910
Reviewed-by: Will Harris <wfh@chromium.org>
Reviewed-by: John Abd-El-Malek <jam@chromium.org>
Commit-Queue: Will Harris <wfh@chromium.org>
Cr-Commit-Position: refs/heads/master@{#601294}
[modify] https://crrev.com/c67f8bc96da85822756d512c8925e97e2bb1a52c/content/browser/utility_process_host_browsertest.cc

Sign in to add a comment