Issue metadata
Sign in to add a comment
|
UtilityProcessHostBrowserTest.LaunchProcessAndCrash is flaky on Windows |
||||||||||||||||||||
Issue descriptioncontent_browsertests failed failure on Win Asan and win trunk build on Official & continuous builder Build Path: ---------- https://uberchromegw.corp.google.com/i/official.desktop/builders/win-asan/builds/1083 https://uberchromegw.corp.google.com/i/official.desktop.continuous/builders/win%20trunk/builds/91679 Log Path: --------- https://logs.chromium.org/v/?s=chrome%2Fbb%2Fofficial.desktop%2Fwin-asan%2F1083%2F%2B%2Frecipes%2Fsteps%2Fcontent_browsertests%2F0%2Fstdout https://logs.chromium.org/v/?s=chrome%2Fbb%2Fofficial.desktop.continuous%2Fwin_trunk%2F91679%2F%2B%2Frecipes%2Fsteps%2Fcontent_browsertests%2F0%2Fstdout Error log: ---------- [ FAILED ] UtilityProcessHostBrowserTest.LaunchProcessAndCrash, where TypeParam = and GetParam() = (2583 ms) [2931/2931] UtilityProcessHostBrowserTest.LaunchProcessAndCrash (2717 ms) 1 test failed: UtilityProcessHostBrowserTest.LaunchProcessAndCrash (../../content/browser/utility_process_host_browsertest.cc:163) <Thread(Thread-1, started 10748)> ProcessRead: proc.stdout finished. <Thread(Thread-1, started 10748)> ProcessRead: cleaning up. <Thread(Thread-2, started daemon 3200)> TimedFlush: Finished <Thread(Thread-1, started 10748)> ProcessRead: finished. python C:\b\rr\tmp3pgipx\rw\checkout\recipes\.recipe_deps\depot_tools\gsutil.py -- cp file://c:\users\chrome~1\appdata\local\temp\tmpak3tee gs://chrome-gtest-results/raw/2018/8/27/31/f956e1eba333737350dfbcd0674b11ed315a5ed7.json.gz Copying file://c:\users\chrome~1\appdata\local\temp\tmpak3tee [Content-Type=application/octet-stream]... / [0 files][ 0.0 B/ 1.2 MiB] - - [0 files][ 1.0 MiB/ 1.2 MiB] - [1 files][ 1.2 MiB/ 1.2 MiB] \ Operation completed over 1 objects/1.2 MiB. python C:\b\rr\tmp3pgipx\rw\checkout\recipes\.recipe_deps\depot_tools\gsutil.py -- cp file://c:\users\chrome~1\appdata\local\temp\tmpak3tee "gs://chrome-gtest-results/buildbot/official.desktop.continuous/win trunk/91679/content_browsertests.json.gz" Copying file://c:\users\chrome~1\appdata\local\temp\tmpak3tee [Content-Type=application/octet-stream]... / [0 files][ 0.0 B/ 1.2 MiB] - - [0 files][ 1.0 MiB/ 1.2 MiB] - [1 files][ 1.2 MiB/ 1.2 MiB] \ Operation completed over 1 objects/1.2 MiB. python C:\b\rr\tmp3pgipx\rw\checkout\recipes\.recipe_deps\depot_tools\gsutil.py -- cp file://c:\users\chrome~1\appdata\local\temp\tmp1ecuwj gs://chrome-gtest-results/bigquery/2018/8/27/31/f956e1eba333737350dfbcd0674b11ed315a5ed7.json.gz Copying file://c:\users\chrome~1\appdata\local\temp\tmp1ecuwj [Content-Type=application/octet-stream]... / [0 files][ 0.0 B/ 70.5 KiB] / [1 files][ 70.5 KiB/ 70.5 KiB] Operation completed over 1 objects/70.5 KiB. exit code (as seen by runtest.py): 1 Confused: 93 files were deleted from c:\users\chrome~1\appdata\local\temp during the test run step returned non-zero exit code: 1 Unable to find the possible suspect.Untriaging it so that it gets addressed. Note: its working fine in next build in continuous builder for tracking purpose raising this issue. Thank You!
,
Sep 6
when was the last successful run? I look at win-asan and it looks like aug 20th. https://uberchromegw.corp.google.com/i/official.desktop/builders/win-asan I look at "win trunk" and it looks mostly green to me, except a recent generate_build_files failure. https://uberchromegw.corp.google.com/i/official.desktop.continuous/builders/win%20trunk
,
Sep 6
Yeah that is correct these test failures are pretty recent on "Win_trunk" official desktop builder and suspecting that this a flaky test. https://uberchromegw.corp.google.com/i/official.desktop.continuous/builders/win64%20trunk/builds/39937
,
Sep 6
it does looks like the asan builder doesn't like crash tests. I see other failures in other crash tests e.g. https://uberchromegw.corp.google.com/i/official.desktop/builders/win-asan/builds/1068 fails on PDFExtensionTest.PdfAccessibilityTextRunCrash perhaps some configuration changed on the asan builder. UtilityProcessHostBrowserTest.LaunchProcessAndCrash has existed since https://chromium-review.googlesource.com/c/chromium/src/+/1086225 this was 6 June 2018. I see successful runs on win-asan since then e.g. https://uberchromegw.corp.google.com/i/official.desktop/builders/win-asan/builds/1067 was 15 Aug 2018.
,
Sep 14
Definitely flaky. I locally did a few runs UtilityProcessHostBrowserTest.LaunchProcessAndCrash built with crrev.com/590984 and LLVM 340925. It passed multiple times and failed multiple times.
,
Sep 15
I sent https://chromium-review.googlesource.com/c/chromium/src/+/1227303 to disable the flaky test.
,
Sep 15
Why do we have a win/asan bot on the official waterfall? What config does it build in? How is it different from https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/win-asan on the public memory waterfall? (Off corp, can't check myself right now)
,
Sep 17
https://uberchromegw.corp.google.com/i/official.desktop/builders/win-asan/ doesn't do ASan builds. It's configured like this: dcheck_is_configurable = true is_chrome_branded = true is_debug = false is_official_build = true target_cpu = "x86" I'm guessing it's a leftover from when we shipped Syzy-ASan builds to a fraction of users, and now it's used to ship builds with dchecks enabled maybe?
,
Sep 17
,
Sep 18
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b7c3f6a2c0f51ca0f253af17919450be53736bbd commit b7c3f6a2c0f51ca0f253af17919450be53736bbd Author: inglorion <inglorion@chromium.org> Date: Tue Sep 18 21:53:36 2018 Disable flaky UtilityProcessHostBrowserTest.LaunchProcessAndCrash on Windows content_browsertests UtilityProcessHostBrowserTest.LaunchProcessAndCrash is flaky on Windows. Disabling the test so that it doesn't cause build bots to be red on non-breaking changes. Bug: 879555 Change-Id: If2aff8e69d240adf09173b82472db6c86eb6aa91 Reviewed-on: https://chromium-review.googlesource.com/1227303 Reviewed-by: Will Harris <wfh@chromium.org> Reviewed-by: Jochen Eisinger <jochen@chromium.org> Reviewed-by: John Abd-El-Malek <jam@chromium.org> Commit-Queue: Reid Kleckner <rnk@chromium.org> Cr-Commit-Position: refs/heads/master@{#592216} [modify] https://crrev.com/b7c3f6a2c0f51ca0f253af17919450be53736bbd/content/browser/utility_process_host_browsertest.cc
,
Oct 17
Just to update: 'UtilityProcessHostBrowserTest.LaunchProcessAndCrash'--content_browsertests failed browser_tests failed Failure reason seen on Win64 stable on official continuous builder. Link to the builder: ---------------- https://uberchromegw.corp.google.com/i/official.desktop.continuous/builders/win64%20stable/builds/1224 Link to error log: ----------------- https://logs.chromium.org/v/?s=chrome%2Fbb%2Fofficial.desktop.continuous%2Fwin64_stable%2F1224%2F%2B%2Frecipes%2Fsteps%2Fcontent_browsertests%2F0%2Fstdout wfh@, Please take a look into it. Thanks..!
,
Oct 17
UtilityProcessHostBrowserTest.LaunchProcessAndCrash is alreayd disabled on trunk, so I'm not sure why this is failing on this bot. I am now actively investigating this failure. So far I am able to repro if I add a crash in UtilityMain startup, but unable to induce the failure any other way e.g. by adding delays or sleeps in parts of the code. I might have to add some additional diagnostic details for this failure and re-enable the test to see what's actually going on here.
,
Oct 17
are the binaries from https://uberchromegw.corp.google.com/i/official.desktop.continuous/builders/win64%20stable/builds/1224 available anywhere - are they uploaded to isolate server perhaps? I still am unable to repro this flakiness locally unless I artificially induce a crash in the wrong place.
,
Oct 18
I'm having small success reproducing this with a build with the same flags as official. Adding a debug statement in run task to try and see what is different here between the failing run and the passing run. My current understanding is that the failure occurs when BrowserChildProcessCrashed is called before OnChannelConnected. Compare passing.txt with failing.txt: Passing.txt: [6316:23624:1018/133432.412:INFO:task_annotator.cc(99)] running task from Send@../../gpu/ipc/client/gpu_channel_host.cc:68 [6316:20408:1018/133432.412:INFO:task_annotator.cc(99)] running task from OnChannelConnected@../../content/browser/browser_child_process_host_impl.cc:388 <-- CONNECTED [6316:20408:1018/133432.412:INFO:task_annotator.cc(99)] running task from OnChannelConnected@../../content/browser/browser_child_process_host_impl.cc:395 [6316:20408:1018/133432.412:INFO:utility_process_host_browsertest.cc(158)] test_process has launched. [6316:23624:1018/133432.412:INFO:task_annotator.cc(99)] running task from Send@../../gpu/ipc/client/gpu_channel_host.cc:68 [6316:20408:1018/133432.412:INFO:task_annotator.cc(99)] running task from OnChildSupportsTracing@../../content/browser/tracing/trace_message_filter.cc:57 [6316:23624:1018/133432.412:INFO:task_annotator.cc(99)] running task from Send@../../gpu/ipc/client/gpu_channel_host.cc:68 [6316:20408:1018/133432.412:INFO:task_annotator.cc(99)] running task from Notify@../../mojo/public/cpp/system/simple_watcher.cc:108 [6316:23624:1018/133432.412:INFO:task_annotator.cc(99)] running task from Notify@../../mojo/public/cpp/system/simple_watcher.cc:108 [6316:20408:1018/133432.412:INFO:task_annotator.cc(99)] running task from OnChildDisconnected@../../content/browser/browser_child_process_host_impl.cc:455 [6316:23624:1018/133432.412:INFO:task_annotator.cc(99)] running task from BrowserChildProcessCrashed@../../content/browser/utility_process_host_browsertest.cc:150 <-- CRASHES but in failing.txt, there is no call happening to OnChannelConnected before the call to BrowserChildProcessCrashed. I do not know why this is happening. I think I'm going to need someone more familiar with the internals of the service manager and/or task runner to work out what is happening here...
,
Oct 19
mojo is happening on the IO thread, and calls via ipc\ipc_mojo_bootstrap.cc mojo::MessageReceiver:Accept into client->HandleIncomingMessage(message) which ends up calling via some generated code (ChannelStubDispatch::Accept) into impl->SetPeerPid(std::move(p_pid)) which ends up in ChildProcessHostImpl::OnChannelConnected. This ends up in BrowserChildProcessHostImpl::OnChannelConnected where a task is posted to UI thread for both the NotifyProcessHostConnected and NotifyProcessLaunchedAndConnected. The issue seems to happen when the disconnect comes before that connected task is able to be run on the UI thread, resulting in the discrepancy. It seems this is 'expected' - see comment in https://cs.chromium.org/chromium/src/content/browser/browser_child_process_host_impl.cc?l=437 so I think perhaps the solution is just to expect an out-of-order disconnect.
,
Oct 19
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c67f8bc96da85822756d512c8925e97e2bb1a52c commit c67f8bc96da85822756d512c8925e97e2bb1a52c Author: Will Harris <wfh@chromium.org> Date: Fri Oct 19 21:04:37 2018 Re-enable UtilityProcessHostBrowserTest.LaunchProcess without tracking launches. As stated in BrowserChildProcessHostImpl::OnChildDisconnected: "OnChildDisconnected may be called without OnChannelConnected, so stop the early exit watcher so GetTerminationStatus can close the process handle" This test previously did not take this into account and always expected a connected notification before a disconnected notification, so the test can be fixed by just removing this assumption. BUG=879555 Change-Id: I68ce0d27765a6b5387d57bf7af10b84c8fb240ac Reviewed-on: https://chromium-review.googlesource.com/c/1286910 Reviewed-by: Will Harris <wfh@chromium.org> Reviewed-by: John Abd-El-Malek <jam@chromium.org> Commit-Queue: Will Harris <wfh@chromium.org> Cr-Commit-Position: refs/heads/master@{#601294} [modify] https://crrev.com/c67f8bc96da85822756d512c8925e97e2bb1a52c/content/browser/utility_process_host_browsertest.cc |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by pnangunoori@chromium.org
, Sep 4