New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 700522 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
OOO until 2019-01-24
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

"gpu_tests.gpu_process_integration_test.GpuProcessIntegrationTest.GpuProcess_only_one_workaround" is flaky

Project Member Reported by chromium...@appspot.gserviceaccount.com, Mar 10 2017

Issue description

"gpu_tests.gpu_process_integration_test.GpuProcessIntegrationTest.GpuProcess_only_one_workaround" is flaky.

This issue was created automatically by the chromium-try-flakes app. Please find the right owner to fix the respective test/step and assign this issue to them. If the step/test is infrastructure-related, please add Infra-Troopers label and change issue status to Untriaged. When done, please remove the issue from Sheriff Bug Queue by removing the Sheriff-Chromium label.

We have detected 3 recent flakes. List of all flakes can be found at https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyagsSBUZsYWtlIl9ncHVfdGVzdHMuZ3B1X3Byb2Nlc3NfaW50ZWdyYXRpb25fdGVzdC5HcHVQcm9jZXNzSW50ZWdyYXRpb25UZXN0LkdwdVByb2Nlc3Nfb25seV9vbmVfd29ya2Fyb3VuZAw.

Flaky tests should be disabled within 30 minutes unless culprit CL is found and reverted. Please see more details here: https://sites.google.com/a/chromium.org/dev/developers/tree-sheriffs/sheriffing-bug-queues#triaging-auto-filed-flakiness-bugs
 
Labels: -Sheriff-Chromium
Owner: kbr@chromium.org
Assigning to kbr@ to have a look.

Comment 2 by kbr@chromium.org, Mar 20 2017

Cc: kbr@chromium.org
Components: Internals>GPU>Testing Internals>Mojo
Owner: roc...@chromium.org
Status: Assigned (was: Untriaged)
The failure appears to be provoked because of this crash. It looks like there is a race condition between child process bringup and sending of Mojo messages to it. Not sure why this seems to only be flaky on Windows. Seems to have become flaky on March 6. On the other hand, that may have coincided with when these tests were switched to typ, which may have improved the precision of the error reporting.

[3412:3540:0317/124244.441:ERROR:process_win.cc(140)] Unable to terminate process: Access is denied. (0x5)
[3412:2892:0317/124244.753:FATAL:browser_child_process_host_impl.cc(284)] Check failed: child_process_->GetProcess().IsValid(). Requesting a child process handle before launch has completed OK.
Backtrace:
	base::debug::StackTrace::StackTrace [0x664EC277+55]
	base::debug::StackTrace::StackTrace [0x664F12EA+10]
	content::BrowserChildProcessHostImpl::GetProcess [0x65D4B04A+314]
	content::GpuProcessHost::GetProcessHandles [0x65E2011F+159]
	content::RenderMessageFilter::OnHasGpuProcess [0x65F12435+69]
	??$DispatchDelayReply@VRenderMessageFilter@content@@XP812@AEXPAVMessage@IPC@@@Z@?$MessageT@UChildProcessHostMsg_HasGpuProcess_Meta@@V?$tuple@$$V@std@@V?$tuple@_N@3@@IPC@@SA_NPBVMessage@1@PAVRenderMessageFilter@content@@PAXP834@AEXPAV21@@Z@Z [0x65F1138F+136]
	content::RenderMessageFilter::OnMessageReceived [0x65F12679+345]
	content::BrowserMessageFilter::Internal::DispatchMessageW [0x65CC5D91+33]
	content::BrowserMessageFilter::Internal::OnMessageReceived [0x65CC6017+183]
	IPC::MessageFilterRouter::TryFilters [0x66837C8F+103]
	IPC::MessageFilterRouter::TryFilters [0x66837C65+61]
	IPC::ChannelProxy::Context::TryFilters [0x66830D80+91]
	IPC::ChannelProxy::Context::OnMessageReceived [0x6683097E+14]
	IPC::ChannelMojo::OnMessageReceived [0x6682E598+152]
	IPC::internal::MessagePipeReader::Receive [0x66833940+336]
	IPC::mojom::ChannelStubDispatch::Accept [0x65ACD6EC+698]
	IPC::mojom::ChannelStub<mojo::RawPtrImplRefTraits<IPC::mojom::Channel> >::Accept [0x66833578+24]
	mojo::InterfaceEndpointClient::HandleValidatedMessage [0x6651DEE9+608]
	mojo::FilterChain::Accept [0x665209E6+118]
	mojo::InterfaceEndpointClient::HandleIncomingMessage [0x6651DC81+100]
	IPC::`anonymous namespace'::MojoBootstrapImpl::`scalar deleting destructor' [0x66834CF3+451]
	mojo::FilterChain::Accept [0x665209E6+118]
	mojo::Connector::ReadSingleMessage [0x6651F49C+153]
	mojo::Connector::ReadAllAvailableMessages [0x6651F2C0+57]
	mojo::Connector::OnHandleReadyInternal [0x6651F0BC+119]
	base::internal::Invoker<base::internal::BindState<void (__thiscall content::RedirectToFileResourceHandler::Writer::*)(int),base::internal::UnretainedWrapper<content::RedirectToFileResourceHandler::Writer> >,void __cdecl(int)>::Run [0x65E63BD1+17]
	base::internal::RunMixin<base::Callback<bool __cdecl(enum previews::PreviewsType),1,1> >::Run [0x66265D7C+32]
	mojo::SimpleWatcher::OnHandleReady [0x66523EC8+184]
	base::internal::InvokeHelper<1,void>::MakeItSo<void (__thiscall VersionHandler::*const &)(std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> > *,std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> > *),base [0x675130C6+43]
	base::internal::Invoker<base::internal::BindState<void (__thiscall mojo::SimpleWatcher::*)(int,unsigned int),base::WeakPtr<mojo::SimpleWatcher>,int,unsigned int>,void __cdecl(void)>::RunImpl<void (__thiscall mojo::SimpleWatcher::*const &)(int,unsigned int [0x670CD423+23]
	base::internal::Invoker<base::internal::BindState<void (__thiscall mojo::SimpleWatcher::*)(int,unsigned int),base::WeakPtr<mojo::SimpleWatcher>,int,unsigned int>,void __cdecl(void)>::Run [0x670CF296+22]
	base::debug::TaskAnnotator::RunTask [0x66505971+417]
	base::MessageLoop::RunTask [0x664BC351+1233]
	base::MessageLoop::DoWork [0x664BB4A5+741]
	base::MessagePumpForIO::DoRunLoop [0x665069AC+188]
	base::MessagePumpWin::Run [0x6650726A+74]
	base::MessageLoop::RunHandler [0x664BBE77+247]
	base::RunLoop::Run [0x664D90C4+132]
	base::Thread::Run [0x664B8CCD+173]
	content::BrowserThreadImpl::IOThreadRun [0x65D5887B+30]
	content::BrowserThreadImpl::Run [0x65D594D6+246]
	base::Thread::ThreadMain [0x664B976E+622]
	base::PlatformThread::Sleep [0x66488C62+290]
	BaseThreadInitThunk [0x751C337A+18]
	RtlInitializeExceptionChain [0x777392B2+99]
	RtlInitializeExceptionChain [0x77739285+54]



This may be reproducible on Windows by building Release mode with the GN arg:
dcheck_always_on=true

and repeatedly running:
python content/test/gpu/run_gpu_integration_test.py gpu_process --browser=release

rockot@: could you or someone else on the Mojo team take a look at the above stack trace and see whether it might be caused by a bug or race condition in Mojo? This test does bring up and shut down the browser quickly and repeatedly. Could that be causing problems?

Comment 3 by kbr@chromium.org, Mar 20 2017

Note: the error at the top:
[3412:3540:0317/124244.441:ERROR:process_win.cc(140)] Unable to terminate process: Access is denied. (0x5)

might be the cause of the following error.

Project Member

Comment 4 by bugdroid1@chromium.org, Mar 21 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/57842131a06bc01168d581f14fb496bb63be216a

commit 57842131a06bc01168d581f14fb496bb63be216a
Author: kbr <kbr@chromium.org>
Date: Tue Mar 21 01:51:36 2017

Mark GpuProcess_only_one_workaround flaky on Windows.

BUG=700522
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel
TBR=zmo@chromium.org
NOTRY=true

Review-Url: https://codereview.chromium.org/2762013002
Cr-Commit-Position: refs/heads/master@{#458273}

[modify] https://crrev.com/57842131a06bc01168d581f14fb496bb63be216a/content/test/gpu/gpu_tests/gpu_process_expectations.py

Comment 5 by roc...@chromium.org, Mar 21 2017

It doesn't look like a race in Mojo per se, it looks like a race between the expectations of RenderMessageFilter and the behavior of GpuProcessHost.

I'll see if I can repro locally and sort this out.

Comment 6 by roc...@chromium.org, Mar 21 2017

I can't repro this btw, but looking at the code, the IPC is clearly racy.

RenderMessageFilter asks GpuProcessHost for a list of active GPU process handles, but it does no validation to ensure that the GpuProcessHosts it queries even have a valid process handle. Hence the DCHECK, which can be hit if any extant GpuProcessHost is currently launching a process.

I would submit a trivial fix for this but it's not clear to me what the correct behavior should be: should GpuProcessHost::GetProcessHandles defer until no hosts are in a launching state? Or should it just make a best effort to return the process hosts which are currently (process_launched_ && !process_->IsStarting()) which would safely validate the conditions required to obtain a process handle.

Comment 7 by roc...@chromium.org, Mar 21 2017

BTW looks like this IPC is from https://codereview.chromium.org/1547793004 (moved to RenderMessageFilter in https://codereview.chromium.org/1799713002) and it has always been incorrect. So maybe some subtle timing changes or test environment changes have just uncovered this raciness.

Comment 8 by kbr@chromium.org, Mar 21 2017

Cc: piman@chromium.org jbau...@chromium.org roc...@chromium.org
Owner: kbr@chromium.org
Oh dear. Sorry rockot@ for making you investigate this.

Let me take this from you. I'll do what you suggested and only send messages to the GPU processes which aren't currently launching.

Comment 9 Deleted

Is this still Pri-1?

Comment 11 by kbr@chromium.org, Jul 6 2017

Labels: -Pri-1 Pri-2
Not P1, but still needs to be fixed.

Cc: -roc...@chromium.org rockot@google.com

Sign in to add a comment