"gpu_tests.gpu_process_integration_test.GpuProcessIntegrationTest.GpuProcess_only_one_workaround" is flaky |
|||||
Issue description"gpu_tests.gpu_process_integration_test.GpuProcessIntegrationTest.GpuProcess_only_one_workaround" is flaky. This issue was created automatically by the chromium-try-flakes app. Please find the right owner to fix the respective test/step and assign this issue to them. If the step/test is infrastructure-related, please add Infra-Troopers label and change issue status to Untriaged. When done, please remove the issue from Sheriff Bug Queue by removing the Sheriff-Chromium label. We have detected 3 recent flakes. List of all flakes can be found at https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyagsSBUZsYWtlIl9ncHVfdGVzdHMuZ3B1X3Byb2Nlc3NfaW50ZWdyYXRpb25fdGVzdC5HcHVQcm9jZXNzSW50ZWdyYXRpb25UZXN0LkdwdVByb2Nlc3Nfb25seV9vbmVfd29ya2Fyb3VuZAw. Flaky tests should be disabled within 30 minutes unless culprit CL is found and reverted. Please see more details here: https://sites.google.com/a/chromium.org/dev/developers/tree-sheriffs/sheriffing-bug-queues#triaging-auto-filed-flakiness-bugs
,
Mar 20 2017
The failure appears to be provoked because of this crash. It looks like there is a race condition between child process bringup and sending of Mojo messages to it. Not sure why this seems to only be flaky on Windows. Seems to have become flaky on March 6. On the other hand, that may have coincided with when these tests were switched to typ, which may have improved the precision of the error reporting. [3412:3540:0317/124244.441:ERROR:process_win.cc(140)] Unable to terminate process: Access is denied. (0x5) [3412:2892:0317/124244.753:FATAL:browser_child_process_host_impl.cc(284)] Check failed: child_process_->GetProcess().IsValid(). Requesting a child process handle before launch has completed OK. Backtrace: base::debug::StackTrace::StackTrace [0x664EC277+55] base::debug::StackTrace::StackTrace [0x664F12EA+10] content::BrowserChildProcessHostImpl::GetProcess [0x65D4B04A+314] content::GpuProcessHost::GetProcessHandles [0x65E2011F+159] content::RenderMessageFilter::OnHasGpuProcess [0x65F12435+69] ??$DispatchDelayReply@VRenderMessageFilter@content@@XP812@AEXPAVMessage@IPC@@@Z@?$MessageT@UChildProcessHostMsg_HasGpuProcess_Meta@@V?$tuple@$$V@std@@V?$tuple@_N@3@@IPC@@SA_NPBVMessage@1@PAVRenderMessageFilter@content@@PAXP834@AEXPAV21@@Z@Z [0x65F1138F+136] content::RenderMessageFilter::OnMessageReceived [0x65F12679+345] content::BrowserMessageFilter::Internal::DispatchMessageW [0x65CC5D91+33] content::BrowserMessageFilter::Internal::OnMessageReceived [0x65CC6017+183] IPC::MessageFilterRouter::TryFilters [0x66837C8F+103] IPC::MessageFilterRouter::TryFilters [0x66837C65+61] IPC::ChannelProxy::Context::TryFilters [0x66830D80+91] IPC::ChannelProxy::Context::OnMessageReceived [0x6683097E+14] IPC::ChannelMojo::OnMessageReceived [0x6682E598+152] IPC::internal::MessagePipeReader::Receive [0x66833940+336] IPC::mojom::ChannelStubDispatch::Accept [0x65ACD6EC+698] IPC::mojom::ChannelStub<mojo::RawPtrImplRefTraits<IPC::mojom::Channel> >::Accept [0x66833578+24] mojo::InterfaceEndpointClient::HandleValidatedMessage [0x6651DEE9+608] mojo::FilterChain::Accept [0x665209E6+118] mojo::InterfaceEndpointClient::HandleIncomingMessage [0x6651DC81+100] IPC::`anonymous namespace'::MojoBootstrapImpl::`scalar deleting destructor' [0x66834CF3+451] mojo::FilterChain::Accept [0x665209E6+118] mojo::Connector::ReadSingleMessage [0x6651F49C+153] mojo::Connector::ReadAllAvailableMessages [0x6651F2C0+57] mojo::Connector::OnHandleReadyInternal [0x6651F0BC+119] base::internal::Invoker<base::internal::BindState<void (__thiscall content::RedirectToFileResourceHandler::Writer::*)(int),base::internal::UnretainedWrapper<content::RedirectToFileResourceHandler::Writer> >,void __cdecl(int)>::Run [0x65E63BD1+17] base::internal::RunMixin<base::Callback<bool __cdecl(enum previews::PreviewsType),1,1> >::Run [0x66265D7C+32] mojo::SimpleWatcher::OnHandleReady [0x66523EC8+184] base::internal::InvokeHelper<1,void>::MakeItSo<void (__thiscall VersionHandler::*const &)(std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> > *,std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> > *),base [0x675130C6+43] base::internal::Invoker<base::internal::BindState<void (__thiscall mojo::SimpleWatcher::*)(int,unsigned int),base::WeakPtr<mojo::SimpleWatcher>,int,unsigned int>,void __cdecl(void)>::RunImpl<void (__thiscall mojo::SimpleWatcher::*const &)(int,unsigned int [0x670CD423+23] base::internal::Invoker<base::internal::BindState<void (__thiscall mojo::SimpleWatcher::*)(int,unsigned int),base::WeakPtr<mojo::SimpleWatcher>,int,unsigned int>,void __cdecl(void)>::Run [0x670CF296+22] base::debug::TaskAnnotator::RunTask [0x66505971+417] base::MessageLoop::RunTask [0x664BC351+1233] base::MessageLoop::DoWork [0x664BB4A5+741] base::MessagePumpForIO::DoRunLoop [0x665069AC+188] base::MessagePumpWin::Run [0x6650726A+74] base::MessageLoop::RunHandler [0x664BBE77+247] base::RunLoop::Run [0x664D90C4+132] base::Thread::Run [0x664B8CCD+173] content::BrowserThreadImpl::IOThreadRun [0x65D5887B+30] content::BrowserThreadImpl::Run [0x65D594D6+246] base::Thread::ThreadMain [0x664B976E+622] base::PlatformThread::Sleep [0x66488C62+290] BaseThreadInitThunk [0x751C337A+18] RtlInitializeExceptionChain [0x777392B2+99] RtlInitializeExceptionChain [0x77739285+54] This may be reproducible on Windows by building Release mode with the GN arg: dcheck_always_on=true and repeatedly running: python content/test/gpu/run_gpu_integration_test.py gpu_process --browser=release rockot@: could you or someone else on the Mojo team take a look at the above stack trace and see whether it might be caused by a bug or race condition in Mojo? This test does bring up and shut down the browser quickly and repeatedly. Could that be causing problems?
,
Mar 20 2017
Note: the error at the top: [3412:3540:0317/124244.441:ERROR:process_win.cc(140)] Unable to terminate process: Access is denied. (0x5) might be the cause of the following error.
,
Mar 21 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/57842131a06bc01168d581f14fb496bb63be216a commit 57842131a06bc01168d581f14fb496bb63be216a Author: kbr <kbr@chromium.org> Date: Tue Mar 21 01:51:36 2017 Mark GpuProcess_only_one_workaround flaky on Windows. BUG=700522 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel TBR=zmo@chromium.org NOTRY=true Review-Url: https://codereview.chromium.org/2762013002 Cr-Commit-Position: refs/heads/master@{#458273} [modify] https://crrev.com/57842131a06bc01168d581f14fb496bb63be216a/content/test/gpu/gpu_tests/gpu_process_expectations.py
,
Mar 21 2017
It doesn't look like a race in Mojo per se, it looks like a race between the expectations of RenderMessageFilter and the behavior of GpuProcessHost. I'll see if I can repro locally and sort this out.
,
Mar 21 2017
I can't repro this btw, but looking at the code, the IPC is clearly racy. RenderMessageFilter asks GpuProcessHost for a list of active GPU process handles, but it does no validation to ensure that the GpuProcessHosts it queries even have a valid process handle. Hence the DCHECK, which can be hit if any extant GpuProcessHost is currently launching a process. I would submit a trivial fix for this but it's not clear to me what the correct behavior should be: should GpuProcessHost::GetProcessHandles defer until no hosts are in a launching state? Or should it just make a best effort to return the process hosts which are currently (process_launched_ && !process_->IsStarting()) which would safely validate the conditions required to obtain a process handle.
,
Mar 21 2017
BTW looks like this IPC is from https://codereview.chromium.org/1547793004 (moved to RenderMessageFilter in https://codereview.chromium.org/1799713002) and it has always been incorrect. So maybe some subtle timing changes or test environment changes have just uncovered this raciness.
,
Mar 21 2017
Oh dear. Sorry rockot@ for making you investigate this. Let me take this from you. I'll do what you suggested and only send messages to the GPU processes which aren't currently launching.
,
Jun 26 2017
Is this still Pri-1?
,
Jul 6 2017
Not P1, but still needs to be fixed.
,
Oct 17
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by vasi...@chromium.org
, Mar 17 2017Owner: kbr@chromium.org