context_lost flaky on initialization
Reported by
dyen@chromium.org,
May 19 2016
|
|||||
Issue descriptionWe are getting some crashes upon initialization recently (last 2 days or so). Here is the backtrace (Although it looks wrong): https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/224347 https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/223414 https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/223446 (No symbol) [0x0F000000] (No symbol) [0x603D9717] gpu::gles2::FeatureInfo::InitializeFeatures [0x6401DEE0+5360] RtlInitUnicodeString [0x77BBE38C+356] RtlAllocateHeap [0x77BBE0F2+172] free [0x61526174+20] base::TimeTicks::Now [0x614DBF5F+255] piman@, it looks like you are the only one that has changed anything related to initialization recently, could you take a look?
,
May 19 2016
Thanks for the stack trace, I'll take a look.
,
May 19 2016
It would mean it crashes here: https://code.google.com/p/chromium/codesearch#chromium/src/gpu/command_buffer/service/feature_info.cc&q=feature_info.cc:736&sq=package:chromium&l=736 ?? glGenTextures(1, &tex_id); glGenFramebuffersEXT(1, &fb_id); glBindTexture(GL_TEXTURE_2D, tex_id); // Nearest filter needed for framebuffer completeness on some drivers. -> glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST); glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, width, width, 0, GL_RGBA, GL_FLOAT, NULL);
,
May 19 2016
Sounds like it is related to this then: https://bugs.chromium.org/p/chromium/issues/detail?id=612866
,
May 19 2016
Issue 612866 has been merged into this issue.
,
May 19 2016
,
May 24 2016
Antoine, will you be able to investigate this? It's still showing up flaky on the commit queue per https://bugs.chromium.org/p/chromium/issues/detail?id=608923#c26 .
,
May 24 2016
I have made very little progress so far. The bug is still impossible to explain, most likely some memory corruption, either corrupting our driver function table (or the GLApi vtable, but that is less likely). There are some likely related crashes on crash/. They only show up on Chrome OS, Android and Windows - but not Linux and Mac, which could indicate this is specific to our EGL use. This shows across different drivers/GPUs, so this is most likely an issue in our code. I found and fixed several potential memory corruption issues over the last couple of days, but nothing that seems to have an impact on this yet.
,
May 26 2016
Another symptom (not sure whether this was captured elsewhere) seems to be a crash in DoBindTexture: https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/228520/steps/context_lost_tests%20on%20NVIDIA%20GPU%20on%20Windows%20%28with%20patch%29%20on%20Windows-2008ServerR2-SP1/logs/stdio https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/228327/steps/context_lost_tests%20on%20ATI%20GPU%20on%20Windows%20%28with%20patch%29%20on%20Windows-2008ServerR2-SP1/logs/stdio [3392:3764:0525/193653:ERROR:command_buffer_proxy_impl.cc(236)] Failed to send GpuChannelMsg_CreateCommandBuffer. [3392:3764:0525/193653:ERROR:context_provider_command_buffer.cc(159)] GpuChannelHost failed to create command buffer. [3228:4032:0525/193653:INFO:CONSOLE(0)] "WebGL: CONTEXT_LOST_WEBGL: loseContext: context lost", source: http://127.0.0.1:53128/gpu_process_crash.html (0) [3228:4032:0525/193653:ERROR:gpu_process_transport_factory.cc(754)] Lost UI shared context. Backtrace: (No symbol) [0x54415453] (No symbol) [0x630533CB] gpu::gles2::GLES2DecoderImpl::DoBindTexture [0x64B25FFF+339] RtlFreeHeap [0x775EE023+126] HeapFree [0x759614AD+20] free [0x64172772+20] gpu::gles2::GLES2DecoderImpl::GenTexturesHelper [0x64B2F166+152] gpu::gles2::GLES2DecoderImpl::HandleBindTexture [0x64B31F91+89] gpu::gles2::GLES2DecoderImpl::DoCommandsImpl<0> [0x64B1E594+197] gpu::CommandParser::ProcessCommands [0x64B0D282+49] gpu::CommandExecutor::PutChanged [0x64B0DEFE+484] gpu::CommandBufferService::Flush [0x64B0D63A+30] gpu::GpuCommandBufferStub::OnAsyncFlush [0x64C15C9F+345] IPC::MessageT<GpuCommandBufferMsg_AsyncFlush_Meta,std::tuple<int,unsigned int,std::vector<ui::LatencyInfo,std::allocator<ui::LatencyInfo> > >,void>::Dispatch<gpu::GpuCommandBufferStub,gpu::GpuCommandBufferStub,void,void (__thiscall gpu::GpuCommandBufferSt [0x64C14007+142] gpu::GpuCommandBufferStub::OnMessageReceived [0x64C164DE+485] gpu::GpuChannel::HandleMessageHelper [0x64C0FFF2+44] gpu::GpuChannel::HandleMessage [0x64C0FF33+346] base::internal::InvokeHelper<1,void,base::internal::RunnableAdapter<void (__thiscall content::WebMediaPlayerMS::*)(scoped_refptr<media::VideoFrame> const &)> >::MakeItSo<base::WeakPtr<content::WebMediaPlayerMS>,scoped_refptr<media::VideoFrame> const &> [0x65806A5C+48] base::internal::Invoker<base::IndexSequence<0,1>,base::internal::BindState<base::internal::RunnableAdapter<void (__thiscall content::WebBluetoothServiceImpl::*)(mojo::Callback<void __cdecl(enum blink::mojom::WebBluetoothError)> const &)>,void __cdecl(cont [0x64C11D1C+45] base::debug::TaskAnnotator::RunTask [0x66DEB0F7+247] base::MessageLoop::RunTask [0x66D77CFB+1211] base::MessageLoop::DoWork [0x66D76DA5+549] base::MessagePumpForUI::DoRunLoop [0x66DBDFCA+90] base::MessagePumpWin::Run [0x66DBE8AA+74] base::MessageLoop::RunHandler [0x66D77837+103] base::RunLoop::Run [0x66DE3159+41] base::MessageLoop::Run [0x66D777C2+98] content::GpuMain [0x66A12131+1759] content::RunNamedProcessTypeMain [0x66D5019D+176] content::ContentMainRunnerImpl::Run [0x66D500BC+274] content::ContentMain [0x66D4F4D2+35] ChromeMain [0x6416A71C+108] MainDllLoader::Launch [0x011BEA73+488] wWinMain [0x011BD94D+450] __scrt_common_main_seh [0x01A3BF3B+253] (f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:255) BaseThreadInitThunk [0x7596336A+18] RtlInitializeExceptionChain [0x775F92B2+99] RtlInitializeExceptionChain [0x775F9285+54] (full stdout attached) Looks like this may be happening immediately after the GPU process relaunches after it was terminated via about:gpucrash.
,
May 26 2016
Yes, we noticed that in bug 612866 That's why I suspect corruption of our driver function table, but I haven't been able to repro this yet when adding instrumentation to catch the condition.
,
Jun 17 2016
Ping. Should this be Pri-1? If so please assign a milestone.
,
Jun 21 2016
I'm not seeing this flake any more on https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng?numbuilds=200 and would suggest closing it as WontFix (not reproducible).
,
Jun 22 2016
Thanks Ken. I think some of the unrelated fixes may have magically removed this. ¯\_(ツ)_/¯ |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by dyen@chromium.org
, May 19 2016