MetricsServiceBrowserTest.OOMRenderers failing on clang win 32-bit dbg testers |
|||
Issue descriptionBoth on pinned on trunk, so not a roll blocker. Started in one of these two builds: https://build.chromium.org/p/chromium.fyi/builders/CrWinClang%28dbg%29%20tester/builds/9113 https://build.chromium.org/p/chromium.fyi/builders/CrWinClang%28dbg%29%20tester/builds/9114 Started here on tot: https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin%28dbg%29%20tester/builds/3837 Stack looks like things mostly work like they should; not sure why the test fails: MetricsServiceBrowserTest.OOMRenderers (run #1): [ RUN ] MetricsServiceBrowserTest.OOMRenderers [3816:4672:0214/165637.825:INFO:media_foundation_video_encode_accelerator_win.cc(329)] Windows versions earlier than 8 are not supported. [3816:4672:0214/165637.826:INFO:media_foundation_video_encode_accelerator_win.cc(329)] Windows versions earlier than 8 are not supported. [6752:4376:0214/165643.461:ERROR:render_frame_impl.cc(546)] Intentionally exhausting renderer memory because user navigated to chrome://memory-exhaust/ Backtrace: RaiseException [0x74DFC42D+88] base::TerminateBecauseOutOfMemory [0x10260D56+118] callnewh [0x6C778156+54] calloc_base [0x6C774E88+3064] malloc_dbg [0x6C77763A+26] malloc [0x6C777F94+20] std::_Vector_val<std::_Simple_types<blink::WebFileChooserCompletion::SelectedFileInfo> >::~_Vector_val<std::_Simple_types<blink::WebFileChooserCompletion::SelectedFileInfo> > [0x1804C3AD+189] std::unique_ptr<content::MediaStreamRendererFactory,std::default_delete<content::MediaStreamRendererFactory> >::unique_ptr<content::MediaStreamRendererFactory,std::default_delete<content::MediaStreamRendererFactory> > [0x1803A1BB+3275] content::RenderFrameImpl::PrepareRenderViewForNavigation [0x18033246+342] content::RenderFrameImpl::NavigateInternal [0x18009F13+243] content::RenderFrameImpl::OnNavigate [0x17FF6675+757] base::DispatchToMethodImpl<content::RenderFrameImpl *,void (__thiscall content::RenderFrameImpl::*)(content::CommonNavigationParams const &,content::StartNavigationParams const &,content::RequestNavigationParams const &),std::tuple<content::CommonNavigati [0x180678DC+140] base::DispatchToMethod<content::RenderFrameImpl *,void (__thiscall content::RenderFrameImpl::*)(content::CommonNavigationParams const &,content::StartNavigationParams const &,content::RequestNavigationParams const &),std::tuple<content::CommonNavigationPa [0x18067818+120] IPC::DispatchToMethod<content::RenderFrameImpl,void (__thiscall content::RenderFrameImpl::*)(content::CommonNavigationParams const &,content::StartNavigationParams const &,content::RequestNavigationParams const &),void,std::tuple<content::CommonNavigation [0x1806776B+107] IPC::MessageT<FrameMsg_Navigate_Meta,std::tuple<content::CommonNavigationParams,content::StartNavigationParams,content::RequestNavigationParams>,void>::Dispatch<content::RenderFrameImpl,content::RenderFrameImpl,void,void (__thiscall content::RenderFrameIm [0x17FF627F+527] content::RenderFrameImpl::OnMessageReceived [0x17FF07A6+3238] IPC::MessageRouter::RouteMessage [0x0F0ACEAB+91] content::ChildThreadImpl::ChildThreadMessageRouter::RouteMessage [0x146E4AC9+41] IPC::MessageRouter::OnMessageReceived [0x0F0ACE15+101] content::ChildThreadImpl::OnMessageReceived [0x146ECDB3+2675] IPC::ChannelProxy::Context::OnDispatchMessage [0x0F025D84+148] base::internal::FunctorTraits<void (__thiscall IPC::ChannelProxy::Context::*)(IPC::Message const &),void>::Invoke<scoped_refptr<IPC::ChannelProxy::Context> const &,IPC::Message const &> [0x0F02E1D3+83] base::internal::InvokeHelper<0,void>::MakeItSo<void (__thiscall IPC::ChannelProxy::Context::*const &)(IPC::Message const &),scoped_refptr<IPC::ChannelProxy::Context> const &,IPC::Message const &> [0x0F02E040+128] base::internal::Invoker<base::internal::BindState<void (__thiscall IPC::ChannelProxy::Context::*)(IPC::Message const &),scoped_refptr<IPC::ChannelProxy::Context>,IPC::Message>,void __cdecl(void)>::RunImpl<void (__thiscall IPC::ChannelProxy::Context::*cons [0x0F02DF8F+111] base::internal::Invoker<base::internal::BindState<void (__thiscall IPC::ChannelProxy::Context::*)(IPC::Message const &),scoped_refptr<IPC::ChannelProxy::Context>,IPC::Message>,void __cdecl(void)>::Run [0x0F02DE2D+61] base::internal::RunMixin<base::Callback<void __cdecl(void),0,0> >::Run [0x100C30E2+82] base::debug::TaskAnnotator::RunTask [0x100C28D5+885] blink::scheduler::TaskQueueManager::ProcessTaskFromWorkQueue [0x1D90AC6A+2506] blink::scheduler::TaskQueueManager::DoWork [0x1D905532+2018] base::internal::FunctorTraits<void (__thiscall blink::scheduler::TaskQueueManager::*)(bool),void>::Invoke<base::WeakPtr<blink::scheduler::TaskQueueManager> const &,bool const &> [0x1D91D8EC+92] base::internal::InvokeHelper<1,void>::MakeItSo<void (__thiscall blink::scheduler::TaskQueueManager::*const &)(bool),base::WeakPtr<blink::scheduler::TaskQueueManager> const &,bool const &> [0x1D91D754+148] base::internal::Invoker<base::internal::BindState<void (__thiscall blink::scheduler::TaskQueueManager::*)(bool),base::WeakPtr<blink::scheduler::TaskQueueManager>,bool>,void __cdecl(void)>::RunImpl<void (__thiscall blink::scheduler::TaskQueueManager::*cons [0x1D91D68F+111] base::internal::Invoker<base::internal::BindState<void (__thiscall blink::scheduler::TaskQueueManager::*)(bool),base::WeakPtr<blink::scheduler::TaskQueueManager>,bool>,void __cdecl(void)>::Run [0x1D91D52D+61] base::internal::RunMixin<base::Callback<void __cdecl(void),0,0> >::Run [0x100C30E2+82] base::debug::TaskAnnotator::RunTask [0x100C28D5+885] base::MessageLoop::RunTask [0x1016E3B8+1192] base::MessageLoop::DeferOrRunPendingTask [0x1016ED00+64] base::MessageLoop::DoWork [0x1016F47A+362] base::MessagePumpDefault::Run [0x1018361F+255] base::MessageLoop::RunHandler [0x1016DCFE+622] base::RunLoop::Run [0x102798BE+270] content::RendererMain [0x18184E71+1505] content::RunNamedProcessTypeMain [0x18882038+216] content::ContentMainRunnerImpl::Run [0x18883B0C+860] content::ContentMain [0x1888199F+127] [5436:2268:0214/165706.368:ERROR:scoped_com_initializer.h(58)] Multiple CoInitialize() calls for thread 2268 (only on 32-bit dbg too) On the tot bot, for some reason the tester doesn't list a bunch of revisions, but on it too the first failing run is the first that contains https://codereview.chromium.org/2648423006/ Weird that it only fails in 32-bit dbg and passes everywhere else!
,
Feb 15 2017
I attempted to repro this, but I did a release build. :( Trying again without optimizations.
,
Feb 15 2017
keyword browser_tests
,
Feb 15 2017
(sorry meant to untick "send email" -- that was just so that this bug shows up when I search crbug for "browser_tests")
,
Feb 15 2017
This passes for me locally. Also, that stack trace looks correct for a renderer. The renderer is supposed to OOM in this test. Maybe I need to enable components... This is what I get for not copy pasting the gn args.
,
Feb 15 2017
Debug defaults to components, so unless you turn that off you have that already. 32-bit needs an explicit arg though, maybe you're missing that?
,
Mar 13 2017
I synced and confirmed it still doesn't reproduce locally. Maybe it's a Windows 7 only problem? My workstation is win10.
,
Mar 14 2017
I attempted to follow the instructions at https://www.chromium.org/developers/testing/isolated-testing/for-swes for running my locally built binary on some Win7 machine in the glorious cloud, but it did not work: C:\src\chromium\src>python tools\mb\mb.py isolate //out/clang browser_tests python tools\swarming_client\isolate.py check -i out\clang\browser_tests.isolate -s out\clang\browser_tests.isolated Failed to find an input file: Input file C:\src\chromium\src\out\clang\ ^--- doesn't exist -> returned 1 Is there a good way to trigger try-jobs in this specific configuration?
,
Mar 14 2017
Did you run step 2 at https://www.chromium.org/developers/testing/isolated-testing/for-swes#TOC-Run-a-test-built-locally-on-Swarming ? That's what creates the .isolated
,
Mar 14 2017
I failed at step 1. However, I think the problem is simpler than that. My Chromium checkout was dirty, so I was synced to some CL from Feb 15, which is probably before Will's change. =/ I kept doing 'git pull && glient sync' and going off to do something else, and the spew from gclient eclipsed the output of git. Let's try again...
,
Mar 14 2017
I synced, rebuilt, and OOMRenderers still passes locally. Is there a trybot I can use to attempt to repro with more logging? Just knowing the exit code of the child renderer when the test fails on the bot would help a lot.
,
Mar 14 2017
I don't think we have clang trybots that run tests. Maybe try the swarming thing again? If that doesn't work, my work box is still on Win 7 I think, so I can give it a try tomorrow. (Snowed in at home today.)
,
Mar 14 2017
I did try the swarming thing again, and it's working, probably because I synced.
,
Mar 14 2017
I got the test running on swarming, where it appears to time out:
[4204:4944:0314/105013.962:ERROR:render_frame_impl.cc(549)] Intentionally exhausting renderer memory because user navigated to chrome://memory-exhaust/
Backtrace:
RaiseException [0x76CDC54F+88]
base::TerminateBecauseOutOfMemory [0x10262DD8+120]
callnewh [0x742D8156+54]
calloc_base [0x742D4E88+3064]
malloc_dbg [0x742D763A+26]
malloc [0x742D7F94+20]
std::_Vector_val<std::_Simple_types<blink::WebFileChooserCompletion::SelectedFileInfo> >::~_Vector_val<std::_Simple_types<blink::WebFileChooserCompletion::SelectedFileInfo> > [0x186D7CDD+189]
std::unique_ptr<content::MediaStreamRendererFactory,std::default_delete<content::MediaStreamRendererFactory> >::unique_ptr<content::MediaStreamRendererFactory,std::default_delete<content::MediaStreamRendererFactory> > [0x186C544B+3323]
content::RenderFrameImpl::PrepareRenderViewForNavigation [0x186BF3D4+340]
...
content::RunNamedProcessTypeMain [0x18F1A502+210]
content::ContentMainRunnerImpl::Run [0x18F1B96C+860]
content::ContentMain [0x18F19EBF+127]
[3728:3960:0314/105039.827:ERROR:scoped_com_initializer.h(58)] Multiple CoInitialize() calls for thread 3960
[4/4] MetricsServiceBrowserTest.OOMRenderers (TIMED OUT)
1 test timed out:
MetricsServiceBrowserTest.OOMRenderers (../../chrome/browser/metrics/metrics_service_browsertest.cc:197)
,
May 8 2017
What's the status here?
,
May 8 2017
Looks like the test is passing now, but I didn't do anything, and nothing has changed in the test since February.
,
May 8 2017
It's still failing on https://build.chromium.org/p/chromium.fyi/builders/CrWinClang(dbg)%20tester as far as I can tell.
,
May 9 2017
It repros locally (times out) on my Windows 7 box.
,
May 9 2017
Re: #18 clang only, or MSVC as well?
,
May 9 2017
It seems we never return from ui_test_utils::NavigateToURL(browser(), GURL(crashy_url)); TestNavigationObserver::OnDidStopLoading() never gets called for the crashing frame (OnDidStartLoading does fire).
,
May 10 2017
Tracing through and learning a little about how this is supposed to work, one of the things that should fire when a renderer process dies is MessagePipeReader::OnPipeError, and that doesn't fire for this test in the debug build. So it seems the renderer actually doesn't die, despite printing that stack. procexp also shows it as still around. (Seems kind of obvious in hindsight actually.)
,
May 10 2017
If I comment out the ::RaiseException part so we just hit _exit, the test passes:
NOINLINE int OnNoMemory(size_t size) {
// Kill the process. This is important for security since most of code
// does not check the result of memory allocation.
// https://msdn.microsoft.com/en-us/library/het71c37.aspx
// Pass the size of the failed request in an exception argument.
#if 0
ULONG_PTR exception_args[] = {size};
::RaiseException(win::kOomExceptionCode, EXCEPTION_NONCONTINUABLE,
arraysize(exception_args), exception_args);
#endif
// Safety check, make sure process exits here.
_exit(win::kOomExceptionCode);
return 0;
}
What could we be doing to make RaiseException fail?
,
May 10 2017
Does this repro in a small program where you a) install this OnNoMemory handler b) call RaiseException in main() to trigger this handler ?
,
May 10 2017
Well I suppose raising it works, because we see the backtrace. That's printed by this one:
// Prints the exception call stack.
// This is the unit tests exception filter.
long WINAPI StackDumpExceptionFilter(EXCEPTION_POINTERS* info) {
debug::StackTrace(info).Print();
if (g_previous_filter)
return g_previous_filter(info);
return EXCEPTION_CONTINUE_SEARCH;
}
But we never return from the "debug::StackTrace(info).Print();" call..
,
May 10 2017
"debug::StackTrace(info).Print();" takes us to SymbolContext::OutputTraceToStream() which then hangs when calling SymFromAddr on the frame before content::ContentMain.
,
May 10 2017
base_unittest's StackTraceTest.OutputToStream passes (if I enable it), so this isn't completely broken. It also doesn't seem related to e.g. printing many stack frames; even if I skip the top 30 frames in OutputTraceToStream() it still hangs in the same place for the same frame. It looks like it's stuck here: dbghelp!EnumSC<SC2>::get+0x19 dbghelp!SymCachePdb::getContrib+0x3e dbghelp!CModSymsByAddrTrav::FInit+0xd1 dbghelp!CBlockByAddrTrav::next+0x12 dbghelp!CLabelByAddrTrav::find+0x115 dbghelp!CAllSymsByAddrTrav::getEnclosingSymbol+0x1db dbghelp!CAllSymsByAddrTrav::findNextAddress+0x204 dbghelp!CAllSymsByAddrTrav::init+0xfb dbghelp!CAllSymsByAddrTrav::FInit+0x43 dbghelp!CDiaSession::findSymbolByAddr+0x1c2 dbghelp!CDiaSession::findSymbolByRVA+0x59 dbghelp!CDiaSession::findSymbolByRVAEx+0x17 dbghelp!diaGetSymbol+0x7d dbghelp!diaGetSymFromAddr+0x8b dbghelp!GetSymFromAddr+0x42 dbghelp!SympGetSymFromAddr+0x5b dbghelp!SymFromInlineContext+0x2a dbghelp!SymFromAddr+0x1e base!base::debug::StackTrace::OutputToStream+0x3cc base!base::debug::StackTrace::OutputToStream+0x157
,
May 11 2017
I've learned the test is flakier than I thought. If I build with enable_nacl=false, the test passes. As we've seen, it passes in all bot configs except 32-bit debug. Reid couldn't repro on his Windows 10. I think dbghelp is sensitive to the environment. I think what's happening is that the OOM code (ExhaustMemory in render_frame_impl.cc), somehow prevents dbghelp's SymFromAddr from doing its business, perhaps by using up too much heap or address space or something. If I change the line in ExhaustMemory from malloc'ing 0x10000000 bytes to 0x1000000, the test passes. Actually, the stack still doesn't look right -- now SymFromAddr returns an error for some addresses, but at least it doesn't hang. I also tried using larger allocations instead, but then SymFromAddr still hangs ¯\_(ツ)_/¯ I'll try a CL with the smaller malloc amount and see what the bots think.
,
May 11 2017
,
May 12 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/7cf2f122818ad39afafea382ecf2a6cee9d350fd commit 7cf2f122818ad39afafea382ecf2a6cee9d350fd Author: hans <hans@chromium.org> Date: Fri May 12 00:18:56 2017 Allocate memory in smaller chunks for chrome://memory-exhaust/ The memory-exhaustion loop could cause DbgHelp's SymFromAddr() to hang in certain build configs (32-bit Windows Clang debug builds), causing MetricsServiceBrowserTest.OOMRenderers to hang while generating the backtrace after the OOM exception. Reducing the allocation size seems to help. BUG= 692564 TBR=jochen Review-Url: https://codereview.chromium.org/2882513003 Cr-Commit-Position: refs/heads/master@{#471138} [modify] https://crrev.com/7cf2f122818ad39afafea382ecf2a6cee9d350fd/content/renderer/render_frame_impl.cc
,
May 12 2017
The test now passes on the 32-bit debug bot: https://build.chromium.org/p/chromium.fyi/builders/CrWinClang%28dbg%29%20tester/builds/10388 But started failing on the 32-bit shared build bot: https://build.chromium.org/p/chromium.fyi/builders/CrWinClang%28shared%29%20tester/builds/10250 :-( I'll see if we can disable the stack trace during this test since DbgHelp is being unreliable in low-memory situations.
,
May 12 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c3c52671bcb5680b267c80ffa951a61363aad5ca commit c3c52671bcb5680b267c80ffa951a61363aad5ca Author: hans <hans@chromium.org> Date: Fri May 12 16:41:13 2017 Revert of Allocate memory in smaller chunks for chrome://memory-exhaust/ (patchset #1 id:1 of https://codereview.chromium.org/2882513003/ ) Reason for revert: This fixed the test in one build configuration but made it hang in another one instead (see bug). I'll come up with a less magic and more effective fix. Original issue's description: > Allocate memory in smaller chunks for chrome://memory-exhaust/ > > The memory-exhaustion loop could cause DbgHelp's SymFromAddr() to hang > in certain build configs (32-bit Windows Clang debug builds), causing > MetricsServiceBrowserTest.OOMRenderers to hang while generating the > backtrace after the OOM exception. Reducing the allocation size seems to > help. > > BUG= 692564 > TBR=jochen > > Review-Url: https://codereview.chromium.org/2882513003 > Cr-Commit-Position: refs/heads/master@{#471138} > Committed: https://chromium.googlesource.com/chromium/src/+/7cf2f122818ad39afafea382ecf2a6cee9d350fd TBR=thakis@chromium.org,jochen@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG= 692564 Review-Url: https://codereview.chromium.org/2881793002 Cr-Commit-Position: refs/heads/master@{#471331} [modify] https://crrev.com/c3c52671bcb5680b267c80ffa951a61363aad5ca/content/renderer/render_frame_impl.cc
,
May 12 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/59665042f159cb7d4deb3b26e34340461d9d1e65 commit 59665042f159cb7d4deb3b26e34340461d9d1e65 Author: hans <hans@chromium.org> Date: Fri May 12 18:56:10 2017 Disable stack trace during MetricsServiceBrowserTest.OOMRenderers DbgHelp is unreliable in low-memory sistuations. After intentionally exhausting the heap, the test would hang in DbgHelp's SymFromAddr(). BUG= 692564 Review-Url: https://codereview.chromium.org/2879793003 Cr-Commit-Position: refs/heads/master@{#471392} [modify] https://crrev.com/59665042f159cb7d4deb3b26e34340461d9d1e65/chrome/browser/metrics/metrics_service_browsertest.cc
,
May 15 2017
|
|||
►
Sign in to add a comment |
|||
Comment 1 by thakis@chromium.org
, Feb 15 2017