LocalNTPJavascriptTest.SimpleJavascriptTests is flaky |
||||||||
Issue descriptionSee flakiness dashboard: https://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=LocalNTPJavascriptTest It's been going on for a few weeks at least, unfortunately the dashboard doesn't go further back than that. Logs look like this: [ RUN ] LocalNTPJavascriptTest.SimpleJavascriptTests [6148:3772:1116/210646.316:WARNING:chrome_browser_main_win.cc(613)] Command line too long for RegisterApplicationRestart: --brave-new-test-launcher --cfi-diag=0 --gtest_also_run_disabled_tests --gtest_filter=LocalNTPJavascriptTest.SimpleJavascriptTests --single_process --test-launcher-bot-mode --test-launcher-output="C:\Users\CHROME~2\AppData\Local\Temp\scoped_dir5432_27524\results5432_9566\test_results.xml" --test-launcher-summary-output="e:\b\swarm_slave\w\iogxwc3c\output.json" --user-data-dir="C:\Users\CHROME~2\AppData\Local\Temp\scoped_dir5432_27524\d5432_14443" --disable-offline-auto-reload --no-first-run --no-default-browser-check --enable-logging=stderr --safebrowsing-disable-auto-update --disable-default-apps --wm-window-animations-disabled --disable-component-update --test-type=browser --force-color-profile=srgb --disable-zero-browsers-open-for-tests --ipc-connection-timeout=45 --allow-file-access-from-files --dom-automation --log-gpu-control-list-decisions --disable-backgrounding-occluded-windows --disable-gl-drawing-for-tests --override-use-software-gl-for-tests --force-color-profile=srgb --enable-features=TestFeatureForBrowserTest1,UseGoogleLocalNtp --disable-features=NetworkPrediction,TestFeatureForBrowserTest2 --flag-switches-begin --flag-switches-end --restore-last-session about:blank [4740:7056:1116/210647.576:INFO:media_foundation_video_encode_accelerator_win.cc(370)] Windows versions earlier than 8 are not supported. [6148:1096:1116/210651.589:WARNING:embedded_test_server.cc(228)] Request not handled. Returning 404: /favicon.ico [6148:3772:1116/210652.047:WARNING:render_frame_host_impl.cc(2682)] OnDidStopLoading was called twice. [6148:3772:1116/210652.095:FATAL:web_contents_observer_sanity_checker.cc(297)] Check failed: !is_loading_. Backtrace: base::debug::StackTrace::StackTrace [0x100BCED6+102] base::debug::StackTrace::StackTrace [0x100BBA73+35] logging::LogMessage::~LogMessage [0x10132665+149] content::WebContentsObserverSanityChecker::DidStartLoading [0x05BF4E1F+191] content::WebContentsImpl::LoadingStateChanged [0x1DDD022C+1340] content::WebContentsImpl::DidStartLoading [0x1DDDBC37+103] content::FrameTreeNode::DidStartLoading [0x1CD8814A+138] content::RenderFrameHostImpl::DidCommitProvisionalLoad [0x1CE3FA62+3474] content::mojom::FrameHostStubDispatch::Accept [0x1B23362D+1373] content::mojom::FrameHostStub<mojo::RawPtrImplRefTraits<content::mojom::FrameHost> >::Accept [0x1CE77219+89] mojo::InterfaceEndpointClient::HandleValidatedMessage [0x0F3C181D+1917] mojo::InterfaceEndpointClient::HandleIncomingMessageThunk::Accept [0x0F3C1080+32] mojo::FilterChain::Accept [0x0F3BF029+489] mojo::InterfaceEndpointClient::HandleIncomingMessage [0x0F3C4A94+260] std::_Compressed_pair<std::default_delete<mojo::Connector>,mojo::Connector *,1>::_Get_first [0x161F11B2+2690] std::_Default_allocator_traits<std::allocator<mojo::Message> >::destroy<mojo::Message> [0x161E0905+1349] std::_Default_allocator_traits<std::allocator<mojo::Message> >::destroy<mojo::Message> [0x161E07A1+993] std::_Default_allocator_traits<std::allocator<mojo::Message> >::destroy<mojo::Message> [0x161E06DC+796] std::_Default_allocator_traits<std::allocator<mojo::Message> >::destroy<mojo::Message> [0x161E056E+430] base::OnceCallback<void __cdecl(void)>::Run [0x10043C7E+78] base::debug::TaskAnnotator::RunTask [0x100C1925+1013] base::internal::IncomingTaskQueue::RunTask [0x1016DB4D+253] base::MessageLoop::RunTask [0x10178CAD+1005] base::MessageLoop::DeferOrRunPendingTask [0x10179791+81] base::MessageLoop::DoWork [0x10179A66+454] base::MessagePumpForUI::DoRunLoop [0x101863CE+62] base::MessagePumpWin::Run [0x10185581+193] base::MessageLoop::Run [0x1017837F+559] base::RunLoop::Run [0x1026E379+537] content::RunThisRunLoop [0x05A988B5+53] content::MessageLoopRunner::Run [0x05A99FF1+257] content::DOMMessageQueue::WaitForMessage [0x05AABDC1+353] content::ExecuteScriptAndExtractDouble [0x05AA8E9C+924] content::ExecuteScriptAndExtractBool [0x05AA9660+288] instant_test_utils::GetBoolFromJS [0x01A2DBA5+85] LocalNTPJavascriptTest_SimpleJavascriptTests_Test::RunTestOnMainThread [0x01A3C2DB+651] content::BrowserTestBase::ProxyRunTestOnMainThreadLoop [0x05AE1AFB+539] ??$Invoke@PAVBrowserTestBase@content@@$$V@?$FunctorTraits@P8BrowserTestBase@content@@AEXXZX@internal@base@@SAXP8BrowserTestBase@content@@AEXXZ$$QAPAV34@@Z [0x05AE4C9C+28] base::internal::InvokeHelper<0,void>::MakeItSo<void (__thiscall content::BrowserTestBase::*const &)(void),content::BrowserTestBase *> [0x05AE4BCD+77] base::internal::Invoker<base::internal::BindState<void (__thiscall content::BrowserTestBase::*)(void),base::internal::UnretainedWrapper<content::BrowserTestBase> >,void __cdecl(void)>::RunImpl<void (__thiscall content::BrowserTestBase::*const &)(void),std [0x05AE4B53+83] base::internal::Invoker<base::internal::BindState<void (__thiscall content::BrowserTestBase::*)(void),base::internal::UnretainedWrapper<content::BrowserTestBase> >,void __cdecl(void)>::Run [0x05AE4A2E+62] base::RepeatingCallback<void __cdecl(void)>::Run [0x00A20F92+50] ChromeBrowserMainParts::PreMainMessageLoopRunImpl [0x06267620+7664] ChromeBrowserMainParts::PreMainMessageLoopRun [0x06265723+419] content::BrowserMainLoop::PreMainMessageLoopRun [0x1C763848+360] ??$Invoke@PAVBrowserMainLoop@content@@$$V@?$FunctorTraits@P8BrowserMainLoop@content@@AEHXZX@internal@base@@SAHP8BrowserMainLoop@content@@AEHXZ$$QAPAV34@@Z [0x1C774A2C+28] base::internal::InvokeHelper<0,int>::MakeItSo<int (__thiscall content::BrowserMainLoop::*const &)(void),content::BrowserMainLoop *> [0x1C77494D+77] base::internal::Invoker<base::internal::BindState<int (__thiscall content::BrowserMainLoop::*)(void),base::internal::UnretainedWrapper<content::BrowserMainLoop> >,int __cdecl(void)>::RunImpl<int (__thiscall content::BrowserMainLoop::*const &)(void),std::t [0x1C7748D3+83] base::internal::Invoker<base::internal::BindState<int (__thiscall content::BrowserMainLoop::*)(void),base::internal::UnretainedWrapper<content::BrowserMainLoop> >,int __cdecl(void)>::Run [0x1C77474E+62] base::RepeatingCallback<int __cdecl(void)>::Run [0x1DCE1CE2+50] content::StartupTaskRunner::RunAllTasksNow [0x1DCE1A28+136] content::BrowserMainLoop::CreateStartupTasks [0x1C75FB6D+1165] content::BrowserMainRunnerImpl::Initialize [0x1C77FC48+1560] content::BrowserMain [0x1C758586+182] content::RunNamedProcessTypeMain [0x1FAB0E71+209] content::ContentMainRunnerImpl::Run [0x1FAB1EB2+658] content::ContentServiceManagerMainDelegate::RunEmbedderProcess [0x1FAAD701+33] service_manager::Main [0x22E572FE+1534] content::ContentMain [0x1FAB0CEA+74] content::BrowserTestBase::SetUp [0x05AE120A+2970] InProcessBrowserTest::SetUp [0x0566CB15+1557] LocalNTPTest::SetUp [0x01A55BF4+52] [102/325] LocalNTPJavascriptTest.SimpleJavascriptTests (CRASHED) So apparently RenderFrameHostImpl::OnDidStopLoading somehow gets called twice, which makes WebContentsObserverSanityChecker unhappy. I don't know how that would be caused by something in that test...
,
Nov 22 2017
Zine triage: if this is really a P1, it needs an owner! Assigning to treib, please re-assign or re-prioritize if needed.
,
Nov 23 2017
This appears to be a problem in the Blink loading code, see bug 466089. I haven't been able to figure out what exactly in this test triggers it though, given that the very similar LocalNTPVoiceJavascriptTest.* tests don't hit it.
,
Nov 23 2017
,
Nov 23 2017
So I've started investigating. This happens when committing a navigation to about:blank. The browser is only notified of load start when it commits, so we update the loading state at that time. However, it seems we missed updating WCO of a previous load stop. Still looking in why we dropped it.
,
Nov 23 2017
Huh, interesting. This particular test never navigates to about:blank though, so I guess there's another trigger? Or can a navigation to about:blank happen automatically somehow, while opening a tab or something?
,
Nov 23 2017
On a side note, this test appears to be doing very weird things. For example, I get between 0 and 2 requests to commit chrome-search://most-visited/single.html?removeTooltip=Don%27t%20show%20on%20this%20page and the test succeeds nonetheless. This is quite unexpected...
,
Nov 24 2017
What that test does is basically: - Dynamically (in JS) create an iframe with that URL. - Test something unrelated. - Tear it all down again. Presumably that may or may not create a navigation, depending on the exact timing? Anyway, this does qualify as weird, but I think it's all valid. I guess it's a good stress test for WebContentsObserverSanityChecker ;)
,
Nov 24 2017
Ok so I think I have the issue figured out. This test is doing a lot of racy operations. In particular, there's a race between creating navigating an iframe to chrome-search://most-visited/single.html?removeTooltip=Don%27t%20show%20on%20this%20page and deleting this iframe. The navigation will never commit and the load should stop. Depending on the ordering of response ready, load stop and iframe deletion we get the following: 1) The iframe is deleted before the response is ready -> ok (RFH wasn't loading) 2) The response is ready, a RFH is asked to commit it, the commit fails and the load stops. Then the iframe is deleted -> ok (RFH isn't loading when deleted). 3) The response is ready, and a RFH is asked to commit it. Before the load stops, the iframe is destroyed. -> This is the problematic case as the RFH is still marked as loading. It appears that we do not update the loading state of the FrameTree when a FrameTreeNode is removed from it. In our case, this is the only loading FTN, so we do not update WebContents to tell it we stopped loading. I thought we did: I wonder if it wasn't removed because it was causing issues in some WebContents destruction sequences. I'm going to try to reproduce the issue in a test in a more reliable way to ensure we fix it.
,
Nov 24 2017
Awesome, thanks for the thorough investigation!
,
Nov 28 2017
I have a more reliable test for the issue at https://chromium-review.googlesource.com/c/chromium/src/+/793730. I'm still looking at a fix.
,
Dec 14 2017
Triage ping: The last 5 days look pretty green. Has that improved and have the remaining crashes (no failures) a different reason?
,
Dec 14 2017
Er.. they're not green at all, I see at least 10 flakes from today alone. It's recorded as a crash rather than failure because WebContentsObserverSanityChecker CHECKs (rather than using EXPECT or ASSERT). clamy, any updates on https://crrev.com/c/793730?
,
Dec 14 2017
I haven't had time to make progress on this precise issue. Given that there are several related bugs in the area, we're looking at a bigger refactoring of the is_loading area. However, right now we are dealing with several release-blockers for M63 which have higher priority.
,
May 15 2018
+kmilka: The new flakiness [1] looks very similar to this. [1] https://findit-for-me.appspot.com/waterfall/flake/flake-culprit?key=ag9zfmZpbmRpdC1mb3ItbWVyQwsSDEZsYWtlQ3VscHJpdCIxY2hyb21pdW0vMjgwMzI4ODRkN2M2Mjk2OTA4YmU1OGE1YjRlOWQxZjZhYTk0YzdkZAw
,
May 15 2018
,
Aug 20
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by treib@chromium.org
, Nov 17 2017