New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 757896 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner: ----
Closed: Sep 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 2
Type: Bug



Sign in to add a comment

Regression in TwoClientSearchEnginesSyncTest.SyncDefault_E2ETest

Project Member Reported by wylieb@chromium.org, Aug 22 2017

Issue description

Some of the output:
[12898:12898:0821/204327.873711:INFO:CONSOLE(7)] "SW registered", source: https://www.google.com/_/chrome/newtab?espv=2&ie=UTF-8 (7)
[12898:12898:0821/204331.096780:ERROR:navigation_entry_screenshot_manager.cc(135)] Invalid entry with unique id: 13
[12898:12898:0821/204331.100076:INFO:CONSOLE(1)] "Not allowed to load local resource: chrome-search://local-ntp/theme.css", source: https://www.google.com/_/chrome/newtab?espv=2&ie=UTF-8 (1)
[12898:12898:0821/204331.280491:INFO:CONSOLE(7)] "SW registered", source: https://www.google.com/_/chrome/newtab?espv=2&ie=UTF-8 (7)
[12898:12898:0821/204331.767518:INFO:CONSOLE(513)] "Not allowed to load local resource: chrome-search://most-visited/single.html?removeTooltip=Don%27t%20show%20on%20this%20page", source: https://www.google.com/xjs/_/js/k=xjs.ntp.en_US.mU0Cx6s4pQ8.O/m=sx,jsa,ntp,d,csi/am=CIgB/rt=j/d=1/t=zcms/rs=ACT90oGzLM7hivT8DlXyfyuL5L7BQFeLAw (513)
[12898:12898:0821/204332.001173:ERROR:navigation_entry_screenshot_manager.cc(135)] Invalid entry with unique id: 25
[12898:12898:0821/204332.005160:INFO:CONSOLE(1)] "Not allowed to load local resource: chrome-search://local-ntp/theme.css", source: https://www.google.com/_/chrome/newtab?espv=2&ie=UTF-8 (1)
[12898:12898:0821/204332.152482:INFO:CONSOLE(7)] "SW registered", source: https://www.google.com/_/chrome/newtab?espv=2&ie=UTF-8 (7)
[12898:12898:0821/204332.412751:ERROR:navigation_entry_screenshot_manager.cc(135)] Invalid entry with unique id: 27
[12898:12898:0821/204332.721198:INFO:CONSOLE(513)] "Not allowed to load local resource: chrome-search://most-visited/single.html?removeTooltip=Don%27t%20show%20on%20this%20page", source: https://www.google.com/xjs/_/js/k=xjs.ntp.en_US.mU0Cx6s4pQ8.O/m=sx,jsa,ntp,d,csi/am=CIgB/rt=j/d=1/t=zcms/rs=ACT90oGzLM7hivT8DlXyfyuL5L7BQFeLAw (513)
BrowserTestBase received signal: Terminated. Backtrace:
#0 0x7ff6c486317d base::debug::StackTrace::StackTrace()
#1 0x7ff6c486154c base::debug::StackTrace::StackTrace()
#2 0x0000066d0e1f content::(anonymous namespace)::DumpStackTraceSignalHandler()
#3 0x7ff6ac0fdd40 <unknown>
#4 0x7ff6af2fb76a base::UnguessableToken::operator<()
#5 0x7ff6af2fb639 viz::LocalSurfaceId::operator<()
#6 0x7ff6af2fad47 viz::SurfaceId::operator<()
#7 0x7ff6af2fa3f1 base::internal::flat_tree<>::equal_range<>()
#8 0x7ff6af2f4cc0 base::internal::flat_tree<>::count<>()
#9 0x7ff6af3a8f29 viz::SurfaceAggregator::ProcessAddedAndRemovedSurfaces()
#10 0x7ff6af3b266e viz::SurfaceAggregator::Aggregate()
#11 0x7ff6af2efaac viz::Display::DrawAndSwap()
#12 0x7ff6af2fe8fe viz::DisplayScheduler::DrawAndSwap()
#13 0x7ff6af2fd816 viz::DisplayScheduler::AttemptDrawAndSwap()
#14 0x7ff6af2fd10f viz::DisplayScheduler::OnBeginFrameDeadline()
#15 0x7ff6af30362f _ZN4base8internal13FunctorTraitsIMN3viz16DisplaySchedulerEFvvEvE6InvokeIRKNS_7WeakPtrIS3_EEJEEEvS5_OT_DpOT0_
#16 0x7ff6af30357a _ZN4base8internal12InvokeHelperILb1EvE8MakeItSoIRKMN3viz16DisplaySchedulerEFvvERKNS_7WeakPtrIS5_EEJEEEvOT_OT0_DpOT1_
#17 0x7ff6af303510 _ZN4base8internal7InvokerINS0_9BindStateIMN3viz16DisplaySchedulerEFvvEJNS_7WeakPtrIS4_EEEEEFvvEE7RunImplIRKS6_RKNSt3__15tupleIJS8_EEEJLm0EEEEvOT_OT0_NSF_16integer_sequenceImJXspT1_EEEE
#18 0x7ff6af30345c _ZN4base8internal7InvokerINS0_9BindStateIMN3viz16DisplaySchedulerEFvvEJNS_7WeakPtrIS4_EEEEEFvvEE3RunEPNS0_13BindStateBaseE
#19 0x7ff6af30778d _ZNKR4base8CallbackIFvvELNS_8internal8CopyModeE1ELNS2_10RepeatModeE1EE3RunEv
#20 0x7ff6af307259 base::CancelableCallback<>::Forward()
#21 0x7ff6af30362f _ZN4base8internal13FunctorTraitsIMN3viz16DisplaySchedulerEFvvEvE6InvokeIRKNS_7WeakPtrIS3_EEJEEEvS5_OT_DpOT0_
#22 0x7ff6af30743a _ZN4base8internal12InvokeHelperILb1EvE8MakeItSoIRKMNS_18CancelableCallbackIFvvEEEKFvvERKNS_7WeakPtrIS6_EEJEEEvOT_OT0_DpOT1_
#23 0x7ff6af3073d0 _ZN4base8internal7InvokerINS0_9BindStateIMNS_18CancelableCallbackIFvvEEEKFvvEJNS_7WeakPtrIS5_EEEEES4_E7RunImplIRKS7_RKNSt3__15tupleIJS9_EEEJLm0EEEEvOT_OT0_NSF_16integer_sequenceImJXspT1_EEEE
#24 0x7ff6af30731c _ZN4base8internal7InvokerINS0_9BindStateIMNS_18CancelableCallbackIFvvEEEKFvvEJNS_7WeakPtrIS5_EEEEES4_E3RunEPNS0_13BindStateBaseE
#25 0x7ff6c480e611 _ZNO4base8CallbackIFvvELNS_8internal8CopyModeE0ELNS2_10RepeatModeE0EE3RunEv
#26 0x7ff6c4867dc7 base::debug::TaskAnnotator::RunTask()
#27 0x7ff6c491b653 base::MessageLoop::RunTask()
#28 0x7ff6c491b8d7 base::MessageLoop::DeferOrRunPendingTask()
#29 0x7ff6c491c5c4 base::MessageLoop::DoWork()
#30 0x7ff6c4923a7c base::MessagePumpGlib::HandleDispatch()
#31 0x7ff6c4924221 base::(anonymous namespace)::WorkSourceDispatch()
#32 0x7ff6ae753e04 g_main_context_dispatch
#33 0x7ff6ae754048 <unknown>
#34 0x7ff6ae7540ec g_main_context_iteration
#35 0x7ff6c4923b7f base::MessagePumpGlib::Run()
#36 0x7ff6c491ae14 base::MessageLoop::Run()
#37 0x7ff6c49d15bd base::RunLoop::Run()
#38 0x000006705087 content::RunThisRunLoop()
#39 0x000006706887 content::MessageLoopRunner::Run()
#40 0x000006706ed0 content::WindowedNotificationObserver::Wait()
#41 0x000002045fa4 InProcessBrowserTest::CloseBrowserSynchronously()
#42 0x00000202ed28 SyncTest::TearDownOnMainThread()
#43 0x0000066d09e1 content::BrowserTestBase::ProxyRunTestOnMainThreadLoop()
#44 0x00000083dd9d _ZN4base8internal13FunctorTraitsIMN18OAuth2TokenService7FetcherEFvvEvE6InvokeIPS3_JEEEvS5_OT_DpOT0_
#45 0x00000083dce4 _ZN4base8internal12InvokeHelperILb0EvE8MakeItSoIRKMN18OAuth2TokenService7FetcherEFvvEJPS5_EEEvOT_DpOT0_
#46 0x0000066d3055 _ZN4base8internal7InvokerINS0_9BindStateIMN7content15BrowserTestBaseEFvvEJNS0_17UnretainedWrapperIS4_EEEEEFvvEE7RunImplIRKS6_RKNSt3__15tupleIJS8_EEEJLm0EEEEvOT_OT0_NSF_16integer_sequenceImJXspT1_EEEE
#47 0x0000066d2f9c _ZN4base8internal7InvokerINS0_9BindStateIMN7content15BrowserTestBaseEFvvEJNS0_17UnretainedWrapperIS4_EEEEEFvvEE3RunEPNS0_13BindStateBaseE
#48 0x00000087013d _ZNKR4base8CallbackIFvvELNS_8internal8CopyModeE1ELNS2_10RepeatModeE1EE3RunEv
#49 0x000002c99ddf ChromeBrowserMainParts::PreMainMessageLoopRunImpl()
#50 0x000002c97fdc ChromeBrowserMainParts::PreMainMessageLoopRun()
#51 0x7ff6be1669c4 content::BrowserMainLoop::PreMainMessageLoopRun()
#52 0x7ff6bd1015bd _ZN4base8internal13FunctorTraitsIMN7content14NetworkContextEFvvEvE6InvokeIPS3_JEEEvS5_OT_DpOT0_
#53 0x7ff6bd101504 _ZN4base8internal12InvokeHelperILb0EvE8MakeItSoIMN7content14NetworkContextEFvvEJPS5_EEEvOT_DpOT0_
#54 0x7ff6be16d5d5 _ZN4base8internal7InvokerINS0_9BindStateIMN7content15BrowserMainLoopEFivEJNS0_17UnretainedWrapperIS4_EEEEEFivEE7RunImplIRKS6_RKNSt3__15tupleIJS8_EEEJLm0EEEEiOT_OT0_NSF_16integer_sequenceImJXspT1_EEEE
#55 0x7ff6be16d51c _ZN4base8internal7InvokerINS0_9BindStateIMN7content15BrowserMainLoopEFivEJNS0_17UnretainedWrapperIS4_EEEEEFivEE3RunEPNS0_13BindStateBaseE
#56 0x7ff6bd0cd37d _ZNKR4base8CallbackIFvvELNS_8internal8CopyModeE1ELNS2_10RepeatModeE1EE3RunEv
#57 0x7ff6bef5b7bd content::StartupTaskRunner::RunAllTasksNow()
#58 0x7ff6be1629ff content::BrowserMainLoop::CreateStartupTasks()
#59 0x7ff6be16fa74 content::BrowserMainRunnerImpl::Initialize()
#60 0x7ff6be15b034 content::BrowserMain()
#61 0x7ff6bfddf694 content::RunNamedProcessTypeMain()

View the failure here: https://uberchromegw.corp.google.com/i/internal.client.kitchensync/builders/Linux%20Canary/builds/40222

 

Comment 1 by s...@chromium.org, Aug 30 2017

Owner: s...@chromium.org
Status: Assigned (was: Untriaged)

Comment 2 by s...@chromium.org, Sep 6 2017

At first I thought these were crashes, but they're not, they're timeouts. If you look at the timestamps, when we retry each case, it looks like 90-110 seconds have passed. So why are we hanging here? Here being https://cs.chromium.org/chromium/src/chrome/test/base/in_process_browser_test.cc?type=cs&q=CloseBrowserSynchronously&sq=package:chromium&l=389

And why is this the only test that's failing? SyncDefault_E2ETest doesn't seem to be doing anything that the other tests are not doing.

Comment 3 by s...@chromium.org, Sep 6 2017

I actually just reproduced this running

> xvfb-run --server-args="-screen 0 1920x1080x24" out/Default/sync_integration_tests --cfi-diag=0 --test-launcher-summary-output=/tmp/chrome42893428934/foo.json --single_process --brave-new-test-launcher --test-launcher-bot-mode --gtest_also_run_disabled_tests --gtest_filter=TwoClientSearchEnginesSyncTest.SyncDefault_E2ETest --ignore-urlfetcher-cert-requests=true --test-launcher-timeout=120000 --ui-test-action-timeout=120000 --test-launcher-jobs=1 --vmodule=*sync_test*=1,*login_ui_test_utils*=1 --sync-url=https://clients4.google.com/chrome-sync/dev --sync-user-for-test=cs.ks.canary@gmail.com --sync-password-for-test=googletesting
ninja: Entering directory `out/Default'

One odd thing though was that this hung test wasn't terminated, even after ~10 minutes.

Trying to track what the code is doing, there's a lot of browser related code. It would seem that we setup an observer for content::NOTIFICATION_BROWSER_CLOSED events, then close the browser in an async fashion. This event is created in BrowserList::RemoveBrowser(). So if the Wait() never returns, we're not being told about the event. Which means either the browser is still open, or the browser was "closed", whatever that means, before we set up the observer. But there's too much code to really understand what's going on, and with such a rare repro, it's very difficult to make progress here.

WindowedNotificationObserver seems to already handle the case where the event is seen after construction but before the Wait call is made, but presumably it's all running on the same thread and the event is going to come from another task, so that's not even a concern.

Given that this stack trace has viz stuff, I think the better guess is that the browser did not shut down successfully. The async jump happens at

#2 0x7f6b2cad61f2 views::DesktopWindowTreeHostX11::Close()
#3 0x7f6b2caaf865 views::DesktopNativeWidgetAura::Close()
#4 0x7f6b2ca6f77c views::Widget::Close()
#5 0x000005607cc2 BrowserView::Close()
#6 0x00000207ad55 InProcessBrowserTest::CloseBrowserAsynchronously()
#7 0x00000207abb4 InProcessBrowserTest::CloseBrowserSynchronously()
#8 0x000002063b08 SyncTest::TearDownOnMainThread()
#9 0x0000065cf031 content::BrowserTestBase::ProxyRunTestOnMainThreadLoop()

Where be post to the current task runner DesktopWindowTreeHostX11::CloseNow, which ultimately will call into BrowserList

                  BrowserList::RemoveBrowser()
#2 0x000005438595 Browser::~Browser()
#3 0x00000543a849 Browser::~Browser()
#4 0x000005606642 BrowserView::~BrowserView()
#5 0x0000056068d9 BrowserView::~BrowserView()
#6 0x7f6b2ca47157 views::View::~View()
#7 0x7f6b2ca84e6c views::NonClientView::~NonClientView()
#8 0x7f6b2ca84f39 views::NonClientView::~NonClientView()
#9 0x7f6b2ca4a257 views::View::DoRemoveChildView()
#10 0x7f6b2ca4b20d views::View::RemoveAllChildViews()
#11 0x7f6b2ca6758a views::internal::RootView::~RootView()
#12 0x000005935045 BrowserRootView::~BrowserRootView()
#13 0x000005935069 BrowserRootView::~BrowserRootView()
#14 0x7f6b2ca6d001 views::Widget::DestroyRootView()
#15 0x7f6b2ca6ca5a views::Widget::~Widget()
#16 0x00000593037d BrowserFrame::~BrowserFrame()
#17 0x000005930429 BrowserFrame::~BrowserFrame()
#18 0x7f6b2caa9a6b views::DesktopNativeWidgetAura::~DesktopNativeWidgetAura()
#19 0x000005a9d883 DesktopBrowserFrameAura::~DesktopBrowserFrameAura()
#20 0x000005a9a884 DesktopBrowserFrameAuraX11::~DesktopBrowserFrameAuraX11()
#21 0x000005a9a989 DesktopBrowserFrameAuraX11::~DesktopBrowserFrameAuraX11()
#22 0x7f6b2caabb32 views::DesktopNativeWidgetAura::OnHostClosed()
#23 0x000005a9dac0 DesktopBrowserFrameAura::OnHostClosed()
#24 0x7f6b2cad6f08 views::DesktopWindowTreeHostX11::CloseNow()
#25 0x000005a9e7a7 BrowserDesktopWindowTreeHostX11::CloseNow()
#26 0x7f6b2ca3e5af _ZN4base8internal13FunctorTraitsIMN5views12MouseWatcher8ObserverEFvvEvE6InvokeIRKNS_7WeakPtrIS4_EEJEEEvS6_OT_DpOT0_
#27 0x7f6b2cae4d5a _ZN4base8internal12InvokeHelperILb1EvE8MakeItSoIRKMN5views24DesktopWindowTreeHostX11EFvvERKNS_7WeakPtrIS5_EEJEEEvOT_OT0_DpOT1_
#28 0x7f6b2cae4cf0 _ZN4base8internal7InvokerINS0_9BindStateIMN5views24DesktopWindowTreeHostX11EFvvEJNS_7WeakPtrIS4_EEEEEFvvEE7RunImplIRKS6_RKNSt3__15tupleIJS8_EEEJLm0EEEEvOT_OT0_NSF_16integer_sequenceImJXspT1_EEEE
#29 0x7f6b2cae4c3c _ZN4base8internal7InvokerINS0_9BindStateIMN5views24DesktopWindowTreeHostX11EFvvEJNS_7WeakPtrIS4_EEEEEFvvEE3RunEPNS0_13BindStateBaseE

It seems to me that both the Linux KitchenSync bots and my local Linux machine use the exact same implementations. But then why does the flakiness dashboard report hundreds of runs for these tests, and have never seen a failure, but the Kitchen Sync bots see like >20% failure on this test.

Comment 4 by s...@chromium.org, Sep 7 2017

I was wrong in #3, when we actually post the call the DesktopWindowTreeHostX11::CloseNow(), it actually looks more like

#2 0x7f4de3204a9f views::DesktopWindowTreeHostX11::Close()
#3 0x7f4de31de0ba views::DesktopNativeWidgetAura::Close()
#4 0x7f4de319dc68 views::Widget::Close()
#5 0x000005607db2 BrowserView::Close()
#6 0x000005440f1b Browser::CloseFrame()
#7 0x0000008c650f _ZN4base8internal13FunctorTraitsIMN11google_apis19UrlFetchRequestBaseEFvvEvE6InvokeIRKNS_7WeakPtrINS2_5drive30SingleBatchableDelegateRequestEEEJEEEvS5_OT_DpOT0_
#8 0x000005448cda _ZN4base8internal12InvokeHelperILb1EvE8MakeItSoIM7BrowserFvvENS_7WeakPtrIS4_EEJEEEvOT_OT0_DpOT1_
#9 0x000005448c70 _ZN4base8internal7InvokerINS0_9BindStateIM7BrowserFvvEJNS_7WeakPtrIS3_EEEEEFvvEE7RunImplIS5_NSt3__15tupleIJS7_EEEJLm0EEEEvOT_OT0_NSC_16integer_sequenceImJXspT1_EEEE
#10 0x000005448bc9 _ZN4base8internal7InvokerINS0_9BindStateIM7BrowserFvvEJNS_7WeakPtrIS3_EEEEEFvvEE7RunOnceEPNS0_13BindStateBaseE
#11 0x7f4de2a48731 _ZNO4base12OnceCallbackIFvvEE3RunEv
#12 0x7f4de2aa2135 base::debug::TaskAnnotator::RunTask()
#13 0x7f4de2b5125f base::internal::IncomingTaskQueue::RunTask()
#14 0x7f4de2b565d4 base::MessageLoop::RunTask()
#15 0x7f4de2b56857 base::MessageLoop::DeferOrRunPendingTask()
#16 0x7f4de2b57544 base::MessageLoop::DoWork()
#17 0x7f4de2b5eb66 base::MessagePumpGlib::Run()
#18 0x7f4de2b55d94 base::MessageLoop::Run()
#19 0x7f4de2c0cccd base::RunLoop::Run()
#20 0x000006603a77 content::RunThisRunLoop()
#21 0x000006605277 content::MessageLoopRunner::Run()
#22 0x0000066058c0 content::WindowedNotificationObserver::Wait()
#23 0x00000207acb6 InProcessBrowserTest::CloseBrowserSynchronously()
#24 0x000002063b6d SyncTest::TearDownOnMainThread()
#25 0x0000065cf241 content::BrowserTestBase::ProxyRunTestOnMainThreadLoop()

When simply running the test normally, we see something like #3, and it never flakes. However, when running the following command the pattern is slightly different.

> xvfb-run --server-args="-screen 0 1920x1080x24" out/Default/sync_integration_tests --cfi-diag=0 --test-launcher-summary-output=/tmp/chrome42893428934/foo.json --single_process --brave-new-test-launcher --test-launcher-bot-mode --gtest_also_run_disabled_tests --gtest_filter=TwoClientSearchEnginesSyncTest.SyncDefault_E2ETest --ignore-urlfetcher-cert-requests=true --test-launcher-timeout=120000 --ui-test-action-timeout=120000 --test-launcher-jobs=1 --vmodule=*sync_test*=1,*login_ui_test_utils*=1 --sync-url=https://clients4.google.com/chrome-sync/dev --sync-user-for-test=cs.ks.canary@gmail.com --sync-password-for-test=googletesting

It seems when we first go into Widget::Close(), we return early because Widget::non_client_view_ returns false for the CanClose() call, and the side effects of this call end up posting Browser::CloseFrame(), although I'm not sure how exactly that works. From my log statements, it seems that View::DoRemoveChildView() happens somewhere in the middle there, probably on a different intermediate task.

I just hit another flake, where that first BrowserView::Close() call does the same early exit, but the second one never happens. This is reassuring because if this jump through Browser::non_client_view_ during shutdown is what's breaking, and this jump only happens for KitchenSync tests, we now know why the regular trybots never flake.

Comment 5 by s...@chromium.org, Sep 7 2017

It would seem that the Widget::non_client_view_ indirection happens as a result of the UsingExternalServers() signin flow. The following command uses indirection:

> out/Default/sync_integration_tests --single_process --gtest_filter=TwoClientSearchEnginesSyncTest.SyncDefault_E2ETest --ignore-urlfetcher-cert-requests=true --vmodule=*sync_test*=1,*login_ui_test_utils*=1 --sync-url=https://clients4.google.com/chrome-sync/dev --sync-user-for-test=cs.ks.canary@gmail.com --sync-password-for-test=googletesting

While this one does not:

> xvfb-run --server-args="-screen 0 1920x1080x24" out/Default/sync_integration_tests --cfi-diag=0 --test-launcher-summary-output=/tmp/chrome42893428934/foo.json --single_process --brave-new-test-launcher --test-launcher-bot-mode --gtest_also_run_disabled_tests --gtest_filter=TwoClientSearchEnginesSyncTest.SyncDefault_E2ETest --ignore-urlfetcher-cert-requests=true --test-launcher-timeout=120000 --ui-test-action-timeout=120000 --test-launcher-jobs=1 --vmodule=*sync_test*=1,*login_ui_test_utils*=1

Looking at sync_test.cc, it looks like our profile creation process is different, as is of course how we sign in.

Trying to trace the difference in shutdown, it happens in UnloadController::ShouldCloseWindow(), fake signin will return true on the 3rd return, while external signin will return false on the 4th return. It is unclear if this divergence is necessarily bad.

Comment 6 by s...@chromium.org, Sep 12 2017

Owner: ----
Status: Available (was: Assigned)
Giving up on this bug for now. After adding a lot of print statements, and running this tests overnight two nights in a row, both times after ~150 attempts it hung in a differ a different spot than it used to, before even getting to CloseBrowserSynchronously().

Working theory for the original hang is that somewhere in the trip to render and back for BeforeUnloadACK and OnClosePageACK we get lost and never end up calling some aprt of TabStripEmpty()/CloseFrame()/DesktopWindowTreeHostX11::Close().

Comment 7 by s...@chromium.org, Sep 20 2017

Status: Fixed (was: Available)
We actually haven't seen this since the 8th. It may have stopped reproing for me locally because the underling issue was actually fixed. Closing, we can re-open if it happens agian.

Sign in to add a comment