New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 723657 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Closed: Jun 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

SitePerProcessBrowserTest.PopupMenuTest flaking on Linux ChromiumOS Tests

Project Member Reported by glevin@chromium.org, May 17 2017

Issue description

Chrome Version: 60.0.3101.0

SitePerProcessBrowserTest.PopupMenuTest has been flaking on the Linux ChromiumOS Tests builder:
https://build.chromium.org/p/chromium.chromiumos/builders/Linux%20ChromiumOS%20Tests%20%281%29?numbuilds=200

Every 5th run or so of this test seems to fail (with the same output shown below), but usually succeeds on retry.  It did cause the builder to fail on one recent run when all retries failed:
https://build.chromium.org/p/chromium.chromiumos/builders/Linux%20ChromiumOS%20Tests%20%281%29/builds/38642


kenrb@, it looks like you wrote this particular test.  Could you take a look at this, or triage appropriately?


From content_browsertests stdout log:

[ RUN      ] SitePerProcessBrowserTest.PopupMenuTest
Xlib:  extension "RANDR" missing on display ":99".
[19915:19915:0516/101313.508739:44325466243:ERROR:proxy_service.cc(1534)] ProxyConfigService for ChromeOS should be created in profile_io_data.cc::CreateProxyConfigService and this should be used only for examples.
[19915:19915:0516/101313.611099:44325568507:WARNING:render_frame_host_impl.cc(2432)] OnDidStopLoading was called twice.
BrowserTestBase received signal: Terminated. Backtrace:
#0 0x000001ecaebc base::debug::StackTrace::StackTrace()
#1 0x000001b7c1a2 content::(anonymous namespace)::DumpStackTraceSignalHandler()
#2 0x7f3e24f4dcb0 <unknown>
#3 0x7f3e2500784d __poll
#4 0x7f3e28cb8fe4 <unknown>
#5 0x7f3e28cb90ec g_main_context_iteration
#6 0x000001ee0da5 base::MessagePumpGlib::Run()
#7 0x000001efa673 base::RunLoop::Run()
#8 0x000001b9133e content::MessageLoopRunner::Run()
#9 0x0000008fb6e5 content::SitePerProcessBrowserTest_PopupMenuTest_Test::RunTestOnMainThread()
#10 0x000001b7bf02 content::BrowserTestBase::ProxyRunTestOnMainThreadLoop()
#11 0x000001bd4796 content::ShellBrowserMainParts::PreMainMessageLoopRun()
#12 0x00000168cd81 content::BrowserMainLoop::PreMainMessageLoopRun()
#13 0x000001967fe7 content::StartupTaskRunner::RunAllTasksNow()
#14 0x00000168b245 content::BrowserMainLoop::CreateStartupTasks()
#15 0x00000168f62a content::BrowserMainRunnerImpl::Initialize()
#16 0x000001bbd659 ShellBrowserMain()
#17 0x000001bb714c content::ShellMainDelegate::RunProcess()
#18 0x0000015e3a1a content::RunNamedProcessTypeMain()
#19 0x0000015e4257 content::ContentMainRunnerImpl::Run()
#20 0x000002db20f1 service_manager::Main()
#21 0x000000d6a0d4 content::ContentMain()
#22 0x000001b7b8bd content::BrowserTestBase::SetUp()
#23 0x000001b7369f content::ContentBrowserTest::SetUp()
#24 0x000001ae89b1 testing::Test::Run()
#25 0x000001ae94c0 testing::TestInfo::Run()
#26 0x000001ae99f7 testing::TestCase::Run()
#27 0x000001af0a87 testing::internal::UnitTestImpl::RunAllTests()
#28 0x000001af070b testing::UnitTest::Run()
#29 0x000001b9b6f6 base::TestSuite::Run()
#30 0x000001b76d3a content::ContentTestLauncherDelegate::RunTestSuite()
#31 0x000001b8dca4 content::LaunchTests()
#32 0x000001b76cea main
#33 0x7f3e24f38f45 __libc_start_main
#34 0x0000004c71de <unknown>
[1482/1901] SitePerProcessBrowserTest.PopupMenuTest (TIMED OUT)

Less frequently, I have seen a slightly different version:

[ RUN      ] SitePerProcessBrowserTest.PopupMenuTest
Xlib:  extension "RANDR" missing on display ":99".
[26413:26413:0515/225901.595694:151338999:ERROR:proxy_service.cc(1534)] ProxyConfigService for ChromeOS should be created in profile_io_data.cc::CreateProxyConfigService and this should be used only for examples.
[26413:26413:0515/225901.685979:151429194:WARNING:render_frame_host_impl.cc(2432)] OnDidStopLoading was called twice.
BrowserTestBase received signal: Segmentation fault. Backtrace:
#0 0x000001ec2bbc base::debug::StackTrace::StackTrace()
#1 0x000001b73eb2 content::(anonymous namespace)::DumpStackTraceSignalHandler()
#2 0x7f5673c93cb0 <unknown>
#3 0x0000019f56ff content::RenderWidgetHostViewEventHandler::ForwardMouseEventToParent()
#4 0x0000019f5054 content::RenderWidgetHostViewEventHandler::OnMouseEvent()
#5 0x0000008f7477 content::SitePerProcessBrowserTest_PopupMenuTest_Test::RunTestOnMainThread()
#6 0x000001b73c12 content::BrowserTestBase::ProxyRunTestOnMainThreadLoop()
#7 0x000001bcc4a6 content::ShellBrowserMainParts::PreMainMessageLoopRun()
#8 0x000001685571 content::BrowserMainLoop::PreMainMessageLoopRun()
#9 0x0000019607d7 content::StartupTaskRunner::RunAllTasksNow()
#10 0x000001683a35 content::BrowserMainLoop::CreateStartupTasks()
#11 0x000001687e1a content::BrowserMainRunnerImpl::Initialize()
#12 0x000001bb5369 ShellBrowserMain()
#13 0x000001baee5c content::ShellMainDelegate::RunProcess()
#14 0x0000015dc20a content::RunNamedProcessTypeMain()
#15 0x0000015dca47 content::ContentMainRunnerImpl::Run()
#16 0x000002da9531 service_manager::Main()
#17 0x000000d65044 content::ContentMain()
#18 0x000001b735cd content::BrowserTestBase::SetUp()
#19 0x000001b6b3af content::ContentBrowserTest::SetUp()
#20 0x000001ae08d1 testing::Test::Run()
#21 0x000001ae13e0 testing::TestInfo::Run()
#22 0x000001ae1917 testing::TestCase::Run()
#23 0x000001ae89a7 testing::internal::UnitTestImpl::RunAllTests()
#24 0x000001ae862b testing::UnitTest::Run()
#25 0x000001b93406 base::TestSuite::Run()
#26 0x000001b6ea4a content::ContentTestLauncherDelegate::RunTestSuite()
#27 0x000001b859b4 content::LaunchTests()
#28 0x000001b6e9fa main
#29 0x7f5673c7ef45 __libc_start_main
#30 0x0000004c6cbe <unknown>
[1147/1897] SitePerProcessBrowserTest.PopupMenuTest (389 ms)

I looked through older logs, and found instances of this going back at least two weeks.
I was not able to reproduce the problem on my local machine.
 
Cc: creis@chromium.org nasko@chromium.org
Another flake incident can be found here https://build.chromium.org/p/chromium.chromiumos/builders/Linux%20ChromiumOS%20Ozone%20Tests%20%281%29/builds/47278.


Project Member

Comment 2 by bugdroid1@chromium.org, May 20 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/66169e7ebec215a59409823ab08f5210ce1fde24

commit 66169e7ebec215a59409823ab08f5210ce1fde24
Author: kenrb <kenrb@chromium.org>
Date: Sat May 20 00:37:43 2017

Speculative fix for PopupMenuTest flakiness

SitePerProcessBrowserTest.PopupMenuTest is occasionally flaky on
ChromeOS bots, possibly because a Linux-only section was added in
r459476 that relies on browser process hit testing, but the test does
not call WaitForChildFrameSurfaceReady() to prevent races between hit
tests and compositor frames.

This adds a call to attempt to resolve the flakiness, and does some
unrelated cleanup.

BUG= 723657 

Review-Url: https://codereview.chromium.org/2895653003
Cr-Commit-Position: refs/heads/master@{#473396}

[modify] https://crrev.com/66169e7ebec215a59409823ab08f5210ce1fde24/content/browser/site_per_process_browsertest.cc

Comment 3 by kenrb@chromium.org, May 23 2017

I don't see any more flakes in a quick scan of recent logs. Has anyone seen any recent instances, or can I close this bug?

Comment 4 by glevin@chromium.org, May 23 2017

Test flaked on https://build.chromium.org/p/chromium.chromiumos/builders/Linux%20ChromiumOS%20Tests%20%281%29/builds/38852, right before your change landed.  I looked at the next 20, and some more after that, and didn't see the flake shown in the first call stack.

However, I did see one instance of the second call stack crash in those 30 runs:
https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.chromiumos%2FLinux_ChromiumOS_Tests__1_%2F39036%2F%2B%2Frecipes%2Fsteps%2Fcontent_browsertests%2F0%2Fstdout

Empirically, it looks like these were two different problems.  This first (1-in-5) seems to be fixed, the second (1-in-30+?) still seems to happen occasionally.
A local build of content_browsertests (Linux, non-Chrome OS, release build, DCHECK on) is consistently hitting this failure on every run. This is on ToT with no local changes applied.

Comment 6 by kenrb@chromium.org, Jun 2 2017

rockot@: I still can't repro locally, using the configuration you describe. Is there anything else notable about it?

I am wondering if there is something wrong with the MouseEvent being passed to RenderWidgetHostViewEventHandler::ForwardMouseEventToParent(), but it would be a lot easier to diagnose if I could reproduce it.

Comment 7 by kenrb@chromium.org, Jun 2 2017

Cc: roc...@chromium.org
Adding rockot@ to cc -- Please see comment above.
I can try to repro in a debugger some time today to help track this down.

Comment 9 by kenrb@chromium.org, Jun 2 2017

Thanks.

One possibility that just occurred to me is that I might need to move this test to ui_interactive_tests. It might be possible that other tests are causing the popup to be dismissed during one of the times when this test yields thread execution, and the RenderWidgetHostView is sometimes getting destroyed prematurely, resulting in a crash when it tries to route a mouse event to the RWHV.

Any help confirming or refuting that hypothesis would be greatly appreciated.
Thought I posted this already btw but I don't see it... Here's some other output which appears within the same test shortly before the crash:

../../content/browser/site_per_process_browsertest.cc:6457: Failure
      Expected: popup_rect.y() - rwhv_root->GetViewBounds().y()
      Which is: 96
To be equal to: 94

From a debugger it seems pretty clear that RenderWidgetHost::FromID on line 6479 of site_per_process_browsertest.cc is returning null.

I wonder if this is just because the MessageLoop is allowed to run for too long after the test message is received, and ShowWidgetMessageFilter should either switch to just using a base::RunLoop to Wait, or at least construct its content::MessageLoopRunner with content::MessageLoopRunner::QuitMode::IMMEDIATE?
I just tried constructing the MessageLoopRunner with content::MessageLoopRunner::QuitMode::IMMEDIATE, with interesting results. Whereas I was previously not able to repro this bug, after that change I see this crash stack on every run.

That suggests the problem is the other way around -- we aren't waiting long enough. RenderWidgetHost::FromID is returning null because WebContentsImpl hasn't yet processed the ViewHostMsg_ShowWidget message, and the popup's RWH doesn't yet exist.

A simple solution would just be to check the FromID return for null, and if it isn't there yet, yield and try again.
Somehow that sounds likely to have even more potential for flakiness. If
the ViewHostMsg_ShowWidget IPC does not correspond to the expected RWH
being created, then isn't waiting for ViewHostMsg_ShowWidget simply the
wrong thing to do here?
You're right, I was thinking that the widget gets created by the WebContentsImpl upon receipt of ViewHostMsg_ShowWidget, and since the IO thread is posting both those events to the UI thread, we might have a race.

But the RenderWidgetHost already exists, and incidentally in my test runs FromID always returns a RenderWidgetHost, even though I see a crash. I still think this is a race between ShowWidget handlers, but possibly what is happening is that we are calling RWHVA::OnMouseEvent() before WebContentsImpl calls RWHVA::InitAsPopup. This means that the RenderWidgetHostView doesn't have a window_ object yet, which would result in the crash stack we are seeing.

Creating a loop that waits for popup_view->window() to be non-null fixes the crash for me locally.
Project Member

Comment 16 by bugdroid1@chromium.org, Jun 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/37003a1722d89acc689cd3911efceb583bad43e0

commit 37003a1722d89acc689cd3911efceb583bad43e0
Author: kenrb <kenrb@chromium.org>
Date: Sat Jun 03 00:17:03 2017

Make PopupMenuTest wait for RenderWidgetHostView initialization

SitePerProcessBrowserTest.PopupMenuTest has been observed to have
occasional flakes, and the cause seems to be a race between handlers
of ViewMsg_ShowWidget messages. The test code uses a message filter
which notifies upon receipt of ShowWidget, and WebContentsImpl is the
ultimate recipient of that message which causes it to initialize the
popup widget.

Occasionally the test code will run before WebContentsImpl's handler
has fired, in which case the RenderWidgetHostView has not been
initialized, resulting in a crash. This CL adds a loop to wait for
initialization to be completed before proceeding with the test.

BUG= 723657 

Review-Url: https://codereview.chromium.org/2917173002
Cr-Commit-Position: refs/heads/master@{#476839}

[modify] https://crrev.com/37003a1722d89acc689cd3911efceb583bad43e0/content/browser/renderer_host/render_widget_host_view_aura.h
[modify] https://crrev.com/37003a1722d89acc689cd3911efceb583bad43e0/content/browser/site_per_process_browsertest.cc

I haven't seen any incidences of this flake since my CL landed.
Status: Fixed (was: Assigned)
I checked the 40 builds after your CL landed and didn't see any flakes.  Thanks for your diligence in getting this fixed!

Comment 19 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Sign in to add a comment