SitePerProcessBrowserTest.PopupMenuTest flaking on Linux ChromiumOS Tests |
||||
Issue descriptionChrome Version: 60.0.3101.0 SitePerProcessBrowserTest.PopupMenuTest has been flaking on the Linux ChromiumOS Tests builder: https://build.chromium.org/p/chromium.chromiumos/builders/Linux%20ChromiumOS%20Tests%20%281%29?numbuilds=200 Every 5th run or so of this test seems to fail (with the same output shown below), but usually succeeds on retry. It did cause the builder to fail on one recent run when all retries failed: https://build.chromium.org/p/chromium.chromiumos/builders/Linux%20ChromiumOS%20Tests%20%281%29/builds/38642 kenrb@, it looks like you wrote this particular test. Could you take a look at this, or triage appropriately? From content_browsertests stdout log: [ RUN ] SitePerProcessBrowserTest.PopupMenuTest Xlib: extension "RANDR" missing on display ":99". [19915:19915:0516/101313.508739:44325466243:ERROR:proxy_service.cc(1534)] ProxyConfigService for ChromeOS should be created in profile_io_data.cc::CreateProxyConfigService and this should be used only for examples. [19915:19915:0516/101313.611099:44325568507:WARNING:render_frame_host_impl.cc(2432)] OnDidStopLoading was called twice. BrowserTestBase received signal: Terminated. Backtrace: #0 0x000001ecaebc base::debug::StackTrace::StackTrace() #1 0x000001b7c1a2 content::(anonymous namespace)::DumpStackTraceSignalHandler() #2 0x7f3e24f4dcb0 <unknown> #3 0x7f3e2500784d __poll #4 0x7f3e28cb8fe4 <unknown> #5 0x7f3e28cb90ec g_main_context_iteration #6 0x000001ee0da5 base::MessagePumpGlib::Run() #7 0x000001efa673 base::RunLoop::Run() #8 0x000001b9133e content::MessageLoopRunner::Run() #9 0x0000008fb6e5 content::SitePerProcessBrowserTest_PopupMenuTest_Test::RunTestOnMainThread() #10 0x000001b7bf02 content::BrowserTestBase::ProxyRunTestOnMainThreadLoop() #11 0x000001bd4796 content::ShellBrowserMainParts::PreMainMessageLoopRun() #12 0x00000168cd81 content::BrowserMainLoop::PreMainMessageLoopRun() #13 0x000001967fe7 content::StartupTaskRunner::RunAllTasksNow() #14 0x00000168b245 content::BrowserMainLoop::CreateStartupTasks() #15 0x00000168f62a content::BrowserMainRunnerImpl::Initialize() #16 0x000001bbd659 ShellBrowserMain() #17 0x000001bb714c content::ShellMainDelegate::RunProcess() #18 0x0000015e3a1a content::RunNamedProcessTypeMain() #19 0x0000015e4257 content::ContentMainRunnerImpl::Run() #20 0x000002db20f1 service_manager::Main() #21 0x000000d6a0d4 content::ContentMain() #22 0x000001b7b8bd content::BrowserTestBase::SetUp() #23 0x000001b7369f content::ContentBrowserTest::SetUp() #24 0x000001ae89b1 testing::Test::Run() #25 0x000001ae94c0 testing::TestInfo::Run() #26 0x000001ae99f7 testing::TestCase::Run() #27 0x000001af0a87 testing::internal::UnitTestImpl::RunAllTests() #28 0x000001af070b testing::UnitTest::Run() #29 0x000001b9b6f6 base::TestSuite::Run() #30 0x000001b76d3a content::ContentTestLauncherDelegate::RunTestSuite() #31 0x000001b8dca4 content::LaunchTests() #32 0x000001b76cea main #33 0x7f3e24f38f45 __libc_start_main #34 0x0000004c71de <unknown> [1482/1901] SitePerProcessBrowserTest.PopupMenuTest (TIMED OUT) Less frequently, I have seen a slightly different version: [ RUN ] SitePerProcessBrowserTest.PopupMenuTest Xlib: extension "RANDR" missing on display ":99". [26413:26413:0515/225901.595694:151338999:ERROR:proxy_service.cc(1534)] ProxyConfigService for ChromeOS should be created in profile_io_data.cc::CreateProxyConfigService and this should be used only for examples. [26413:26413:0515/225901.685979:151429194:WARNING:render_frame_host_impl.cc(2432)] OnDidStopLoading was called twice. BrowserTestBase received signal: Segmentation fault. Backtrace: #0 0x000001ec2bbc base::debug::StackTrace::StackTrace() #1 0x000001b73eb2 content::(anonymous namespace)::DumpStackTraceSignalHandler() #2 0x7f5673c93cb0 <unknown> #3 0x0000019f56ff content::RenderWidgetHostViewEventHandler::ForwardMouseEventToParent() #4 0x0000019f5054 content::RenderWidgetHostViewEventHandler::OnMouseEvent() #5 0x0000008f7477 content::SitePerProcessBrowserTest_PopupMenuTest_Test::RunTestOnMainThread() #6 0x000001b73c12 content::BrowserTestBase::ProxyRunTestOnMainThreadLoop() #7 0x000001bcc4a6 content::ShellBrowserMainParts::PreMainMessageLoopRun() #8 0x000001685571 content::BrowserMainLoop::PreMainMessageLoopRun() #9 0x0000019607d7 content::StartupTaskRunner::RunAllTasksNow() #10 0x000001683a35 content::BrowserMainLoop::CreateStartupTasks() #11 0x000001687e1a content::BrowserMainRunnerImpl::Initialize() #12 0x000001bb5369 ShellBrowserMain() #13 0x000001baee5c content::ShellMainDelegate::RunProcess() #14 0x0000015dc20a content::RunNamedProcessTypeMain() #15 0x0000015dca47 content::ContentMainRunnerImpl::Run() #16 0x000002da9531 service_manager::Main() #17 0x000000d65044 content::ContentMain() #18 0x000001b735cd content::BrowserTestBase::SetUp() #19 0x000001b6b3af content::ContentBrowserTest::SetUp() #20 0x000001ae08d1 testing::Test::Run() #21 0x000001ae13e0 testing::TestInfo::Run() #22 0x000001ae1917 testing::TestCase::Run() #23 0x000001ae89a7 testing::internal::UnitTestImpl::RunAllTests() #24 0x000001ae862b testing::UnitTest::Run() #25 0x000001b93406 base::TestSuite::Run() #26 0x000001b6ea4a content::ContentTestLauncherDelegate::RunTestSuite() #27 0x000001b859b4 content::LaunchTests() #28 0x000001b6e9fa main #29 0x7f5673c7ef45 __libc_start_main #30 0x0000004c6cbe <unknown> [1147/1897] SitePerProcessBrowserTest.PopupMenuTest (389 ms) I looked through older logs, and found instances of this going back at least two weeks. I was not able to reproduce the problem on my local machine.
,
May 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/66169e7ebec215a59409823ab08f5210ce1fde24 commit 66169e7ebec215a59409823ab08f5210ce1fde24 Author: kenrb <kenrb@chromium.org> Date: Sat May 20 00:37:43 2017 Speculative fix for PopupMenuTest flakiness SitePerProcessBrowserTest.PopupMenuTest is occasionally flaky on ChromeOS bots, possibly because a Linux-only section was added in r459476 that relies on browser process hit testing, but the test does not call WaitForChildFrameSurfaceReady() to prevent races between hit tests and compositor frames. This adds a call to attempt to resolve the flakiness, and does some unrelated cleanup. BUG= 723657 Review-Url: https://codereview.chromium.org/2895653003 Cr-Commit-Position: refs/heads/master@{#473396} [modify] https://crrev.com/66169e7ebec215a59409823ab08f5210ce1fde24/content/browser/site_per_process_browsertest.cc
,
May 23 2017
I don't see any more flakes in a quick scan of recent logs. Has anyone seen any recent instances, or can I close this bug?
,
May 23 2017
Test flaked on https://build.chromium.org/p/chromium.chromiumos/builders/Linux%20ChromiumOS%20Tests%20%281%29/builds/38852, right before your change landed. I looked at the next 20, and some more after that, and didn't see the flake shown in the first call stack. However, I did see one instance of the second call stack crash in those 30 runs: https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.chromiumos%2FLinux_ChromiumOS_Tests__1_%2F39036%2F%2B%2Frecipes%2Fsteps%2Fcontent_browsertests%2F0%2Fstdout Empirically, it looks like these were two different problems. This first (1-in-5) seems to be fixed, the second (1-in-30+?) still seems to happen occasionally.
,
Jun 1 2017
A local build of content_browsertests (Linux, non-Chrome OS, release build, DCHECK on) is consistently hitting this failure on every run. This is on ToT with no local changes applied.
,
Jun 2 2017
rockot@: I still can't repro locally, using the configuration you describe. Is there anything else notable about it? I am wondering if there is something wrong with the MouseEvent being passed to RenderWidgetHostViewEventHandler::ForwardMouseEventToParent(), but it would be a lot easier to diagnose if I could reproduce it.
,
Jun 2 2017
Adding rockot@ to cc -- Please see comment above.
,
Jun 2 2017
I can try to repro in a debugger some time today to help track this down.
,
Jun 2 2017
Thanks. One possibility that just occurred to me is that I might need to move this test to ui_interactive_tests. It might be possible that other tests are causing the popup to be dismissed during one of the times when this test yields thread execution, and the RenderWidgetHostView is sometimes getting destroyed prematurely, resulting in a crash when it tries to route a mouse event to the RWHV. Any help confirming or refuting that hypothesis would be greatly appreciated.
,
Jun 2 2017
Thought I posted this already btw but I don't see it... Here's some other output which appears within the same test shortly before the crash:
../../content/browser/site_per_process_browsertest.cc:6457: Failure
Expected: popup_rect.y() - rwhv_root->GetViewBounds().y()
Which is: 96
To be equal to: 94
,
Jun 2 2017
From a debugger it seems pretty clear that RenderWidgetHost::FromID on line 6479 of site_per_process_browsertest.cc is returning null. I wonder if this is just because the MessageLoop is allowed to run for too long after the test message is received, and ShowWidgetMessageFilter should either switch to just using a base::RunLoop to Wait, or at least construct its content::MessageLoopRunner with content::MessageLoopRunner::QuitMode::IMMEDIATE?
,
Jun 2 2017
I just tried constructing the MessageLoopRunner with content::MessageLoopRunner::QuitMode::IMMEDIATE, with interesting results. Whereas I was previously not able to repro this bug, after that change I see this crash stack on every run. That suggests the problem is the other way around -- we aren't waiting long enough. RenderWidgetHost::FromID is returning null because WebContentsImpl hasn't yet processed the ViewHostMsg_ShowWidget message, and the popup's RWH doesn't yet exist. A simple solution would just be to check the FromID return for null, and if it isn't there yet, yield and try again.
,
Jun 2 2017
Somehow that sounds likely to have even more potential for flakiness. If the ViewHostMsg_ShowWidget IPC does not correspond to the expected RWH being created, then isn't waiting for ViewHostMsg_ShowWidget simply the wrong thing to do here?
,
Jun 2 2017
You're right, I was thinking that the widget gets created by the WebContentsImpl upon receipt of ViewHostMsg_ShowWidget, and since the IO thread is posting both those events to the UI thread, we might have a race. But the RenderWidgetHost already exists, and incidentally in my test runs FromID always returns a RenderWidgetHost, even though I see a crash. I still think this is a race between ShowWidget handlers, but possibly what is happening is that we are calling RWHVA::OnMouseEvent() before WebContentsImpl calls RWHVA::InitAsPopup. This means that the RenderWidgetHostView doesn't have a window_ object yet, which would result in the crash stack we are seeing. Creating a loop that waits for popup_view->window() to be non-null fixes the crash for me locally.
,
Jun 2 2017
,
Jun 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/37003a1722d89acc689cd3911efceb583bad43e0 commit 37003a1722d89acc689cd3911efceb583bad43e0 Author: kenrb <kenrb@chromium.org> Date: Sat Jun 03 00:17:03 2017 Make PopupMenuTest wait for RenderWidgetHostView initialization SitePerProcessBrowserTest.PopupMenuTest has been observed to have occasional flakes, and the cause seems to be a race between handlers of ViewMsg_ShowWidget messages. The test code uses a message filter which notifies upon receipt of ShowWidget, and WebContentsImpl is the ultimate recipient of that message which causes it to initialize the popup widget. Occasionally the test code will run before WebContentsImpl's handler has fired, in which case the RenderWidgetHostView has not been initialized, resulting in a crash. This CL adds a loop to wait for initialization to be completed before proceeding with the test. BUG= 723657 Review-Url: https://codereview.chromium.org/2917173002 Cr-Commit-Position: refs/heads/master@{#476839} [modify] https://crrev.com/37003a1722d89acc689cd3911efceb583bad43e0/content/browser/renderer_host/render_widget_host_view_aura.h [modify] https://crrev.com/37003a1722d89acc689cd3911efceb583bad43e0/content/browser/site_per_process_browsertest.cc
,
Jun 7 2017
I haven't seen any incidences of this flake since my CL landed.
,
Jun 7 2017
I checked the 40 builds after your CL landed and didn't see any flakes. Thanks for your diligence in getting this fixed!
,
Jan 22 2018
|
||||
►
Sign in to add a comment |
||||
Comment 1 by afakhry@chromium.org
, May 19 2017