New issue
Advanced search Search tips

Issue 748217 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 2
Type: ----



Sign in to add a comment

browser_tests flaky times out on Mac10.12 Tests

Project Member Reported by ojan@chromium.org, Jul 24 2017

Issue description

https://luci-milo.appspot.com/buildbot/chromium.mac/Mac10.12%20Tests/3095
https://luci-milo.appspot.com/buildbot/chromium.mac/Mac10.12%20Tests/3079

Not sure whose job this is, but it's purple on the bots, which I think means it goes to the trooper.
 
Components: -Infra
Labels: -Infra-Troopers Sheriff-Chromium OS-Mac
Looks like it's because the suite gets in a bad state and starts timing out at each individual test, always with stack trace like:

[ RUN      ] DownloadTest.SavePageNonHTMLViaGet
[2160:4099:0725/003632.378059:WARNING:notification_platform_bridge_mac.mm(514)] AlertNotificationService: XPC connection invalidated.
BrowserTestBase received signal: Terminated: 15. Backtrace:
0   browser_tests                       0x000000010bc92c0c base::debug::StackTrace::StackTrace(unsigned long) + 28
1   browser_tests                       0x000000010c3aab48 content::(anonymous namespace)::DumpStackTraceSignalHandler(int) + 200
2   libsystem_platform.dylib            0x00007fff9eac5bba _sigtramp + 26
3   browser_tests                       0x000000010bd0e9fd base::SequencedWorkerPool::PoolSequencedTaskRunner::PostDelayedTask(tracked_objects::Location const&, base::Callback<void (), (base::internal::CopyMode)0, (base::internal::RepeatMode)0>, base::TimeDelta) + 221
4   CoreFoundation                      0x00007fff892bbe84 __CFRunLoopServiceMachPort + 212
5   CoreFoundation                      0x00007fff892bb301 __CFRunLoopRun + 1361
6   CoreFoundation                      0x00007fff892bab54 CFRunLoopRunSpecific + 420
7   HIToolbox                           0x00007fff88845acc RunCurrentEventLoopInMode + 240
8   HIToolbox                           0x00007fff88845901 ReceiveNextEventCommon + 432
9   HIToolbox                           0x00007fff88845736 _BlockUntilNextEventMatchingListInModeWithFilter + 71
10  AppKit                              0x00007fff86debae4 _DPSNextEvent + 1120
11  AppKit                              0x00007fff8756621f -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 2789
12  browser_tests                       0x000000010bdc6660 __71-[BrowserCrApplication nextEventMatchingMask:untilDate:inMode:dequeue:]_block_invoke + 64
13  browser_tests                       0x000000010bcadc1a base::mac::CallWithEHFrame(void () block_pointer) + 10
14  browser_tests                       0x000000010bdc65a4 -[BrowserCrApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 164
15  AppKit                              0x00007fff86de0465 -[NSApplication run] + 926
16  browser_tests                       0x000000010bcbecde base::MessagePumpNSApplication::DoRun(base::MessagePump::Delegate*) + 334
17  browser_tests                       0x000000010bcbd6ac base::MessagePumpCFRunLoopBase::Run(base::MessagePump::Delegate*) + 92
18  browser_tests                       0x000000010bce1e23 base::RunLoop::Run() + 51
19  browser_tests                       0x000000010c6da851 net::test_server::EmbeddedTestServer::PostTaskToIOThreadAndWait(base::Callback<void (), (base::internal::CopyMode)1, (base::internal::RepeatMode)1> const&) + 321
20  browser_tests                       0x000000010c6d953f net::test_server::EmbeddedTestServer::ShutdownAndWaitUntilComplete() + 95
21  browser_tests                       0x0000000109753f4c DownloadTest_SavePageNonHTMLViaGet_Test::RunTestOnMainThread() + 940
22  browser_tests                       0x000000010c3aa88f content::BrowserTestBase::ProxyRunTestOnMainThreadLoop() + 335
23  browser_tests                       0x000000010bdcba31 ChromeBrowserMainParts::PreMainMessageLoopRunImpl() + 3985
24  browser_tests                       0x000000010bdca99e ChromeBrowserMainParts::PreMainMessageLoopRun() + 62
25  browser_tests                       0x000000010abb6623 content::BrowserMainLoop::PreMainMessageLoopRun() + 67
26  browser_tests                       0x000000010af4f757 content::StartupTaskRunner::RunAllTasksNow() + 39
27  browser_tests                       0x000000010abb4d39 content::BrowserMainLoop::CreateStartupTasks() + 601
28  browser_tests                       0x000000010abb8d44 content::BrowserMainRunnerImpl::Initialize(content::MainFunctionParams const&) + 756
29  browser_tests                       0x000000010abb2834 content::BrowserMain(content::MainFunctionParams const&) + 100
30  browser_tests                       0x000000010bc76830 content::ContentMainRunnerImpl::Run() + 368
31  browser_tests                       0x000000010da77d2d service_manager::Main(service_manager::MainParams const&) + 2445
32  browser_tests                       0x000000010bc75d64 content::ContentMain(content::ContentMainParams const&) + 68
33  browser_tests                       0x000000010c3aa580 content::BrowserTestBase::SetUp() + 2000
34  browser_tests                       0x000000010bd5a1e5 InProcessBrowserTest::SetUp() + 389
35  browser_tests                       0x000000010a5c4071 testing::Test::Run() + 97
36  browser_tests                       0x000000010a5c4ba0 testing::TestInfo::Run() + 288
37  browser_tests                       0x000000010a5c5107 testing::TestCase::Run() + 263
38  browser_tests                       0x000000010a5cb387 testing::internal::UnitTestImpl::RunAllTests() + 871
39  browser_tests                       0x000000010a5caff3 testing::UnitTest::Run() + 163
40  browser_tests                       0x000000010bd71623 base::TestSuite::Run() + 163
41  browser_tests                       0x000000010bc846ff ChromeTestSuiteRunner::RunTestSuite(int, char**) + 31
42  browser_tests                       0x000000010c3e6d3f content::LaunchTests(content::TestLauncherDelegate*, unsigned long, int, char**) + 319
43  browser_tests                       0x000000010bc8468c main + 108
44  libdyld.dylib                       0x00007fff9e8b8255 start + 1
45  ???                                 0x0000000000000009 0x0 + 9
[49/568] DownloadTest.SavePageNonHTMLViaGet (TIMED OUT)

See https://chromium-swarm.appspot.com/task?id=37917d7a9dd55510 for example.

This doesn't seem to be an infra issue. Someone familiar with the test will need to debug, so over the sheriff queue to triage.

Comment 2 by meade@chromium.org, Jul 28 2017

Labels: Infra-Troopers
I had a look over recent purple Mac10.12 runs, and the failures seemed to be different..

e.g. this one had several BOT_DIED and one regular failure in browser_side_navigation_browser_tests
https://luci-milo.appspot.com/buildbot/chromium.mac/Mac10.12%20Tests/3218

and this build also had a bunch of BOT_DIED failure:
https://luci-milo.appspot.com/buildbot/chromium.mac/Mac10.12%20Tests/3220


whereas this one looks like the test segfaulted. Perhaps that shouldn't cause a purple bot failure, but rather a red one?
https://luci-milo.appspot.com/buildbot/chromium.mac/Mac10.12%20Tests/3215
Components: Test
Labels: -Infra-Troopers
It does look like the test suite just gave up and decided to time out.  It's purple but probably shouldn't be purple.  The test suite should be doing it's own timeout before swarming kills it.

Assigning over to Test, whom I assume maintain the test runner?
Labels: -Pri-2 Pri-1
This looks frequently happening (almost once per day or more), so let me bump this up to P1.

I looked the following failures to investigate why the test gets stuck so often, but I'm still not sure.
The first tests on which browser_tests started to get stuck are different as mentioned at c#2.

- https://chromium-swarm.appspot.com/task?id=37caba4fc918da10&refresh=10&show_raw=1
  EncryptedMediaSupportedTypesExternalClearKeyTest.InvalidKeySystems
- https://chromium-swarm.appspot.com/task?id=37d50fa464b6d610&refresh=10&show_raw=1
  MSE_ExternalClearKey/EncryptedMediaTest.Playback_VideoOnly_MP4_VP9/0
- https://chromium-swarm.appspot.com/task?id=37d617b49a5a7010&refresh=10&show_raw=1
  MediaEngagementAutoplayBrowserTest.DoNotBypassAutoplayFrameLowEngagement
- https://chromium-swarm.appspot.com/task?id=37d886a04bd92610&refresh=10&show_raw=1
  IncognitoProfileMainNetworkContext/NetworkContextConfigurationBrowserTest.Cache/0
- https://chromium-swarm.appspot.com/task?id=378f0da96891a110&refresh=10&show_raw=1
  NoStatePrefetchBrowserTest.PrerenderSafeBrowsingTopLevel
- https://chromium-swarm.appspot.com/task?id=3784bc106334fa10&refresh=10&show_raw=1
  MediaStreamPermissionTest.TestDenyingUserMedia

Comment 5 by gab@chromium.org, Aug 8 2017

Cc: erikc...@chromium.org jam@chromium.org dcheng@chromium.org
Components: UI>Browser>Navigation
Labels: -Restrict-View-Google
Latest one was SubresourceFilterWebSocketBrowserTest.DoNotBlockWebSocketNoActivatedFrame/0 @ https://luci-milo.appspot.com/buildbot/chromium.mac/Mac10.12%20Tests/3630

The one thing that seems common with all of these is ui_test_utils::NavigateToURLWithDispositionBlockUntilNavigationsComplete() which performs a nested RunLoop blocking until the navigation completes [1].

So navigation response isn't arriving. Hard to tell what the underlying issue is. Maybe a full dump would help? +jam@: PlzNavigate?

[1] Or one of them is in content::TestURLLoaderClient::RunUntilResponseReceived() but same thing -- blocked on navigation response.

Comment 6 by jam@chromium.org, Aug 8 2017

This is happening for both plznavigate and non plznavigate (i.e. last 2 links in comment 4).
This error is reminiscent of the behavior when the window server crashes, although the symptoms seem fairly different.

https://bugs.chromium.org/p/chromium/issues/detail?id=653353
https://bugs.chromium.org/p/chromium/issues/detail?id=515627

Crashing in 
3   ???                                 0x00007fff5a0bded8 0x0 + 140734704115416
4   CoreFoundation                      0x00007fffb5a63e84 __CFRunLoopServiceMachPort + 212

Seems fairly worrying. I wonder if this is because many of the bots are on 10.12.2, which is more buggy/less stable than 10.12.6?

Comment 8 by treib@chromium.org, Aug 14 2017

Still happening about once per day. Different test each time.

Adding a third option next to ui_test_utils::NavigateToURLWithDispositionBlockUntilNavigationsComplete() and content::TestURLLoaderClient::RunUntilResponseReceived():
content::TitleWatcher::WaitAndGetTitle(). Happened here: https://chromium-swarm.appspot.com/task?id=37f76797e15fc810&refresh=10&show_raw=1
It also runs a nested RunLoop essentially waiting for a navigation, so probably the same thing as well.

Comment 9 by sdy@chromium.org, Aug 14 2017

Status: Available (was: Untriaged)
[MacTriage]
Looking at https://luci-milo.appspot.com/buildbot/chromium.mac/Mac10.12%20Tests/ I can see that the last failure like this was https://luci-milo.appspot.com/buildbot/chromium.mac/Mac10.12%20Tests/3879 @ 2017-08-15 7:35 AM (CEST)

That's more than 3 days ago. Shall we consider it fixed then?
Labels: -Pri-1 Pri-2
Lowering priority since it's currently not a big issue.

Comment 12 by hbos@chromium.org, Aug 22 2017

For the past 200 runs it has been purple a few times but the logs look different, not sure if it's the same bug or not. Swarming times out.
Labels: -Sheriff-Chromium
Removing the sheriff label as this doesn't seem to be an urgent issue or blocking others. 
Project Member

Comment 14 by sheriffbot@chromium.org, Aug 24

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Status: Available (was: Untriaged)

Sign in to add a comment