New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 670844 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
please use my google.com address
Closed: Dec 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug
Team-Accessibility

Blocking:
issue 668707
issue 650175



Sign in to add a comment

>170 content_browsertests tests failing intermittently on win_chromium_rel_ng

Project Member Reported by kbr@chromium.org, Dec 2 2016

Issue description

Intermittently on win_chromium_rel_ng over 170 tests in content_browsertests are spuriously failing. The test list appears to be roughly the same each time. It happens on different VMs, so either it's not a specific machine, or it's a widespread configuration problem in the Swarming fleet.

Here are the recent runs with this symptom that I found on win_chromium_rel_ng:

https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/342957
https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/342937
https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/342934
https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/342907
https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/342900

The symptom is that all of these tests time out. They span systems like accessibility, site isolation, navigation, etc. I'm adding several components including Mojo in the case that someone will have an idea what's going on.

There was a similar bug report in Issue 650175, on the official builders, and it's not clear whether that was fully resolved.

The list of failing tests from my failed try run follows.

While I understand that there are only 6 such failures in the past 100 builds, something is badly broken and this needs to be triaged immediately. I am therefore marking this P0 until it's at least triaged. Feel free to downgrade it once it's triaged.

DumpAccessibilityTreeTest.AccessibilityIframePostEnable
BackgroundSyncBrowserTest.SyncRegistrationsDeletedWhenClearingSiteData
DumpAccessibilityTreeTest.AccessibilityImg
SessionHistoryTest.FrameBackForward
BackgroundSyncBrowserTest.SyncRegistrationFromSWDeletedWhenClearingSiteData
RenderFrameHostManagerTest.DontSwapProcessWithOnlyTargetBlank
DumpAccessibilityEventsTest.AccessibilityEventsListboxNext
RenderFrameHostManagerTest.PopupKeepsWindowReferenceCrossProcesAndBack
WorkerTest.SharedWorkerHttpAuth
RenderFrameHostManagerTest.SupportCrossProcessPostMessageWithMessagePort
NavigationControllerBrowserTest.FrameNavigationEntry_SameOriginBackWithRedirect
SitePerProcessBrowserTest.NestedSurfaceHitTestTest
RenderFrameMessageFilterBrowserTest.SameSiteCookies
NavigationControllerBrowserTest.PageStateWithIframeAfterForwardInCompetingFrames
NavigationControllerBrowserTest.RefererAndOriginHeadersAfterRedirects
SitePerProcessHighDPIBrowserTest.SubframeLoadsWithCorrectDeviceScaleFactor
IsolateIcelandFrameTreeBrowserTest.ProcessSwitchForIsolatedBlob
RenderFrameHostManagerTest.SwapProcessWithSameSiteRelNoopener
NavigationControllerBrowserTest.FrameNavigationEntry_BackWithRedirect
NavigationControllerBrowserTest.FrameNavigationEntry_RenameNestedAutoSubframe
BackgroundSyncBrowserTest.HasTagFromServiceWorker
ReloadCacheControlBrowserTest.NormalReload
RenderFrameHostManagerTest.ConsecutiveNavigationsToSite
RequestDataResourceDispatcherHostBrowserTest.CrossOriginAuxiliary
NavigationControllerBrowserTest.FrameNavigationEntry_BackNestedAutoSubframe
SitePerProcessDevToolsBrowserTest.AgentHostForFrames
RenderFrameHostManagerTest.DisownOpener
SessionHistoryTest.CrossFrameFormBackForward
CrossSiteResourceHandlerTest.NoDeliveryToDetachedFrame
RenderFrameHostManagerTest.DontPreemptNavigationWithFrameTreeUpdate
SessionHistoryTest.LocationChangeInSubframe
DumpAccessibilityEventsTest.AccessibilityEventsListboxFocus
NavigationControllerBrowserTest.FrameNavigationEntry_NewSubframe
RenderFrameHostManagerTest.AllowTargetedNavigationsInOpenerAfterSwap
BackgroundSyncBrowserTest.FiringSyncEventDeletedWhenClearingSiteData
ClearSiteDataThrottleBrowserTest.Types
RequestDataResourceDispatcherHostBrowserTest.SameOriginAuxiliary
SitePerProcessDevToolsProtocolTest.TargetNoDiscovery
FrameTreeBrowserTest.FrameTreeShape
NavigationHandleImplBrowserTest.VerifySamePage
IFrameZoomBrowserTest.SubframesDontZoomIndependently
SitePerProcessBrowserTest.ViewBoundsInNestedFrameTest
NavigationControllerBrowserTest.RefererStoredForSubFrame
BackgroundSyncBrowserTest.RegisterFromIFrameWithoutMainFrameHost
RenderFrameHostManagerTest.DontSwapProcessWithOnlyRelNoOpener
NavigationControllerBrowserTest.FrameNavigationEntry_SubframeHistoryFallback
RenderFrameHostManagerTest.InputMsgToSwappedOutRVHIsIgnored
BackgroundSyncBrowserTest.RegisterFromUncontrolledDocument
WorkerTest.WebSocketSharedWorker
RenderFrameMessageFilterBrowserTest.Cookies
AsyncRevalidationManagerBrowserTest.CacheIsUpdated
SessionHistoryTest.BasicBackForward
SessionHistoryTest.GoBackToCrossSitePostWithRedirect
SecurityExploitBrowserTest.AttemptRunFileChoosers
DevToolsProtocolTest.InspectDuringFrameSwap
LoFiResourceDispatcherHostBrowserTest.ShouldEnableLoFiModeReloadDisableLoFi
DumpAccessibilityEventsTest.AccessibilityEventsMenuListFocus
NavigationControllerBrowserTest.PreventSpoofFromSubframeAndReplace
DevToolsProtocolTest.TargetDiscovery
DumpAccessibilityTreeTest.AccessibilityIframe
DumpAccessibilityTreeTest.AccessibilityIframeCrossProcess
DumpAccessibilityTreeTest.AccessibilityIframeTransformCrossProcess
ManifestBrowserTest.CORSManifest
MHTMLGenerationSitePerProcessTest.GenerateMHTML
SecurityExploitBrowserTest.InvalidOriginHeaders
NavigationControllerBrowserTest.EnsureSamePageNavigationUpdatesFrameNavigationEntry
RenderFrameHostManagerTest.ProcessExitWithSwappedOutViews
RenderFrameHostManagerTest.BackForwardNotStale
AsyncRevalidationManagerBrowserTest.StaleWhileRevalidateIsApplied
IFrameZoomBrowserTest.SiblingFramesZoom
RenderFrameHostManagerTest.SupportCrossProcessPostMessage
RenderFrameHostManagerTest.PreserveTopFrameWindowNameOnCrossProcessNavigations
SitePerProcessBrowserTest.SurfaceHitTestTest
FrameTreeBrowserTest.IsRenderFrameLive
IFrameZoomBrowserTest.RedirectToPageWithSubframeZoomsCorrectly
BackgroundSyncBrowserTest.RegisterFromIFrameWithMainFrameHost
CrossProcessFrameTreeBrowserTest.OriginSetOnCrossProcessNavigations
FrameTreeBrowserTest.NavigateGrandchildToBlob
FrameTreeBrowserTest.NavigateChildToAboutBlank
BackgroundSyncBrowserTest.Incognito
DumpAccessibilityTreeTest.AccessibilityIframeCoordinatesCrossProcess
RenderFrameHostManagerTest.RestoreSubframeFileAccessForHistoryNavigation
MediaSourceTest.Playback_Video_MP4_Audio_WEBM
RenderFrameHostManagerTest.UpdateOpener
BackgroundSyncBrowserTest.RegisterFromServiceWorkerWithoutMainFrameHost
NavigationControllerBrowserTest.FrameNavigationEntry_RecreatedSubframeBackForward
NavigationControllerBrowserTest.FrameNavigationEntry_RestoreViaPageState
SitePerProcessBrowserTest.SurfaceHitTestPointerEventsNone
NavigationControllerBrowserTest.FrameNavigationEntry_SubframeAfterInPage
RenderFrameHostManagerTest.SwapProcessWithRelNoreferrerAndTargetBlank
LoFiResourceDispatcherHostBrowserTest.ShouldEnableLoFiModeReload
FrameTreeBrowserTest.SubframeOpenerSetForNewWindow
DumpAccessibilityTreeTest.AccessibilityIframeTransformNested
SitePerProcessBrowserTest.CrossProcessMouseCapture
BackgroundSyncBrowserTest.RegisterFromControlledDocument
NavigationControllerBrowserTest.FrameNavigationEntry_RepeatCreatedFrame
FrameTreeBrowserTest.NavigateWithLeftoverFrames
SitePerProcessDevToolsBrowserTest.CrossSiteIframeAgentHost
ReloadCacheControlBrowserTest.BypassingReload
DumpAccessibilityTreeTest.AccessibilityFramesetPostEnable
RequestDataResourceDispatcherHostBrowserTest.CrossOriginNested
RequestDataResourceDispatcherHostBrowserTest.SameOriginNested
FrameTreeBrowserTest.FrameTreeShape2
ClearSiteDataThrottleBrowserTest.Redirect
EncryptedMediaTest.UnknownKeySystemThrowsException
DownloadContentTest.DownloadAttributeCrossOriginRedirect
MHTMLGenerationTest.ViewedMHTMLDoesNotContainNoStoreContent
NavigationControllerBrowserTest.ForwardRedirectWithNoCommittedEntry
RequestDataResourceDispatcherHostBrowserTest.BasicCrossSite
DumpAccessibilityTreeTest.AccessibilityFrameset
BackgroundSyncBrowserTest.GetRegistrationsFromServiceWorker
NavigationControllerBrowserTest.ConsecutiveReloadMetrics
NavigationControllerBrowserTest.CloneAndGoBackWithNamedWindow
FrameTreeBrowserTest.OriginSetOnNavigation
RenderFrameHostManagerTest.NavigateBackToExistingProcessFromSadTab
SitePerProcessAccessibilityBrowserTest.TwoCrossSiteNavigations
RenderFrameHostManagerTest.CrossProcessPopupInheritsSandboxFlagsWithNoOpener
NavigationControllerBrowserTest.RaceCrossOriginNavigationAndSamePageHistoryNavigation
SitePerProcessBrowserTest.ScrollEventToOOPIF
NavigationControllerBrowserTest.FrameNavigationEntry_AutoSubframe
NavigationControllerBrowserTest.NavigationTypeClassification_ExistingPage
SitePerProcessBrowserTest.TitleAfterCrossSiteIframe
RenderFrameHostManagerTest.SwapProcessWithWindowOpenAndNoopener
MediaRedirectTest.CanPlayHiddenWebm
NavigationHandleImplBrowserTest.VerifyRequestContextTypeForFrameTree
BackgroundSyncBrowserTest.WaitUntilReject
NavigationControllerOopifBrowserTest.RestoreWithoutExtraOopifs
BackgroundSyncBrowserTest.RegistrationDelaysForNetwork
BackgroundSyncBrowserTest.SyncRegistrationDeletedWhenClearingSiteData
RenderFrameHostManagerTest.NoScriptAccessAfterSwapOut
FrameTreeBrowserTest.ChildFrameWithSrcdoc
BackgroundSyncBrowserTest.WaitUntil
RequestDataResourceDispatcherHostBrowserTest.Basic
AsyncResourceHandlerBrowserTest.UploadProgressRedirect
NavigationControllerBrowserTest.FrameNavigationEntry_FrameUniqueName
MediaSourceTest.Playback_Video_WEBM_Audio_MP4
DumpAccessibilityTreeTest.AccessibilityImgEmptyAlt
WorkerTest.WorkerHttpAuth
SitePerProcessBrowserTest.CrossProcessMouseEnterAndLeaveTest
MediaSourceTest.ConfigChangeVideo
RenderFrameHostManagerTest.SameOriginFramesInDifferentProcesses
DumpAccessibilityTreeTest.AccessibilityIframeTransformScrolled
RenderFrameHostManagerTest.RenderViewInitAfterProcessKill
SitePerProcessBrowserTest.CompositorFrameSwapped
NavigationHandleImplBrowserTest.VerifyRendererInitiated
RenderFrameHostManagerTest.SwapProcessWithSameSiteRelNoreferrer
CrossProcessFrameTreeBrowserTest.CreateCrossProcessSubframeProxies
IFrameZoomBrowserTest.SubframesZoomProperly
DumpAccessibilityTreeTest.AccessibilityIframeTransformNestedCrossProcess
NavigationHandleImplBrowserTest.VerifyFrameTree
NavigationControllerBrowserTest.LoadCommittedDetails_IsInPage
IFrameZoomBrowserTest.SubframeRetainsZoomOnNavigation
BackgroundSyncBrowserTest.GetTags
BrowserSideNavigationBrowserTest.BrowserInitiatedNavigations
RenderFrameHostManagerTest.AllowTargetedNavigationsAfterSwap
MHTMLGenerationTest.ViewedMHTMLContainsNoStoreContentIfNoCacheControlPolicy
RenderWidgetHostViewChildFrameTest.Screen
DumpAccessibilityEventsTest.AccessibilityEventsAriaComboBoxCollapse
SitePerProcessDevToolsBrowserTest.AgentHostForPageEqualsOneForMainFrame
SitePerProcessAccessibilityBrowserTest.CrossSiteIframeAccessibility
NavigationControllerBrowserTest.NavigationTypeClassification_NewAndAutoSubframe
SessionHistoryTest.FrameFormBackForward
BackgroundSyncBrowserTest.RegisterFromServiceWorker
IFrameZoomBrowserTest.AllFramesGetDefaultZoom
DumpAccessibilityTreeTest.AccessibilityIframeTransform
NavigationControllerBrowserTest.SubframeForwardRedirect
FrameTreeBrowserTest.SandboxFlagsSetForChildFrames
RenderFrameHostManagerTest.SwapProcessWithRelNoopenerAndTargetBlank
DumpAccessibilityTreeTest.AccessibilityIframeCoordinates
SitePerProcessBrowserTest.CleanupCrossSiteIframe
NavigationHandleImplBrowserTest.VerifyPageTransition
SitePerProcessBrowserTest.CrossSiteIframe
SessionHistoryTest.JavascriptHistory
IsolatedDevToolsProtocolTest.ControlNavigationsChildFrames
TouchAccessibilityBrowserTest.TouchExplorationInCrossSiteIframe
FrameTreeBrowserTest.FrameTreeAfterCrash
NavigationControllerBrowserTest.EnsureFrameNavigationEntriesClearedOnMismatch
DownloadContentTest.DownloadAttributeSameOriginRedirect
NavigationControllerBrowserTest.FrameNavigationEntry_SubframeBackForward
RenderFrameHostManagerTest.DontSwapProcessWithOnlyRelNoreferrer

 
I've been trying to find a reasonable culprit throughout the day so far, with no useful results yet.
FWIW, a similar failure mode is intermittently seen on Site Isolation Win FYI bot (*) - i.e. in builds #17171, #17176 and #17178.  

(*) https://build.chromium.org/p/chromium.fyi/builders/Site%20Isolation%20Win

Comment 3 by lfg@chromium.org, Dec 2 2016

Perhaps https://codereview.chromium.org/2537893002 ? That's the only thing I could think of that could affect a lot of unrelated tests.

Comment 4 by lfg@chromium.org, Dec 2 2016

Cc: -andyb...@chromium.org ah...@yandex-team.ru
+ahest

Comment 5 by lfg@chromium.org, Dec 2 2016

Cc: andyb...@chromium.org
#3: I don't think so, that CL did not change behavior - other than adding ThreadChecker to RunLoop, but it shouldn't have caused timeouts in any way.

More probable culprit is https://codereview.chromium.org/2523583003

Comment 7 by lfg@chromium.org, Dec 2 2016

#6, should we speculatively revert that change to see if the builders clears up?

See also: http://crbug.com/628787 - I was able to reproduce the timeouts locally by increasing --test-launcher-jobs too high - parallelism was actually slowing down the runtime of each individual test by a pretty significant amount. However, reducing parallelism on that bot didn't actually fix the problem.

Now that https://codereview.chromium.org/2548883002/ has been reverted an hour ago I think it's better to see if failures continue to happen, and take decision based on that.
dmazzoni, can you please revert either one of the CLs mentioned here and try to reproduce again locally?

Comment 10 by kbr@chromium.org, Dec 5 2016

Blocking: 668707
Labels: -Pri-0 Sheriff-Chromium Pri-1
More failures seem to persist. There are two such failures in the last 200 builds on win_chromium_rel_ng:

https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/343451
https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/343436

There are also several visible on:

https://build.chromium.org/p/chromium.win/builders/Win7%20Tests%20%281%29?numbuilds=200

The last such failure was:

https://build.chromium.org/p/chromium.win/builders/Win7%20Tests%20%281%29/builds/60717

I scanned the jobs on the two Swarming bots which ran the failing win_chromium_rel_ng jobs above:

https://chromium-swarm.appspot.com/bot?id=vm1291-m4&sort_stats=total%3Adesc
https://chromium-swarm.appspot.com/bot?id=vm1280-m4&sort_stats=total%3Adesc

However there doesn't appear to be a pattern indicating that the machine is misconfigured.

The flakiness dashboard shows the pattern pretty clearly on Win7 Tests (1):

http://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=content_browsertests&builder=chromium.win%3AWin7%20Tests%20(1)

When it happens, all of the tests time out.

According to the flakiness dashboard, the failures haven't occurred on Win7 Tests (1) since Friday, but it's not clear whether they're still happening on win_chromium_rel_ng. It looks like https://codereview.chromium.org/2548883002/ might have solved the problem. Downgrading to P1.

Ken: this problem has not happened in the last >60 builds, whereas it happened every 4-10 builds before build 60717.  Seems like the revert worked.  You added the Sheriff-Chromium label.  How long do you want sheriffs to keep an eye on this?
Looking at the logs, it seems that renderers are silently dying in these tests. Is it possible that DCHECKs in child process are not visible in logs on these bots? 

Comment 13 by kbr@chromium.org, Dec 5 2016

Labels: -Sheriff-Chromium
Owner: roc...@chromium.org
Status: Fixed (was: Untriaged)
I wanted to make sure that https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng?numbuilds=200 recovered fully. There aren't any instances of content_browsertests failures there today, so yes, it looks like the revert in f8675b380dff12f69ad3b61920f9e8fdcfcadae8 addressed the problem.

ahest@: it looks like win_chromium_rel_ng is configured to build with dcheck_always_on=true; see:
https://cs.chromium.org/chromium/src/tools/mb/mb_config.pyl?q=mb_config.pyl&sq=package:chromium&dr&l=643

and the fact that it uses the 'release_trybot' mixin, which turns on dchecks.

I would think that the DCHECKs would show up in the logs, even on Windows. The bots redirect all output streams and I've definitely observed subprocesses' logs showing up when doing so.

Yes, it seems that you are right that firing DCHECKs were not the cause.
And in the failed builds almost all tests took a lot of time, even those that passed. That's quite strange. Does anybody have an idea of what might have caused it?

As a side note:
I tried to find a clue to what happened, and noticed that all builds visible on the flakiness dashboard for a single builder type are always run on a single bot, e.g. vm801-m1 for Win7 Tests (1). Is it right? It just looks strange, and made me think for some time that it has something to do with the failures.

Comment 15 by lfg@chromium.org, Dec 6 2016

Re #14: Is it possible that ThreadChecker is too expensive and cause several threads to spin ending up timing out tests?

I'd say it is unlikely, because ThreadChecker is already used in many places. And it does not explain why only some of the test runs exhibited this slowness, while most of the runs were not affected.
I was trying to reproduce these failures locally, on trybots, and on our (yandex) bots, with no luck.
Caught a number of issues which turned out to be unrelated.

Then I went to look at flakiness dashboard again
http://test-results.appspot.com/dashboards/flakiness_dashboard.html#showAllRuns=true&testType=content_browsertests&builder=chromium.win%3AWin7%20Tests%20(1)

And here is what I see:
First bad build - 02.12.2016 6:47:12 GMT+3
  https://build.chromium.org/p/chromium.win/builders/Win7%20Tests%20(1)/builds/60670
  r435822 to r435830
  http://test-results.appspot.com/revision_range?start=435822&end=435830

Last bad build - 03.12.2016 8:12:29 GMT+3
  http://build.chromium.org/p/chromium.win/builders/Win7%20Tests%20(1)/builds/60717
  r436157 to r436162
  http://test-results.appspot.com/revision_range?start=436157&end=436162

Which means that the cause of timeouts was present in the range 435830 - 436162, at least.
CL with ThreadCheckers (https://codereview.chromium.org/2537893002/) landed at commit 435924 and was reverted at 436035.
Doesn't it mean that the culprit was something else?
I looked at what CLs were reverted around that time and there were quite a lot of reverts, so I won't try to guess which one is relevant.

It does some odd that your CL would be able to cause widespread test hangs, but stranger things have happened.

Given the uncertainty, please feel free to reland it as-is, but be mindful of the state of the waterfall so we can catch it early if the flake reappears.
Can you elaborate on what uncertainty do you mean?

To reland I'll have to get LGTMs once again, I suppose, and I thought in such cases it is usually done some other way (which I don't have permissions for), isn't it?
I just mean your CL was reverted as a suspect, and after the revert the tree was green. So there's some reason to believe it may have been the culprit, but a review of the CL doesn't produce any good explanation for how.
But there were builds failures of exactly the same type, both before the cl landed and after it was reverted. Given that, and the nature of the cl - to me, it leaves out much of uncertainty.

Anyway, I just wanted to confirm how to proceed - should I recreate the CL and go through the ususal review process?
No, you don't need to wait for LGTMs again to re-land a change that was reverted for this type of reason.

Create a new changelist to re-land your change and call it something like "Re-land: (original change description". You can use "git revert" or "git cherry-pick" or something like that to create it.

Upload it with your original reviewers but list them all as TBR= so you don't need to wait for their approval. Land it using the commit queue.

Oh, and ping whoever's sheriffing today and let them know that you're re-landing a change that was suspected of causing test flakiness (but you don't think it's actually the cause).

If you have a merge error and need to update your patch in some nontrivial way, use your best judgement about getting those changes reviewed first.

Created https://codereview.chromium.org/2564943002/, but the CQ rejects it (I'm not a committer).

Sign in to add a comment