Issue metadata
Sign in to add a comment
|
>170 content_browsertests tests failing intermittently on win_chromium_rel_ng |
||||||||||||||||||||||||
Issue descriptionIntermittently on win_chromium_rel_ng over 170 tests in content_browsertests are spuriously failing. The test list appears to be roughly the same each time. It happens on different VMs, so either it's not a specific machine, or it's a widespread configuration problem in the Swarming fleet. Here are the recent runs with this symptom that I found on win_chromium_rel_ng: https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/342957 https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/342937 https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/342934 https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/342907 https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/342900 The symptom is that all of these tests time out. They span systems like accessibility, site isolation, navigation, etc. I'm adding several components including Mojo in the case that someone will have an idea what's going on. There was a similar bug report in Issue 650175, on the official builders, and it's not clear whether that was fully resolved. The list of failing tests from my failed try run follows. While I understand that there are only 6 such failures in the past 100 builds, something is badly broken and this needs to be triaged immediately. I am therefore marking this P0 until it's at least triaged. Feel free to downgrade it once it's triaged. DumpAccessibilityTreeTest.AccessibilityIframePostEnable BackgroundSyncBrowserTest.SyncRegistrationsDeletedWhenClearingSiteData DumpAccessibilityTreeTest.AccessibilityImg SessionHistoryTest.FrameBackForward BackgroundSyncBrowserTest.SyncRegistrationFromSWDeletedWhenClearingSiteData RenderFrameHostManagerTest.DontSwapProcessWithOnlyTargetBlank DumpAccessibilityEventsTest.AccessibilityEventsListboxNext RenderFrameHostManagerTest.PopupKeepsWindowReferenceCrossProcesAndBack WorkerTest.SharedWorkerHttpAuth RenderFrameHostManagerTest.SupportCrossProcessPostMessageWithMessagePort NavigationControllerBrowserTest.FrameNavigationEntry_SameOriginBackWithRedirect SitePerProcessBrowserTest.NestedSurfaceHitTestTest RenderFrameMessageFilterBrowserTest.SameSiteCookies NavigationControllerBrowserTest.PageStateWithIframeAfterForwardInCompetingFrames NavigationControllerBrowserTest.RefererAndOriginHeadersAfterRedirects SitePerProcessHighDPIBrowserTest.SubframeLoadsWithCorrectDeviceScaleFactor IsolateIcelandFrameTreeBrowserTest.ProcessSwitchForIsolatedBlob RenderFrameHostManagerTest.SwapProcessWithSameSiteRelNoopener NavigationControllerBrowserTest.FrameNavigationEntry_BackWithRedirect NavigationControllerBrowserTest.FrameNavigationEntry_RenameNestedAutoSubframe BackgroundSyncBrowserTest.HasTagFromServiceWorker ReloadCacheControlBrowserTest.NormalReload RenderFrameHostManagerTest.ConsecutiveNavigationsToSite RequestDataResourceDispatcherHostBrowserTest.CrossOriginAuxiliary NavigationControllerBrowserTest.FrameNavigationEntry_BackNestedAutoSubframe SitePerProcessDevToolsBrowserTest.AgentHostForFrames RenderFrameHostManagerTest.DisownOpener SessionHistoryTest.CrossFrameFormBackForward CrossSiteResourceHandlerTest.NoDeliveryToDetachedFrame RenderFrameHostManagerTest.DontPreemptNavigationWithFrameTreeUpdate SessionHistoryTest.LocationChangeInSubframe DumpAccessibilityEventsTest.AccessibilityEventsListboxFocus NavigationControllerBrowserTest.FrameNavigationEntry_NewSubframe RenderFrameHostManagerTest.AllowTargetedNavigationsInOpenerAfterSwap BackgroundSyncBrowserTest.FiringSyncEventDeletedWhenClearingSiteData ClearSiteDataThrottleBrowserTest.Types RequestDataResourceDispatcherHostBrowserTest.SameOriginAuxiliary SitePerProcessDevToolsProtocolTest.TargetNoDiscovery FrameTreeBrowserTest.FrameTreeShape NavigationHandleImplBrowserTest.VerifySamePage IFrameZoomBrowserTest.SubframesDontZoomIndependently SitePerProcessBrowserTest.ViewBoundsInNestedFrameTest NavigationControllerBrowserTest.RefererStoredForSubFrame BackgroundSyncBrowserTest.RegisterFromIFrameWithoutMainFrameHost RenderFrameHostManagerTest.DontSwapProcessWithOnlyRelNoOpener NavigationControllerBrowserTest.FrameNavigationEntry_SubframeHistoryFallback RenderFrameHostManagerTest.InputMsgToSwappedOutRVHIsIgnored BackgroundSyncBrowserTest.RegisterFromUncontrolledDocument WorkerTest.WebSocketSharedWorker RenderFrameMessageFilterBrowserTest.Cookies AsyncRevalidationManagerBrowserTest.CacheIsUpdated SessionHistoryTest.BasicBackForward SessionHistoryTest.GoBackToCrossSitePostWithRedirect SecurityExploitBrowserTest.AttemptRunFileChoosers DevToolsProtocolTest.InspectDuringFrameSwap LoFiResourceDispatcherHostBrowserTest.ShouldEnableLoFiModeReloadDisableLoFi DumpAccessibilityEventsTest.AccessibilityEventsMenuListFocus NavigationControllerBrowserTest.PreventSpoofFromSubframeAndReplace DevToolsProtocolTest.TargetDiscovery DumpAccessibilityTreeTest.AccessibilityIframe DumpAccessibilityTreeTest.AccessibilityIframeCrossProcess DumpAccessibilityTreeTest.AccessibilityIframeTransformCrossProcess ManifestBrowserTest.CORSManifest MHTMLGenerationSitePerProcessTest.GenerateMHTML SecurityExploitBrowserTest.InvalidOriginHeaders NavigationControllerBrowserTest.EnsureSamePageNavigationUpdatesFrameNavigationEntry RenderFrameHostManagerTest.ProcessExitWithSwappedOutViews RenderFrameHostManagerTest.BackForwardNotStale AsyncRevalidationManagerBrowserTest.StaleWhileRevalidateIsApplied IFrameZoomBrowserTest.SiblingFramesZoom RenderFrameHostManagerTest.SupportCrossProcessPostMessage RenderFrameHostManagerTest.PreserveTopFrameWindowNameOnCrossProcessNavigations SitePerProcessBrowserTest.SurfaceHitTestTest FrameTreeBrowserTest.IsRenderFrameLive IFrameZoomBrowserTest.RedirectToPageWithSubframeZoomsCorrectly BackgroundSyncBrowserTest.RegisterFromIFrameWithMainFrameHost CrossProcessFrameTreeBrowserTest.OriginSetOnCrossProcessNavigations FrameTreeBrowserTest.NavigateGrandchildToBlob FrameTreeBrowserTest.NavigateChildToAboutBlank BackgroundSyncBrowserTest.Incognito DumpAccessibilityTreeTest.AccessibilityIframeCoordinatesCrossProcess RenderFrameHostManagerTest.RestoreSubframeFileAccessForHistoryNavigation MediaSourceTest.Playback_Video_MP4_Audio_WEBM RenderFrameHostManagerTest.UpdateOpener BackgroundSyncBrowserTest.RegisterFromServiceWorkerWithoutMainFrameHost NavigationControllerBrowserTest.FrameNavigationEntry_RecreatedSubframeBackForward NavigationControllerBrowserTest.FrameNavigationEntry_RestoreViaPageState SitePerProcessBrowserTest.SurfaceHitTestPointerEventsNone NavigationControllerBrowserTest.FrameNavigationEntry_SubframeAfterInPage RenderFrameHostManagerTest.SwapProcessWithRelNoreferrerAndTargetBlank LoFiResourceDispatcherHostBrowserTest.ShouldEnableLoFiModeReload FrameTreeBrowserTest.SubframeOpenerSetForNewWindow DumpAccessibilityTreeTest.AccessibilityIframeTransformNested SitePerProcessBrowserTest.CrossProcessMouseCapture BackgroundSyncBrowserTest.RegisterFromControlledDocument NavigationControllerBrowserTest.FrameNavigationEntry_RepeatCreatedFrame FrameTreeBrowserTest.NavigateWithLeftoverFrames SitePerProcessDevToolsBrowserTest.CrossSiteIframeAgentHost ReloadCacheControlBrowserTest.BypassingReload DumpAccessibilityTreeTest.AccessibilityFramesetPostEnable RequestDataResourceDispatcherHostBrowserTest.CrossOriginNested RequestDataResourceDispatcherHostBrowserTest.SameOriginNested FrameTreeBrowserTest.FrameTreeShape2 ClearSiteDataThrottleBrowserTest.Redirect EncryptedMediaTest.UnknownKeySystemThrowsException DownloadContentTest.DownloadAttributeCrossOriginRedirect MHTMLGenerationTest.ViewedMHTMLDoesNotContainNoStoreContent NavigationControllerBrowserTest.ForwardRedirectWithNoCommittedEntry RequestDataResourceDispatcherHostBrowserTest.BasicCrossSite DumpAccessibilityTreeTest.AccessibilityFrameset BackgroundSyncBrowserTest.GetRegistrationsFromServiceWorker NavigationControllerBrowserTest.ConsecutiveReloadMetrics NavigationControllerBrowserTest.CloneAndGoBackWithNamedWindow FrameTreeBrowserTest.OriginSetOnNavigation RenderFrameHostManagerTest.NavigateBackToExistingProcessFromSadTab SitePerProcessAccessibilityBrowserTest.TwoCrossSiteNavigations RenderFrameHostManagerTest.CrossProcessPopupInheritsSandboxFlagsWithNoOpener NavigationControllerBrowserTest.RaceCrossOriginNavigationAndSamePageHistoryNavigation SitePerProcessBrowserTest.ScrollEventToOOPIF NavigationControllerBrowserTest.FrameNavigationEntry_AutoSubframe NavigationControllerBrowserTest.NavigationTypeClassification_ExistingPage SitePerProcessBrowserTest.TitleAfterCrossSiteIframe RenderFrameHostManagerTest.SwapProcessWithWindowOpenAndNoopener MediaRedirectTest.CanPlayHiddenWebm NavigationHandleImplBrowserTest.VerifyRequestContextTypeForFrameTree BackgroundSyncBrowserTest.WaitUntilReject NavigationControllerOopifBrowserTest.RestoreWithoutExtraOopifs BackgroundSyncBrowserTest.RegistrationDelaysForNetwork BackgroundSyncBrowserTest.SyncRegistrationDeletedWhenClearingSiteData RenderFrameHostManagerTest.NoScriptAccessAfterSwapOut FrameTreeBrowserTest.ChildFrameWithSrcdoc BackgroundSyncBrowserTest.WaitUntil RequestDataResourceDispatcherHostBrowserTest.Basic AsyncResourceHandlerBrowserTest.UploadProgressRedirect NavigationControllerBrowserTest.FrameNavigationEntry_FrameUniqueName MediaSourceTest.Playback_Video_WEBM_Audio_MP4 DumpAccessibilityTreeTest.AccessibilityImgEmptyAlt WorkerTest.WorkerHttpAuth SitePerProcessBrowserTest.CrossProcessMouseEnterAndLeaveTest MediaSourceTest.ConfigChangeVideo RenderFrameHostManagerTest.SameOriginFramesInDifferentProcesses DumpAccessibilityTreeTest.AccessibilityIframeTransformScrolled RenderFrameHostManagerTest.RenderViewInitAfterProcessKill SitePerProcessBrowserTest.CompositorFrameSwapped NavigationHandleImplBrowserTest.VerifyRendererInitiated RenderFrameHostManagerTest.SwapProcessWithSameSiteRelNoreferrer CrossProcessFrameTreeBrowserTest.CreateCrossProcessSubframeProxies IFrameZoomBrowserTest.SubframesZoomProperly DumpAccessibilityTreeTest.AccessibilityIframeTransformNestedCrossProcess NavigationHandleImplBrowserTest.VerifyFrameTree NavigationControllerBrowserTest.LoadCommittedDetails_IsInPage IFrameZoomBrowserTest.SubframeRetainsZoomOnNavigation BackgroundSyncBrowserTest.GetTags BrowserSideNavigationBrowserTest.BrowserInitiatedNavigations RenderFrameHostManagerTest.AllowTargetedNavigationsAfterSwap MHTMLGenerationTest.ViewedMHTMLContainsNoStoreContentIfNoCacheControlPolicy RenderWidgetHostViewChildFrameTest.Screen DumpAccessibilityEventsTest.AccessibilityEventsAriaComboBoxCollapse SitePerProcessDevToolsBrowserTest.AgentHostForPageEqualsOneForMainFrame SitePerProcessAccessibilityBrowserTest.CrossSiteIframeAccessibility NavigationControllerBrowserTest.NavigationTypeClassification_NewAndAutoSubframe SessionHistoryTest.FrameFormBackForward BackgroundSyncBrowserTest.RegisterFromServiceWorker IFrameZoomBrowserTest.AllFramesGetDefaultZoom DumpAccessibilityTreeTest.AccessibilityIframeTransform NavigationControllerBrowserTest.SubframeForwardRedirect FrameTreeBrowserTest.SandboxFlagsSetForChildFrames RenderFrameHostManagerTest.SwapProcessWithRelNoopenerAndTargetBlank DumpAccessibilityTreeTest.AccessibilityIframeCoordinates SitePerProcessBrowserTest.CleanupCrossSiteIframe NavigationHandleImplBrowserTest.VerifyPageTransition SitePerProcessBrowserTest.CrossSiteIframe SessionHistoryTest.JavascriptHistory IsolatedDevToolsProtocolTest.ControlNavigationsChildFrames TouchAccessibilityBrowserTest.TouchExplorationInCrossSiteIframe FrameTreeBrowserTest.FrameTreeAfterCrash NavigationControllerBrowserTest.EnsureFrameNavigationEntriesClearedOnMismatch DownloadContentTest.DownloadAttributeSameOriginRedirect NavigationControllerBrowserTest.FrameNavigationEntry_SubframeBackForward RenderFrameHostManagerTest.DontSwapProcessWithOnlyRelNoreferrer
,
Dec 2 2016
FWIW, a similar failure mode is intermittently seen on Site Isolation Win FYI bot (*) - i.e. in builds #17171, #17176 and #17178. (*) https://build.chromium.org/p/chromium.fyi/builders/Site%20Isolation%20Win
,
Dec 2 2016
Perhaps https://codereview.chromium.org/2537893002 ? That's the only thing I could think of that could affect a lot of unrelated tests.
,
Dec 2 2016
+ahest
,
Dec 2 2016
,
Dec 2 2016
#3: I don't think so, that CL did not change behavior - other than adding ThreadChecker to RunLoop, but it shouldn't have caused timeouts in any way. More probable culprit is https://codereview.chromium.org/2523583003
,
Dec 2 2016
#6, should we speculatively revert that change to see if the builders clears up?
,
Dec 2 2016
See also: http://crbug.com/628787 - I was able to reproduce the timeouts locally by increasing --test-launcher-jobs too high - parallelism was actually slowing down the runtime of each individual test by a pretty significant amount. However, reducing parallelism on that bot didn't actually fix the problem.
,
Dec 2 2016
Now that https://codereview.chromium.org/2548883002/ has been reverted an hour ago I think it's better to see if failures continue to happen, and take decision based on that. dmazzoni, can you please revert either one of the CLs mentioned here and try to reproduce again locally?
,
Dec 5 2016
More failures seem to persist. There are two such failures in the last 200 builds on win_chromium_rel_ng: https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/343451 https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/343436 There are also several visible on: https://build.chromium.org/p/chromium.win/builders/Win7%20Tests%20%281%29?numbuilds=200 The last such failure was: https://build.chromium.org/p/chromium.win/builders/Win7%20Tests%20%281%29/builds/60717 I scanned the jobs on the two Swarming bots which ran the failing win_chromium_rel_ng jobs above: https://chromium-swarm.appspot.com/bot?id=vm1291-m4&sort_stats=total%3Adesc https://chromium-swarm.appspot.com/bot?id=vm1280-m4&sort_stats=total%3Adesc However there doesn't appear to be a pattern indicating that the machine is misconfigured. The flakiness dashboard shows the pattern pretty clearly on Win7 Tests (1): http://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=content_browsertests&builder=chromium.win%3AWin7%20Tests%20(1) When it happens, all of the tests time out. According to the flakiness dashboard, the failures haven't occurred on Win7 Tests (1) since Friday, but it's not clear whether they're still happening on win_chromium_rel_ng. It looks like https://codereview.chromium.org/2548883002/ might have solved the problem. Downgrading to P1.
,
Dec 5 2016
Ken: this problem has not happened in the last >60 builds, whereas it happened every 4-10 builds before build 60717. Seems like the revert worked. You added the Sheriff-Chromium label. How long do you want sheriffs to keep an eye on this?
,
Dec 5 2016
Looking at the logs, it seems that renderers are silently dying in these tests. Is it possible that DCHECKs in child process are not visible in logs on these bots?
,
Dec 5 2016
I wanted to make sure that https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng?numbuilds=200 recovered fully. There aren't any instances of content_browsertests failures there today, so yes, it looks like the revert in f8675b380dff12f69ad3b61920f9e8fdcfcadae8 addressed the problem. ahest@: it looks like win_chromium_rel_ng is configured to build with dcheck_always_on=true; see: https://cs.chromium.org/chromium/src/tools/mb/mb_config.pyl?q=mb_config.pyl&sq=package:chromium&dr&l=643 and the fact that it uses the 'release_trybot' mixin, which turns on dchecks. I would think that the DCHECKs would show up in the logs, even on Windows. The bots redirect all output streams and I've definitely observed subprocesses' logs showing up when doing so.
,
Dec 6 2016
Yes, it seems that you are right that firing DCHECKs were not the cause. And in the failed builds almost all tests took a lot of time, even those that passed. That's quite strange. Does anybody have an idea of what might have caused it? As a side note: I tried to find a clue to what happened, and noticed that all builds visible on the flakiness dashboard for a single builder type are always run on a single bot, e.g. vm801-m1 for Win7 Tests (1). Is it right? It just looks strange, and made me think for some time that it has something to do with the failures.
,
Dec 6 2016
Re #14: Is it possible that ThreadChecker is too expensive and cause several threads to spin ending up timing out tests?
,
Dec 6 2016
I'd say it is unlikely, because ThreadChecker is already used in many places. And it does not explain why only some of the test runs exhibited this slowness, while most of the runs were not affected.
,
Dec 9 2016
I was trying to reproduce these failures locally, on trybots, and on our (yandex) bots, with no luck. Caught a number of issues which turned out to be unrelated. Then I went to look at flakiness dashboard again http://test-results.appspot.com/dashboards/flakiness_dashboard.html#showAllRuns=true&testType=content_browsertests&builder=chromium.win%3AWin7%20Tests%20(1) And here is what I see: First bad build - 02.12.2016 6:47:12 GMT+3 https://build.chromium.org/p/chromium.win/builders/Win7%20Tests%20(1)/builds/60670 r435822 to r435830 http://test-results.appspot.com/revision_range?start=435822&end=435830 Last bad build - 03.12.2016 8:12:29 GMT+3 http://build.chromium.org/p/chromium.win/builders/Win7%20Tests%20(1)/builds/60717 r436157 to r436162 http://test-results.appspot.com/revision_range?start=436157&end=436162 Which means that the cause of timeouts was present in the range 435830 - 436162, at least. CL with ThreadCheckers (https://codereview.chromium.org/2537893002/) landed at commit 435924 and was reverted at 436035. Doesn't it mean that the culprit was something else? I looked at what CLs were reverted around that time and there were quite a lot of reverts, so I won't try to guess which one is relevant.
,
Dec 9 2016
It does some odd that your CL would be able to cause widespread test hangs, but stranger things have happened. Given the uncertainty, please feel free to reland it as-is, but be mindful of the state of the waterfall so we can catch it early if the flake reappears.
,
Dec 9 2016
Can you elaborate on what uncertainty do you mean? To reland I'll have to get LGTMs once again, I suppose, and I thought in such cases it is usually done some other way (which I don't have permissions for), isn't it?
,
Dec 9 2016
I just mean your CL was reverted as a suspect, and after the revert the tree was green. So there's some reason to believe it may have been the culprit, but a review of the CL doesn't produce any good explanation for how.
,
Dec 9 2016
But there were builds failures of exactly the same type, both before the cl landed and after it was reverted. Given that, and the nature of the cl - to me, it leaves out much of uncertainty. Anyway, I just wanted to confirm how to proceed - should I recreate the CL and go through the ususal review process?
,
Dec 9 2016
No, you don't need to wait for LGTMs again to re-land a change that was reverted for this type of reason. Create a new changelist to re-land your change and call it something like "Re-land: (original change description". You can use "git revert" or "git cherry-pick" or something like that to create it. Upload it with your original reviewers but list them all as TBR= so you don't need to wait for their approval. Land it using the commit queue. Oh, and ping whoever's sheriffing today and let them know that you're re-landing a change that was suspected of causing test flakiness (but you don't think it's actually the cause). If you have a merge error and need to update your patch in some nontrivial way, use your best judgement about getting those changes reviewed first.
,
Dec 9 2016
Created https://codereview.chromium.org/2564943002/, but the CQ rejects it (I'm not a committer). |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by roc...@chromium.org
, Dec 2 2016