Sheriff-o-Matic is not refreshed (stale) |
||||||||||||||||
Issue descriptionNow 1:53 PST and Sheriff-o-Matic last update was at "1/28/2018, 11:21 pm PST (3 hours ago)". When I refresh the page or press the refresh button, nothing happens. The developed console shows: ===== som-app.vulcanized.html:38241 ...... too many results, data snipped....,BrowserCloseManagerWithDownloadsBrowserTest/BrowserCloseManagerWithDownloadsBrowserTest.TestWithDownloads/0,BrowserEncodingTest.TestEncodingAutoDetect,BrowserWindowControllerTest.FullscreenResizeFlags,BrowsingDataRemoverBrowserTest.Download,ChromeResourceDispatcherHostDelegateBrowserTest.ThrottlesAddedExactlyOnceToADownloads,ChromeResourceDispatcherHostDelegateBrowserTest.ThrottlesAddedExactlyOnceToLargeSniffedDownloads,ChromeResourceDispatcherHostDelegateBrowserTest.ThrottlesAddedExactlyOnceToTinySniffedDownloads,ConstrainedWindowMacTest.BrowserWindowFullscreen,DownloadExtensionTest.DownloadExtensionTest_Download_AuthBasic,DownloadExtensionTest.DownloadExtensionTest_Download_AuthBasic_Fail,DownloadExtensionTest.DownloadExtensionTest_Download_Basic,DownloadExtensionTest.DownloadExtensionTest_Download_ConflictAction,DownloadExtensionTest.DownloadExtensionTest_Download_DataURL,DownloadExtensionTest.DownloadExtensionTest_Download_File,DownloadExtensionTest.DownloadExtensionTest_Download_Headers,DownloadExtensionTest.DownloadExtensionTest_Download_Headers_Fail,DownloadExtensionTest.DownloadExtensionTest_Download_InterruptAndResume,DownloadExtensionTest.DownloadExtensionTest_Download_Post,DownloadExtensionTest.DownloadExtensionTest_Download_Post_Get,DownloadExtensionTest.DownloadExtensionTest_Download_Redirect,DownloadExtensionTest.DownloadExtensionTest_Download_Subdirectory,DownloadExtensionTest.DownloadExtensionTest_Download_URLFragment,DownloadExtensionTest.DownloadExtensionTest_FileIcon_Active,DownloadExtensionTest.DownloadExtensionTest_FileIcon_History,DownloadExtensionTest.DownloadExtensionTest_OnDeterminingFilename_AbsPathInvalid,DownloadExtensionTest.DownloadExtensionTest_OnDeterminingFilename_CurDirInvalid,DownloadExtensionTest.DownloadExtensionTest_OnDeterminingFilename_EmptyBasenameInvalid,DownloadExtensionTest.DownloadExtensionTest_OnDeterminingFilename_IllegalFilenameExtension,DownloadExtensionTest.DownloadExtensionTest_OnDeterminingFilename_IncognitoSpanning,DownloadExtensionTest.DownloadExtensionTest_OnDeterminingFilename_IncognitoSplit,DownloadExtensionTest.DownloadExtensionTest_OnDeterminingFilename_NoChange,DownloadExtensionTest.DownloadExtensionTest_OnDeterminingFilename_Override,DownloadExtensionTest.DownloadExtensionTest_OnDeterminingFilename_ParentDirInvalid,DownloadExtensionTest.DownloadExtensionTest_OnDeterminingFilename_ReferencesParentInvalid,DownloadExtensionTest.DownloadExtensionTest_OnDeterminingFilename_ReservedFilename,DownloadExtensionTest.DownloadExtensionTest_OnDeterminingFilename_Twice,DownloadExtensionTest.DownloadExtensionTest_Open,DownloadExtensionTest.DownloadExtensionTest_PauseResumeCancelErase,DownloadExtensionTest.DownloadExtensionTest_SearchDanger,DownloadExtensionTest.DownloadExtensionTest_SearchEmptyQuery is not equal to BrowserCommandControllerInteractiveTest.KeyEventsShouldBeConsumedByWebPageInJsFullscreenExceptForEsc,BrowserCommandControllerInteractiveTest.KeyEventsShouldBeConsumedByWebPageInJsFullscreenExceptForF11,BrowserCommandControllerInteractiveTest.ShortcutsShouldTakeEffectInWindowMode,DevToolsManagerDelegateTest.ExitFullscreenWindow,DevToolsManagerDelegateTest.MaximizedToFullscreenWindow,DevToolsManagerDelegateTest.NormalToFullscreenWindow,ExtensionApiTest.FocusWindowDoesNotExitFullscreen,NotificationsTest.TestShouldDisplayFullscreen,NotificationsTest.TestShouldDisplayPopupNotification,SitePerProcessInteractiveBrowserTest.FullscreenElementInABAAndExitViaEscapeKey,SitePerProcessInteractiveBrowserTest.FullscreenElementInABAAndExitViaJS,SitePerProcessInteractiveBrowserTest.FullscreenElementInSubframe but they were merged together. This should never happen, because merging is done server side by looking at the reason data. _mergeReason @ som-app.vulcanized.html:38241 _computeAlert @ som-app.vulcanized.html:38218 _computeAlertsSet @ som-app.vulcanized.html:38155 _computeAlerts @ som-app.vulcanized.html:38136 runMethodEffect @ som-app.vulcanized.html:3014 runComputedEffect @ som-app.vulcanized.html:2643 runEffectsForProperty @ som-app.vulcanized.html:2378 runEffects @ som-app.vulcanized.html:2344 runComputedEffects @ som-app.vulcanized.html:2621 _propertiesChanged @ som-app.vulcanized.html:3853 _flushProperties @ som-app.vulcanized.html:1688 _invalidateProperties @ som-app.vulcanized.html:3707 set @ som-app.vulcanized.html:4015 _alertsSetData @ som-app.vulcanized.html:38059 window.fetch.then.then @ som-app.vulcanized.html:38095 Promise resolved (async) alertStreams.forEach @ som-app.vulcanized.html:38095 _updateAlerts @ som-app.vulcanized.html:38079 refresh @ som-app.vulcanized.html:37934 _refresh @ som-app.vulcanized.html:50264 handler @ som-app.vulcanized.html:1848 _fire @ som-app.vulcanized.html:6490 forward @ som-app.vulcanized.html:6852 click @ som-app.vulcanized.html:6822 _handleNative @ som-app.vulcanized.html:6280 =====
,
Jan 29 2018
martiniss@ was touching that code long time ago. CC him, in case he is familiar with recent changes.
,
Jan 29 2018
Assigning to zhangtiff@ as an OWNER.
,
Jan 29 2018
It did refresh 4 minutes ago. Decreasing priority.
,
Jan 29 2018
Now it hasn't been refreshed since 2:30 am PST (5 hours).
,
Jan 29 2018
Ping! The list hasn't updated since 12:58 am PST. The bug queue is okay though.
,
Jan 29 2018
Oh, I was looking at the time the failure occurred. It still hasn't updated in like 6 hours though.
,
Jan 29 2018
,
Jan 29 2018
This is the chromium tree I take it? Looking...
,
Jan 29 2018
Analyzer logs say: Status 500 msg Post https://luci-milo.appspot.com/prpc/milo.Buildbot/GetCompressedMasterJSON: Call error 11: Deadline exceeded (timeout) hinoka@: anything on the milo end look odd?
,
Jan 29 2018
+nodir I'm seeing these in the error logs: prpc: responding with Unknown error: could not load builds: parsing buildnumber: no build_address%!(EXTRA int64=8967945820045353936) (and 8 other errors) https://pantheon.corp.google.com/logs/viewer?project=luci-milo&minLogLevel=0&expandAll=false×tamp=2018-01-29T19:35:38.681846000Z&dateRangeStart=2018-01-29T18:44:18.604Z&dateRangeEnd=2018-01-29T19:44:18.604Z&interval=PT1H&resource=gae_app%2Fmodule_id%2Fdefault&logName=projects%2Fluci-milo%2Flogs%2Fappengine.googleapis.com%252Frequest_log&advancedFilter=resource.type%3D%22gae_app%22%0Aresource.labels.module_id%3D%22default%22%0AlogName%3D%22projects%2Fluci-milo%2Flogs%2Fappengine.googleapis.com%252Frequest_log%22%0AprotoPayload.resource%3D%22%2Fprpc%2Fmilo.Buildbot%2FGetCompressedMasterJSON%22%0AprotoPayload.status!%3D200%0AprotoPayload.status%3D500
,
Jan 29 2018
,
Jan 29 2018
HTTP 500s in comment #12 are unrelated to SOM. Those requests are coming from luci-migration app.
,
Jan 29 2018
,
Jan 29 2018
Digging a little more, I found this: Process terminated because the request deadline was exceeded. (Error code 123) https://pantheon.corp.google.com/logs/viewer?project=luci-milo&minLogLevel=0&expandAll=false×tamp=2018-01-29T20:12:19.922151000Z&dateRangeStart=2018-01-29T20:04:21.341Z&dateRangeEnd=2018-01-29T21:04:21.341Z&interval=PT1H&resource=gae_app%2Fmodule_id%2Fdefault&logName=projects%2Fluci-milo%2Flogs%2Fappengine.googleapis.com%252Frequest_log&advancedFilter=resource.type%3D%22gae_app%22%0Aresource.labels.module_id%3D%22default%22%0AlogName%3D%22projects%2Fluci-milo%2Flogs%2Fappengine.googleapis.com%252Frequest_log%22%0AprotoPayload.resource%3D%22%2Fprpc%2Fmilo.Buildbot%2FGetCompressedMasterJSON%22%0AprotoPayload.status%3D500%0AprotoPayload.userAgent!%3D%22pRPC%20Client%201.0%20AppEngine-Google;%20(%2Bhttp:%2F%2Fcode.google.com%2Fappengine;%20appid:%20s~luci-migration)%22 from SoM-staging https://screenshot.googleplex.com/cvnByZvECPi https://groups.google.com/a/google.com/forum/#!msg/prometheus-discuss/Mb-Ji9pyWgY/zxRTUNaZWcEJ This would suggest it's because GAE can't spin up instances fast enough. I'll raise the min instances to see if that helps.
,
Jan 29 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-go.git/+/35b6729b1a56e0c2d4f88bcf7258da71d636f9a9 commit 35b6729b1a56e0c2d4f88bcf7258da71d636f9a9 Author: Nodir Turakulov <nodir@google.com> Date: Mon Jan 29 21:21:00 2018 [milo] fix error message format Forgot %d in format string. Bug: 806700 Change-Id: I0fff27210e0a8b89f0ad3364de0b88cd2d3b10f2 Reviewed-on: https://chromium-review.googlesource.com/891798 Reviewed-by: Ryan Tseng <hinoka@chromium.org> Commit-Queue: Nodir Turakulov <nodir@chromium.org> [modify] https://crrev.com/35b6729b1a56e0c2d4f88bcf7258da71d636f9a9/milo/buildsource/buildbot/buildstore/buildbucket.go
,
Jan 29 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-go.git/+/d3692e44683e3133b0fbb81bbf72e8d476148154 commit d3692e44683e3133b0fbb81bbf72e8d476148154 Author: Ryan Tseng <hinoka@google.com> Date: Mon Jan 29 22:33:50 2018 [milo] Set min active instances Add in automatic scaling factors for Milo. Milo's default service generally has about 9-12 instances active, so this just codifies the minimum, and there shouldn't be a difference. Bug:806700 Change-Id: Ifbd21dc2697b12f9784b0a7bc90ada851b20d777 Reviewed-on: https://chromium-review.googlesource.com/892019 Reviewed-by: Nodir Turakulov <nodir@chromium.org> Commit-Queue: Ryan Tseng <hinoka@chromium.org> [modify] https://crrev.com/d3692e44683e3133b0fbb81bbf72e8d476148154/milo/frontend/appengine/app.yaml
,
Jan 29 2018
I'm not sure if the fix is supposed to have taken effect yet, but the sheriff-o-matic hasn't updated thus far.
,
Jan 30 2018
The sheriff-o-matic still hasn't updated so far. How's it going?
,
Jan 30 2018
Ryan, did you trying bisecting in which Milo version the problem started to occur? we can narrow down the list of CLs that caused this.
,
Jan 30 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-go.git/+/be601b65d15544d56302f511dcb02ad971625199 commit be601b65d15544d56302f511dcb02ad971625199 Author: Ryan Tseng <hinoka@google.com> Date: Tue Jan 30 19:19:22 2018 [milo] Reduce parallel requests from 8 to 4 Bug: 806700 Change-Id: Ie003fc77894d2e4217a913dc6ae2e3b093064d3a Reviewed-on: https://chromium-review.googlesource.com/891631 Reviewed-by: Nodir Turakulov <nodir@chromium.org> Commit-Queue: Ryan Tseng <hinoka@chromium.org> [modify] https://crrev.com/be601b65d15544d56302f511dcb02ad971625199/milo/frontend/appengine/app.yaml
,
Jan 30 2018
,
Jan 30 2018
FWIU SOM is being updated, but sometimes analysis runs timeout at 10m https://pantheon.corp.google.com/logs/viewer?project=sheriff-o-matic&minLogLevel=0&expandAll=false×tamp=2018-01-30T21:52:50.688585000Z&dateRangeStart=2018-01-30T15:58:08.905Z&dateRangeEnd=2018-01-30T21:58:08.905Z&interval=PT6H&resource=gae_app%2Fmodule_id%2Fanalyzer&logName=projects%2Fsheriff-o-matic%2Flogs%2Fappengine.googleapis.com%252Frequest_log&advancedFilter=resource.type%3D%22gae_app%22%0Aresource.labels.module_id%3D%22analyzer%22%0AprotoPayload.resource%3D%22%2F_cron%2Fanalyze%2Fchromium%22%0Aoperation.last%3Dtrue
,
Jan 30 2018
I'll try changing the size of the worker pool to increase concurrency and get the overall time down, and see if it still stays under RAM constraints.
,
Jan 31 2018
Issue 807635 has been merged into this issue.
,
Jan 31 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/3ad142a2cf5bd8a5bae94e615cc66d9d4177f30b commit 3ad142a2cf5bd8a5bae94e615cc66d9d4177f30b Author: Sean McCullough <seanmccullough@chromium.org> Date: Wed Jan 31 17:17:30 2018 [som] Increase client RPC timeouts to 1 minute TBR=zhangtiff Bug: 806700 Change-Id: I00262fd3a9630aa3c99a35d0db219dbb55db0825 Reviewed-on: https://chromium-review.googlesource.com/895412 Reviewed-by: Sean McCullough <seanmccullough@chromium.org> Commit-Queue: Sean McCullough <seanmccullough@chromium.org> [modify] https://crrev.com/3ad142a2cf5bd8a5bae94e615cc66d9d4177f30b/go/src/infra/appengine/sheriff-o-matic/som/client/client.go [modify] https://crrev.com/3ad142a2cf5bd8a5bae94e615cc66d9d4177f30b/go/src/infra/appengine/sheriff-o-matic/som/analyzer/step/test_step.go
,
Jan 31 2018
Changing the title, so that new sheriffs have more chances to notice.
,
Jan 31 2018
Just pushed this fix to prod. PTAL
,
Jan 31 2018
Looks good, thank you! |
||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||
Comment 1 by vitaliii@chromium.org
, Jan 29 2018