New issue
Advanced search Search tips

Issue 846320 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 0
Type: Bug



Sign in to add a comment

Sheriff-o-matic is lying (and no longer updating)

Project Member Reported by gab@chromium.org, May 24 2018

Issue description

Since yesterday it's been saying that "Mac ASan 64 Tests (1) [3 out of the last 3 builds have failed]" but the last 12 runs are successes when following the link :

2018-05-24 10:15 AM (EDT)	39 mins 22 secs	0e4ef27562f84ac382c269e11c99cdeb399e1fec	Success	#40854	(20 changes)	build successful
2018-05-24 9:12 AM (EDT)	45 mins 44 secs	3583de23ea2e0cf11f626d63822181c6c36b2f64	Success	#40853	(38 changes)	build successful
2018-05-24 7:59 AM (EDT)	43 mins 8 secs	e55b2920a9fc4addd9bb7c45fb1feeb2341a3e48	Success	#40852	(30 changes)	build successful
2018-05-24 6:28 AM (EDT)	40 mins 44 secs	fca1f01a6a5eeeeee1131778c806273544487ade	Success	#40851	(22 changes)	build successful
2018-05-24 5:23 AM (EDT)	42 mins 51 secs	335109fef8be52912633914bae022a80293e4827	Success	#40850	(16 changes)	build successful
2018-05-24 4:36 AM (EDT)	42 mins 24 secs	20e63a0df5a1181b9b6775d3f4480d07761ef10c	Success	#40849	(28 changes)	build successful
2018-05-24 3:30 AM (EDT)	43 mins 42 secs	ed350c5af7814a159f5554cd7e3a12e2377037d6	Success	#40848	(10 changes)	build successful
2018-05-24 2:43 AM (EDT)	41 mins 6 secs	8fa0234c67b8f9bd9e3e84d1edf7f0a671715c1e	Success	#40847	(22 changes)	build successful
2018-05-24 1:45 AM (EDT)	47 mins 8 secs	ee2c93bd00022572b5ae097b1aea80265982989a	Success	#40846	(40 changes)	build successful
2018-05-24 12:20 AM (EDT)	39 mins 20 secs	059308d6872dc0fb7b188d748ad1300773fb5801	Success	#40845	(52 changes)	build successful
2018-05-23 10:38 PM (EDT)	43 mins 13 secs	965042c5f89bb5ff1e776c5a1533b9632e5cd91b	Success	#40844	(62 changes)	build successful
2018-05-23 9:17 PM (EDT)	44 mins 47 secs	38d6629de4a0abd09127066f99eaccb1f08d0b96	Success	#40843	(66 changes)	build successful
 
Not only this failure is affected: none of the messages have updated for ~6 hours.

(I reverted to using build.chromium.org for the meantime)

Comment 2 by gab@chromium.org, May 24 2018

Labels: -Pri-1 Pri-0
Owner: zhangtiff@chromium.org
Status: Assigned (was: Untriaged)
Summary: Sheriff-o-matic is lying (and no longer updating) (was: Sheriff-o-matic is lying)
SheriffOMatic being stuck is problematic for sheriffing.
Cc: zhangtiff@chromium.org seanmccullough@chromium.org
Sorry for not responding earlier. I was at Camp Chrome yesterday and today. 

Looks like the Chromium tree is two days stale. Looking at the request logs there are several recent errors that look similar for different builders. ie: 

   0: {
    logMessage:  "Error fetching build https://build.chromium.org/p/chromium.memory/Linux CFI/8052: failed to decode data in memcache (data probably corrupt: 0 bytes). key chromium.memory/Linux CFI/8052 err unexpected end of JSON input"     
    severity:  "ERROR"     
    time:  "2018-05-24T22:30:24.169925Z"     
   }
   1: {
    logMessage:  "Error fetching build https://build.chromium.org/p/chromium.memory/Linux ChromiumOS MSan Tests/7263: failed to decode data in memcache (data probably corrupt: 0 bytes). key chromium.memory/Linux ChromiumOS MSan Tests/7263 err unexpected end of JSON input"     
    severity:  "ERROR"     
    time:  "2018-05-24T22:30:24.169937Z"     
   }
   2: {
    logMessage:  "Error fetching build https://build.chromium.org/p/chromium.memory/Linux ChromiumOS MSan Tests/7262: failed to decode data in memcache (data probably corrupt: 0 bytes). key chromium.memory/Linux ChromiumOS MSan Tests/7262 err unexpected end of JSON input"     
    severity:  "ERROR"     
    time:  "2018-05-24T22:30:24.169984Z"     
   } 
After digging a bit more, a few other error messages. 

{
    logMessage:  "couldn't get test expectations for http/tests/devtools/audits2/audits2-limited-run.js: errors fetching Expectation files: [error reading: Call error 11: Deadline exceeded (timeout)]"     
    severity:  "ERROR"     
    time:  "2018-05-24T22:36:39.885509Z"     
   } 

0: {
    logMessage:  "Process terminated because the request deadline was exceeded. (Error code 123)"     
    severity:  "ERROR"     
    time:  "2018-05-24T22:36:47.274477Z"     
   }
Tried temporarily rolling back the version and re-running the cron, but this didn't seem to help. So it's probably not related to any recent code changes. 

Looking through the code a bit, I think both the "Error fetching build" and "couldn't get test expectations" errors are non-blocking, so I think "Process terminated because the request deadline was exceeded" is probably what's causing the cron to fail. 
Status: Fixed (was: Assigned)
Looks like the data is up to date again. 

I'm not super comfortable that we've solved the root problem here, though. For now, I filed a followup bug to add alerting so that we can catch/respond to this problem faster/better in the future:  crbug.com/846511 
Of note, since the last release we've been sending a lot more requests to test-results.appspot.com.com, which is a little surprising: https://screenshot.googleplex.com/icKbWMMutg7

I did land a change that queries for more test-results data, but it should be cached. https://chromium-review.googlesource.com/c/infra/infra/+/1066819 is the CL, and it's trying to add extra test result data (perf artifacts) for each test failure in an alert.

I can think of two things: -verify that it's caching responses from test-reulsts, and also filter these checks to just the chromium.perf tree so chromium is unaffected.
Upon closer inspection it appears that test-results responses are *not* cached by the analyzer. That's probably why we're making so many more requests now, and also might be why this cron task is timing out recently.

Sign in to add a comment