Sheriff-o-matic is lying (and no longer updating) |
||||
Issue descriptionSince yesterday it's been saying that "Mac ASan 64 Tests (1) [3 out of the last 3 builds have failed]" but the last 12 runs are successes when following the link : 2018-05-24 10:15 AM (EDT) 39 mins 22 secs 0e4ef27562f84ac382c269e11c99cdeb399e1fec Success #40854 (20 changes) build successful 2018-05-24 9:12 AM (EDT) 45 mins 44 secs 3583de23ea2e0cf11f626d63822181c6c36b2f64 Success #40853 (38 changes) build successful 2018-05-24 7:59 AM (EDT) 43 mins 8 secs e55b2920a9fc4addd9bb7c45fb1feeb2341a3e48 Success #40852 (30 changes) build successful 2018-05-24 6:28 AM (EDT) 40 mins 44 secs fca1f01a6a5eeeeee1131778c806273544487ade Success #40851 (22 changes) build successful 2018-05-24 5:23 AM (EDT) 42 mins 51 secs 335109fef8be52912633914bae022a80293e4827 Success #40850 (16 changes) build successful 2018-05-24 4:36 AM (EDT) 42 mins 24 secs 20e63a0df5a1181b9b6775d3f4480d07761ef10c Success #40849 (28 changes) build successful 2018-05-24 3:30 AM (EDT) 43 mins 42 secs ed350c5af7814a159f5554cd7e3a12e2377037d6 Success #40848 (10 changes) build successful 2018-05-24 2:43 AM (EDT) 41 mins 6 secs 8fa0234c67b8f9bd9e3e84d1edf7f0a671715c1e Success #40847 (22 changes) build successful 2018-05-24 1:45 AM (EDT) 47 mins 8 secs ee2c93bd00022572b5ae097b1aea80265982989a Success #40846 (40 changes) build successful 2018-05-24 12:20 AM (EDT) 39 mins 20 secs 059308d6872dc0fb7b188d748ad1300773fb5801 Success #40845 (52 changes) build successful 2018-05-23 10:38 PM (EDT) 43 mins 13 secs 965042c5f89bb5ff1e776c5a1533b9632e5cd91b Success #40844 (62 changes) build successful 2018-05-23 9:17 PM (EDT) 44 mins 47 secs 38d6629de4a0abd09127066f99eaccb1f08d0b96 Success #40843 (66 changes) build successful
,
May 24 2018
SheriffOMatic being stuck is problematic for sheriffing.
,
May 24 2018
Sorry for not responding earlier. I was at Camp Chrome yesterday and today.
Looks like the Chromium tree is two days stale. Looking at the request logs there are several recent errors that look similar for different builders. ie:
0: {
logMessage: "Error fetching build https://build.chromium.org/p/chromium.memory/Linux CFI/8052: failed to decode data in memcache (data probably corrupt: 0 bytes). key chromium.memory/Linux CFI/8052 err unexpected end of JSON input"
severity: "ERROR"
time: "2018-05-24T22:30:24.169925Z"
}
1: {
logMessage: "Error fetching build https://build.chromium.org/p/chromium.memory/Linux ChromiumOS MSan Tests/7263: failed to decode data in memcache (data probably corrupt: 0 bytes). key chromium.memory/Linux ChromiumOS MSan Tests/7263 err unexpected end of JSON input"
severity: "ERROR"
time: "2018-05-24T22:30:24.169937Z"
}
2: {
logMessage: "Error fetching build https://build.chromium.org/p/chromium.memory/Linux ChromiumOS MSan Tests/7262: failed to decode data in memcache (data probably corrupt: 0 bytes). key chromium.memory/Linux ChromiumOS MSan Tests/7262 err unexpected end of JSON input"
severity: "ERROR"
time: "2018-05-24T22:30:24.169984Z"
}
,
May 24 2018
After digging a bit more, a few other error messages.
{
logMessage: "couldn't get test expectations for http/tests/devtools/audits2/audits2-limited-run.js: errors fetching Expectation files: [error reading: Call error 11: Deadline exceeded (timeout)]"
severity: "ERROR"
time: "2018-05-24T22:36:39.885509Z"
}
0: {
logMessage: "Process terminated because the request deadline was exceeded. (Error code 123)"
severity: "ERROR"
time: "2018-05-24T22:36:47.274477Z"
}
,
May 24 2018
Tried temporarily rolling back the version and re-running the cron, but this didn't seem to help. So it's probably not related to any recent code changes. Looking through the code a bit, I think both the "Error fetching build" and "couldn't get test expectations" errors are non-blocking, so I think "Process terminated because the request deadline was exceeded" is probably what's causing the cron to fail.
,
May 24 2018
Looks like the data is up to date again. I'm not super comfortable that we've solved the root problem here, though. For now, I filed a followup bug to add alerting so that we can catch/respond to this problem faster/better in the future: crbug.com/846511
,
May 24 2018
Of note, since the last release we've been sending a lot more requests to test-results.appspot.com.com, which is a little surprising: https://screenshot.googleplex.com/icKbWMMutg7 I did land a change that queries for more test-results data, but it should be cached. https://chromium-review.googlesource.com/c/infra/infra/+/1066819 is the CL, and it's trying to add extra test result data (perf artifacts) for each test failure in an alert. I can think of two things: -verify that it's caching responses from test-reulsts, and also filter these checks to just the chromium.perf tree so chromium is unaffected.
,
May 24 2018
Upon closer inspection it appears that test-results responses are *not* cached by the analyzer. That's probably why we're making so many more requests now, and also might be why this cron task is timing out recently. |
||||
►
Sign in to add a comment |
||||
Comment 1 by fhorschig@chromium.org
, May 24 2018