sizes step failing on Google Chrome {Win7, Mac, Linux64} with HTTP {400,500} |
|||||||||
Issue descriptionSample run: https://ci.chromium.org/buildbot/chromium.chrome/Google%20Chrome%20Win/35197 Log: Sending result 2 of 2 to dashboard. Confused: 12 files were deleted from c:\users\chrome~1\appdata\local\temp during the test run Error uploading chartjson data: Discarding JSON, error: Traceback (most recent call last): File "C:\b\rr\tmprkihr4\rw\checkout\scripts\slave\results_dashboard.py", line 493, in _SendResultsJson urllib2.urlopen(req) File "C:\b\depot_tools\win_tools-2_7_6_bin\python\bin\lib\urllib2.py", line 127, in urlopen return _opener.open(url, data, timeout) File "C:\b\depot_tools\win_tools-2_7_6_bin\python\bin\lib\urllib2.py", line 410, in open response = meth(req, response) File "C:\b\depot_tools\win_tools-2_7_6_bin\python\bin\lib\urllib2.py", line 523, in http_response 'http', request, response, code, msg, hdrs) File "C:\b\depot_tools\win_tools-2_7_6_bin\python\bin\lib\urllib2.py", line 448, in error return self._call_chain(*args) File "C:\b\depot_tools\win_tools-2_7_6_bin\python\bin\lib\urllib2.py", line 382, in _call_chain result = func(*args) File "C:\b\depot_tools\win_tools-2_7_6_bin\python\bin\lib\urllib2.py", line 531, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) HTTPError: HTTP Error 400: Bad Request step returned non-zero exit code: 87
,
Aug 9
These are now failing on all official bots. Raising to Pri-0.
,
Aug 9
There are a bunch of issues with the general chromium tree right now. This looks like a perf issue with their server? I'll assign to them.
,
Aug 9
,
Aug 9
,
Aug 9
Searching the perf dashboard logs for "status:400" brings up a lot of errors about the revisions being uploaded: https://pantheon.corp.google.com/logs/viewer?project=chromeperf&minLogLevel=0&expandAll=false×tamp=2018-08-09T16:23:53.720000000Z&customFacets=&limitCustomFacetWidth=true&dateRangeStart=2018-08-09T15:23:59.245Z&dateRangeEnd=2018-08-09T16:23:59.245Z&interval=PT1H&resource=gae_app%2Fmodule_id%2Fdefault&logName=projects%2Fchromeperf%2Flogs%2Fappengine.googleapis.com%252Frequest_log&scrollTimestamp=2018-08-09T16:23:35.926207000Z&filters=status:400 Invalid ID (revision) 1533831815; compared to previous ID 581887, it was larger or smaller by too much I think it's related to problems with commit position? Do these builders re-try failed uploads?
,
Aug 9
,
Aug 9
> I think it's related to problems with commit position? See bug 872729 where ppl force pushed commits w/o the usual commit position header. Maybe related?
,
Aug 9
#8 bug deduped into https://bugs.chromium.org/p/chromium/issues/detail?id=872722 def looks related. There haven't been any new chromeperf deployments recently (latest was the 7th) so it's probably not a change on that side.
,
Aug 9
https://docs.google.com/document/d/11UwPvlhjK5DLKSOBOpF5fRsT8zx1urEIDLavuAcTotY/edit TL;DR looks there's a gerrit plugin failure that allowed CLs to land without going through CQ
,
Aug 9
The perf dashboard looks like it's rejecting the bad revisions intentionally. (Hooray!) When the gerrit plugin is fixed, these benchmarks will go back to providing commit positions instead of unix timestamps, and we don't want the perf dashboard's revisions to go backwards. Any objections to letting the perf dashboard continue to reject the bad revisions? IIUC, the bots will keep retrying to upload the bad revisions even after the gerrit plugin is fixed, so we need to manually purge the data from the bots. Is that correct? Does anybody know how to do that?
,
Aug 9
The bots *should* only retry on 5XX (transient) errors and not 4XX (permanent) errors. I'm not sure where the recipe for official sizes is to check though!
,
Aug 9
Judging from the logs, it doesn't look like they retry after receiving a 400 response code. One bad request failure and they stop. However, on subsequent builds do the bots try to upload any previously attempted files or data that might still be around?
,
Aug 9
The bots usually have a step that retries previous attempts, but only if the response was a 5XX.
,
Aug 10
,
Aug 10
These 3 bots started failing constantly from https://ci.chromium.org/buildbot/chromium.chrome/Google%20Chrome%20Win/35240 Sending result 1 of 2 to dashboard. Confused: 10 files were deleted from c:\users\chrome~1\appdata\local\temp during the test run Error while uploading chartjson data: Traceback (most recent call last): File "C:\b\rr\tmpec7c4v\rw\checkout\scripts\slave\results_dashboard.py", line 493, in _SendResultsJson urllib2.urlopen(req) File "C:\b\depot_tools\win_tools-2_7_6_bin\python\bin\lib\urllib2.py", line 127, in urlopen return _opener.open(url, data, timeout) File "C:\b\depot_tools\win_tools-2_7_6_bin\python\bin\lib\urllib2.py", line 410, in open response = meth(req, response) File "C:\b\depot_tools\win_tools-2_7_6_bin\python\bin\lib\urllib2.py", line 523, in http_response 'http', request, response, code, msg, hdrs) File "C:\b\depot_tools\win_tools-2_7_6_bin\python\bin\lib\urllib2.py", line 448, in error return self._call_chain(*args) File "C:\b\depot_tools\win_tools-2_7_6_bin\python\bin\lib\urllib2.py", line 382, in _call_chain result = func(*args) File "C:\b\depot_tools\win_tools-2_7_6_bin\python\bin\lib\urllib2.py", line 531, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) HTTPError: HTTP Error 500: Internal Server Error step returned non-zero exit code: 87 https://ci.chromium.org/buildbot/chromium.chrome/Google%20Chrome%20Linux%20x64/34654 https://ci.chromium.org/buildbot/chromium.chrome/Google%20Chrome%20Mac/35784
,
Aug 10
,
Aug 11
Marking as fixed, this seems to be better? |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by ellyjo...@chromium.org
, Aug 9