I've been seeing this in jobs recently. Doesn't seem related to the OBBS migration.
https://pinpoint-dot-chromeperf.appspot.com/job/1201028da40000https://pinpoint-dot-chromeperf.appspot.com/job/17c0140fa40000
Traceback (most recent call last):
File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-1e61f609.411439606576101465/dashboard/pinpoint/models/job.py", line 283, in Run
work_left = self.state.ScheduleWork()
File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-1e61f609.411439606576101465/dashboard/pinpoint/models/job_state.py", line 128, in ScheduleWork
attempt.ScheduleWork()
File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-1e61f609.411439606576101465/dashboard/pinpoint/models/attempt.py", line 77, in ScheduleWork
self._Poll()
File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-1e61f609.411439606576101465/dashboard/pinpoint/models/attempt.py", line 81, in _Poll
self._last_execution.Poll()
File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-1e61f609.411439606576101465/dashboard/pinpoint/models/quest/execution.py", line 95, in Poll
self._Poll()
File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-1e61f609.411439606576101465/dashboard/pinpoint/models/quest/find_isolate.py", line 100, in _Poll
self._CheckBuildStatus()
File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-1e61f609.411439606576101465/dashboard/pinpoint/models/quest/find_isolate.py", line 145, in _CheckBuildStatus
'Buildbucket says the build completed successfully, '
IsolateNotFoundError: Buildbucket says the build completed successfully, but Pinpoint can't find the isolate hash.
I've been seeing this in jobs recently. Doesn't seem related to the OBBS migration.
https://pinpoint-dot-chromeperf.appspot.com/job/1277a8c3a40000https://pinpoint-dot-chromeperf.appspot.com/job/17c0140fa40000https://pinpoint-dot-chromeperf.appspot.com/job/17b3842da40000
Traceback (most recent call last):
File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-1e61f609.411439606576101465/dashboard/pinpoint/models/job.py", line 283, in Run
work_left = self.state.ScheduleWork()
File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-1e61f609.411439606576101465/dashboard/pinpoint/models/job_state.py", line 128, in ScheduleWork
attempt.ScheduleWork()
File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-1e61f609.411439606576101465/dashboard/pinpoint/models/attempt.py", line 77, in ScheduleWork
self._Poll()
File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-1e61f609.411439606576101465/dashboard/pinpoint/models/attempt.py", line 81, in _Poll
self._last_execution.Poll()
File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-1e61f609.411439606576101465/dashboard/pinpoint/models/quest/execution.py", line 95, in Poll
self._Poll()
File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-1e61f609.411439606576101465/dashboard/pinpoint/models/quest/find_isolate.py", line 100, in _Poll
self._CheckBuildStatus()
File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-1e61f609.411439606576101465/dashboard/pinpoint/models/quest/find_isolate.py", line 145, in _CheckBuildStatus
'Buildbucket says the build completed successfully, '
IsolateNotFoundError: Buildbucket says the build completed successfully, but Pinpoint can't find the isolate hash.
Retrying the failed execution in the dev console works, so it only fails the first time. Maybe that implies some kind of timing issue?
from dashboard.pinpoint.models import job
j = job.JobFromId('17c0140fa40000')
s = j.state
for change in s._changes:
for attempt in s._attempts[change]:
last_execution = attempt._last_execution
if last_execution.completed:
continue
last_execution.Poll()
print last_execution.completed
We see the same error message for bisects on android-pixel2-perf.
This is not the same bug; that error message is because pixel2 is in the process of being migrated to the perf waterfall, but the builders don't build the non-webview isolate yet. Filed issue 869485.
There's nothing obvious from the code or logs as to the cause.
Uploaded crrev.com/c/1217453 to get some more debugging info.
As mentioned in comment 2, still seems like a timing issue or race condition; the build is complete before the hash is in Pinpoint's isolate database. But the calls should all be synchronous/blocking, so not sure where it's coming from.
When I look at the build step on the waterfall, I see that the "pinpoint isolate upload" step takes 3ms, but when I look at the `/api/isolate` requests in Pinpoint, they take 50-200ms. That implies that the step is nonblocking, but I'm not sure where it is. I'll upload some more debugging information to confirm this theory.
Ah, the problem seems to be that the build succeeds but fails to upload the isolate hash. Example build created by pinpoint:
https://ci.chromium.org/buildbot/tryserver.chromium.perf/Android%20Compile%20Perf/3708
Fails on "pinpoint isolate upload" with:
{
"status_code": 500,
"text": "<html>\n <head>\n <title>500 Internal Server Error</title>\n </head>\n <body>\n <h1>500 Internal Server Error</h1>\n The server has either erred or is incapable of performing the requested operation.<br /><br />\n\n\n\n </body>\n</html>"
}
Comment 1 by dtu@chromium.org
, Jul 30