Gatekeeper hung for 40 minutes |
|||
Issue descriptionhttps://build.chromium.org/p/chromium.gatekeeper/builders/Chromium%20Gatekeeper/builds/677365/steps/gatekeeper%3A%20non-closers/logs/stdio DEBUG:root:opening https://chrome-build-extract.appspot.com/p/chromium.fyi/builders/Android%20Cloud%20Tests/builds/5390?json=1... INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 4 seconds and retrying... INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 4 seconds and retrying... INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 4 seconds and retrying... INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 8 seconds and retrying... INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 8 seconds and retrying... INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 8 seconds and retrying... INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 16 seconds and retrying... INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 16 seconds and retrying... INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 16 seconds and retrying... INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 32 seconds and retrying... INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 32 seconds and retrying... INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 32 seconds and retrying... Process PoolWorker-8: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/usr/lib/python2.7/multiprocessing/pool.py", line 99, in worker put((job, i, result)) File "/usr/lib/python2.7/multiprocessing/queues.py", line 390, in put return send(obj) UnpickleableError: Cannot pickle <type 'ssl.SSLContext'> objects Gatekeeper somehow timed out for 40 minutes? It shouldn't have.... buildbot killed the process after 40 minutes of no output. Fault might be in python multiprocessing stuff? Should we also not be retrying on 404s?
,
Oct 27 2016
Looking at this. I think something in the URLFetch error isn't pickle-able (SSLContext). Gonna catch the error and raise a simpler one which can be pickled. Yay python mulitprocessing.
,
Oct 27 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build.git/+/6e390490ae0e00dcb1f058f30a61dde88cb55626 commit 6e390490ae0e00dcb1f058f30a61dde88cb55626 Author: martiniss <martiniss@chromium.org> Date: Thu Oct 27 23:54:05 2016 build_scan: Raise ValueError on URL fetch error When url fetches error, they raise a URLError, which apparently has a SSLContext object contained in them, which can't be pickled. This seems to break gatekeeper, because it uses multiprocessing, and python doesn't know how to pickle SSLContext objects. BUG= 653262 Review-Url: https://codereview.chromium.org/2451213004 [modify] https://crrev.com/6e390490ae0e00dcb1f058f30a61dde88cb55626/scripts/slave/build_scan.py
,
Oct 28 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build.git/+/b67196c320e6a97c67e0769c0e43a353ce61f304 commit b67196c320e6a97c67e0769c0e43a353ce61f304 Author: martiniss <martiniss@chromium.org> Date: Fri Oct 28 21:55:31 2016 build_scan: Fix exception handling BUG= 653262 Review-Url: https://codereview.chromium.org/2465493002 [modify] https://crrev.com/b67196c320e6a97c67e0769c0e43a353ce61f304/scripts/slave/build_scan.py
,
Nov 4 2016
Ok, I think I finally fixed this. http://shortn/_fVhRrDggh6 shows that when I landed https://codereview.chromium.org/2465493002, it stopped timing out. Yay! |
|||
►
Sign in to add a comment |
|||
Comment 1 by martiniss@chromium.org
, Oct 25 2016Labels: -Pri-2 Pri-1