New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 653262 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Nov 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 1
Type: Bug



Sign in to add a comment

Gatekeeper hung for 40 minutes

Project Member Reported by martiniss@chromium.org, Oct 5 2016

Issue description

https://build.chromium.org/p/chromium.gatekeeper/builders/Chromium%20Gatekeeper/builds/677365/steps/gatekeeper%3A%20non-closers/logs/stdio


DEBUG:root:opening https://chrome-build-extract.appspot.com/p/chromium.fyi/builders/Android%20Cloud%20Tests/builds/5390?json=1...
INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 4 seconds and retrying...
INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 4 seconds and retrying...
INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 4 seconds and retrying...
INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 8 seconds and retrying...
INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 8 seconds and retrying...
INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 8 seconds and retrying...
INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 16 seconds and retrying...
INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 16 seconds and retrying...
INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 16 seconds and retrying...
INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 32 seconds and retrying...
INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 32 seconds and retrying...
INFO:root:url fetch encountered HTTP Error 404: Not Found, sleeping for 32 seconds and retrying...
Process PoolWorker-8:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 99, in worker
    put((job, i, result))
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 390, in put
    return send(obj)
UnpickleableError: Cannot pickle <type 'ssl.SSLContext'> objects

Gatekeeper somehow timed out for 40 minutes? It shouldn't have.... buildbot killed the process after 40 minutes of no output.

Fault might be in python multiprocessing stuff?

Should we also not be retrying on 404s?
 
Cc: -bpastene@chromium.org
Labels: -Pri-2 Pri-1
This is happening every other day roughly. We need to fix this.
Owner: martiniss@chromium.org
Status: Assigned (was: Available)
Looking at this. I think something in the URLFetch error isn't pickle-able (SSLContext). Gonna catch the error and raise a simpler one which can be pickled. Yay python mulitprocessing.
Project Member

Comment 3 by bugdroid1@chromium.org, Oct 27 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build.git/+/6e390490ae0e00dcb1f058f30a61dde88cb55626

commit 6e390490ae0e00dcb1f058f30a61dde88cb55626
Author: martiniss <martiniss@chromium.org>
Date: Thu Oct 27 23:54:05 2016

build_scan: Raise ValueError on URL fetch error

When url fetches error, they raise a URLError, which apparently has a SSLContext
object contained in them, which can't be pickled. This seems to break gatekeeper,
because it uses multiprocessing, and python doesn't know how to pickle
SSLContext objects.

BUG= 653262 

Review-Url: https://codereview.chromium.org/2451213004

[modify] https://crrev.com/6e390490ae0e00dcb1f058f30a61dde88cb55626/scripts/slave/build_scan.py

Status: Fixed (was: Assigned)
Ok, I think I finally fixed this. http://shortn/_fVhRrDggh6 shows that when I landed https://codereview.chromium.org/2465493002, it stopped timing out. Yay!

Sign in to add a comment