New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 635209 link

Starred by 2 users

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: Aug 2016
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

ts_mon write timeout killed build.

Project Member Reported by dgarr...@chromium.org, Aug 6 2016

Issue description

https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/11935/steps/CommitQueueCompletion/logs/stdio


01:55:24: ERROR: Automatic monitoring flush failed.
Traceback (most recent call last):
  File "/b/build/slave/master-paladin-master/build/chromite/third_party/infra_libs/ts_mon/common/interface.py", line 141, in _flush_and_log_exceptions
    flush()
  File "/b/build/slave/master-paladin-master/build/chromite/third_party/infra_libs/ts_mon/common/interface.py", line 91, in flush
    state.global_monitor.send(proto)
  File "/b/build/slave/master-paladin-master/build/chromite/third_party/infra_libs/ts_mon/common/monitors.py", line 150, in send
    body=body).execute(num_retries=5)
  File "/b/build/slave/master-paladin-master/build/chromite/third_party/oauth2client/util.py", line 140, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/b/build/slave/master-paladin-master/build/chromite/third_party/googleapiclient/http.py", line 722, in execute
    body=self.body, headers=self.headers)
  File "/b/build/slave/master-paladin-master/build/chromite/third_party/oauth2client/client.py", line 596, in new_request
    redirections, connection_type)
  File "/b/build/slave/master-paladin-master/build/chromite/third_party/infra_libs/httplib2_utils.py", line 198, in request
    uri, method, body, *args, **kwargs)
  File "/b/build/slave/master-paladin-master/build/chromite/third_party/httplib2/__init__.py", line 1593, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/b/build/slave/master-paladin-master/build/chromite/third_party/httplib2/__init__.py", line 1335, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/b/build/slave/master-paladin-master/build/chromite/third_party/httplib2/__init__.py", line 1291, in _conn_request
    response = conn.getresponse()
  File "/usr/lib/python2.7/httplib.py", line 1030, in getresponse
    response.begin()
  File "/usr/lib/python2.7/httplib.py", line 407, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.7/httplib.py", line 365, in _read_status
    line = self.fp.readline()
  File "/usr/lib/python2.7/socket.py", line 430, in readline
    data = recv(1)
  File "/usr/lib/python2.7/ssl.py", line 241, in recv
    return self.read(buflen)
  File "/usr/lib/python2.7/ssl.py", line 160, in read
    return self._sslobj.read(len)
SSLError: The read operation timed out

 
This builder seemed to have several network issues. This bug doesn't cover them. It covers ts_mon writes being fatal errors.

It seems like stats collection shouldn't fail a CQ run if it doesn't work.
Status: Started (was: Untriaged)
I'm not convinced that that stacktrace means that the CommitQueueSync stage didn't fail for some other reason. That stacktrace came (or should be coming) from a separate process.

I'll look into this more carefully on Monday, and I'm opening a bug to note that we need to catch this error and log a WARNING should definitely be done 
If this hid an exception from something else, we need to make sure that other error gets exposed as well.

In other parts of our code, we use failures_lib.CompoundFailure to deal with this, as clumsy as it is.

Comment 4 by dshi@chromium.org, Aug 9 2016

Any update on this issue?

Comment 5 by aut...@google.com, Aug 9 2016

Labels: -current-issue
Project Member

Comment 6 by bugdroid1@chromium.org, Aug 25 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/41bc70bd8e741801ecb2513390e023f4faf8c3a8

commit 41bc70bd8e741801ecb2513390e023f4faf8c3a8
Author: Paul Hobbs <phobbs@google.com>
Date: Fri Aug 19 20:46:29 2016

metrics: Make SSLErrors when flushing warnings

SSLErrors happen from time to time when the other end of the TCP
connection disappears for whatever reason.  Since we're flushing
regularly, just log the error as a warning.

BUG= chromium:635209 
TEST=unittests.

Change-Id: I29431d338fd596af754c285508da1b838727260c
Reviewed-on: https://chromium-review.googlesource.com/372985
Commit-Ready: Paul Hobbs <phobbs@google.com>
Tested-by: Paul Hobbs <phobbs@google.com>
Reviewed-by: Paul Hobbs <phobbs@google.com>

[modify] https://crrev.com/41bc70bd8e741801ecb2513390e023f4faf8c3a8/lib/ts_mon_config.py
[modify] https://crrev.com/41bc70bd8e741801ecb2513390e023f4faf8c3a8/lib/ts_mon_config_unittest.py
[modify] https://crrev.com/41bc70bd8e741801ecb2513390e023f4faf8c3a8/lib/metrics.py

Comment 7 by pho...@chromium.org, Aug 26 2016

Status: Fixed (was: Started)

Comment 8 by dchan@chromium.org, Oct 7 2016

Labels: VerifyIn-55

Comment 9 by dchan@chromium.org, Oct 10 2016

Labels: -VerifyIn-55

Comment 10 by dchan@google.com, Nov 19 2016

Labels: VerifyIn-56

Comment 11 by dchan@google.com, Jan 21 2017

Labels: VerifyIn-57

Comment 12 by dchan@google.com, Mar 4 2017

Labels: VerifyIn-58

Comment 13 by dchan@google.com, Apr 17 2017

Labels: VerifyIn-59

Comment 14 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61

Comment 16 by dchan@chromium.org, Oct 14 2017

Status: Archived (was: Fixed)

Sign in to add a comment