New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 776204 link

Starred by 1 user

Issue metadata

Status: Duplicate
Owner: ----
Closed: Oct 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

guado_moblab: CQ, AFE RPC error

Project Member Reported by ayatane@chromium.org, Oct 19 2017

Issue description

https://luci-milo.appspot.com/buildbot/chromeos/guado_moblab-paladin/7587

Very weird guado_moblab failure: DatabaseError: (1146, "Table 'chromeos_autotest_db.django_session' doesn't exist")

I don't see any CLs that could have caused this

10/18 15:48:43.588 INFO |monitor_db_cleanup:0261| Deleting old sessions from django_session
10/18 15:48:43.593 ERROR|        monitor_db:0183| Uncaught exception, terminating monitor_db.
Traceback (most recent call last):
  File "/usr/local/autotest/scheduler/monitor_db.py", line 172, in main_without_exception_handling
    dispatcher.tick()
  File "/usr/lib64/python2.7/site-packages/chromite/lib/metrics.py", line 483, in wrapper
    return fn(*args, **kwargs)
  File "/usr/local/autotest/scheduler/monitor_db.py", line 366, in tick
    self._run_cleanup()
  File "/usr/local/autotest/scheduler/monitor_db.py", line 269, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/local/autotest/scheduler/monitor_db.py", line 393, in _run_cleanup
    self._periodic_cleanup.run_cleanup_maybe()
  File "/usr/local/autotest/scheduler/monitor_db_cleanup.py", line 48, in run_cleanup_maybe
    self._cleanup()
  File "/usr/lib64/python2.7/site-packages/chromite/lib/metrics.py", line 483, in wrapper
    return fn(*args, **kwargs)
  File "/usr/local/autotest/scheduler/monitor_db_cleanup.py", line 75, in _cleanup
    self._django_session_cleanup()
  File "/usr/local/autotest/scheduler/monitor_db_cleanup.py", line 263, in _django_session_cleanup
    self._db.execute(sql)
  File "/usr/local/autotest/database/database_connection.py", line 312, in execute
    results = self._backend.execute(query, parameters)
  File "/usr/local/autotest/database/database_connection.py", line 132, in execute
    parameters=parameters)
  File "/usr/local/autotest/database/database_connection.py", line 54, in execute
    self._cursor.execute(query, parameters)
  File "/usr/lib64/python2.7/site-packages/django/db/backends/mysql/base.py", line 130, in execute
    six.reraise(utils.DatabaseError, utils.DatabaseError(*tuple(e.args)), sys.exc_info()[2])
  File "/usr/lib64/python2.7/site-packages/django/db/backends/mysql/base.py", line 120, in execute
    return self.cursor.execute(query, args)
  File "/usr/lib64/python2.7/site-packages/MySQLdb/cursors.py", line 205, in execute
    self.errorhandler(self, exc, value)
  File "/usr/lib64/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
DatabaseError: (1146, "Table 'chromeos_autotest_db.django_session' doesn't exist")

 
Summary: guado_moblab: CQ, AFE RPC error (was: guado_moblab: Table 'chromeos_autotest_db.django_session' doesn't exist)
Oops, I think that's a red herring

The actual error is an AFE RPC 500.

@@@ Apache logs:

127.0.0.1 - - [18/Oct/2017:15:38:21 -0700] "POST /afe/server/noauth/rpc/?method=get_hosts HTTP/1.1" 500 2424
127.0.0.1 - - [18/Oct/2017:15:38:21 -0700] "POST /afe/server/noauth/rpc/?method=get_hosts HTTP/1.1" 500 2424
127.0.0.1 - - [18/Oct/2017:15:38:43 -0700] "POST /afe/server/noauth/rpc/?method=get_hosts HTTP/1.1" 500 2424
127.0.0.1 - - [18/Oct/2017:15:38:43 -0700] "POST /afe/server/noauth/rpc/?method=get_hosts HTTP/1.1" 500 2424
127.0.0.1 - - [18/Oct/2017:15:39:24 -0700] "POST /afe/server/noauth/rpc/?method=get_hosts HTTP/1.1" 500 2424
127.0.0.1 - - [18/Oct/2017:15:39:24 -0700] "POST /afe/server/noauth/rpc/?method=get_hosts HTTP/1.1" 500 2424
127.0.0.1 - - [18/Oct/2017:15:40:44 -0700] "POST /afe/server/noauth/rpc/?method=get_hosts HTTP/1.1" 500 2424
127.0.0.1 - - [18/Oct/2017:15:40:44 -0700] "POST /afe/server/noauth/rpc/?method=get_hosts HTTP/1.1" 500 2424
127.0.0.1 - - [18/Oct/2017:15:46:15 -0700] "POST /afe/server/noauth/rpc/?method=get_hosts HTTP/1.1" 500 2424
127.0.0.1 - - [18/Oct/2017:15:46:15 -0700] "POST /afe/server/noauth/rpc/?method=get_hosts HTTP/1.1" 500 2424
127.0.0.1 - - [18/Oct/2017:15:46:35 -0700] "POST /afe/server/noauth/rpc/?method=get_hosts HTTP/1.1" 500 2424
127.0.0.1 - - [18/Oct/2017:15:46:35 -0700] "POST /afe/server/noauth/rpc/?method=get_hosts HTTP/1.1" 500 2424
127.0.0.1 - - [18/Oct/2017:15:47:15 -0700] "POST /afe/server/noauth/rpc/?method=get_hosts HTTP/1.1" 500 2424
127.0.0.1 - - [18/Oct/2017:15:47:15 -0700] "POST /afe/server/noauth/rpc/?method=get_hosts HTTP/1.1" 500 2424
127.0.0.1 - - [18/Oct/2017:15:48:35 -0700] "POST /afe/server/noauth/rpc/?method=get_hosts HTTP/1.1" 500 2424
127.0.0.1 - - [18/Oct/2017:15:48:35 -0700] "POST /afe/server/noauth/rpc/?method=get_hosts HTTP/1.1" 500 2424

@@@ DEBUG log:

10/18 15:46:05.301 DEBUG|      abstract_ssh:0892| Full tunnel command: /usr/bin/ssh -a -x   -n -N -q -L 46496:localhost:80 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos2-row1-rack8-host1
10/18 15:46:05.347 DEBUG|      abstract_ssh:0900| Started ssh tunnel, local = 46496 remote = 80, pid = 1824
10/18 15:46:05.353 DEBUG|        retry_util:0204| <class 'urllib2.URLError'>(<urlopen error [Errno 111] Connection refused>)
10/18 15:46:05.353 DEBUG|        retry_util:0066| Retrying in 10.000000 (10.000000 + jitter 0.000000) seconds ...
10/18 15:46:15.493 DEBUG|        retry_util:0204| <class 'urllib2.HTTPError'>(HTTP Error 500: INTERNAL SERVER ERROR)
10/18 15:46:15.494 DEBUG|        retry_util:0066| Retrying in 20.000000 (20.000000 + jitter 0.000000) seconds ...
10/18 15:46:35.687 DEBUG|        retry_util:0204| <class 'urllib2.HTTPError'>(HTTP Error 500: INTERNAL SERVER ERROR)
10/18 15:46:35.687 DEBUG|        retry_util:0066| Retrying in 40.000000 (40.000000 + jitter 0.000000) seconds ...
10/18 15:47:15.879 DEBUG|        retry_util:0204| <class 'urllib2.HTTPError'>(HTTP Error 500: INTERNAL SERVER ERROR)
10/18 15:47:15.879 DEBUG|        retry_util:0066| Retrying in 80.000000 (80.000000 + jitter 0.000000) seconds ...
10/18 15:48:36.073 DEBUG|        retry_util:0204| <class 'urllib2.HTTPError'>(HTTP Error 500: INTERNAL SERVER ERROR)
10/18 15:48:36.074 DEBUG|        retry_util:0066| Retrying in 160.000000 (160.000000 + jitter 0.000000) seconds ...
10/18 15:51:05.351 DEBUG|       moblab_host:0310| AFE is not responding
10/18 15:51:05.352 WARNI|              test:0612| The test failed with the following exception
Traceback (most recent call last):
  File "/usr/local/autotest/client/common_lib/test.py", line 573, in _exec
    _cherry_pick_call(self.initialize, *args, **dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 721, in _cherry_pick_call
    return func(*p_args, **p_dargs)
  File "/usr/local/autotest/server/cros/moblab_test.py", line 37, in initialize
    self._host.wait_afe_up()
  File "/usr/local/autotest/server/hosts/moblab_host.py", line 176, in wait_afe_up
    self._check_afe()
  File "/usr/local/autotest/server/hosts/moblab_host.py", line 308, in _check_afe
    self.afe.get_hosts()
  File "/usr/local/autotest/server/frontend.py", line 527, in get_hosts
    hosts = self.run('get_hosts', **query_args)
  File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 126, in run
    self, call, **dargs)
  File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 244, in GenericRetry
    return _run()
  File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 171, in _Wrapper
    self._retry_delay.Sleep(attempt)
  File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 67, in Sleep
    time.sleep(total)
  File "/usr/local/autotest/site-packages/chromite/lib/timeout_util.py", line 88, in kill_us
    raise TimeoutError(error_message % {'time': max_run_time})
TimeoutError: Timeout occurred- waited 300.0 seconds.
I have no idea where the RPC logs are, if they exist.

Comment 3 by nxia@chromium.org, Oct 19 2017

dshi@ thinks it's moblab init flake crbug.com/776184
Mergedinto: 776184
Status: Duplicate (was: Untriaged)

Sign in to add a comment