New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 754860 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

4 minute "sudo" hangs.

Project Member Reported by dgarr...@chromium.org, Aug 11 2017

Issue description

Most lab servers are seeing "sudo" commands hang for about 4 minutes on a semi-regular basis.

 
Here are relevant logs from auth.log during one example:

# Preceding/normal sudo
Aug 10 07:40:36 chromeos-server67.hot.corp.google.com sudo: chromeos-test : TTY=unknown ; PWD=/usr/local/autotest/results ; USER=root ; COMMAND=/usr/bin/lxc-info -P /usr/local/autotest/containers -n test_134161472_1502376034_16010 -c lxc.rootfs
Aug 10 07:40:36 chromeos-server67.hot.corp.google.com sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Aug 10 07:40:36 chromeos-server67.hot.corp.google.com sudo: pam_unix(sudo:session): session closed for user root

# Problematic sudo
Aug 10 07:40:37 chromeos-server67.hot.corp.google.com sudo: chromeos-test : TTY=unknown ; PWD=/usr/local/autotest/results ; USER=root ; COMMAND=/bin/mkdir -p /usr/local/autotest/containers/test_134161472_1502376034_16010/delta0/usr/local
Aug 10 07:40:37 chromeos-server67.hot.corp.google.com sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Aug 10 07:40:37 chromeos-server67.hot.corp.google.com sudo: pam_unix(sudo:session): session closed for user root
Aug 10 07:43:43 chromeos-server67.hot.corp.google.com /usr/sbin/connectd[16283]: 2017/08/10 07:43:43 ERROR: Failed loading machine certificate: Credential.Load2 failed: failed to load credkit credential: exporting certificate: 

# Following logs
Aug 10 07:44:12 chromeos-server67.hot.corp.google.com /usr/bin/ssh[4464]: local_user chromeos-test, remote_user root, remote_host chromeos4-row7-rack3-host15-servo, remote_addr 100.115.210.30
Aug 10 07:44:12 chromeos-server67.hot.corp.google.com /usr/bin/ssh[4465]: local_user chromeos-test, remote_user root, remote_host chromeos4-row7-rack2-host21-servo, remote_addr 100.115.210.22
Aug 10 07:44:13 chromeos-server67.hot.corp.google.com sudo: chromeos-test : TTY=unknown ; PWD=/usr/local/autotest/results ; USER=root ; COMMAND=/bin/mv /tmp/autotest_server_package.tar.bz2__bJMP1 /usr/local/autotest/containers/test_134161472_1502376034_16010/delta0/usr/local/autotest_server_package.tar.bz2

It looks like problems with the machine cert are happening very, very regularly.

zgrep "Credential.Load2" /var/log/*

daemon.log.2.gz:Jul 24 21:34:28 chromeos-server67.hot.corp.google.com nsscacheclient[11153]: Unable to use machine cert, continuing anyway: Credential.Load2 failed: failed to load credkit credential: exporting certificate: 
daemon.log.2.gz:Jul 24 21:34:30 chromeos-server67.hot.corp.google.com nsscacheclient[12849]: Unable to use machine cert, continuing anyway: Credential.Load2 failed: failed to load credkit credential: exporting certificate: 
daemon.log.2.gz:Jul 24 21:35:28 chromeos-server67.hot.corp.google.com nsscacheclient[31281]: Unable to use machine cert, continuing anyway: Credential.Load2 failed: failed to load credkit credential: exporting certificate: 
daemon.log.2.gz:Jul 24 21:35:57 chromeos-server67.hot.corp.google.com nsscacheclient[9046]: Unable to use machine cert, continuing anyway: Credential.Load2 failed: failed to load credkit credential: exporting certificate: 
daemon.log.2.gz:Jul 24 21:38:52 chromeos-server67.hot.corp.google.com nsscacheclient[31391]: Unable to use machine cert, continuing anyway: Credential.Load2 failed: failed to load credkit credential: exporting certificate: 
daemon.log.2.gz:Jul 24 21:39:13 chromeos-server67.hot.corp.google.com nsscacheclient[7262]: Unable to use machine cert, continuing anyway: Credential.Load2 failed: failed to load credkit credential: exporting certificate: 
daemon.log.2.gz:Jul 24 21:39:16 chromeos-server67.hot.corp.google.com nsscacheclient[8027]: Unable to use machine cert, continuing anyway: Credential.Load2 failed: failed to load credkit credential: exporting certificate: 
After poking around semi-randomly, it seems to be happening on all labservers, other than cautotest.

Happening on:
chromeos-server72.hot     # Shard
cros-autotest-shard2.cbf  # Drone
chromeos-server104.mtv    # Shard
chromeos-server44.cbf     # Shard

Not on:
chromeos-server2          # cautotest
dgarrett.mtv              # My Workstation
Cc: akes...@chromium.org
Owner: shuqianz@chromium.org
We should follow up with Tech Stop to try to understand this as a Goobuntu issue. (those machines are running Goobuntu, aren't they?)

I strongly suspect it's related to our network firewall rules.
Status: Archived (was: Untriaged)

Sign in to add a comment