New issue
Advanced search Search tips

Issue 609096 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: May 2016
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 1
Type: Bug



Sign in to add a comment

Chromoting service keeps going offline

Project Member Reported by michae...@chromium.org, May 4 2016

Issue description

Version: 50.0.2661.94
OS: Goobuntu

I've been unable to connect to Chrome Remote Desktop with my corp account.

I keep seeing "SendHostOfflineReason" in the logs.

https://paste.googleplex.com/5606575918022656
 
Labels: -Pri-2 M-51 Pri-1
Owner: lambroslambrou@chromium.org
Status: Assigned (was: Untriaged)
Assigning to Lambros, since this looks the same as another report we've had internally. There's a suggestion it might be related to the recently added certificate watcher shutdown path.
Is there a way to turn on VLOG on the host? The certificate watcher will send out a verbose log before it restarts the host.

If it turns out the problem is caused by the certificate watcher then there must be something continuously touching the NSS DB, or the start script somehow touches NSSDB every time the host restarts.
The host binary accepts the same parameters documented at
https://www.chromium.org/for-testers/enable-logging
Specifically, adding --v=1 will set it to level 1 (more verbose than default).
Higher numbers give more log output.
You would have to edit the Python script (or manually run the host binary).

I'm looking into this now.

I've enabled verbose logging. Interestingly the certificate watcher restarts the host once immediately after starting the chrome-remote-desktop service. I'll see if it happen again. The service dies on a regular basis so I should have an update soon.

When not at my desk, I usually restart the service via "sudo service chrome-remote-desktop restart" from an SSH session (which I then close). Is that OK?


I just tried to connect and the host went offline. Grepping for certificate_watcher shows dozens of these lines:

[0508/052226:VERBOSE1:certificate_watcher.cc(66)] Started watching certificate changes.
[0508/052230:VERBOSE1:certificate_watcher.cc(141)] Certificate was updated. Calling restart...
[0508/052234:VERBOSE1:certificate_watcher.cc(141)] Certificate was updated. Calling restart...
[0508/052237:VERBOSE1:certificate_watcher.cc(141)] Certificate was updated. Calling restart...
[0508/052246:VERBOSE1:certificate_watcher.cc(66)] Started watching certificate changes.
[0508/052251:VERBOSE1:certificate_watcher.cc(141)] Certificate was updated. Calling restart...
[0508/052255:VERBOSE1:certificate_watcher.cc(141)] Certificate was updated. Calling restart...
[0508/052258:VERBOSE1:certificate_watcher.cc(141)] Certificate was updated. Calling restart...

Is it safe to share the full log output in paste.googleplex.com?
The log contains everything that gets logged from your chrome-remote-desktop-session file, so it might have sensitive information. However, I think we have enough to go on here.

I think there are a couple of changes to be made here. Firstly, we don't expect this to happen often, so we should be logging at a more verbose level than this. Secondly, I think we should guard against this sort of failure mode, perhaps by throttling the restart behaviour.

Yuwei, can you suggest any debugging steps for this user (periodic "ls -l" on the cert directory, for example) to determine whether this is a bug in our code or an invalid assumption that cert updates are rare?
The X session launcher seems to somehow touch the certificate folder, so you will see the certificate watcher being triggered immediately when the script just starts, but it shouldn't trigger the watcher after the host restarts since the host will reuse the existing session.

Would you do ls -l ~/.pki/nssdb/ and pay attention to the timestamp? Do you see the timestamp of any file inside the directory keeps changing and which file(s) are they?
5/10 10:22: chromoting is down again. I `ls`, then restart, then `ls` again.

ls results:
-rw------- 1 michaelpg eng  9216 Feb 25 12:28 cert9.db
-rw------- 1 michaelpg eng 11264 Feb 25 12:28 key4.db
-rw------- 1 michaelpg eng   713 May  9 20:52 pkcs11.txt

sudo service chrome-remote-desktop restart

ls results:
-rw------- 1 michaelpg eng  9216 Feb 25 12:28 cert9.db
-rw------- 1 michaelpg eng 11264 Feb 25 12:28 key4.db
-rw------- 1 michaelpg eng   713 May 10 10:22 pkcs11.txt

I connected and disconnected from my laptop, and the timestamps are unchanged.

5/11 19:30: chromoting is down again. The timestamps haven't changed, everything is exactly as above.

I restarted the service again, and checked the timestamps again:

-rw------- 1 michaelpg eng  9216 Feb 25 12:28 cert9.db
-rw------- 1 michaelpg eng 11264 Feb 25 12:28 key4.db
-rw------- 1 michaelpg eng   713 May 11 19:29 pkcs11.txt

Status: Started (was: Assigned)
Project Member

Comment 10 by bugdroid1@chromium.org, May 16 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/81e5e053c458bd919fff0575aaaaea9a2a4a050d

commit 81e5e053c458bd919fff0575aaaaea9a2a4a050d
Author: lambroslambrou <lambroslambrou@chromium.org>
Date: Mon May 16 18:36:17 2016

[remoting host] Only watch for material changes to NSS DB files

The remoting host process (on Linux) self-restarts if it detects
changes to the NSS DB directory. This modifies the watcher to only
trigger on material changes to the NSS DB files. This should reduce
incidences of false-restarts of the host.

BUG= 609096 

Review-Url: https://codereview.chromium.org/1977963002
Cr-Commit-Position: refs/heads/master@{#393879}

[modify] https://crrev.com/81e5e053c458bd919fff0575aaaaea9a2a4a050d/remoting/host/linux/certificate_watcher.cc
[modify] https://crrev.com/81e5e053c458bd919fff0575aaaaea9a2a4a050d/remoting/host/linux/certificate_watcher.h
[modify] https://crrev.com/81e5e053c458bd919fff0575aaaaea9a2a4a050d/remoting/host/linux/certificate_watcher_unittest.cc

Status: Fixed (was: Started)

Sign in to add a comment