New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 698154 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Last visit > 30 days ago
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

instalog: PID file causing problems under Umpire+Docker.

Project Member Reported by hungte@chromium.org, Mar 3 2017

Issue description

We've found that when running inside Docker+Umpire, Instalog will first find PID file and see if that process exists, then try connection, and die if connection failed.

This creates a problem that under Docker environment, we usually see some small number in PID for example "14", that you'll always see some other PID there and instalog will fail to connect.

There are several issues and ways to fix:

1. Is umpire_data a good place for PID? Or should we use some tmpfs like /run

2. When Instalog fail to build connection, should it do something - for example killing the PID or remove the PID file?

3. Find why the PID file was not removed when we terminate the umpire docker session

etc
 
also, when umpire failed to start a service, maybe it should temporary disable it instead of fail and abort (and then restart, keep failing).
Cc: kitching@chromium.org
Owner: chuntsen@chromium.org
Chuntsen, could you please look into this?

I would recommend relocating the PID file as Hung-Te has suggested in (1).  This could be done by changing the config definition in the Umpire Instalog service.
Pihsun, do you think it's worth adding "temporary disable" support to Umpire for services that are acting up, as Hung-Te mentioned?
1. I'll relocate PID file.
2. I think we should remove PID file and run Instalog as usual.
3. PID file was not removed only when Instalog down with accidents. I think we only can recode some error logs, and let users find out the reason from instalog.log.
(2) is already the current behaviour, except we don't kill the PID if it is running:

  def GetPID(self):
    """Returns the current PID of the daemon process, or None if not found."""
    try:
      with open(self.pidfile, 'r') as f:
        return int(f.read().strip())
    except (IOError, ValueError):
      if os.path.exists(self.pidfile):
        os.remove(self.pidfile)
      return None


  def Start(self, foreground=False):
    """Starts the daemon."""
    # Check for a pidfile to see if the daemon is already running.
    if self.GetPID() and self.IsStopped():
      # Not sure if this is the safest thing to do...
      message = 'pidfile %s exists, but pid is down. Removing pidfile\n'
      sys.stderr.write(message % self.pidfile)
      os.remove(self.pidfile)
    elif self.IsRunning():
      message = 'pidfile %s already exists, and pid is running\n'
      sys.stderr.write(message % self.pidfile)
      sys.exit(1)
When moving to /run, please also add --tmpfs to docker command (both dome and cros_docker) so /run will be mapped to tmpfs. For example:

 docker run -d --tmpfs /run:rw,size=65536k my_image

Not sure if adding size is a good idea. You can try what'll happen if we don't specify size.
@Joel:
hungte means that the pid in pidfile is not Instalog when Umpire restart.
We need to find a way to check it.

After discuss with others, can we remove pidfile when no-daemon, and use other way (lock?) to check Instalog is running or not.

Comment 8 by hungte@chromium.org, Apr 11 2017

The idea is to always run when invoked as no-daemon (umpire service) mode, even not checking pid files.

Checking instalog daemon state by other IPC approaches (lock, domain socket, bind port, ... etc) is optional.
So to confirm my understanding, the actionable item here is:

* Update Instalog to not use pidfile in the case of --no-daemon.
Yes, that's the first thing.
Project Member

Comment 11 by bugdroid1@chromium.org, Apr 13 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/factory/+/32bf4e25207a72675ed4fb9add73efe399286511

commit 32bf4e25207a72675ed4fb9add73efe399286511
Author: chuntsen <chuntsen@google.com>
Date: Thu Apr 13 11:11:06 2017

Instalog: Fix bugs about Umpire's Instalog service and Instalog pidfile

When umpire restart, the pidfile will remain, and cause Instalog service
and Umpire fail to start. So we move Instalog's pidfile to /run which is
mounted as tmpfs.

BUG= chromium:698154 
TEST=Test on local device

Change-Id: I91d057f864c20deaa48a1df787b0f20a41dbac2b
Reviewed-on: https://chromium-review.googlesource.com/474764
Commit-Ready: Chun-Tsen Kuo <chuntsen@chromium.org>
Tested-by: Chun-Tsen Kuo <chuntsen@chromium.org>
Reviewed-by: Mao Huang <littlecvr@chromium.org>

[modify] https://crrev.com/32bf4e25207a72675ed4fb9add73efe399286511/py/umpire/service/instalog.py
[modify] https://crrev.com/32bf4e25207a72675ed4fb9add73efe399286511/py/dome/backend/models.py
[modify] https://crrev.com/32bf4e25207a72675ed4fb9add73efe399286511/setup/cros_docker.sh

Status: Verified (was: Untriaged)

Sign in to add a comment