instalog: PID file causing problems under Umpire+Docker. |
|||
Issue descriptionWe've found that when running inside Docker+Umpire, Instalog will first find PID file and see if that process exists, then try connection, and die if connection failed. This creates a problem that under Docker environment, we usually see some small number in PID for example "14", that you'll always see some other PID there and instalog will fail to connect. There are several issues and ways to fix: 1. Is umpire_data a good place for PID? Or should we use some tmpfs like /run 2. When Instalog fail to build connection, should it do something - for example killing the PID or remove the PID file? 3. Find why the PID file was not removed when we terminate the umpire docker session etc
,
Apr 4 2017
Chuntsen, could you please look into this? I would recommend relocating the PID file as Hung-Te has suggested in (1). This could be done by changing the config definition in the Umpire Instalog service.
,
Apr 4 2017
Pihsun, do you think it's worth adding "temporary disable" support to Umpire for services that are acting up, as Hung-Te mentioned?
,
Apr 6 2017
1. I'll relocate PID file. 2. I think we should remove PID file and run Instalog as usual. 3. PID file was not removed only when Instalog down with accidents. I think we only can recode some error logs, and let users find out the reason from instalog.log.
,
Apr 6 2017
(2) is already the current behaviour, except we don't kill the PID if it is running:
def GetPID(self):
"""Returns the current PID of the daemon process, or None if not found."""
try:
with open(self.pidfile, 'r') as f:
return int(f.read().strip())
except (IOError, ValueError):
if os.path.exists(self.pidfile):
os.remove(self.pidfile)
return None
def Start(self, foreground=False):
"""Starts the daemon."""
# Check for a pidfile to see if the daemon is already running.
if self.GetPID() and self.IsStopped():
# Not sure if this is the safest thing to do...
message = 'pidfile %s exists, but pid is down. Removing pidfile\n'
sys.stderr.write(message % self.pidfile)
os.remove(self.pidfile)
elif self.IsRunning():
message = 'pidfile %s already exists, and pid is running\n'
sys.stderr.write(message % self.pidfile)
sys.exit(1)
,
Apr 6 2017
When moving to /run, please also add --tmpfs to docker command (both dome and cros_docker) so /run will be mapped to tmpfs. For example: docker run -d --tmpfs /run:rw,size=65536k my_image Not sure if adding size is a good idea. You can try what'll happen if we don't specify size.
,
Apr 11 2017
@Joel: hungte means that the pid in pidfile is not Instalog when Umpire restart. We need to find a way to check it. After discuss with others, can we remove pidfile when no-daemon, and use other way (lock?) to check Instalog is running or not.
,
Apr 11 2017
The idea is to always run when invoked as no-daemon (umpire service) mode, even not checking pid files. Checking instalog daemon state by other IPC approaches (lock, domain socket, bind port, ... etc) is optional.
,
Apr 11 2017
So to confirm my understanding, the actionable item here is: * Update Instalog to not use pidfile in the case of --no-daemon.
,
Apr 12 2017
Yes, that's the first thing.
,
Apr 13 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/factory/+/32bf4e25207a72675ed4fb9add73efe399286511 commit 32bf4e25207a72675ed4fb9add73efe399286511 Author: chuntsen <chuntsen@google.com> Date: Thu Apr 13 11:11:06 2017 Instalog: Fix bugs about Umpire's Instalog service and Instalog pidfile When umpire restart, the pidfile will remain, and cause Instalog service and Umpire fail to start. So we move Instalog's pidfile to /run which is mounted as tmpfs. BUG= chromium:698154 TEST=Test on local device Change-Id: I91d057f864c20deaa48a1df787b0f20a41dbac2b Reviewed-on: https://chromium-review.googlesource.com/474764 Commit-Ready: Chun-Tsen Kuo <chuntsen@chromium.org> Tested-by: Chun-Tsen Kuo <chuntsen@chromium.org> Reviewed-by: Mao Huang <littlecvr@chromium.org> [modify] https://crrev.com/32bf4e25207a72675ed4fb9add73efe399286511/py/umpire/service/instalog.py [modify] https://crrev.com/32bf4e25207a72675ed4fb9add73efe399286511/py/dome/backend/models.py [modify] https://crrev.com/32bf4e25207a72675ed4fb9add73efe399286511/setup/cros_docker.sh
,
Jul 26 2017
|
|||
►
Sign in to add a comment |
|||
Comment 1 by hungte@chromium.org
, Mar 3 2017