SetUpUser RPC fails with "failed to run useradd: EOF" sometimes |
||||
Issue descriptionIn the tast-tests I've seen a few results where there's a failure with the following log lines. This has been observed on both kevin, eve, grunt, wizpig (so it's not a slower board problem or an architecture specific issue). Specific instances were observed on R71-11065.0.0, 11064, 11061, 11060, 11056 (and likely earlier). 2018-09-14T08:36:00.832872-07:00 ERR vm_cicerone[26468]: Failed to set up user: failed to run useradd: EOF 2018-09-14T08:36:03.327256-07:00 INFO VM(4)[26435]: lxd[172]: action=start created=2018-09-14T15:35:10+0000 ephemeral=false lvl=info msg="Starting container" name=penguin stateful=false t=2018-09-14T15:35:59+0000 used=1970-01-01T00:00:00+0000#012 2018-09-14T08:36:03.327266-07:00 INFO VM(4)[26435]: lxd[172]: action=start created=2018-09-14T15:35:10+0000 ephemeral=false lvl=info msg="Started container" name=penguin stateful=false t=2018-09-14T15:36:00+0000 used=1970-01-01T00:00:00+0000#012 2018-09-14T08:36:03.327269-07:00 INFO VM(4)[26435]: tremplin[202]: 2018/09/14 15:36:00 Received SetUpUser RPC: penguin (username testuser)#012 2018-09-14T08:36:03.327272-07:00 ERR VM(4)[26435]: lxd[172]: lvl=eror msg="Failed to retrieve PID of executing child process: EOF" t=2018-09-14T15:36:01+0000#012
,
Sep 17
,
Sep 18
The cause is the child process in lxc_attach calling shutdown() on the socketpair that the intermediate process needs to use. There's a race between the shutdown() in the intermediate process and the shutdown() in the child (target) process. Basic repro case: x=0; while lxc exec penguin -- id -n 1000 -u; do x=$(( $x+1 )); echo $x; done This fails usually in < 300 iterations on my system, and has exceeded 20k iterations by removing the shutdown() calls in attach_child_main(): https://github.com/lxc/lxc/blob/lxc-3.0.2/src/lxc/attach.c#L933 Attaching an strace of the failure.
,
Sep 19
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/4f6515fbe53897a78e6e0e2d70166215b360d1d7 commit 4f6515fbe53897a78e6e0e2d70166215b360d1d7 Author: Stephen Barber <smbarber@chromium.org> Date: Wed Sep 19 04:12:07 2018 app-emulation/lxc: add shutdown fix for lxc-attach Merged upstream in https://github.com/lxc/lxc/pull/2619 BUG= chromium:884244 TEST=run repro case in bug; no failure after 10k+ iterations Change-Id: Ia93db140e10ba840c50826fbfbfff77969113530 Reviewed-on: https://chromium-review.googlesource.com/1231854 Commit-Ready: Stephen Barber <smbarber@chromium.org> Tested-by: Stephen Barber <smbarber@chromium.org> Reviewed-by: Chirantan Ekbote <chirantan@chromium.org> [modify] https://crrev.com/4f6515fbe53897a78e6e0e2d70166215b360d1d7/app-emulation/lxc/lxc-3.0.1.ebuild [rename] https://crrev.com/4f6515fbe53897a78e6e0e2d70166215b360d1d7/app-emulation/lxc/lxc-3.0.1-r2.ebuild [add] https://crrev.com/4f6515fbe53897a78e6e0e2d70166215b360d1d7/app-emulation/lxc/files/lxc-3.0.1-attach-shutdown.patch
,
Sep 19
,
Dec 11
|
||||
►
Sign in to add a comment |
||||
Comment 1 by jkardatzke@chromium.org
, Sep 14