LXC launch hangs forever on network error |
|||||||
Issue descriptiondownloading a container hangs if there is any network issue during the download. Reproduction steps. 1) crosh 2) crosh> vmc start termina 3) (termina) chronos@localhost ~ $ lxc launch ubuntu:16.04 c1 4) wait for root fs download to start 5) toggle wifi to a different nextwork in the system UI 6) download hangs and never times out.
,
Jun 12 2018
reproduced in ubuntu 18.04, checking if it's fixed in upstream. Will cook a patch if not.
,
Jun 13 2018
+christianvanbrauner@gmail.com Mind pointing me to the code that actually download image?
,
Jun 13 2018
Should I look at CancelableDownload?
,
Jun 13 2018
,
Jun 14 2018
+jkwang@google.com sure. :) The main function that download images when you create a container is here: https://github.com/lxc/lxd/blob/ee1ec7db91f1b1fc50042f52c65bd7d31384df95/lxd/daemon_images.go#L99
,
Jun 14 2018
I'm trying to remember whether this was caused by a go bug or not.
,
Jun 14 2018
Well, the overall go/http client behavior is weird when it comes to network change. I have this code to test go/http: https://gist.github.com/Catramen/ec505a49dc212fce0437b73002783d89 I did lots of "lxc launch" tests and go/http test. The following things happened without a pattern: 1. It could resume happily like nothing happened. 2. It could just block. (I am not even sure if it is blocked. I waited for 10mins and nothing new). 3. It resumes, but in a very low speed. My best guess is that there is some backoff/retry algorithm in the go/http implementation, and the behavior is not lovely these days. Maybe I can figure out a way to turn off the "resume" completely, "fail fast" is not as bad as blocking.
,
Jun 14 2018
Yes, I think Stéphane (Graber) either filed a bug against go or was at least discussing it with me a while back. The problem is - iirc - that the timeout handling is suboptimal. We've had problems with this before...
,
Jun 14 2018
We could also do the "nuke sockets" option where we destroy the connections in the kernel when the host switches networks. abhishekbh knows how to do that.
,
Jun 21 2018
,
Jul 11
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/dc84156e1fc42dfa63028999e7a39296496a9ee0 commit dc84156e1fc42dfa63028999e7a39296496a9ee0 Author: Jingkui Wang <jkwang@google.com> Date: Wed Jul 11 01:47:30 2018 app-emulation/lxd: set tcp keepalive This patch set tcp session keepalive. It will fix the problem that lxc download block forever when network is changed. BUG= chromium:849753 TEST=build termina image and lxc launch, change network. Change-Id: I4da65fad6a429222e9aa0ec8d69a32c1a4a1eab9 Reviewed-on: https://chromium-review.googlesource.com/1103150 Commit-Ready: Stephen Barber <smbarber@chromium.org> Tested-by: Stephen Barber <smbarber@chromium.org> Reviewed-by: Stephen Barber <smbarber@chromium.org> Reviewed-by: Dylan Reid <dgreid@chromium.org> [add] https://crrev.com/dc84156e1fc42dfa63028999e7a39296496a9ee0/app-emulation/lxd/lxd-3.0.0-r1.ebuild [modify] https://crrev.com/dc84156e1fc42dfa63028999e7a39296496a9ee0/app-emulation/lxd/lxd-3.0.0.ebuild [add] https://crrev.com/dc84156e1fc42dfa63028999e7a39296496a9ee0/app-emulation/lxd/files/3.00-tcp-keepalive.patch
,
Aug 2
,
Aug 2
,
Aug 9
@jkwang is anything more needed on this bug for M69? Please update milestone label if not.
,
Aug 9
|
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by jkwang@google.com
, Jun 5 2018