New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 849753 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
User never visited
Closed: Aug 9
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

LXC launch hangs forever on network error

Project Member Reported by dgreid@chromium.org, Jun 5 2018

Issue description

downloading a container hangs if there is any network issue during the download.

Reproduction steps.

1) crosh
2) crosh> vmc start termina
3) (termina) chronos@localhost ~ $ lxc launch ubuntu:16.04 c1
4) wait for root fs download to start
5) toggle wifi to a different nextwork in the system UI
6) download hangs and never times out.
 

Comment 1 by jkwang@google.com, Jun 5 2018

Owner: jkwang@google.com

Comment 2 by jkwang@google.com, Jun 12 2018

reproduced in ubuntu 18.04, checking if it's fixed in upstream. Will cook a patch if not.

Comment 3 by jkwang@google.com, Jun 13 2018

Cc: christia...@gmail.com
+christianvanbrauner@gmail.com
Mind pointing me to the code that actually download image?

Comment 4 by jkwang@google.com, Jun 13 2018

Should I look at CancelableDownload?
+jkwang@google.com sure. :)

The main function that download images when you create a container is here:

https://github.com/lxc/lxd/blob/ee1ec7db91f1b1fc50042f52c65bd7d31384df95/lxd/daemon_images.go#L99
I'm trying to remember whether this was caused by a go bug or not.

Comment 8 by jkwang@google.com, Jun 14 2018

Cc: dgreid@chromium.org
Well, the overall go/http client behavior is weird when it comes to network change. I have this code to test go/http: https://gist.github.com/Catramen/ec505a49dc212fce0437b73002783d89

I did lots of "lxc launch" tests and go/http test. The following things happened without a pattern:
1. It could resume happily like nothing happened.
2. It could just block. (I am not even sure if it is blocked. I waited for 10mins and nothing new).
3. It resumes, but in a very low speed.

My best guess is that there is some backoff/retry algorithm in the go/http implementation, and the behavior is not lovely these days.

Maybe I can figure out a way to turn off the "resume" completely, "fail fast" is not as bad as blocking.
Yes, I think Stéphane (Graber) either filed a bug against go or was at least discussing it with me a while back. The problem is - iirc - that the timeout handling is suboptimal. We've had problems with this before...
We could also do the "nuke sockets" option where we destroy the connections in the kernel when the host switches networks. abhishekbh knows how to do that.
Components: OS>Systems>Containers
Project Member

Comment 12 by bugdroid1@chromium.org, Jul 11

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/dc84156e1fc42dfa63028999e7a39296496a9ee0

commit dc84156e1fc42dfa63028999e7a39296496a9ee0
Author: Jingkui Wang <jkwang@google.com>
Date: Wed Jul 11 01:47:30 2018

app-emulation/lxd: set tcp keepalive

This patch set tcp session keepalive. It will fix the problem that lxc
download block forever when network is changed.

BUG= chromium:849753 
TEST=build termina image and lxc launch, change network.

Change-Id: I4da65fad6a429222e9aa0ec8d69a32c1a4a1eab9
Reviewed-on: https://chromium-review.googlesource.com/1103150
Commit-Ready: Stephen Barber <smbarber@chromium.org>
Tested-by: Stephen Barber <smbarber@chromium.org>
Reviewed-by: Stephen Barber <smbarber@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[add] https://crrev.com/dc84156e1fc42dfa63028999e7a39296496a9ee0/app-emulation/lxd/lxd-3.0.0-r1.ebuild
[modify] https://crrev.com/dc84156e1fc42dfa63028999e7a39296496a9ee0/app-emulation/lxd/lxd-3.0.0.ebuild
[add] https://crrev.com/dc84156e1fc42dfa63028999e7a39296496a9ee0/app-emulation/lxd/files/3.00-tcp-keepalive.patch

Status: Assigned (was: Available)
Owner: jkwang@chromium.org
@jkwang is anything more needed on this bug for M69? Please update milestone label if not.
Status: Fixed (was: Assigned)

Sign in to add a comment