Container download fails often |
|||
Issue descriptionWhen using lxd to fetch the container, there seems to be a high failure rate. While it'll now exit instead of hang, we should ideally handle this more gracefully. Can we... - work with lxd to retry some number of times? - try downloading the container via a dedicated tool, eg. curl or Chrome's download manager?
,
Jul 13
Interesting, what a failure looks like. Is it due to some network error?
,
Jul 15
It would be good if you could file an issue upstream so that we can track this down. We'd need the daemon log at least and any other information you can provide.
,
Jul 25
We might post a PR for retrying upstream, but not sure if we'll need that yet.
I was looking at this on eve and the underlying culprit is wifi. Attaching my /var/log/messages and net.log which show wifi AP reassociations while in the middle of downloads.
+hugo and abhishek - I understand if we change network interfaces or change LANs we'd need to nuke sockets in the VM, but in this case we're just changing APs on the same LAN. The host's IP should remain the same AFAIK.
And FWIW, this affects the host as well (I had a long running ping command):
64 bytes from 176.32.103.205: icmp_seq=64 ttl=229 time=68.9 ms
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable
64 bytes from 176.32.103.205: icmp_seq=67 ttl=229 time=70.9 ms
lxc monitor showed:
metadata:
class: task
created_at: "2018-07-25T22:34:26.193003127Z"
description: Creating container
err: ""
id: ef523da5-4e94-44ea-82b5-6f514cdd6f63
may_cancel: true
metadata:
download_progress: 'rootfs: 74% (1.55MB/s)'
resources:
containers:
- /1.0/containers/penguin
status: Running
status_code: 103
updated_at: "2018-07-25T22:37:06.957912122Z"
timestamp: "2018-07-25T22:37:06.958060318Z"
type: operation
metadata:
context: {}
level: dbug
message: 'Failure for task operation: ef523da5-4e94-44ea-82b5-6f514cdd6f63: read
tcp 100.115.92.6:36508->216.58.192.16:443: read: connection timed out'
timestamp: "2018-07-25T22:37:38.084557982Z"
type: logging
metadata:
class: task
created_at: "2018-07-25T22:34:26.193003127Z"
description: Creating container
err: 'read tcp 100.115.92.6:36508->216.58.192.16:443: read: connection timed out'
id: ef523da5-4e94-44ea-82b5-6f514cdd6f63
may_cancel: false
metadata:
download_progress: 'rootfs: 74% (1.55MB/s)'
resources:
containers:
- /1.0/containers/penguin
status: Failure
status_code: 400
updated_at: "2018-07-25T22:37:06.957912122Z"
timestamp: "2018-07-25T22:37:38.084622834Z"
type: operation
,
Jul 25
Downloading a large file in Chrome also hit the same problem - the download failed because network connectivity was temporarily lost.
,
Sep 5
I think the underlying wifi issues have been fixed, since I don't see this anymore. |
|||
►
Sign in to add a comment |
|||
Comment 1 by tbuck...@chromium.org
, Jul 13