New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 863455 link

Starred by 4 users

Issue metadata

Status: WontFix
Owner:
Closed: Sep 5
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Container download fails often

Project Member Reported by tbuck...@chromium.org, Jul 13

Issue description

When using lxd to fetch the container, there seems to be a high failure rate. While it'll now exit instead of hang, we should ideally handle this more gracefully.

Can we...
- work with lxd to retry some number of times?
- try downloading the container via a dedicated tool, eg. curl or Chrome's download manager?
 
Cc: jkwang@chromium.org
Interesting, what a failure looks like. Is it due to some network error?
It would be good if you could file an issue upstream so that we can track this down. We'd need the daemon log at least and any other information you can provide.
Cc: hugobenichi@chromium.org abhishekbh@chromium.org
We might post a PR for retrying upstream, but not sure if we'll need that yet.

I was looking at this on eve and the underlying culprit is wifi. Attaching my /var/log/messages and net.log which show wifi AP reassociations while in the middle of downloads.

+hugo and abhishek - I understand if we change network interfaces or change LANs we'd need to nuke sockets in the VM, but in this case we're just changing APs on the same LAN. The host's IP should remain the same AFAIK.

And FWIW, this affects the host as well (I had a long running ping command):
64 bytes from 176.32.103.205: icmp_seq=64 ttl=229 time=68.9 ms
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable
64 bytes from 176.32.103.205: icmp_seq=67 ttl=229 time=70.9 ms

lxc monitor showed:

metadata:
  class: task
  created_at: "2018-07-25T22:34:26.193003127Z"
  description: Creating container
  err: ""
  id: ef523da5-4e94-44ea-82b5-6f514cdd6f63
  may_cancel: true
  metadata:
    download_progress: 'rootfs: 74% (1.55MB/s)'
  resources:
    containers:
    - /1.0/containers/penguin
  status: Running
  status_code: 103
  updated_at: "2018-07-25T22:37:06.957912122Z"
timestamp: "2018-07-25T22:37:06.958060318Z"
type: operation


metadata:
  context: {}
  level: dbug
  message: 'Failure for task operation: ef523da5-4e94-44ea-82b5-6f514cdd6f63: read
    tcp 100.115.92.6:36508->216.58.192.16:443: read: connection timed out'
timestamp: "2018-07-25T22:37:38.084557982Z"
type: logging
metadata:
  class: task
  created_at: "2018-07-25T22:34:26.193003127Z"
  description: Creating container
  err: 'read tcp 100.115.92.6:36508->216.58.192.16:443: read: connection timed out'
  id: ef523da5-4e94-44ea-82b5-6f514cdd6f63
  may_cancel: false
  metadata:
    download_progress: 'rootfs: 74% (1.55MB/s)'
  resources:
    containers:
    - /1.0/containers/penguin
  status: Failure
  status_code: 400
  updated_at: "2018-07-25T22:37:06.957912122Z"
timestamp: "2018-07-25T22:37:38.084622834Z"
type: operation
net.log
622 KB View Download
messages
509 KB View Download
Downloading a large file in Chrome also hit the same problem - the download failed because network connectivity was temporarily lost.
Status: WontFix (was: Assigned)
I think the underlying wifi issues have been fixed, since I don't see this anymore.

Sign in to add a comment