New issue
Advanced search Search tips

Issue 899273 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Dec 4
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

crosvm qcow2 cluster allocations may exceed valid refcount table size

Project Member Reported by dverkamp@chromium.org, Oct 26

Issue description

We need to make sure that unreferenced clusters (e.g. those freed by discard requests) are reused before we extend the length of the file.  Otherwise, the get_new_cluster() code will happily extend the file beyond the pre-allocated number of reference count table entries, which will cause a failure once the refcount for the new clusters is updated.

The current code reuses unreferenced clusters while crovsm is still running, but if the VM is shut down and restarted, the unreferenced clusters are not recovered.
 
Cc: davidri...@chromium.org
I just ran into this when messing around with steam.

Excerpt of logs:
2018-11-09T15:22:39.554593-08:00 ERR localhos[23706]: crosvm[37]: [devices/src/virtio/block.rs:502] failed executing disk request: Write { addr: GuestAddress(10136989696), length: 4096, sector: 52947544, guestmemerr: MemoryAccess(GuestAddress(10136989696), ReadFromSource(Os { code: 22, kind: InvalidInput, message: "Invalid argument" })) }
2018-11-09T15:22:39.554710-08:00 ERR localhos[23706]: crosvm[37]: [devices/src/virtio/block.rs:502] failed executing disk request: Write { addr: GuestAddress(10137071616), length: 4096, sector: 52947552, guestmemerr: MemoryAccess(GuestAddress(10137071616), ReadFromSource(Os { code: 22, kind: InvalidInput, message: "Invalid argument" })) }
2018-11-09T15:22:39.554837-08:00 ERR localhos[23706]: crosvm[37]: [devices/src/virtio/block.rs:502] failed executing disk request: Write { addr: GuestAddress(10137149440), length: 4096, sector: 52947560, guestmemerr: MemoryAccess(GuestAddress(10137149440), ReadFromSource(Os { code: 22, kind: InvalidInput, message: "Invalid argument" })) }
2018-11-09T15:22:39.568016-08:00 INFO VM(70)[23707]: [   95.442041] print_req_error: I/O error, dev vdb, sector 52935112#015
2018-11-09T15:22:39.571273-08:00 INFO VM(70)[23707]: [   95.442041] print_req_error: I/O error, dev vdb, sector 52935584#015
2018-11-09T15:22:39.576648-08:00 INFO VM(70)[23707]: [   95.442041] print_req_error: I/O error, dev vdb, sector 52937200#015
2018-11-09T15:22:39.581327-08:00 INFO VM(70)[23707]: [   95.461792] BTRFS error (device vdb): bdev /dev/vdb errs: wr 1, rd 0, flush 0, corrupt 0, gen 0#015
2018-11-09T15:22:39.583652-08:00 INFO VM(70)[23707]: [   95.470009] print_req_error: I/O error, dev vdb, sector 52937632#015
2018-11-09T15:22:39.585746-08:00 INFO VM(70)[23707]: [   95.470009] print_req_error: I/O error, dev vdb, sector 52938016#015
2018-11-09T15:22:39.587789-08:00 INFO VM(70)[23707]: [   95.470009] print_req_error: I/O error, dev vdb, sector 52939040#015
2018-11-09T15:22:39.589829-08:00 INFO VM(70)[23707]: [   95.470009] print_req_error: I/O error, dev vdb, sector 52939680#015
2018-11-09T15:22:39.591815-08:00 INFO VM(70)[23707]: [   95.470009] print_req_error: I/O error, dev vdb, sector 52939816#015
2018-11-09T15:22:39.593823-08:00 INFO VM(70)[23707]: [   95.470009] print_req_error: I/O error, dev vdb, sector 52940808#015
2018-11-09T15:22:39.596086-08:00 INFO VM(70)[23707]: [   95.470009] BTRFS error (device vdb): bdev /dev/vdb errs: wr 2, rd 0, flush 0, corrupt 0, gen 0#015
2018-11-09T15:22:39.597957-08:00 INFO VM(70)[23707]: [   95.485590] print_req_error: I/O error, dev vdb, sector 52941728#015
2018-11-09T15:22:39.600284-08:00 INFO VM(70)[23707]: [   95.485590] BTRFS error (device vdb): bdev /dev/vdb errs: wr 3, rd 0, flush 0, corrupt 0, gen 0#015
2018-11-09T15:22:39.600980-08:00 ERR localhos[23706]: crosvm[37]: [devices/src/virtio/block.rs:502] failed executing disk request: Write { addr: GuestAddress(10137145344), length: 4096, sector: 52947568, guestmemerr: MemoryAccess(GuestAddress(10137145344), ReadFromSource(Os { code: 22, kind: InvalidInput, message: "Invalid argument" })) }
2018-11-09T15:22:39.601088-08:00 ERR localhos[23706]: crosvm[37]: [devices/src/virtio/block.rs:502] failed executing disk request: Write { addr: GuestAddress(10137202688), length: 4096, sector: 52947576, guestmemerr: MemoryAccess(GuestAddress(10137202688), ReadFromSource(Os { code: 22, kind: InvalidInput, message: "Invalid argument" })) }
2018-11-09T15:22:39.601184-08:00 ERR localhos[23706]: crosvm[37]: [devices/src/virtio/block.rs:502] failed executing disk request: Write { addr: GuestAddress(10140348416), length: 4096, sector: 52947584, guestmemerr: MemoryAccess(GuestAddress(10140348416), ReadFromSource(Os { code: 22, kind: InvalidInput, message: "Invalid argument" })) }
Project Member

Comment 2 by bugdroid1@chromium.org, Nov 16

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/crosvm/+/f0fd764428418cd520dfb6d60ae65a790c18cecc

commit f0fd764428418cd520dfb6d60ae65a790c18cecc
Author: Daniel Verkamp <dverkamp@chromium.org>
Date: Fri Nov 16 13:02:06 2018

qcow: calculate refcount table size correctly

The refcount table needs to include not only the data clusters and
reftable clusters but also the L1 and L2 tables and main qcow2 header.

Also add sanity checking to prevent allocating a cluster that cannot be
indexed with the current reference count table size.

BUG= chromium:899273 
TEST=cargo test -p qcow

Change-Id: I9da4515db3dccbabdeee4f60dc392b5b42d62cb2
Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1308833
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/f0fd764428418cd520dfb6d60ae65a790c18cecc/qcow/src/qcow_raw_file.rs
[modify] https://crrev.com/f0fd764428418cd520dfb6d60ae65a790c18cecc/qcow/src/vec_cache.rs
[modify] https://crrev.com/f0fd764428418cd520dfb6d60ae65a790c18cecc/qcow/src/refcount.rs
[modify] https://crrev.com/f0fd764428418cd520dfb6d60ae65a790c18cecc/qcow/src/qcow.rs

Project Member

Comment 3 by bugdroid1@chromium.org, Dec 1

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/crosvm/+/ef37e2fe15c3bf476932bee0a47a096c126a94c7

commit ef37e2fe15c3bf476932bee0a47a096c126a94c7
Author: Daniel Verkamp <dverkamp@chromium.org>
Date: Sat Dec 01 09:08:40 2018

qcow: add support for rebuilding refcounts

This adds the ability to regenerate the reference counts by walking all
of the L1/L2 tables and headers to find all reachable clusters.  This is
necessary for the next patch, which will use the reference count tables
to find unused clusters to reuse.

BUG= chromium:899273 
TEST=cargo test -p cqow

Change-Id: I93dd00d381d8d33010fddfc10aa18ca32586e1f4
Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1327821
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/ef37e2fe15c3bf476932bee0a47a096c126a94c7/qcow/src/qcow.rs

Project Member

Comment 4 by bugdroid1@chromium.org, Dec 4

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/crosvm/+/2ea8f3d0aa7230da41b3e6beba27ff9140f661d3

commit 2ea8f3d0aa7230da41b3e6beba27ff9140f661d3
Author: Daniel Verkamp <dverkamp@chromium.org>
Date: Tue Dec 04 08:11:37 2018

qcow: scan for free clusters at startup

During runtime, we track unreferenced clusters (via unref_clusters and
avail_clusters) and reuse them before extending the disk image.
However, across boots, we did not previously recover the list of
unreferenced clusters, so the disk file could grow beyond the range that
the reference table count represent.  This patch adds a boot-time scan
for all unreferenced clusters so that they get reused.

BUG= chromium:899273 
TEST=Boot with qcow2 image, fill the disk with dd, delete the dd'd file,
refill with dd, and so on, repeatedly. Ensure that the disk image does
not grow beyond the expected max size and that no clusters beyond the
size of the refcount table are used.

Change-Id: Idd21b08bb4c55b8244e7ecaccafc4ccc46b7b17a
Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1327822
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/2ea8f3d0aa7230da41b3e6beba27ff9140f661d3/qcow/src/qcow.rs

Status: Fixed (was: Started)
With the last commit above, qcow2 images should now always stay bounded in size rather than growing new clusters at the end, even across shutdown/startup of the VM.

Sign in to add a comment