New issue
Advanced search Search tips

Issue 908983 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Dec 6
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

ext4 file system corruption in linux-4.19.y

Project Member Reported by groeck@chromium.org, Nov 27

Issue description

Tracking bug. See https://bugzilla.kernel.org/show_bug.cgi?id=201685 for further details.

 
Labels: M-72
chromeos-4.19 images MUST NOT SHIP if ext4 is enabled unless this problem has been resolved. I would mark the bug as ReleaseBlock{Beta,Stable}, but I don't know if that would affect systems running other kernel versions.
Note that it is currently unknown if the problem is really an ext4 problem. It has so far only been reported on ext4 volumes, but the ext4 maintainer is not convinced that it is an ext4 problem.

Cc: wonderfly@google.com
Status: Started (was: Assigned)
Project Member

Comment 4 by bugdroid1@chromium.org, Dec 6

Labels: merge-merged-chromeos-4.19
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/e18c4a993c7924506d2cd52ad870949f32494325

commit e18c4a993c7924506d2cd52ad870949f32494325
Author: Jens Axboe <axboe@kernel.dk>
Date: Thu Dec 06 18:37:37 2018

UPSTREAM: blk-mq: fix corruption with direct issue

If we attempt a direct issue to a SCSI device, and it returns BUSY, then
we queue the request up normally. However, the SCSI layer may have
already setup SG tables etc for this particular command. If we later
merge with this request, then the old tables are no longer valid. Once
we issue the IO, we only read/write the original part of the request,
not the new state of it.

This causes data corruption, and is most often noticed with the file
system complaining about the just read data being invalid:

[  235.934465] EXT4-fs error (device sda1): ext4_iget:4831: inode #7142: comm dpkg-query: bad extra_isize 24937 (inode size 256)

because most of it is garbage...

This doesn't happen from the normal issue path, as we will simply defer
the request to the hardware queue dispatch list if we fail. Once it's on
the dispatch list, we never merge with it.

Fix this from the direct issue path by flagging the request as
REQ_NOMERGE so we don't change the size of it before issue.

See also:
  https://bugzilla.kernel.org/show_bug.cgi?id=201685

Tested-by: Guenter Roeck <linux@roeck-us.net>
Fixes: 6ce3dd6eec1 ("blk-mq: issue directly if hw queue isn't busy in case of 'none'")
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit ffe81d45322cc3cb140f0db080a4727ea284661e)

BUG= chromium:908983 
TEST=ext4 file system test on ssd

Change-Id: Ib03bb110f575c5d70e2f4466d311e02528b3272b
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1364160
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Nicolas Boichat <drinkcat@chromium.org>

[modify] https://crrev.com/e18c4a993c7924506d2cd52ad870949f32494325/block/blk-mq.c

Status: Fixed (was: Started)
Cc: lakitu-dev@google.com

Sign in to add a comment