New issue
Advanced search Search tips

Issue 905683 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: Dec 13
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

chromeos-init flaky loopback filesystem unit test

Project Member Reported by zwisler@google.com, Nov 15

Issue description

Failed build:

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8929799010603371408

Log:

https://luci-logdog.appspot.com/logs/chromeos/buildbucket/cr-buildbucket.appspot.com/8929799010603371408/+/steps/UnitTest/0/stdout

Relevant messages:

chromeos-init-0.0.25-r3699: mke2fs 1.44.1 (24-Mar-2018)
chromeos-init-0.0.25-r3699: Suggestion: Use Linux kernel >= 3.18 for improved stability of the metadata and journal checksum features.
chromeos-init-0.0.25-r3699: Warning: could not erase sector 2: Input/output error
chromeos-init-0.0.25-r3699: Creating filesystem with 2048 1k blocks and 256 inodes
chromeos-init-0.0.25-r3699: 
chromeos-init-0.0.25-r3699: Allocating group tables: 0/1   done                            
chromeos-init-0.0.25-r3699: Warning: could not read block 0: Input/output error
chromeos-init-0.0.25-r3699: Warning: could not erase sector 0: Input/output error
chromeos-init-0.0.25-r3699: Writing inode tables: 0/1   done                            
chromeos-init-0.0.25-r3699: ext2fs_update_bb_inode: Input/output error while setting bad block inode
chromeos-init-0.0.25-r3699: ../../../../../../../tmp/portage/chromeos-base/chromeos-init-0.0.25-r3699/work/chromeos-init-0.0.25/init/tests/clobber_state_test.cc:197: Failure
chromeos-init-0.0.25-r3699: Value of: MakeFilesystem("ext4", 5)
chromeos-init-0.0.25-r3699:   Actual: false
chromeos-init-0.0.25-r3699: Expected: true
chromeos-init-0.0.25-r3699: terminating with uncaught exception of type testing::internal::GoogleTestFailureException: ../../../../../../../tmp/portage/chromeos-base/chromeos-init-0.0.25-r3699/work/chromeos-init-0.0.25/init/tests/clobber_state_test.cc:197: Failure
chromeos-init-0.0.25-r3699: Value of: MakeFilesystem("ext4", 5)
chromeos-init-0.0.25-r3699:   Actual: false
chromeos-init-0.0.25-r3699: Expected: true
chromeos-init-0.0.25-r3699: Error: /var/cache/portage/chromeos-base/chromeos-init/out/Default/clobber_state_test: failed with signal SIGIOT|SIGABRT(6)
chromeos-init-0.0.25-r3699:  * ERROR: chromeos-base/chromeos-init-0.0.25-r3699::chromiumos failed (test phase):

It looks like a disk error is preventing the mkfs.ext4 from succeeding, which causes a unit test failure.
 
Cc: mikenichols@chromium.org
That ran on "swarm-cros-525" Checking the serial console at about the time of the failure, I see:


(Normal ARC++ startup stuff...)

Nov 15 04:40:24 swarm-cros-525 sa_jack_unittest: Must specify the JackType for jack 'Headphone Jack' in 'Headphone'.
Nov 15 04:42:05 swarm-cros-525 p2p-http-server: p2p-http-server starting [../../../../../../../tmp/portage/chromeos-base/p2p-0.0.1-r2956/work/p2p-0.0.1/p2p/http_server/main.cc:49]
Nov 15 04:42:05 swarm-cros-525 p2p-http-server: Maximum download rate per connection set to 125000 bytes/sec [../../../../../../../tmp/portage/chromeos-base/p2p-0.0.1-r2956/work/p2p-0.0.1/p2p/http_server/main.cc:75]
Nov 15 04:42:05 swarm-cros-525 p2p-http-server: Sending message {PortNumber: 41024} [../../../../../../../tmp/portage/chromeos-base/p2p-0.0.1-r2956/work/p2p-0.0.1/p2p/http_server/server.cc:267]
Nov 15 04:43:00 swarm-cros-525 kernel: [12908.693606] EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
Nov 15 04:43:00 swarm-cros-525 kernel: [12908.843939] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:43:00 swarm-cros-525 kernel: [12908.844894] EXT4-fs (loop1): mounted filesystem without journal. Opts: (null)
Nov 15 04:43:33 swarm-cros-525 drivefs: [139918123202944] core.cc:352:MaybeUpdateExitCode Terminating with error: NON_EMPTY_MOUNT_POINT
Nov 15 04:43:33 swarm-cros-525 drivefs: [139918123202944] core.cc:352:MaybeUpdateExitCode Terminating with error: RESTART_REQUESTED
Nov 15 04:43:34 swarm-cros-525 drivefs: [139918123202944] cello.cc:564:InitializeInternal Failed to initialize content cache: INTERRUPTED
Nov 15 04:43:34 swarm-cros-525 drivefs: [139918123202944] core.cc:1076:InitCelloFsSync CelloFS initialization failed: INTERRUPTED
Nov 15 04:43:34 swarm-cros-525 drivefs: [139918123202944] core.cc:352:MaybeUpdateExitCode Terminating with error: CELLOFS_INIT_INTERRUPTED
Nov 15 04:43:55 swarm-cros-525 kernel: [12964.255575] EXT4-fs (loop0p3): mounted filesystem without journal. Opts: (null)
Nov 15 04:43:59 swarm-cros-525 kernel: [12967.586371] EXT4-fs (loop0): mounted filesystem without journal. Opts: (null)
Nov 15 04:43:59 swarm-cros-525 kernel: [12968.347551] EXT4-fs (loop0): mounted filesystem without journal. Opts: (null)
Nov 15 04:44:00 swarm-cros-525 kernel: [12969.286805] EXT4-fs (loop0): mounted filesystem without journal. Opts: (null)
Nov 15 04:44:04 swarm-cros-525 kernel: [12973.003497] EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
Nov 15 04:44:12 swarm-cros-525 kernel: [12980.619337] EXT4-fs (loop0): mounting ext2 file system using the ext4 subsystem
Nov 15 04:44:12 swarm-cros-525 kernel: [12980.637496] EXT4-fs (loop0): mounted filesystem without journal. Opts: (null)
Nov 15 04:44:14 swarm-cros-525 kernel: [12983.055434] EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
Nov 15 04:44:17 swarm-cros-525 kernel: [12986.391329] EXT4-fs (loop0p3): mounting ext2 file system using the ext4 subsystem
Nov 15 04:44:17 swarm-cros-525 kernel: [12986.403595] EXT4-fs (loop0p3): mounted filesystem without journal. Opts: (null)
Nov 15 04:44:17 swarm-cros-525 kernel: [12986.435200] EXT4-fs (loop0p1): mounted filesystem with ordered data mode. Opts: (null)
Nov 15 04:44:17 swarm-cros-525 kernel: [12986.441113] EXT4-fs (loop0p8): mounted filesystem with ordered data mode. Opts: (null)
[13006.926194] Buffer I/O error on device loop1p1, logical block 4194176
[13006.936896] Buffer I/O error on device loop1p1, logical block 4194177
[13006.943569] Buffer I/O error on device loop1p1, logical block 4194178
[13006.950194] Buffer I/O error on device loop1p1, logical block 4194179
[13006.958752] Buffer I/O error on device loop1p1, logical block 4194180
Nov 15 04:44:38 swarm-cros-525 kernel: [13006.92[13006.979335] Buffer I/O error on device loop1p1, logical block 4194181
6194] Buffer I/O[13006.986363] Buffer I/O error on device loop1p1, logical block 4194182
 error on device[13006.995732] Buffer I/O error on device loop1p1, logical block 4194183
 loop1p1, logical block 4194176
Nov 15 04:44:38 swarm-cros-525 k[13007.007339] Buffer I/O error on device loop1p1, logical block 4194176
ernel: [13006.93[13007.015035] Buffer I/O error on device loop1p1, logical block 4194177
6896] Buffer I/O error on device loop1p1, logical block 4194177
Nov 15 04:44:38 swarm-cros-525 kernel: [13006.943569] Buffer I/O error on device loop1p1, logical block 4194178
Nov 15 04:44:38 swarm-cros-525 kernel: [13006.950194] Buffer I/O error on device loop1p1, logical block 4194179
Nov 15 04:44:38 swarm-cros-525 kernel: [13006.958752] Buffer I/O error on device loop1p1, logical block 4194180
Nov 15 04:44:38 swarm-cros-525 kernel: [13006.979335] Buffer I/O error on device loop1p1, logical block 4194181
Nov 15 04:44:38 swarm-cros-525 kernel: [13006.986363] Buffer I/O error on device loop1p1, logical block 4194182
Nov 15 04:44:38 swarm-cros-525 kernel: [13006.995732] Buffer I/O error on device loop1p1, logical block 4194183
Nov 15 04:44:38 swarm-cros-525 kernel: [13007.007339] Buffer I/O error on device loop1p1, logical block 4194176
Nov 15 04:44:38 swarm-cros-525 kernel: [13007.015035] Buffer I/O error on device loop1p1, logical block 4194177
Nov 15 04:51:32 swarm-cros-525 kernel: [13420.616145] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:32 swarm-cros-525 kernel: [13421.450030] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:32 swarm-cros-525 kernel: [13421.450435] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:33 swarm-cros-525 kernel: [13421.520732] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:33 swarm-cros-525 kernel: [13421.521069] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:33 swarm-cros-525 kernel: [13421.965745] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:33 swarm-cros-525 kernel: [13421.971532] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:33 swarm-cros-525 kernel: [13422.041508] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:33 swarm-cros-525 kernel: [13422.041949] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:34 swarm-cros-525 kernel: [13422.500146] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:34 swarm-cros-525 kernel: [13422.500695] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:34 swarm-cros-525 kernel: [13422.574230] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:34 swarm-cros-525 kernel: [13422.574680] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:34 swarm-cros-525 kernel: [13422.853954] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:34 swarm-cros-525 kernel: [13422.854762] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:34 swarm-cros-525 kernel: [13422.907309] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:34 swarm-cros-525 kernel: [13422.907782] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:35 swarm-cros-525 kernel: [13423.778301] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:35 swarm-cros-525 kernel: [13423.778806] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:35 swarm-cros-525 kernel: [13423.839683] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:35 swarm-cros-525 kernel: [13423.840016] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:35 swarm-cros-525 kernel: [13424.192528] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:35 swarm-cros-525 kernel: [13424.192936] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:35 swarm-cros-525 kernel: [13424.245851] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:35 swarm-cros-525 kernel: [13424.246174] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:36 swarm-cros-525 kernel: [13424.538760] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:36 swarm-cros-525 kernel: [13424.539140] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:36 swarm-cros-525 kernel: [13424.848941] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:36 swarm-cros-525 kernel: [13424.849277] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:36 swarm-cros-525 kernel: [13424.902537] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:36 swarm-cros-525 kernel: [13424.902899] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:36 swarm-cros-525 kernel: [13425.320777] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:36 swarm-cros-525 kernel: [13425.322198] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:36 swarm-cros-525 kernel: [13425.400118] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:36 swarm-cros-525 kernel: [13425.400593] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:37 swarm-cros-525 kernel: [13425.787669] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:37 swarm-cros-525 kernel: [13425.788431] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:37 swarm-cros-525 kernel: [13425.832590] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:37 swarm-cros-525 kernel: [13425.833192] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:38 swarm-cros-525 kernel: [13426.536818] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:38 swarm-cros-525 kernel: [13426.537381] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:38 swarm-cros-525 kernel: [13426.639473] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:38 swarm-cros-525 kernel: [13426.641023] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:38 swarm-cros-525 kernel: [13427.183953] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:38 swarm-cros-525 kernel: [13427.184469] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:38 swarm-cros-525 kernel: [13427.258430] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:38 swarm-cros-525 kernel: [13427.258776] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:39 swarm-cros-525 kernel: [13427.730655] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:39 swarm-cros-525 kernel: [13427.731103] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:39 swarm-cros-525 kernel: [13427.815951] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:39 swarm-cros-525 kernel: [13427.816342] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:39 swarm-cros-525 kernel: [13428.302387] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:39 swarm-cros-525 kernel: [13428.302918] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:40 swarm-cros-525 kernel: [13429.265600] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:40 swarm-cros-525 kernel: [13429.265999] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:44 swarm-cros-525 kernel: [13432.597806] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:44 swarm-cros-525 kernel: [13432.598364] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:44 swarm-cros-525 kernel: [13432.778673] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:44 swarm-cros-525 kernel: [13432.779030] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:45 swarm-cros-525 kernel: [13433.638239] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:45 swarm-cros-525 kernel: [13433.638718] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:45 swarm-cros-525 kernel: [13433.682422] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:45 swarm-cros-525 kernel: [13433.682839] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:45 swarm-cros-525 kernel: [13433.708782] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:45 swarm-cros-525 kernel: [13433.709207] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:45 swarm-cros-525 kernel: [13433.735157] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:45 swarm-cros-525 kernel: [13433.735555] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:45 swarm-cros-525 kernel: [13433.834080] squashfs: version 4.0 (2009/01/31) Phillip Lougher
Nov 15 04:51:45 swarm-cros-525 kernel: [13433.837922] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:45 swarm-cros-525 kernel: [13433.838331] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:45 swarm-cros-525 kernel: [13433.861009] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:45 swarm-cros-525 kernel: [13433.861323] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:45 swarm-cros-525 kernel: [13433.898417] EXT4-fs (loop2): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:45 swarm-cros-525 kernel: [13433.898839] EXT4-fs (loop2): mounted filesystem without journal. Opts: 
Nov 15 04:51:45 swarm-cros-525 kernel: [13434.166024] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:45 swarm-cros-525 kernel: [13434.166430] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
Nov 15 04:51:45 swarm-cros-525 kernel: [13434.405039] EXT4-fs (loop1): mounting ext2 file system using the ext4 subsystem
Nov 15 04:51:45 swarm-cros-525 kernel: [13434.405459] EXT4-fs (loop1): mounted filesystem without journal. Opts: 
[13497.795393] Buffer I/O error on device loop1p5, logical block 0
[13497.801495] Buffer I/O error on device loop1p5, logical block 0
[13497.807557] Buffer I/O error on device loop1p5, logical block 0
Nov 15 04:52:49 [13497.813710] Buffer I/O error on device loop1p5, logical block 0
swarm-cros-525 k[13497.821138] Buffer I/O error on device loop1p5, logical block 0
ernel: [13497.79[13497.828571] Buffer I/O error on device loop1p5, logical block 0
5385] attempt to[13497.836560] Buffer I/O error on device loop1p5, logical block 0
 access beyond e[13497.843429] Buffer I/O error on device loop1p5, logical block 1
nd of device
Nov[13497.850863] Buffer I/O error on device loop1p5, logical block 2
 15 04:52:49 swa[13497.858291] Buffer I/O error on device loop1p5, logical block 3
rm-cros-525 kernel: [13497.795390] loop1: rw=0, want=315400, limit=102400
Nov 15 04:52:49 swarm-cros-525 kernel: [13497.795391] quiet_error: 332 callbacks suppressed
Nov 15 04:52:49 swarm-cros-525 kernel: [13497.795393] Buffer I/O error on device loop1p5, logical block 0

Followed by a LOT more loop back errors in multiple variations.


I don't really understand what's going on.
Owner: mikenichols@chromium.org
Status: Available (was: Untriaged)
That looks more like an image mount issue than a local drive failure; loop0/loop1 are not local disks.  Looking at the logs it appears this is a chroot/image being mounted.  Not sure if something changed in that or if it was a transient mount issue.  Might suggest firing off a tryjob with that config to follow through a success run (and validate it works now).  

-- Mike
Potentially related failure on squawks-release:

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8929798909459031632

slightly different error message, but also a mkfs failure on a loop device:

chromeos-init-0.0.25-r3699: mke2fs 1.44.1 (24-Mar-2018)
chromeos-init-0.0.25-r3699: The file /dev/loop1p2 does not exist and no size was specified.
chromeos-init-0.0.25-r3699: ../../../../../../../tmp/portage/chromeos-base/chromeos-init-0.0.25-r3699/work/chromeos-init-0.0.25/init/tests/clobber_state_test.cc:195: Failure
chromeos-init-0.0.25-r3699: Value of: MakeFilesystem("ext2", 2)
chromeos-init-0.0.25-r3699:   Actual: false
chromeos-init-0.0.25-r3699: Expected: true
Labels: OS-Chrome
Owner: benchan@chromium.org
Status: Assigned (was: Available)
I think we are convinced at this point that this is a flaky test (based on kernel logs). Over to sheriff to find an owner.
Summary: chromeos-init flaky loopback filesystem unit test (was: banon-release failure due to disk errors)
Another reproduction, this time on cyan-release:

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8929707885927364240

The previous sighting was on banon-release, so this doesn't seem to be tied to a specific builder.

benchan@, any luck finding an owner?
Cc: benchan@chromium.org zwisler@chromium.org
Owner: fletch...@chromium.org
-> fletcherw@chromium.org
This is a unit test I added in https://crrev.com/c/1297181. Would you like me to rollback the CL while I investigate?

Unclear why it's being flaky; any idea what might be causing the call to mkfs.ext4 to fail?
Cc: fletch...@chromium.org posciak@chromium.org
 Issue 905080  has been merged into this issue.
we've long had issues on bots with loopbacks and kernel reliability.  it's why a lot of code has explicit retries & syncs in them.
vapier: any idea why?  Some ideas we had when brainstorming: 1) has someone altered the file we have mounted via loopback?  changed permissions, deleted, moved, changed owner, etc?  2) is the file we are using for loopback failing a block allocation as we try and write to it, either because of a transient error or because the filesystem is full?

Do you have any other ideas of things we could check?  Adding retries seems like a last resort. :-/
Project Member

Comment 13 by bugdroid1@chromium.org, Dec 6

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/a51b89fa44277358cd1d4ca4b0269e74ddb5151c

commit a51b89fa44277358cd1d4ca4b0269e74ddb5151c
Author: Fletcher Woodruff <fletcherw@chromium.org>
Date: Thu Dec 06 02:28:53 2018

init: tweak clobber-state test to stop flakes

clobber_state_test was failing occasionally on release builders.

Reduce the size of the mock disk image and ensure that all blocks are
allocated in order to (hopefully) ensure that the mkfs calls don't fail
due to I/O errors.

BUG= chromium:905683 
TEST=run unit tests

Change-Id: Icc3407dbb77ce77e26e57324e7681e7abef4bd91
Reviewed-on: https://chromium-review.googlesource.com/1361569
Commit-Ready: Fletcher Woodruff <fletcherw@chromium.org>
Tested-by: Fletcher Woodruff <fletcherw@chromium.org>
Reviewed-by: Dan Erat <derat@chromium.org>
Reviewed-by: Ross Zwisler <zwisler@chromium.org>
Reviewed-by: Justin TerAvest <teravest@chromium.org>

[modify] https://crrev.com/a51b89fa44277358cd1d4ca4b0269e74ddb5151c/init/tests/clobber_state_test.cc

Status: Started (was: Assigned)
Still happening: https://luci-logdog.appspot.com/logs/chromeos/buildbucket/cr-buildbucket.appspot.com/8927835311615731024/+/steps/UnitTest/0/stdout

  chromeos-init-0.0.25-r3708: mke2fs 1.44.1 (24-Mar-2018)
  chromeos-init-0.0.25-r3708: Warning: could not erase sector 2: Input/output error
  chromeos-init-0.0.25-r3708: Creating filesystem with 16384 1k blocks and 4096 inodes
  chromeos-init-0.0.25-r3708: Filesystem UUID: 50a937ef-6711-4b76-b8c6-f5f5c55e9780
  chromeos-init-0.0.25-r3708: Superblock backups stored on blocks: 
  chromeos-init-0.0.25-r3708: 	8193
  chromeos-init-0.0.25-r3708: 
  chromeos-init-0.0.25-r3708: Allocating group tables: 0/2   done                            
  chromeos-init-0.0.25-r3708: Warning: could not read block 0: Input/output error
  chromeos-init-0.0.25-r3708: Warning: could not erase sector 0: Input/output error
  chromeos-init-0.0.25-r3708: Writing inode tables: 0/2   done                            
  chromeos-init-0.0.25-r3708: Writing superblocks and filesystem accounting information: 0/2
  chromeos-init-0.0.25-r3708: Warning, had trouble writing out superblocks.
  chromeos-init-0.0.25-r3708: ../../../../../../../tmp/portage/chromeos-base/chromeos-init-0.0.25-r3708/work/chromeos-init-0.0.25/init/tests/clobber_state_test.cc:199: Failure
  chromeos-init-0.0.25-r3708: Value of: MakeFilesystem("ext2", 2)
  chromeos-init-0.0.25-r3708:   Actual: false
  chromeos-init-0.0.25-r3708: Expected: true

How can I go about adding retries here? Or would I be better served by just removing this test and doing everything on-device?
It seems bound to fail occasionally even with retries, so if the cause can't be found, it's probably best to run it on-device instead (assuming that that works reliably).
Components: OS>Systems
Why would we assume that this would work reliably on device? I don't understand why that should be better than running on a builder.
If we don't understand the source of the flake, the test should be made informational so that it stops killing builds.

I put up a revert CL, but if making it informational would work too maybe that's better.  How do I set a test as informational. None of the pages under https://www.chromium.org/chromium-os/testing even mention it.
Cc: derat@chromium.org
Labels: -Pri-1 Pri-3
Dan, can you answer the question in #19? I don't remember the details.
Cc: vapier@chromium.org
I don't think there's such a thing as an informational unit test. As I understand it, if something that runs during src_test fails, then the package fails. Mike would know for sure.

You can add a DISABLED_ prefix to the test to make it be skipped, I think: https://github.com/google/googletest/blob/master/googletest/docs/advanced.md#temporarily-disabling-tests

In general, running this test but ignoring its failures probably isn't something that we'd want to do, I think. It's unlikely to ever be fixed, so we'll probably just pay the maintenance and time cost to keep the code compiling without getting any benefits from it.
Status: Fixed (was: Started)
Reverted test.

Sign in to add a comment