gandof-release: Tests failing likely due to DUT malfunction |
||||
Issue descriptionLooks like following DUT has problems with its SSD: chromeos4-row5-rack8-host1 Related build: https://luci-milo.appspot.com/buildbot/chromeos/gandof-release/1393 Logs: https://storage.cloud.google.com/chromeos-autotest-results/135265221-chromeos-test/chromeos4-row5-rack8-host1/sysinfo/var/log/messages?_ga=2.238646848.-1355467645.1491207816 I can see a lot of ata, dm0 and ext4 errors there, for example: 2017-08-15T12:55:13.788896-07:00 ERR kernel: [ 162.853515] ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x40000 action 0x0 2017-08-15T12:55:13.788909-07:00 ERR kernel: [ 162.853526] ata1.00: irq_stat 0x40000008 2017-08-15T12:55:13.788911-07:00 ERR kernel: [ 162.853533] ata1: SError: { CommWake } 2017-08-15T12:55:13.788912-07:00 ERR kernel: [ 162.853540] ata1.00: failed command: READ FPDMA QUEUED 2017-08-15T12:55:13.788913-07:00 ERR kernel: [ 162.853551] ata1.00: cmd 60/08:00:90:50:01/00:00:00:00:00/40 tag 0 ncq 4096 in 2017-08-15T12:55:13.789608-07:00 ERR kernel: [ 162.853551] res 41/40:02:96:50:01/00:00:00:00:00/00 Emask 0x409 (media error) <F> 2017-08-15T12:55:13.790279-07:00 ERR kernel: [ 162.853567] ata1.00: status: { DRDY ERR } 2017-08-15T12:55:13.790284-07:00 ERR kernel: [ 162.853573] ata1.00: error: { UNC } 2017-08-15T12:55:13.790285-07:00 INFO kernel: [ 162.853919] ata1.00: configured for UDMA/100 2017-08-15T12:55:13.790286-07:00 ERR kernel: [ 162.853935] ata1: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 t4 2017-08-15T12:55:13.790287-07:00 ERR kernel: [ 162.853942] ata1: irq_stat 0x40000001 2017-08-15T12:55:13.790288-07:00 INFO kernel: [ 162.853958] sd 0:0:0:0: [sda] Unhandled sense code 2017-08-15T12:55:13.790289-07:00 INFO kernel: [ 162.853965] sd 0:0:0:0: [sda] 2017-08-15T12:55:13.790290-07:00 NOTICE kernel: [ 162.853970] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE 2017-08-15T12:55:13.790291-07:00 INFO kernel: [ 162.853977] sd 0:0:0:0: [sda] 2017-08-15T12:55:13.790292-07:00 NOTICE kernel: [ 162.853981] Sense Key : Medium Error [current] [descriptor] 2017-08-15T12:55:13.790293-07:00 NOTICE kernel: [ 162.853992] Descriptor sense data with sense descriptors (in hex): 2017-08-15T12:55:13.790294-07:00 INFO kernel: [ 162.853999] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 2017-08-15T12:55:13.790295-07:00 INFO kernel: [ 162.854023] 00 01 50 96 2017-08-15T12:55:13.790296-07:00 INFO kernel: [ 162.854034] sd 0:0:0:0: [sda] 2017-08-15T12:55:13.790297-07:00 NOTICE kernel: [ 162.854039] Add. Sense: Unrecovered read error - auto reallocate failed 2017-08-15T12:55:13.790298-07:00 INFO kernel: [ 162.854048] sd 0:0:0:0: [sda] CDB: 2017-08-15T12:55:13.790299-07:00 NOTICE kernel: [ 162.854053] Read(10): 28 00 00 01 50 90 00 00 08 00 2017-08-15T12:55:13.790299-07:00 ERR kernel: [ 162.854072] end_request: I/O error, dev sda, sector 86166 2017-08-15T12:55:13.790300-07:00 ERR kernel: [ 162.854081] Buffer I/O error on device sda8, logical block 18 2017-08-15T12:55:13.790301-07:00 INFO kernel: [ 162.854105] ata1: EH complete 2017-08-15T12:55:13.913263-07:00 DEBUG kernel: [ 162.979415] SELinux: initialized (dev proc, type proc), uses genfs_contexts 2017-08-15T12:55:13.954263-07:00 INFO kernel: [ 163.019605] tpm_tis tpm_tis: command 0x41 (size 618) returned code 0x0 2017-08-15T12:55:13.990266-07:00 INFO kernel: [ 163.055520] tpm_tis tpm_tis: command 0xba (size 18) returned code 0x22 2017-08-15T12:55:13.996394-07:00 INFO chapsd[995]: Finished importing 0 pending objects. 2017-08-15T12:55:13.996555-07:00 INFO chapsd[995]: Done waiting for private objects. 2017-08-15T12:55:14.004124-07:00 INFO chapsd[995]: Master key is ready for token at /home/root/80c01837745be64ed78d409abda01e922da7f7df/chaps 2017-08-15T12:55:14.020266-07:00 INFO kernel: [ 163.085576] tpm_tis tpm_tis: command 0x65 (size 18) returned code 0x0 2017-08-15T12:55:14.027265-07:00 DEBUG kernel: [ 163.093007] SELinux: initialized (dev proc, type proc), uses genfs_contexts 2017-08-15T12:55:14.062261-07:00 INFO kernel: [ 163.128179] tpm_tis tpm_tis: command 0x21 (size 14) returned code 0x0 2017-08-15T12:55:14.196266-07:00 INFO kernel: [ 163.261653] tpm_tis tpm_tis: command 0x65 (size 18) returned code 0x0 2017-08-15T12:55:14.201002-07:00 INFO session_manager[5155]: [INFO:policy_store.cc(71)] Persisted policy to disk. 2017-08-15T12:55:14.350439-07:00 INFO session_manager[5155]: [INFO:child_exit_handler.cc(77)] Handling 5744 exit. 2017-08-15T12:55:14.398296-07:00 DEBUG kernel: [ 163.463800] SELinux: initialized (dev binfmt_misc, type binfmt_misc), uses genfs_contexts 2017-08-15T12:55:14.409262-07:00 DEBUG kernel: [ 163.475471] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs 2017-08-15T12:55:14.410267-07:00 DEBUG kernel: [ 163.475961] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs 2017-08-15T12:55:14.410277-07:00 DEBUG kernel: [ 163.476247] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs 2017-08-15T12:55:14.411281-07:00 DEBUG kernel: [ 163.476695] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs 2017-08-15T12:55:14.417267-07:00 ERR kernel: [ 163.483296] device-mapper: verity: verification failure occurred: block failure 2017-08-15T12:55:14.417279-07:00 CRIT kernel: [ 163.483343] EXT4-fs error (device dm-0): ext4_find_entry:1442: inode #32122: comm restorecon: reading directory lblock 0 2017-08-15T12:55:14.417280-07:00 ERR kernel: [ 163.483364] Buffer I/O error on device dm-0, logical block 0 2017-08-15T12:55:14.417281-07:00 WARNING kernel: [ 163.483371] lost page write due to I/O error on dm-0
,
Aug 16 2017
After rechecking, it seems to be happening on chromeos4-row5-rack6-host5 too. This raises the probability of a software (kernel?) regression. Let me check...
,
Aug 16 2017
Filed the lab ticket at b/64734174
,
Aug 16 2017
I don't see any relevant commits between 9843 and 9844. Note that I don't see those I/O errors in first failed build (9843) and seems to have failed due to other reasons, likely issue 755461 .
,
Aug 16 2017
Sorry, I didn't look at SoM correctly. The first failed build was 9842 and that one didn't have I/O errors in logs. The first build with I/O errors seems to be 9843 and I rechecked the delta (https://crosland.corp.google.com/log/9842.0.0..9843.0.0), but it still doesn't seem to contain any relevant changes.
,
Aug 18 2017
Is this still happening?
,
Aug 19 2017
We had 4 successful builds in a row, so I'd say it isn't.
,
Jan 22 2018
|
||||
►
Sign in to add a comment |
||||
Comment 1 by nxia@chromium.org
, Aug 16 2017