New issue
Advanced search Search tips
Starred by 1 user
Status: Fixed
Owner:
Closed: Aug 19
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment
gandof-release: Tests failing likely due to DUT malfunction
Project Member Reported by tfiga@chromium.org, Aug 16 Back to list
Looks like following DUT has problems with its SSD:
chromeos4-row5-rack8-host1

Related build:
https://luci-milo.appspot.com/buildbot/chromeos/gandof-release/1393

Logs:
https://storage.cloud.google.com/chromeos-autotest-results/135265221-chromeos-test/chromeos4-row5-rack8-host1/sysinfo/var/log/messages?_ga=2.238646848.-1355467645.1491207816

I can see a lot of ata, dm0 and ext4 errors there, for example:

2017-08-15T12:55:13.788896-07:00 ERR kernel: [  162.853515] ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x40000 action 0x0
2017-08-15T12:55:13.788909-07:00 ERR kernel: [  162.853526] ata1.00: irq_stat 0x40000008
2017-08-15T12:55:13.788911-07:00 ERR kernel: [  162.853533] ata1: SError: { CommWake }
2017-08-15T12:55:13.788912-07:00 ERR kernel: [  162.853540] ata1.00: failed command: READ FPDMA QUEUED
2017-08-15T12:55:13.788913-07:00 ERR kernel: [  162.853551] ata1.00: cmd 60/08:00:90:50:01/00:00:00:00:00/40 tag 0 ncq 4096 in
2017-08-15T12:55:13.789608-07:00 ERR kernel: [  162.853551]          res 41/40:02:96:50:01/00:00:00:00:00/00 Emask 0x409 (media error) <F>
2017-08-15T12:55:13.790279-07:00 ERR kernel: [  162.853567] ata1.00: status: { DRDY ERR }
2017-08-15T12:55:13.790284-07:00 ERR kernel: [  162.853573] ata1.00: error: { UNC }
2017-08-15T12:55:13.790285-07:00 INFO kernel: [  162.853919] ata1.00: configured for UDMA/100
2017-08-15T12:55:13.790286-07:00 ERR kernel: [  162.853935] ata1: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 t4
2017-08-15T12:55:13.790287-07:00 ERR kernel: [  162.853942] ata1: irq_stat 0x40000001
2017-08-15T12:55:13.790288-07:00 INFO kernel: [  162.853958] sd 0:0:0:0: [sda] Unhandled sense code
2017-08-15T12:55:13.790289-07:00 INFO kernel: [  162.853965] sd 0:0:0:0: [sda]  
2017-08-15T12:55:13.790290-07:00 NOTICE kernel: [  162.853970] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
2017-08-15T12:55:13.790291-07:00 INFO kernel: [  162.853977] sd 0:0:0:0: [sda]  
2017-08-15T12:55:13.790292-07:00 NOTICE kernel: [  162.853981] Sense Key : Medium Error [current] [descriptor]
2017-08-15T12:55:13.790293-07:00 NOTICE kernel: [  162.853992] Descriptor sense data with sense descriptors (in hex):
2017-08-15T12:55:13.790294-07:00 INFO kernel: [  162.853999]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
2017-08-15T12:55:13.790295-07:00 INFO kernel: [  162.854023]         00 01 50 96 
2017-08-15T12:55:13.790296-07:00 INFO kernel: [  162.854034] sd 0:0:0:0: [sda]  
2017-08-15T12:55:13.790297-07:00 NOTICE kernel: [  162.854039] Add. Sense: Unrecovered read error - auto reallocate failed
2017-08-15T12:55:13.790298-07:00 INFO kernel: [  162.854048] sd 0:0:0:0: [sda] CDB: 
2017-08-15T12:55:13.790299-07:00 NOTICE kernel: [  162.854053] Read(10): 28 00 00 01 50 90 00 00 08 00
2017-08-15T12:55:13.790299-07:00 ERR kernel: [  162.854072] end_request: I/O error, dev sda, sector 86166
2017-08-15T12:55:13.790300-07:00 ERR kernel: [  162.854081] Buffer I/O error on device sda8, logical block 18
2017-08-15T12:55:13.790301-07:00 INFO kernel: [  162.854105] ata1: EH complete
2017-08-15T12:55:13.913263-07:00 DEBUG kernel: [  162.979415] SELinux: initialized (dev proc, type proc), uses genfs_contexts
2017-08-15T12:55:13.954263-07:00 INFO kernel: [  163.019605] tpm_tis tpm_tis: command 0x41 (size 618) returned code 0x0
2017-08-15T12:55:13.990266-07:00 INFO kernel: [  163.055520] tpm_tis tpm_tis: command 0xba (size 18) returned code 0x22
2017-08-15T12:55:13.996394-07:00 INFO chapsd[995]: Finished importing 0 pending objects.
2017-08-15T12:55:13.996555-07:00 INFO chapsd[995]: Done waiting for private objects.
2017-08-15T12:55:14.004124-07:00 INFO chapsd[995]: Master key is ready for token at /home/root/80c01837745be64ed78d409abda01e922da7f7df/chaps
2017-08-15T12:55:14.020266-07:00 INFO kernel: [  163.085576] tpm_tis tpm_tis: command 0x65 (size 18) returned code 0x0
2017-08-15T12:55:14.027265-07:00 DEBUG kernel: [  163.093007] SELinux: initialized (dev proc, type proc), uses genfs_contexts
2017-08-15T12:55:14.062261-07:00 INFO kernel: [  163.128179] tpm_tis tpm_tis: command 0x21 (size 14) returned code 0x0
2017-08-15T12:55:14.196266-07:00 INFO kernel: [  163.261653] tpm_tis tpm_tis: command 0x65 (size 18) returned code 0x0
2017-08-15T12:55:14.201002-07:00 INFO session_manager[5155]: [INFO:policy_store.cc(71)] Persisted policy to disk.
2017-08-15T12:55:14.350439-07:00 INFO session_manager[5155]: [INFO:child_exit_handler.cc(77)] Handling 5744 exit.
2017-08-15T12:55:14.398296-07:00 DEBUG kernel: [  163.463800] SELinux: initialized (dev binfmt_misc, type binfmt_misc), uses genfs_contexts
2017-08-15T12:55:14.409262-07:00 DEBUG kernel: [  163.475471] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
2017-08-15T12:55:14.410267-07:00 DEBUG kernel: [  163.475961] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
2017-08-15T12:55:14.410277-07:00 DEBUG kernel: [  163.476247] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
2017-08-15T12:55:14.411281-07:00 DEBUG kernel: [  163.476695] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
2017-08-15T12:55:14.417267-07:00 ERR kernel: [  163.483296] device-mapper: verity: verification failure occurred: block failure
2017-08-15T12:55:14.417279-07:00 CRIT kernel: [  163.483343] EXT4-fs error (device dm-0): ext4_find_entry:1442: inode #32122: comm restorecon: reading directory lblock 0
2017-08-15T12:55:14.417280-07:00 ERR kernel: [  163.483364] Buffer I/O error on device dm-0, logical block 0
2017-08-15T12:55:14.417281-07:00 WARNING kernel: [  163.483371] lost page write due to I/O error on dm-0
 
does the problem only happen on this specific dut? I'll lock this dut for now.
After rechecking, it seems to be happening on chromeos4-row5-rack6-host5 too. This raises the probability of a software (kernel?) regression. Let me check...
Filed the lab ticket at b/64734174
I don't see any relevant commits between 9843 and 9844. Note that I don't see those I/O errors in first failed build (9843) and seems to have failed due to other reasons, likely  issue 755461 .
Sorry, I didn't look at SoM correctly. The first failed build was 9842 and that one didn't have I/O errors in logs. The first build with I/O errors seems to be 9843 and I rechecked the delta (https://crosland.corp.google.com/log/9842.0.0..9843.0.0), but it still doesn't seem to contain any relevant changes.
Owner: tfiga@chromium.org
Is this still happening?
Status: Fixed
We had 4 successful builds in a row, so I'd say it isn't.
Sign in to add a comment