New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 772317 link

Starred by 2 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

repair job repaired via rpm, but didn't collect any logs

Project Member Reported by akes...@chromium.org, Oct 6 2017

Issue description

$ dut-status -f chromeos2-row3-rack2-host15
chromeos2-row3-rack2-host15
    2017-10-05 23:13:38  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row3-rack2-host15/1614005-repair/
    2017-10-05 22:32:58  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row3-rack2-host15/1613673-provision/


Provision failed because the DUT didn't return from reboot.

Repair job repaired via rpm:
START	----	repair	timestamp=1507270423	localtime=Oct 05 23:13:43	
	GOOD	----	verify.servo_ssh	timestamp=1507270425	localtime=Oct 05 23:13:45	
	GOOD	----	verify.brd_config	timestamp=1507270426	localtime=Oct 05 23:13:46	
	GOOD	----	verify.ser_config	timestamp=1507270427	localtime=Oct 05 23:13:47	
	GOOD	----	verify.job	timestamp=1507270428	localtime=Oct 05 23:13:48	
	GOOD	----	verify.servod	timestamp=1507270436	localtime=Oct 05 23:13:56	
	GOOD	----	verify.pwr_button	timestamp=1507270436	localtime=Oct 05 23:13:56	
	GOOD	----	verify.lid_open	timestamp=1507270438	localtime=Oct 05 23:13:58	
	GOOD	----	verify.update	timestamp=1507270451	localtime=Oct 05 23:14:11	
	GOOD	----	verify.PASS	timestamp=1507270451	localtime=Oct 05 23:14:11	
	FAIL	----	verify.ssh	timestamp=1507271038	localtime=Oct 05 23:23:58	No answer to ping from chromeos2-row3-rack2-host15
	START	----	repair.rpm	timestamp=1507271038	localtime=Oct 05 23:23:58	
		GOOD	----	verify.ssh	timestamp=1507271077	localtime=Oct 05 23:24:37	
		GOOD	----	verify.power	timestamp=1507271078	localtime=Oct 05 23:24:38	
	END GOOD	----	repair.rpm	timestamp=1507271078	localtime=Oct 05 23:24:38	
	GOOD	----	verify.fwstatus	timestamp=1507271078	localtime=Oct 05 23:24:38	
	GOOD	----	verify.good_au	timestamp=1507271078	localtime=Oct 05 23:24:38	
	GOOD	----	verify.devmode	timestamp=1507271078	localtime=Oct 05 23:24:38	
	GOOD	----	verify.writable	timestamp=1507271079	localtime=Oct 05 23:24:39	
	GOOD	----	verify.tpm	timestamp=1507271080	localtime=Oct 05 23:24:40	
	GOOD	----	verify.ext4	timestamp=1507271081	localtime=Oct 05 23:24:41	
	GOOD	----	verify.rwfw	timestamp=1507271081	localtime=Oct 05 23:24:41	
	GOOD	----	verify.python	timestamp=1507271082	localtime=Oct 05 23:24:42	
	GOOD	----	verify.cros	timestamp=1507271088	localtime=Oct 05 23:24:48	
	GOOD	----	verify.hwid	timestamp=1507271089	localtime=Oct 05 23:24:49	
	GOOD	----	verify.PASS	timestamp=1507271089	localtime=Oct 05 23:24:49	
	START	----	reboot	timestamp=1507271090	localtime=Oct 05 23:24:50	
		GOOD	----	reboot.start	timestamp=1507271090	localtime=Oct 05 23:24:50	
		GOOD	----	reboot.verify	timestamp=1507271110	localtime=Oct 05 23:25:10	
	END GOOD	----	reboot	kernel=4.4.86-11793-ga1bbb5c4f613	localtime=Oct 05 23:25:11	timestamp=1507271111	
	INFO	----	repair	timestamp=1507271111	localtime=Oct 05 23:25:11	Can't repair label 'board:bob'.
	INFO	----	repair	timestamp=1507271111	localtime=Oct 05 23:25:11	Can't repair label 'pool:cq'.
	INFO	----	repair	timestamp=1507271111	localtime=Oct 05 23:25:11	Can't repair label 'arc'.
	INFO	----	repair	timestamp=1507271111	localtime=Oct 05 23:25:11	Can't repair label 'cros-version:bob-paladin/R63-10006.0.0-rc1'.
END GOOD	----	repair	timestamp=1507271111	localtime=Oct 05 23:25:11	chromeos2-row3-rack2-host15 repaired successfully


However, repair job collected no "after rpc repair" logs, so there were no breadcrumbs with which to diagnose the failure to reboot.

How come?
 
Owner: jrbarnette@chromium.org
Looks to me like the "collect after" behavior is defined in cros_repair:_ResetRepairAction , but RPMRepair does not inherit from it.

Is this desired? Why?

Comment 2 by cindyb@chromium.org, May 31 2018

Hi, this bug has not been updated recently and remains untriaged. Please acknowledge the bug and provide status within two weeks (6/8/2018), or the bug will be closed. Thank you.
Owner: ----
Status: Available (was: Untriaged)
Yes, I suspect that  RPMCycleRepair should inherit from _ResetRepairAction.
The one caution is that RPMCycleRepair is shared with servo repair, where
gathering logs is not useful.  Probably, though, it wouldn't be harmful...

So, I think that means we should do the following:
 1) Move _ResetRepairAction to server/hosts/repair.py,
    and rename it ResetRepairAction.
 2) Change RPMCycleRepair to inherit, and make it call
    `self._check_reset_success()`.

That said, I'm not sure when I'll have time to actually make the change and
test it out.

Sign in to add a comment