New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 803854 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Feb 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

HWTest [provision] fails after provision_AutoUpdate with "The host has wrong cros-version label."

Project Member Reported by derat@chromium.org, Jan 19 2018

Issue description

Many informational PFQ builders started failing in the HWTest [provision] stage with "[Test-Logs]: provision: FAIL: The host has wrong cros-version label., completed successfully", starting at these builds:

http://uberchromegw/i/chromeos.chrome/builders/tricky-tot-chrome-pfq-informational/builds/7737
http://uberchromegw/i/chromeos.chrome/builders/peach_pit-tot-chrome-pfq-informational/builds/9377
http://uberchromegw/i/chromeos.chrome/builders/caroline-tot-chrome-pfq-informational/builds/791
http://uberchromegw/i/chromeos.chrome/builders/veyron_minnie-tot-chrome-pfq-informational/builds/7096

The last successful run I see was at Jan 18 22:59 and the first failure was at Jan 19 00:50.

Here's a failing status:

START	----	provision	timestamp=1516356187	localtime=Jan 19 02:03:07	
	GOOD	----	verify.servo_ssh	timestamp=1516356191	localtime=Jan 19 02:03:11	
	GOOD	----	verify.update	timestamp=1516356195	localtime=Jan 19 02:03:15	
	GOOD	----	verify.brd_config	timestamp=1516356195	localtime=Jan 19 02:03:15	
	GOOD	----	verify.ser_config	timestamp=1516356195	localtime=Jan 19 02:03:15	
	GOOD	----	verify.job	timestamp=1516356196	localtime=Jan 19 02:03:16	
	GOOD	----	verify.servod	timestamp=1516356202	localtime=Jan 19 02:03:22	
	GOOD	----	verify.pwr_button	timestamp=1516356202	localtime=Jan 19 02:03:22	
	GOOD	----	verify.lid_open	timestamp=1516356203	localtime=Jan 19 02:03:23	
	GOOD	----	verify.PASS	timestamp=1516356203	localtime=Jan 19 02:03:23	
	START	provision_AutoUpdate	provision_AutoUpdate	timestamp=1516356203	localtime=Jan 19 02:03:23	
		START	----	----	timestamp=1516356212	localtime=Jan 19 02:03:32	
			GOOD	----	sysinfo.before	timestamp=1516356212	localtime=Jan 19 02:03:32	
		END GOOD	----	----	timestamp=1516356212	localtime=Jan 19 02:03:32	
		GOOD	provision_AutoUpdate	provision_AutoUpdate	timestamp=1516356609	localtime=Jan 19 02:10:09	completed successfully
	END GOOD	provision_AutoUpdate	provision_AutoUpdate	timestamp=1516356609	localtime=Jan 19 02:10:09	
	GOOD	----	verify.ssh	timestamp=1516356609	localtime=Jan 19 02:10:09	
	GOOD	----	verify.devmode	timestamp=1516356609	localtime=Jan 19 02:10:09	
	GOOD	----	verify.hwid	timestamp=1516356610	localtime=Jan 19 02:10:10	
	GOOD	----	verify.power	timestamp=1516356611	localtime=Jan 19 02:10:11	
	GOOD	----	verify.ext4	timestamp=1516356611	localtime=Jan 19 02:10:11	
	GOOD	----	verify.writable	timestamp=1516356611	localtime=Jan 19 02:10:11	
	GOOD	----	verify.tpm	timestamp=1516356612	localtime=Jan 19 02:10:12	
	GOOD	----	verify.good_au	timestamp=1516356612	localtime=Jan 19 02:10:12	
	GOOD	----	verify.fwstatus	timestamp=1516356612	localtime=Jan 19 02:10:12	
	GOOD	----	verify.rwfw	timestamp=1516356613	localtime=Jan 19 02:10:13	
	GOOD	----	verify.python	timestamp=1516356613	localtime=Jan 19 02:10:13	
	FAIL	----	verify.cros	timestamp=1516356616	localtime=Jan 19 02:10:16	The host has wrong cros-version label.
END FAIL	----	provision	timestamp=1516356616	localtime=Jan 19 02:10:16	
INFO	----	----	timestamp=1516356616	job_abort_reason=	localtime=Jan 19 02:10:16

----

Here's what I see at the end of the test logs for https://uberchromegw.corp.google.com/i/chromeos.chrome/builders/tricky-tot-chrome-pfq-informational/builds/7737:

...
01/19 02:04:20.319 INFO |        dev_server:1494| Name: /home/chromeos-test/images/tricky-tot-chrome-pfq-informational/R65-10321.0.0-c2223837/.test_suitesDOTtarDOTbz2 Accessed: 2018-01-19 02:01:54.592407 Modified: 2018-01-19 02:01:49.784417 Size: 24964 bytes.
01/19 02:04:21.288 INFO |        dev_server:0707| Getting devservers for host: chromeos4-row2-rack3-host15
01/19 02:04:25.102 INFO |        dev_server:1856| Received response from devserver for cros_au call: '[true, 163027]'
01/19 02:10:04.870 INFO |         afe_store:0099| adding labels: ['cros-version:tricky-tot-chrome-pfq-informational/R65-10321.0.0-c2223837']

----

dshi@, you wrote this code (in 2015), right? Do you have any ideas about what's going wrong?

Issue 540757 and  issue 607601  are about the same error message.
 

Comment 1 by derat@chromium.org, Jan 19 2018

Cc: davidri...@chromium.org
Owner: dgarr...@chromium.org
davidriley@ investigated this a bit (thanks!) and thinks that https://crrev.com/c/869450 is to blame and should be reverted.

Comment 2 by derat@chromium.org, Jan 19 2018

Created a revert at https://crrev.com/c/876243.
Note: I'm confident that my CL caused this.

And I now understand why the CQ let this through. It generates build versions differently.

I don't yet understand WHY the lab breaks with the new format of string, but that's a different issue.

Comment 4 by derat@chromium.org, Jan 22 2018

Is it possible to add a test to the CQ that would've caught this?
If we clean up/simplify how labels are generated/used then yes. As it is, no. There are way, way to many special cases.

I have improved the relevant lab unittest with my proposed quick lab fix, but it will still leave things pretty fragile.
Status: Fixed (was: Assigned)
Richard's "right fix" proposal (see  crbug.com/800562 ) would remove many special cases, and so this would then have been caught in the CQ's hwtests.

This outage is fixed, so closing and moving work back to the before mentioned bug.

Comment 7 by khmel@chromium.org, Feb 15 2018

Cc: pho...@chromium.org
Status: Assigned (was: Fixed)
Reopen this issue.

We have the same issue in TreeHugger presubmit.
b/73367536

phobbs@ thinks this may be the same.


Could you please take a look?


For example:
https://android-build.googleplex.com/builds/where-is-my-cl/googleplex-android-review/3609503?revision=6&revisionAttempt=1
        
F
provision.provision     13:02.000
 The host has wrong cros-version label., completed successfully, servod not running on chromeos6-row4-rack5-labstation1 port 9993
        
F
provision.provision     14:25.000
 Servo initialize timed out., The host has wrong cros-version label., completed successfully
        
F
provision.provision     14:54.000
 The host has wrong cros-version label., completed successfully, servod not running on chromeos6-row2-rack24-labstation1 port 9994





https://atp.googleplex.com/test_runs/15167185
My changes have been relanded, so probably.

We need someone with expertise in the autotest adb_host to help resolve.
Owner: jrbarnette@chromium.org
Well.... this is the CL that is probably related.

https://crrev.com/c/907326
Status: Fixed (was: Assigned)
That CL is the cause of the difficulty, but the new arc-presubmit failures
aren't this bug.  I've opened  bug 812467 ; we'll put the discussion
there.

Sign in to add a comment