Issue metadata
Sign in to add a comment
|
kevin-paladin failures plagued by flaky provisioning |
||||||||||||||||||||||||
Issue descriptionhttps://uberchromegw.corp.google.com/i/chromeos/builders/kevin-paladin Feb 14 05:56 ?? failure #205 Failed steps failed cbuildbot [kevin-paladin] failed hwtest [bvt-inline] Feb 14 03:04 ?? failure #204 Failed steps failed cbuildbot [kevin-paladin] failed hwtest [bvt-cq] Feb 14 00:17 ?? failure #203 Failed steps failed cbuildbot [kevin-paladin] failed hwtest [bvt-inline] Feb 13 21:30 ?? failure #202 Failed steps failed cbuildbot [kevin-paladin] failed hwtest [bvt-inline] Feb 13 18:37 ?? failure #201 Failed steps failed cbuildbot [kevin-paladin] failed hwtest [bvt-cq] Feb 13 15:48 ?? failure #200 Failed steps failed cbuildbot [kevin-paladin] failed hwtest [bvt-inline] failed hwtest [bvt-cq] Feb 13 12:49 ?? failure #199 Failed steps failed cbuildbot [kevin-paladin] failed hwtest [bvt-cq] Feb 13 10:10 ?? failure #198 Failed steps failed cbuildbot [kevin-paladin] failed hwtest [bvt-inline] Feb 13 07:41 ?? failure #197 Failed steps failed cbuildbot [kevin-paladin] failed hwtest [bvt-inline] Feb 13 05:14 ?? failure #196 Failed steps failed cbuildbot [kevin-paladin] failed hwtest [bvt-inline] #206 https://luci-logdog.appspot.com/v/?s=chromeos%2Fbb%2Fchromeos%2Fkevin-paladin%2F206%2F%2B%2Frecipes%2Fsteps%2FHWTest__bvt-inline_%2F0%2Fstdout host: chromeos2-row8-rack8-host3, status: Ready, locked: False diagnosis: Working labels: ['board:kevin', 'arc', 'ec:cros', 'hw_video_acc_enc_vp8', 'audio_loopback_dongle', 'os:cros', 'power:battery', 'cts_abi_arm', 'webcam', 'hw_video_acc_enc_h264', 'hw_video_acc_vp8', 'hw_video_acc_h264', 'storage:mmc', 'kevin', 'internal_display', 'servo', 'phase:PVT', 'touchpad', 'variant:kevin', 'sku:kevin_rk3399_4Gb', 'touchscreen', 'bluetooth', 'pool:cq'] Last 10 jobs within 1:48:00: 59984564 Repair started on: 2017-02-14 10:04:59 status PASS 59983986 Provision started on: 2017-02-14 09:25:39 status FAIL I'm not sure what job this provision shows up under. #204 https://uberchromegw.corp.google.com/i/chromeos/builders/kevin-paladin/builds/204/steps/HWTest%20%5Bbvt-cq%5D/logs/stdio Some background job failed. I have no idea which one. All the hosts listed pass... 05:43:29: ERROR: BaseException in _RunParallelStages <class 'chromite.lib.failures_lib.StepFailure'>: Traceback (most recent call last): File "/b/cbuild/internal_master/chromite/lib/parallel.py", line 440, in _Run self._task(*self._task_args, **self._task_kwargs) File "/b/cbuild/internal_master/chromite/cbuildbot/stages/generic_stages.py", line 629, in Run raise failures_lib.StepFailure() StepFailure Traceback (most recent call last): File "/b/cbuild/internal_master/chromite/cbuildbot/builders/generic_builders.py", line 118, in _RunParallelStages parallel.RunParallelSteps(steps) File "/b/cbuild/internal_master/chromite/lib/parallel.py", line 677, in RunParallelSteps return [queue.get_nowait() for queue in queues] File "/b/cbuild/internal_master/chromite/lib/parallel.py", line 674, in RunParallelSteps pass File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/b/cbuild/internal_master/chromite/lib/parallel.py", line 560, in ParallelTasks raise BackgroundFailure(exc_infos=errors) BackgroundFailure: <class 'chromite.lib.failures_lib.StepFailure'>: Traceback (most recent call last): File "/b/cbuild/internal_master/chromite/lib/parallel.py", line 440, in _Run self._task(*self._task_args, **self._task_kwargs) File "/b/cbuild/internal_master/chromite/cbuildbot/stages/generic_stages.py", line 629, in Run raise failures_lib.StepFailure() StepFailure #203: https://uberchromegw.corp.google.com/i/chromeos/builders/kevin-paladin/builds/203/steps/HWTest%20%5Bbvt-inline%5D/logs/stdio host: chromeos2-row8-rack9-host14, status: Ready, locked: False diagnosis: Working labels: ['board:kevin', 'arc', 'hw_video_acc_enc_h264', 'hw_video_acc_enc_vp8', 'os:cros', 'power:battery', 'ec:cros', 'hw_video_acc_h264', 'servo', 'hw_video_acc_vp8', 'cts_abi_arm', 'storage:mmc', 'webcam', 'kevin', 'audio_loopback_dongle', 'internal_display', 'bluetooth', 'pool:cq', 'phase:PVT', 'touchpad', 'variant:kevin', 'sku:kevin_rk3399_4Gb', 'touchscreen', 'cros-version:kevin-paladin/R58-9280.0.0-rc2'] Last 10 jobs within 1:48:00: 59979040 Repair started on: 2017-02-14 02:42:58 status PASS 59978033 Provision started on: 2017-02-14 01:08:35 status FAIL I think the logs for that are here: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/101053570-chromeos-test/chromeos2-row8-rack9-host14/ But I have no idea how to correlate a job on a host with the metajobs. This host needs a devserver, but it never succeeds. The host chromeos2-row8-rack9-host14 (100.115.231.55) is in a restricted subnet. Try to locate a devserver inside subnet 100.115.224.0:19 02/13 09:25:23.462 DEBUG| base_utils:0185| Running 'ssh 100.115.245.197 'curl "http://100.115.245.197:8082/check_health?"'' 02/13 09:25:38.586 DEBUG| dev_server:0892| Error occurred with exit_code 255 when executing the ssh call: ssh: connect to host 100.115.245.197 port 22: Connection timed out . 02/13 09:25:38.589 WARNI| retry:0221| <class 'autotest_lib.client.common_lib.error.CmdError'>(Command <ssh 100.115.245.197 'curl "http://100.115.245.197:8082/check_health?"'> failed, rc=255, Command returned non-zero exit status * Command: ssh 100.115.245.197 'curl "http://100.115.245.197:8082/check_health?"' Exit status: 255 Duration: 15.0399751663 #202: https://luci-logdog.appspot.com/v/?s=chromeos%2Fbb%2Fchromeos%2Fkevin-paladin%2F202%2F%2B%2Frecipes%2Fsteps%2FHWTest__bvt-inline_%2F0%2Fstdout host: chromeos2-row8-rack9-host14, status: Ready, locked: False diagnosis: Working labels: ['board:kevin', 'arc', 'hw_video_acc_enc_h264', 'hw_video_acc_enc_vp8', 'os:cros', 'power:battery', 'ec:cros', 'hw_video_acc_h264', 'servo', 'hw_video_acc_vp8', 'cts_abi_arm', 'storage:mmc', 'webcam', 'kevin', 'audio_loopback_dongle', 'internal_display', 'bluetooth', 'pool:cq', 'phase:PVT', 'touchpad', 'variant:kevin', 'sku:kevin_rk3399_4Gb', 'touchscreen', 'cros-version:kevin-paladin/R58-9280.0.0-rc2'] Last 10 jobs within 1:48:00: 59977262 Repair started on: 2017-02-13 23:56:02 status PASS 59976051 Provision started on: 2017-02-13 22:25:42 status FAIL Same host. I really don't know how to get the details of the job for the actual host. If someone could teach me to fish better, I could dig out more info. I'm also not clear what's triggering the failure for the whole paladin. Does it take one host to fail provisioning to fail the whole run? And, if so, how do I get the information on the specific provisioning logs?
,
Feb 15 2017
Should be fixed, please re-open if still happening
,
Feb 15 2017
This seems to be still happening and has been blocking the CQ since 2 days.
,
Feb 15 2017
,
Feb 15 2017
master-paladin 13669 = kevin-paladin 213: chromeos2-row8-rack9-host1 fails master-paladin 13668 = kevin-paladin 212: chromeos2-row8-rack9-host1 fails master-paladin 13667 = kevin-paladin 211: chromeos2-row8-rack9-host1 fails master-paladin 13666 = kevin-paladin 210: chromeos2-row8-rack9-host1 fails For kevin-paladin 213: 59999883 Provision started on: 2017-02-15 03:47:08 status FAIL What does that first identifier mean? Is it anything remotely meaningful to link into autotest results? It resembles nothing like an autotest id. I don't know if the times reported as failure are in the same timezone as results in autotest for jobs on that machine. I'm assuming it's this job: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos2-row8-rack9-host1/59999883-provision?pli=1 That job's logs just appear to fall off the cliff. Did the autotest job just blow up? I guess I'll go look at the next ones, but there's not much to see unless someone would like to teach me to fish better.
,
Feb 15 2017
master-paladin 13664 = kevin-paladin 208: chromeos2-row8-rack9-host1 fails master-paladin 13663 = kevin-paladin 207: passes chromeos2-row8-rack9-host1 isn't used. passes
,
Feb 15 2017
Same results for kevin-paladin 208. Logs are clipped. No other information about error. Is there some magic where this kicks off another job that I can't find the linkage to?
,
Feb 15 2017
master-paladin 136670 = kevin-paladin 214: chromeos2-row8-rack9-host1 fails How am I supposed to kick this device out of the pool?
,
Feb 15 2017
,
Feb 15 2017
,
Feb 15 2017
This device was locked by xixuan@. Assigning over there.
,
Feb 15 2017
|
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by akes...@chromium.org
, Feb 14 2017