platform_PrinterPpds fails on daisy-skate(and daisy-spring) |
||||||
Issue descriptionDashboard view: https://stainless.corp.google.com/search?col=board&board=daisy&test=platform_PrinterPpds.100&exclude_not_run=true&view=matrix&row=build&days=7 Screenshot - https://screenshot.googleplex.com/TvtczRp7bnZ daisy - all PASS daisy-skate - mostly failing daisy-spring - flaky I'll get a daisy-skate board for pawliczek@ to take a closer look.
,
Sep 27
There are I locked hosts chromeos4-row9-rack7-host5 chromeos4-row9-rack7-host1 chromeos4-row9-rack6-host1 chromeos4-row9-rack5-host5 https://screenshot.googleplex.com/48mWfG5nmks Will pick them up today and bring to desk.
,
Sep 27
I brought 3 DUTs from hosts chromeos4-row9-rack7-host1 chromeos4-row9-rack6-host1 chromeos4-row9-rack5-host5 Piotr will re-run tests and diagnose next week.
,
Sep 27
,
Sep 27
Rebalanced the BVT pool since we left it dangerously under supported johndhong@phobrz:~$ balance_pool -t 6 bvt daisy_skate daisy_skate bvt pool: Target of 6 is above minimum. Balancing ['model:daisy_skate'] bvt pool: Total 6 DUTs, 2 working, 4 broken, 0 reserved. Target is 6 working DUTs; grow pool by 4 DUTs. ['model:daisy_skate'] suites pool has 19 spares available for balancing pool bvt ['model:daisy_skate'] bvt pool will return 4 broken DUTs, leaving 0 still in the pool. ERROR: ['model:daisy_skate'] bvt pool: Refusing to act on pool with 4 broken DUTs. ERROR: Please investigate this model to for a bug ERROR: that is bricking devices. Once you have finished your ERROR: investigation, you can force a rebalance with ERROR: --force-rebalance Transferring 0 DUTs from bvt to suites. Transferring 0 DUTs from suites to bvt. johndhong@phobrz:~$ balance_pool --force-rebalance -t 6 bvt daisy_skate daisy_skate bvt pool: Target of 6 is above minimum. Balancing ['model:daisy_skate'] bvt pool: Total 6 DUTs, 2 working, 4 broken, 0 reserved. Target is 6 working DUTs; grow pool by 4 DUTs. ['model:daisy_skate'] suites pool has 19 spares available for balancing pool bvt ['model:daisy_skate'] bvt pool will return 4 broken DUTs, leaving 0 still in the pool. Transferring 4 DUTs from bvt to suites. Updating host: chromeos4-row9-rack5-host5. Removing labels ['pool:bvt'] from host chromeos4-row9-rack5-host5 Adding labels ['pool:suites'] to host chromeos4-row9-rack5-host5 Updating host: chromeos4-row9-rack6-host1. Removing labels ['pool:bvt'] from host chromeos4-row9-rack6-host1 Adding labels ['pool:suites'] to host chromeos4-row9-rack6-host1 Updating host: chromeos4-row9-rack7-host1. Removing labels ['pool:bvt'] from host chromeos4-row9-rack7-host1 Adding labels ['pool:suites'] to host chromeos4-row9-rack7-host1 Updating host: chromeos4-row9-rack7-host5. Removing labels ['pool:bvt'] from host chromeos4-row9-rack7-host5 Adding labels ['pool:suites'] to host chromeos4-row9-rack7-host5 Transferring 4 DUTs from suites to bvt. Updating host: chromeos4-row9-rack6-host3. Removing labels ['pool:suites'] from host chromeos4-row9-rack6-host3 Adding labels ['pool:bvt'] to host chromeos4-row9-rack6-host3 Updating host: chromeos4-row9-rack6-host5. Removing labels ['pool:suites'] from host chromeos4-row9-rack6-host5 Adding labels ['pool:bvt'] to host chromeos4-row9-rack6-host5 Updating host: chromeos4-row9-rack5-host9. Removing labels ['pool:suites'] from host chromeos4-row9-rack5-host9 Adding labels ['pool:bvt'] to host chromeos4-row9-rack5-host9 Updating host: chromeos4-row9-rack5-host11. Removing labels ['pool:suites'] from host chromeos4-row9-rack5-host11 Adding labels ['pool:bvt'] to host chromeos4-row9-rack5-host11
,
Sep 28
,
Oct 10
Just to be sure is there any plan to return them to the lab or do we suspect is is broken beyond repair? If so please fill out go/atl-device-failure so that the lab team does not need to track it anymore
,
Oct 10
Devices are at my desk and I'll bring them in 2081 tomorrow.
,
Oct 10
It is fine if you still need them as I placed a replacement request.
,
Oct 11
For daisy_skate ONLY on host (chromeos4-row9-rack5-host11) is passing the test https://screenshot.googleplex.com/STaPojrNAnP For daisy spring ONLY one host (chromeos6-row2-rack16-host16) is failing the test https://screenshot.googleplex.com/m3eKQANKP4F I locked the host (at R71-11145.0.0) and re-ran the test(platform_PrinterPpds.ppd) from local chroot against this host. Test passed within 7 minutes. Most failures take more than 40 minutes to fail. The loss of time in the logs is at: 10/04 09:48:24.846 DEBUG| autotest:1281| Finished waiting on autotestd to start. 10/04 09:48:24.846 INFO | autotest:1340| Finished waiting on autotestd to start. 10/04 09:48:25.424 DEBUG| autotest:1281| AUTOTEST_STATUS::START ---- ---- timestamp=1538671704 localtime=Oct 04 09:48:24 10/04 09:48:25.425 INFO | server_job:0217| START ---- ---- timestamp=1538671704 localtime=Oct 04 09:48:24 10/04 09:48:25.538 DEBUG| autotest:1281| AUTOTEST_STATUS:: START platform_PrinterPpds platform_PrinterPpds timestamp=1538671704 localtime=Oct 04 09:48:24 10/04 09:48:25.539 INFO | server_job:0217| START platform_PrinterPpds platform_PrinterPpds timestamp=1538671704 localtime=Oct 04 09:48:24 >>>> H E R E <<<< 10/04 10:19:19.044 DEBUG| autotest:0956| Result exit status is 255. 10/04 10:19:19.049 DEBUG| utils:0219| Running 'ping chromeos4-row9-rack5-host9 -w1 -c1' 10/04 10:19:20.336 DEBUG| utils:0287| [stdout] PING chromeos4-row9-rack5-host9.cros.corp.google.com (100.115.200.57) 56(84) bytes of data. 10/04 10:19:20.336 DEBUG| utils:0287| [stdout] 10/04 10:19:20.336 DEBUG| utils:0287| [stdout] --- chromeos4-row9-rack5-host9.cros.corp.google.com ping statistics --- 10/04 10:19:20.336 DEBUG| utils:0287| [stdout] 1 packets transmitted, 0 received, 100% packet loss, time 0ms 10/04 10:19:20.337 DEBUG| utils:0287| [stdout] 10/04 10:19:20.341 INFO | server_job:0217| FAIL ---- ---- timestamp=1538673560 localtime=Oct 04 10:19:20 Autotest client terminated unexpectedly: DUT is no longer pingable, it may have rebooted or hung. The client test log itself could not be collected. What would be making the test to stall so much time? I guess over that time the devices are loaded to extent they hang and finally test being thrown out.
,
Oct 11
,
Oct 18
pawliczek@ are we still investigating this?
,
Oct 19
No, I have returned devices week ago. The problem was caused by overheating. I have send to Kalin the following (see attachments): 1. A simple script for testing the device (copy to /var/log/ and run for 3-5 minutes) 2. A plot with CPU temperature obtained by the script for 3 tested devices (2 of them always reboots after 2-3 minutes, 1 always survived; Y axis is temperature given as Celsius*1000) According to my knowledge the problem was redirected to infrastructure team . It looks like some hardware problem with CPU cooling.
,
Oct 24
,
Oct 24
The bad DUTs were decommed and new DUTs installed https://b.corp.google.com/issues/117520498 https://b.corp.google.com/issues/116797710
,
Nov 5
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by ka...@chromium.org
, Sep 27