[Daisy] "Missing var or dev_image in stateful payload" when applying stateful in AU tests |
|||||||||||||||||
Issue descriptionDuring signoff for this build https://cros-goldeneye.corp.google.com/chromeos/console/qaRelease?releaseName=M71-BETA-CHROMEOS-5 We noticed that one daisy AU test failed 11 times: https://stainless.corp.google.com/search?view=matrix&row=test&col=build&first_date=2018-11-25&last_date=2018-11-29&suite=%5Epaygen_au_beta&build=R71-11151.45.0&board=daisy&exclude_cts=false&exclude_not_run=false&exclude_non_release=true&exclude_au=false&exclude_acts=true&exclude_retried=true&exclude_non_production=false Looking at the logs it failed to apply the stateful.tgz every time The logs are 2018/11/27 16:49:38.516 INFO | auto_updater:0813| Updating stateful partition... 2018/11/27 16:49:38.517 DEBUG| cros_build_lib:0586| RunCommand: ssh -p 22 '-oConnectionAttempts=4' '-oUserKnownHostsFile=/dev/null' '-oProtocol=2' '-oConnectTimeout=30' '-oServerAliveCountMax=3' '-oStrictHostKeyChecking=no' '-oServerAliveInterval=10' '-oNumberOfPasswordPrompts=0' '-oIdentitiesOnly=yes' -i /tmp/ssh-tmpApvXWz/testing_rsa root@chromeos4-row9-rack7-host11 -- sh /mnt/stateful_partition/unencrypted/preserve/cros-update/tmp.sSu733jQDY/stateful_update /mnt/stateful_partition/unencrypted/preserve/cros-update/tmp.sSu733jQDY/stateful.tgz Reading local payload /mnt/stateful_partition/unencrypted/preserve/cros-update/tmp.sSu733jQDY/stateful.tgz Successfully retrieved update. Missing var or dev_image in stateful payload. 2018/11/27 16:51:38.845 ERROR| auto_updater:0817| Stateful update failed.
,
Dec 6
,
Dec 6
Doesn't seem to be happening on recent betas for Daisy I will leave this bug open in case we see it again
,
Dec 6
Is it possible we're generating the payload from a recovery image? they don't have those two directories.
,
Dec 7
This is happening on 72 dev now too
,
Dec 7
Looking at an example log: https://00e9e64bace9eed4ab467afc3382e4e2252f3a7385f7d70f50-apidata.googleusercontent.com/download/storage/v1/b/chromeos-autotest-results/o/261987635-abodeti%2Fchromeos4-row9-rack6-host21%2Fautoupdate_logs%2FCrOS_update_chromeos4-row9-rack6-host21_27597.log?qk=AD5uMEsCHjciPgKRzFRsziZs4JVT7se_nOD354RAhgWfqDnPWYasErjUm0g5V4ifoGl5Occ9---iYv8NC6T8yKmsraWpFsRdL1xa09UzBXL92Bo86CtbTmgwO27jcyjetDpCXi8aojoNo5zGa36CzGtjJwmcJAcpDUCcYUjIQcHqf-VWtvpFLOK7_CwZwwJpGG1CkFV0QqPqRZ2yC6lzWgXy-7ic2GB6O-7bA_5WGcbJmjcZwO8NylaySuP8ikL-hb8-5xOvLBDP-HeV59Z8lZGYURoJxgvDpZ-1PiZ51fcN5AXM_Qxu6LmA2Ase_CRp6IrZt0f25G-G7ulk8ZORk06BGzeLjySbP93rcHfIIdDogFCGV1gpr70ypxZaRLSh09ErSEtD0omL3aU8AYECoU1rqwyXehhr5dkPCskJUmGsBvOgyRSgFM8iU6ZYj-hiIxR4t9UOK-Se_74rpagstY0TlmLIqN5r1w6d8pbbKfO0W9fDfeKDy2uF63WLeAD0hqFZs0Y8e8J0DF1oO0ir8Xm320rj-rAVeVMbba3ONFQ30O_OUYiNA3GeqlrIwHisggUP0qwQFhZAn6GKPsoiLtIbH1lD7-ks956gm8xdEamt6Z9dAf0AdZ_q4vTdDe5Q4s49S-vWwFecI39YLoEnye11-dBWReOV76B4ixMp4-fWOUf2eBZBa8spu4qmzWLk5iJ_dniAXVukqVcIS-EhZNALSIeTGm33ACk38ZQc7P6WyFMHLQG5RZAiimuMapKMqRSmHUaPRU5w2JI0PMJJmPkFgDOpc2in1mB5SEQK50aCoORWaRJQY8J57dSUCUcU8WvlaDQ-MZeY7Naax0aZNunlxdTdOwtQhRWuKa9vXJYw0y-AOfTdZI2xfjT6G2qucqe6ZXSUzFFLDJi6Kv6xvTJXOAqhxJHJCQ The stateful.tgz is being copied from: 2018/11/28 11:46:21.072 INFO | remote_access:0831| [mode:scp] copy: /home/chromeos-test/images/beta-channel/daisy-skate/11151.45.0/stateful.tgz -> chromeos4-row9-rack6-host21:/mnt/stateful_partition/unencrypted/preserve/cros-update/tmp.QFAfAncIi0/ I'm assuing that is from release builders. And looking at: https://pantheon.corp.google.com/storage/browser/chromeos-releases/beta-channel/daisy-skate/11151.45.0/stateful.tgz It seems it has both of those directories exist in the stateful file. Can it be the stateful is coming from somewhere else?
,
Dec 7
The test starts by staging the payload and stateful.tgz from gs://chromeos-releases/beta-channel/daisy-skate/<build> 11/28 11:32:46.046 INFO | dev_server:1255| Staging artifacts on devserver http://100.115.245.200:8082: build=beta-channel/daisy/11151.45.0, artifacts=None, files=['stateful.tgz'], archive_url=gs://chromeos-releases/beta-channel/daisy/11151.45.0 11/28 11:32:46.048 DEBUG| utils:0219| Running 'ssh 100.115.245.200 'curl "http://100.115.245.200:8082/stage?artifacts=&files=stateful.tgz&async=True&archive_url=gs://chromeos-releases/beta-channel/daisy/11151.45.0"'' 11/28 11:32:47.136 DEBUG| dev_server:1197| response for RPC: 'Success' 11/28 11:32:47.137 DEBUG| utils:0219| Running 'ssh 100.115.245.200 'curl "http://100.115.245.200:8082/is_staged?artifacts=&files=stateful.tgz&archive_url=gs://chromeos-releases/beta-channel/daisy/11151.45.0"'' 11/28 11:32:48.154 DEBUG| dev_server:1153| whether artifact is staged: 'True' 11/28 11:32:48.155 INFO | dev_server:1273| Finished staging artifacts: build=beta-channel/daisy/11151.45.0, artifacts=None, files=['stateful.tgz'], archive_url=gs://chromeos-releases/beta-channel/daisy/11151.45.0 /home/chromeos-test/images/beta-channel/daisy-skate/11151.45.0/stateful.tgz is the static folder on the devserver where staged files are put. The test then copies it from there to the DUT to use: 018/11/28 11:45:10.312 INFO | auto_updater:0645| Copying target stateful payload to device... 2018/11/28 11:45:10.313 INFO | remote_access:0831| [mode:scp] copy: /home/chromeos-test/images/beta-channel/daisy/11151.45.0/stateful.tgz -> chromeos2-row4-rack7-host14:/mnt/stateful_partition/unencrypted/preserve/cros-update/tmp.E1Sz4UEAH4/
,
Dec 7
I've staged it again using the call in the test and you're correct it has the two directories: http://100.115.245.200:8082/static/beta-channel/daisy/11151.45.0/
,
Dec 10
I reran the test from my desk against daisy in the lab and it failed for the same reason. At each step in the test I checked the stateful.tgz by running tar --list --file=stateful.tgz 1. Staged on devserver: Had the two folders 2. Copied to DUT: had the folders 3. During rootfs: Had the folders 4. During stateful update: Had the folders But still the test failed. Problem with the stateful_update script? Or something changed recently in the payload that the script doesn't know how to deal with? I think Daisy is EOL soon so we probably need this fixed. Upping priority
,
Dec 11
The last change to the stateful_update was in 2017 and before that in 2015. So probably not coming from there. Even if it was, it should've affected possibly all devices. There has been changes to the cros_generate_stateful_update_payload. but not anything that could break this. And again it should've break many other things. Let me lock a DUT to see if I can reproduce
,
Dec 11
boom, found the problem. timeout issue here: https://chromium.git.corp.google.com/chromiumos/platform/dev-util/+/667a0cb6a3f1fa139476166c0abe12e8c1050281/stateful_update#101 I did a test on a DUT it took about 2 minutes and 51 seconds to untar the stateful. Probably stateful has grown and the daisy has slow disk speed or a combination of these factors. I'll send a patch increasing the timeout to four minutes. so not to hit this condition anytime soon.
,
Dec 11
,
Dec 12
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/dev-util/+/3e3f5d022658adfed300971592a345a2426f2f10 commit 3e3f5d022658adfed300971592a345a2426f2f10 Author: Amin Hassani <ahassani@chromium.org> Date: Wed Dec 12 03:33:56 2018 stateful_update: Increase the timeout stateful update timeout Daisy-stake devices are failing because it takes more than two minutes to unzip the update.tgz and write it to the stateful (It takes 2m51 on a DUT with the payload in /tmp). Increase this timeout so we don't hit this problem again if the size of the stateful update grows or the disk is slow. Also added check for the exit code of the call that does the actual work so we know if it fails for any other reason. BUG= chromium:912705 TEST=deployed to a DUT and stateful_update passed successfully. Change-Id: Ib7479f5b46be83fd0c2bdc062e42c211b1229c44 Reviewed-on: https://chromium-review.googlesource.com/1370869 Commit-Ready: Amin Hassani <ahassani@chromium.org> Tested-by: Amin Hassani <ahassani@chromium.org> Reviewed-by: David Haddock <dhaddock@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> [modify] https://crrev.com/3e3f5d022658adfed300971592a345a2426f2f10/stateful_update
,
Dec 12
Does this need a merge request or similar for M71? Otherwise board will miss stable.
,
Dec 12
dhaddock@ do you want me to merge this in to M72 and M71?
,
Dec 12
Yes please!
,
Dec 12
,
Dec 12
This bug requires manual review: Request affecting a post-stable build Please contact the milestone owner if you have questions. Owners: benmason@(Android), kariahda@(iOS), kbleicher@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Dec 12
Approving merge for M71 and M72 Chrome OS.
,
Dec 12
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/dev-util/+/81df2e5622015ab9386a04a7c106de728d86fa7f commit 81df2e5622015ab9386a04a7c106de728d86fa7f Author: Amin Hassani <ahassani@chromium.org> Date: Wed Dec 12 18:14:14 2018 stateful_update: Increase the timeout stateful update timeout Daisy-stake devices are failing because it takes more than two minutes to unzip the update.tgz and write it to the stateful (It takes 2m51 on a DUT with the payload in /tmp). Increase this timeout so we don't hit this problem again if the size of the stateful update grows or the disk is slow. Also added check for the exit code of the call that does the actual work so we know if it fails for any other reason. BUG= chromium:912705 TEST=deployed to a DUT and stateful_update passed successfully. Change-Id: Ib7479f5b46be83fd0c2bdc062e42c211b1229c44 Reviewed-on: https://chromium-review.googlesource.com/1370869 Commit-Ready: Amin Hassani <ahassani@chromium.org> Tested-by: Amin Hassani <ahassani@chromium.org> Reviewed-by: David Haddock <dhaddock@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> (cherry picked from commit 3e3f5d022658adfed300971592a345a2426f2f10) Reviewed-on: https://chromium-review.googlesource.com/c/1373953 Reviewed-by: Amin Hassani <ahassani@chromium.org> Commit-Queue: Amin Hassani <ahassani@chromium.org> [modify] https://crrev.com/81df2e5622015ab9386a04a7c106de728d86fa7f/stateful_update
,
Dec 12
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/dev-util/+/a6150c8ec028a53182258792eb4ac1632b7b095a commit a6150c8ec028a53182258792eb4ac1632b7b095a Author: Amin Hassani <ahassani@chromium.org> Date: Wed Dec 12 18:14:15 2018 stateful_update: Increase the timeout stateful update timeout Daisy-stake devices are failing because it takes more than two minutes to unzip the update.tgz and write it to the stateful (It takes 2m51 on a DUT with the payload in /tmp). Increase this timeout so we don't hit this problem again if the size of the stateful update grows or the disk is slow. Also added check for the exit code of the call that does the actual work so we know if it fails for any other reason. BUG= chromium:912705 TEST=deployed to a DUT and stateful_update passed successfully. Change-Id: Ib7479f5b46be83fd0c2bdc062e42c211b1229c44 Reviewed-on: https://chromium-review.googlesource.com/1370869 Commit-Ready: Amin Hassani <ahassani@chromium.org> Tested-by: Amin Hassani <ahassani@chromium.org> Reviewed-by: David Haddock <dhaddock@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> (cherry picked from commit 3e3f5d022658adfed300971592a345a2426f2f10) Reviewed-on: https://chromium-review.googlesource.com/c/1373952 Reviewed-by: Amin Hassani <ahassani@chromium.org> Commit-Queue: Amin Hassani <ahassani@chromium.org> [modify] https://crrev.com/a6150c8ec028a53182258792eb4ac1632b7b095a/stateful_update
,
Dec 12
,
Dec 17
We are still hitting this. Guess it needs a devserver push to take effect
,
Dec 17
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible! If all merges have been completed, please remove any remaining Merge-Approved labels from this issue. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Dec 17
,
Dec 17
,
Dec 18
This bug is tagged as fixed yet #23 indicates otherwise and the merge was aborted. Raising to a P0 since this is blocking boards from being pushed to M71 Chrome OS stable. Please advise on status and next steps. Thanks.
,
Dec 18
Devserver push worked for this change. Since then the last three AU test runs on daisy are green (hasn't happened in forever) https://stainless.corp.google.com/search?view=matrix&row=board_model&col=build&first_date=2018-12-14&last_date=2018-12-18&suite=%5Epaygen_au_dev&board=daisy&exclude_cts=false&exclude_not_run=false&exclude_non_release=true&exclude_au=false&exclude_acts=true&exclude_retried=true&exclude_non_production=false
,
Dec 18
|
|||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||
Comment 1 by dhaddock@google.com
, Dec 6