Stateful encryption related formats breaking FSI testing on selected boards post R53. |
|||||
Issue descriptionWhere the issue happened: Canary veyron_minnie-release What the issue was: Canary veyron_minnie-release failed at autoupdate_EndToEndTest.paygen_au_canary_full where update_stateful(https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/client/common_lib/cros/autoupdater.py?rcl=745b8167a5a346742905c7b4d8b74ec722d56314&l=516) failed. The DUT was no longer pingable after the failure of udpate_stateful() . When the issue started: This failure has been there since build #1184 (see https://chromegw.corp.google.com/i/chromeos/builders/veyron_minnie-release). Error messages from https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/121595781-chromeos-test/chromeos4-row9-rack9-host3/autoupdate_EndToEndTest.paygen_au_canary_full/debug/: 06/05 07:47:32.358 DEBUG| abstract_ssh:0390| Trying scp. 06/05 07:47:32.359 DEBUG| ssh_host:0284| Running (ssh) 'ls "/tmp/sysinfo/autoserv-idJ0Y7/results/default/"*' 06/05 07:47:32.836 DEBUG| utils:0298| [stderr] ls: cannot access /tmp/sysinfo/autoserv-idJ0Y7/results/default/*: No such file or directory 06/05 07:47:32.839 DEBUG| ssh_host:0284| Running (ssh) 'ls "/tmp/sysinfo/autoserv-idJ0Y7/results/default/".[!.]*' 06/05 07:47:33.296 DEBUG| utils:0298| [stderr] ls: cannot access /tmp/sysinfo/autoserv-idJ0Y7/results/default/.[!.]*: No such file or directory 06/05 07:47:33.300 DEBUG| server_job:1372| Client state file /usr/local/autotest/results/121595781-chromeos-test/chromeos4-row9-rack9-host3/control.autoserv.state not found 06/05 07:47:33.304 DEBUG| base_job:0399| Persistent state client.* deleted 06/05 07:47:33.305 DEBUG| autotest:0966| Autotest job finishes. 06/05 07:47:33.306 ERROR| log:0027| post-test iteration server sysinfo error: 06/05 07:47:33.307 ERROR| traceback:0013| Traceback (most recent call last): 06/05 07:47:33.307 ERROR| traceback:0013| File "/usr/local/autotest/client/common_lib/log.py", line 25, in decorated_func 06/05 07:47:33.308 ERROR| traceback:0013| fn(*args, **dargs) 06/05 07:47:33.309 ERROR| traceback:0013| File "/usr/local/autotest/server/test.py", line 71, in wrapper 06/05 07:47:33.309 ERROR| traceback:0013| func(self, mytest, host, at, outputdir) 06/05 07:47:33.310 ERROR| traceback:0013| File "/usr/local/autotest/server/test.py", line 216, in after_iteration_hook 06/05 07:47:33.311 ERROR| traceback:0013| results_dir=self.job.resultdir) 06/05 07:47:33.312 ERROR| traceback:0013| File "/usr/local/autotest/server/autotest.py", line 381, in run 06/05 07:47:33.312 ERROR| traceback:0013| client_disconnect_timeout, use_packaging=use_packaging) 06/05 07:47:33.313 ERROR| traceback:0013| File "/usr/local/autotest/server/autotest.py", line 464, in _do_run 06/05 07:47:33.314 ERROR| traceback:0013| client_disconnect_timeout=client_disconnect_timeout) 06/05 07:47:33.315 ERROR| traceback:0013| File "/usr/local/autotest/server/autotest.py", line 950, in execute_control 06/05 07:47:33.315 ERROR| traceback:0013| raise error.AutotestRunError(msg) 06/05 07:47:33.316 ERROR| traceback:0013| AutotestRunError: Aborting - unexpected final status message from client on chromeos4-row9-rack9-host3 06/05 07:47:33.317 ERROR| traceback:0013| 06/05 07:47:33.317 DEBUG| test:0396| after_iteration_hooks completed 06/05 07:47:33.318 WARNI| test:0616| The test failed with the following exception Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/test.py", line 610, in _exec _call_test_function(self.execute, *p_args, **p_dargs) File "/usr/local/autotest/client/common_lib/test.py", line 818, in _call_test_function return func(*args, **dargs) File "/usr/local/autotest/client/common_lib/test.py", line 471, in execute dargs) File "/usr/local/autotest/client/common_lib/test.py", line 348, in _call_run_once_with_retry postprocess_profiled_run, args, dargs) File "/usr/local/autotest/client/common_lib/test.py", line 381, in _call_run_once self.run_once(*args, **dargs) File "/usr/local/autotest/server/site_tests/autoupdate_EndToEndTest/autoupdate_EndToEndTest.py", line 1818, in run_once test_platform.prep_device_for_update(test_conf['source_release']) File "/usr/local/autotest/server/site_tests/autoupdate_EndToEndTest/autoupdate_EndToEndTest.py", line 1151, in prep_device_for_update self._staged_urls.source_stateful_url) File "/usr/local/autotest/server/site_tests/autoupdate_EndToEndTest/autoupdate_EndToEndTest.py", line 985, in _install_source_version stateful_url, True) File "/usr/local/autotest/server/site_tests/autoupdate_EndToEndTest/autoupdate_EndToEndTest.py", line 942, in _update_via_test_payloads perform_update(stateful_url, True) File "/usr/local/autotest/server/site_tests/autoupdate_EndToEndTest/autoupdate_EndToEndTest.py", line 923, in perform_update updater.update_stateful(clobber=clobber) File "/usr/local/autotest/client/common_lib/cros/autoupdater.py", line 538, in update_stateful raise update_error StatefulUpdateError: Failed to perform stateful update on chromeos4-row9-rack9-host3
,
Jun 5 2017
,
Jun 5 2017
When, exactly, was the encrypted stateful support landed? Is there any chance it wasn't present in R53 8530.96.0?
,
Jun 6 2017
This sort of failure is happening across a bunch of boards: https://luci-milo.appspot.com/buildbot/chromeos/gandof-release/ https://luci-milo.appspot.com/buildbot/chromeos/quawks-release/ https://luci-milo.appspot.com/buildbot/chromeos/samus-release/ https://luci-milo.appspot.com/buildbot/chromeos/squawks-release/ https://luci-milo.appspot.com/buildbot/chromeos/veyron_jerry-release/ https://luci-milo.appspot.com/buildbot/chromeos/veyron_mickey-release/ https://luci-milo.appspot.com/buildbot/chromeos/veyron_minnie-release/ All of the tests that have failed have been autoupdate_EndToEndTest_paygen_au_dev_full_8530.96.0
,
Jun 6 2017
Looking into the log: 06/05 07:46:56.192 WARNI|autoupdate_EndToEn:0983| Device has been powerwashed, need to reinstall stateful from http://100.115.219.136:8082/static/stable-channel/veyron-minnie/8530.96.0/stateful.tgz And it fails, leading to: 06/05 07:47:30.296 DEBUG| site_autotest:0194| bash: /tmp/sysinfo/autoserv-idJ0Y7/bin/autotestd_monitor: /usr/bin/python: bad interpreter: No such file or directory So indeed, the device has been powerwash, and we can not recovery from this. user space changes for ext4 crypto are indeed in R53, but not the 3.14 kernel changes: Basic Kernel changes for ext4 in 3.14 are not in R53, only R54. 3.18 changes are in R51, 4.4 in R52.
,
Jun 6 2017
This is worse on quawks and squawks were we don't reboot: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/121810418-chromeos-test/chromeos4-row7-rack3-host19/debug/ and https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/121810563-chromeos-test/chromeos4-row10-rack10-host1/debug/ I am suspecting these machines revert to 3.10 (testing) were we crash very early at boot. We will have to set the stepping stone to a very recent image for these machines.
,
Jun 6 2017
Confirmed that squawks move to 4.4 from 3.10 on 1/17, then N (directory encryption) on 3/17. If we revert to 3.10 bad things will happen. - I can fix 3.10 to fail mount nicely, allowing a power wash. It will not fix the issue at hand, but we would fail more cleanly. - We need to find a stepping stone between 1/17 - 3/17 for these machines.
,
Jun 6 2017
Gwendal: Given that all Bay Trail boards have moved to v4.4, fixing 3.10 won't actually be deployed anywhere where it matters, will it? 3.10 on those boards only exist as a historical point, and we can't exactly push an auto update to M56 and before.
,
Jun 6 2017
,
Jun 6 2017
,
Jun 6 2017
Umbrella bugs have been created for the devices moved from 3.10 to 4.4: https://bugs.chromium.org/p/chromium/issues/detail?id=730141 3.14 to 4.4: https://bugs.chromium.org/p/chromium/issues/detail?id=730134
,
Jun 6 2017
Blocking on the (months old) original bug where we tried to fix and then workaround this problem. I'm still trying to understand exactly what we could do differently in autotest to recover properly from this class of problems. If anyone has a good idea and 15 minutes to describe it to me, please ping me on chat.
,
Jun 8 2017
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by bleung@chromium.org
, Jun 5 2017Owner: gwendal@chromium.org