New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 838746 link

Starred by 5 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Sometimes OOBE is not reset after clearing TPM

Project Member Reported by dhadd...@chromium.org, May 2 2018

Issue description

Amin, do you know why this wouldn't work? Or is there another way of resetting update_engines 'has oobe completed' setting?
I think if you look at:
https://cs.corp.google.com/chromeos_public/src/aosp/system/update_engine/hardware_chromeos.cc

lines 132 and 47. It will read a config from "/home/chronos/.oobe_completed". Is that what you are thinking?
I was wondering if there was something else that keeps state because the TPM reset code also deletes that file and others:

https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/client/common_lib/cros/tpm_utils.py?type=cs&q=ClearTPMServer+autotest&l=12
I only see four conditions here:

https://chromium.git.corp.google.com/aosp/platform/system/update_engine/+/1f899d57c3d10f10f2d1324c116d3edd73ed6d7d/omaha_request_action.cc#1245

- OOBE is enable: I think it is for boards we are testing
- OOBE is completed: The only way this is that 'stat' is returning true which means the file exist.
- deadline is empty which is based on the response.
- app_version=ForcedUpdate which I don't see it in any of the logs.

One of these conditions is getting false. My suspicion is second one maybe somehow cached.

In worse case scenario if we cannot find out the cause, we can add some temporary log to see which one is failing.
I definitely think the state is being cached somehow (or not being properly rest in the first place by the reset TPM code). In the failing result you can see that it prints a current request signature at the start vs no current request signature in the passing result. 

Then later the failing result says:
[0501/094856:INFO:prefs.cc(122)] install-date-days not present in /var/lib/update_engine/prefs
[0501/094856:INFO:omaha_request_action.cc(746)] Set the Omaha InstallDate from OOBE time-stamp to 4137 days

So it seems like OOBE is marked as done somewhere. 

I can add a check in the test too to see if that oobe_complete file is still around

Yeah, that is even better. If it didn't work, we can log the update engine itself to see if there is another root cause.
Project Member

Comment 7 by bugdroid1@chromium.org, May 4 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/84ef864b058b5cbaa39f1a7a3125470bab8d7370

commit 84ef864b058b5cbaa39f1a7a3125470bab8d7370
Author: David Haddock <dhaddock@chromium.org>
Date: Fri May 04 10:02:46 2018

Check if oobe_completed marker file is actually deleted.

Sometimes this test fails and we're not sure if it is because this
oobe_completed marker file still exists despite TPM being reset.

BUG=chromium:838746
TEST=autoupdate_NonBlockingOOBEUpdate

Change-Id: I853fcd6085685d97ad885e02c8e1470c85492e8a
Reviewed-on: https://chromium-review.googlesource.com/1040908
Commit-Ready: David Haddock <dhaddock@chromium.org>
Tested-by: David Haddock <dhaddock@chromium.org>
Reviewed-by: Amin Hassani <ahassani@chromium.org>

[modify] https://crrev.com/84ef864b058b5cbaa39f1a7a3125470bab8d7370/server/site_tests/autoupdate_NonBlockingOOBEUpdate/autoupdate_NonBlockingOOBEUpdate.py

OK so it is as we discussed. Failing tests still have this file: /home/chronos/.oobe_completed 

05/06 23:52:05.260 DEBUG|          ssh_host:0301| Running (ssh) 'ls /home/chronos/.oobe_completed' from 'execute|_call_run_once_with_retry|_call_run_once|run_once|run|run_very_slowly'
05/06 23:52:05.545 DEBUG|             utils:0282| [stdout] /home/chronos/.oobe_completed


Passing tests do not
05/07 13:16:20.194 DEBUG|          ssh_host:0301| Running (ssh) 'ls /home/chronos/.oobe_completed' from 'execute|_call_run_once_with_retry|_call_run_once|run_once|run|run_very_slowly'
05/07 13:16:20.514 DEBUG|             utils:0282| [stderr] ls: cannot access '/home/chronos/.oobe_completed': No such file or directory
Summary: Sometimes OOBE is not reset after clearing TPM (was: OOBE is not reset after clearing TPM )
Log here:
https://storage.cloud.google.com/chromeos-autotest-results/198000146-chromeos-test/chromeos2-row3-rack9-host13/debug/autoserv.DEBUG?_ga=2.9302389.-1792747920.1525742232

The lines that matter showing the files are deleted, we reboot, and the file is back: 


05/06 23:51:33.668 DEBUG|          ssh_host:0301| Running (ssh) 'sudo rm -rf /home/.shadow/* /home/chronos/.oobe_completed /home/chronos/Local\ State /var/cache/app_pack /var/cache/shill/default.profile /var/lib/tpm /var/lib/whitelist/* ' from '_call_run_once|run_once|ClearTPMOwnerRequest|CleanupAndReboot|run|run_very_slowly'
05/06 23:51:33.996 DEBUG|          ssh_host:0301| Running (ssh) 'sync' from '_call_run_once|run_once|ClearTPMOwnerRequest|CleanupAndReboot|run|run_very_slowly'
05/06 23:51:34.798 DEBUG|          ssh_host:0301| Running (ssh) 'cat /etc/lsb-release' from 'CleanupAndReboot|reboot|get_board|parse_cmd_output|run|run_very_slowly'
05/06 23:51:35.144 INFO |        server_job:0218| 	START	----	reboot	timestamp=1525675895	localtime=May 06 23:51:35	
05/06 23:51:35.145 INFO |        server_job:0218| 		GOOD	----	reboot.start	timestamp=1525675895	localtime=May 06 23:51:35	
05/06 23:51:35.158 DEBUG|          ssh_host:0301| Running (ssh) 'if [ -f '/proc/sys/kernel/random/boot_id' ]; then cat '/proc/sys/kernel/random/boot_id'; else echo 'no boot_id available'; fi' from 'log_op|run_op|reboot|get_boot_id|run|run_very_slowly'
05/06 23:51:35.461 DEBUG|             utils:0282| [stdout] 8fe90027-36df-4c35-83e2-27094458b610
05/06 23:51:35.509 DEBUG|          ssh_host:0301| Running (ssh) '( sleep 1; reboot & sleep 10; reboot -f ) </dev/null >/dev/null 2>&1 & echo -n $!' from 'log_op|run_op|reboot|run_background|run|run_very_slowly'
05/06 23:51:35.834 DEBUG|             utils:0301| [stdout] 8123
05/06 23:51:35.836 DEBUG|      abstract_ssh:0726| Host chromeos2-row3-rack9-host13 pre-shutdown boot_id is 8fe90027-36df-4c35-83e2-27094458b610
05/06 23:51:35.847 DEBUG|          ssh_host:0301| Running (ssh) 'if [ -f '/proc/sys/kernel/random/boot_id' ]; then cat '/proc/sys/kernel/random/boot_id'; else echo 'no boot_id available'; fi' from 'op_func|wait_for_restart|wait_down|get_boot_id|run|run_very_slowly'
05/06 23:51:36.139 DEBUG|             utils:0282| [stdout] 8fe90027-36df-4c35-83e2-27094458b610
05/06 23:51:37.193 DEBUG|          ssh_host:0301| Running (ssh) 'if [ -f '/proc/sys/kernel/random/boot_id' ]; then cat '/proc/sys/kernel/random/boot_id'; else echo 'no boot_id available'; fi' from 'op_func|wait_for_restart|wait_down|get_boot_id|run|run_very_slowly'
05/06 23:51:47.221 WARNI|             utils:0925| run process timeout (10) fired on: /usr/bin/ssh -a -x   -o Protocol=2 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -l root -p 22 chromeos2-row3-rack9-host13 "export LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::wait_down|get_boot_id|run] -> ssh_run(if [ -f '/proc/sys/kernel/random/boot_id' ]; then cat '/proc/sys/kernel/random/boot_id'; else echo 'no boot_id available'; fi)\";fi; if [ -f '/proc/sys/kernel/random/boot_id' ]; then cat '/proc/sys/kernel/random/boot_id'; else echo 'no boot_id available'; fi"
05/06 23:51:48.225 DEBUG|      abstract_ssh:0748| Host chromeos2-row3-rack9-host13 is now unreachable over ssh, is down
05/06 23:51:48.235 DEBUG|          ssh_host:0301| Running (ssh) 'true' from 'wait_for_restart|wait_up|is_up|ssh_ping|run|run_very_slowly'
05/06 23:52:04.586 DEBUG|      abstract_ssh:0670| Host chromeos2-row3-rack9-host13 is now up
05/06 23:52:04.587 INFO |        server_job:0218| 		GOOD	----	reboot.verify	timestamp=1525675924	localtime=May 06 23:52:04	
05/06 23:52:04.604 DEBUG|          ssh_host:0301| Running (ssh) 'ls /bin/uname >/dev/null 2>&1' from 'log_op|run_op|get_kernel_ver|must_be_installed|run|run_very_slowly'
05/06 23:52:04.917 DEBUG|          ssh_host:0301| Running (ssh) '/bin/uname -r' from 'reboot|log_op|run_op|get_kernel_ver|run|run_very_slowly'
05/06 23:52:05.198 DEBUG|             utils:0282| [stdout] 3.14.0
05/06 23:52:05.244 INFO |        server_job:0218| 	END GOOD	----	reboot	kernel=3.14.0	localtime=May 06 23:52:05	timestamp=1525675925	
05/06 23:52:05.260 DEBUG|          ssh_host:0301| Running (ssh) 'ls /home/chronos/.oobe_completed' from 'execute|_call_run_once_with_retry|_call_run_once|run_once|run|run_very_slowly'
05/06 23:52:05.545 DEBUG|             utils:0282| [stdout] /home/chronos/.oobe_completed
Cc: rspangler@chromium.org mruthven@chromium.org
Components: OS>Firmware
Adding some firmware folks for any opinion.

Cc: vbendeb@chromium.org apronin@chromium.org
Ping :) 
Cc: ahass...@chromium.org
Owner: apronin@chromium.org
Please, feel free to assign to someone else on the firmware team if this doesn't apply.

Thanks.
I'm not on a firmware team, but I'm a bit confused: why is it classified as a firmware problem?

Do I get it right that the issue is:
1) delete /home/chronos/.oobe_completed
2) reboot
3) ls /home/chronos/.oobe_completed - and it's suddenly there?

So, either (A) Chrome marks OOBE complete between steps 1 and 2 (or 2 and 3). Or, (B) it's a filesystem issue: a deleted file reappears after remounting on reboot. There is also (C) mount-encrypted that mounts encrypted stateful partition, and after the tpm is cleared it may still mount the old partition if OOBE didn't get to the point yet where it moves the encryption key from disk to tpm. But .oobe_encrypted was deleted on that old partition on step 1, so I can't come up with the scenario for mount-encrypted to be the cause for that. 

I will take a look at the logs to see any hints for (A), (B) or (C), but may be I'm missing something? Are there also some firmware-related symptoms? 

Is there a chance Chrome is still going through OOBE initialization in the background when we decide to delete .oobe_completed and reboot? That might explain it (reason (A)). Do you know if .oobe_completed already exists on step 1?
What's the timestamp on .oobe_completed that we suddenly see on step 3? 
@dhaddock: at what point in the flow we are clearing the tpm? apronin@ might be right if the OOBE is in progress when we clear the tmp owner and it is just a race condition. But that only happens if device delete those files before the reboot. I don't know much about the boot process, but my guess is these files will be deleted after the reboot when the boot process sees clear_tpm_owner_request=1. Is that a correct assumption? If they are deleted right on the reboot command, there might be a slight chance that the file is written by oobe process. But that is an extreme case and doesn't answer the large number of failures in this case.

I tried to do the same process on DUT and cannot reproduce it. Is there any specific board that is failing on this? Any way we can reproduce this locally?
We set clear_tpm_owner_request=1 
Then specifically delete the .oobe_completed file (and other files)
Then reboot
Then check the .oobe_completed file is still deleted
Then try to do an oobe update 

Failures seem to be scattered randomly throughout:
https://stainless.corp.google.com/search?test=%5Eautoupdate%5C_NonBlockingOOBEUpdate%5C.delta%24&exclude_non_release=true&exclude_cts=true&col=build&row=board_model&view=matrix&first_date=2018-05-31&last_date=2018-06-06
Re #16: Ah yes, that's right that typically /home/chronos is cleared after clearing the tpm owner.

But crossystem clear_tpm_owner_request=1 does not automatically lead to wiping  files in /home/chronos/. Those files are wiped if on the post-clear boot mount-encrypted fails to mount encstateful partition and recreates it instead (encstateful includes /var and /home/chronos).

But if encstateful key is still on disk, it will be used for mounting that partition even if the tpm owner is cleared. Having encstateful key on disk should be rare in tpm 2.0 cases (though possible). But for tpm 1.2 cases, the key is on disk until tpm initialization is complete (moved from disk to tpm at install_attributes writing time, after the 1st login).

Will look at the logs next.
Status: Assigned (was: Untriaged)
Any update on this issue?
Cc: alemate@chromium.org
1) Just to clarify since it's not mentioned anywhere on the bug: the tests fail with "Update did not fail with kNonCriticalUpdateInOOBE".

2) I looked at quawks, R68-10635.0.0 failure (https://stainless.corp.google.com/search?view=list&first_date=2018-05-01&last_date=2018-05-01&test=%5Eautoupdate%5C_NonBlockingOOBEUpdate%5C.delta%24&build=%5ER68%5C-10635%5C.0%5C.0%24&board=quawks&status=FAIL&status=ERROR&status=ABORT&exclude_cts=true&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false), which is the one mentioned in the original bug description as the failing test.

In the update_engine.log, I see this:
[0501/094856:INFO:omaha_request_action.cc(746)] Set the Omaha InstallDate from OOBE time-stamp to 4137 days

which means we reached this point:
http://cs/chromeos_public/src/aosp/system/update_engine/omaha_request_action.cc?l=759

So, here (http://cs/chromeos_public/src/aosp/system/update_engine/omaha_request_action.cc?l=738), IsOOBEEnabled() and IsOOBEComplete(0 are both true.


And even before that, comparing the successful & unsuccessful logs from the bug descr, here's where we should have failed with non-critical update before OOBE, but continued instead:
[0501/094837:INFO:update_manager-inl.h(52)] ChromeOSPolicy::UpdateCheckAllowed: START
[0501/094837:INFO:official_build_check_policy_impl.cc(30)] Unofficial build, blocking periodic update checks.
[0501/094837:INFO:policy_utils.h(58)] decision by OnlyUpdateOfficialBuildsPolicyImpl::UpdateCheckAllowed
[0501/094837:INFO:update_manager-inl.h(74)] ChromeOSPolicy::UpdateCheckAllowed: END
[0501/094837:INFO:prefs.cc(122)] attempt-in-progress not present in /var/lib/update_engine/prefs
[0501/094837:INFO:update_manager-inl.h(52)] ChromeOSPolicy::P2PEnabled: START
[0501/094837:INFO:update_manager-inl.h(74)] ChromeOSPolicy::P2PEnabled: END
[0501/094837:INFO:update_attempter.cc(1520)] Not starting p2p at startup since it's not enabled.
[0501/094837:INFO:update_manager-inl.h(52)] ChromeOSPolicy::P2PEnabledChanged: START
[0501/094837:INFO:update_manager-inl.h(74)] ChromeOSPolicy::P2PEnabledChanged: END
[0501/094846:INFO:common_service.cc(95)] Attempt update: app_version="" omaha_url="" flags=0x0 interactive=yes RestrictDownload=no 
[0501/094846:INFO:update_attempter.cc(783)] Forced update check requested.
[0501/094846:INFO:update_attempter.cc(1597)] Non-official build; allowing any update source.
[0501/094846:INFO:update_manager-inl.h(52)] ChromeOSPolicy::UpdateCheckAllowed: START
[0501/094846:INFO:interactive_update_policy_impl.cc(36)] Forced update signaled (interactive), allowing update check.
[0501/094846:INFO:policy_utils.h(58)] decision by InteractiveUpdatePolicyImpl::UpdateCheckAllowed
[0501/094846:INFO:update_manager-inl.h(74)] ChromeOSPolicy::UpdateCheckAllowed: END
[0501/094846:INFO:update_attempter.cc(875)] Running interactive update.


So, iiuc, at ~09:48:37 Chrome thinks that OOBE is done already.


I looked at Chrome log (chrome_20180501-094839) for the session that starts at ~09:48:41, and it also seems to think that OOBE is done and proceeds to login-prompt-visible (though, I'm including better experts to judge):
[3194:3194:0501/094841.374428:VERBOSE1:chrome_session_manager.cc(206)] Starting Chrome with login/oobe screen.
[3194:3194:0501/094841.374478:VERBOSE1:login_display_host_webui.cc(1168)] Showing OOBE screen: unknown
[3194:3194:0501/094841.384827:VERBOSE1:login_display_host_webui.cc(1239)] Current locale: 
[3194:3194:0501/094841.385038:VERBOSE1:login_display_host_webui.cc(1264)] Initial locale: en-USkeyboard layout xkb:us::eng
[3194:3194:0501/094841.388512:VERBOSE1:input_events_blocker.cc(20)] InputEventsBlocker 0x91bdc31edd0 created.
[3194:3194:0501/094841.427189:VERBOSE1:drm_display_host_manager.cc(248)] Got display event ADD for /dev/dri/card0
[3194:3194:0501/094841.432660:VERBOSE1:drm_display_host_manager.cc(248)] Got display event ADD for /dev/dri/card1
[3194:3224:0501/094841.432951:VERBOSE1:drm_device_handle.cc(87)] Succeeded authenticating /dev/dri/card1 in 0 ms with 1 attempt(s)
[3194:3194:0501/094841.476941:WARNING:statistics_provider.cc(415)] Requested statistic not found: Product_S/N
[3194:3194:0501/094841.490473:VERBOSE1:login_display_host_webui.cc(483)] Login WebUI >> zero_delay: 0 wait_for_wp_load_: 1 init_webui_hidden_: 0
[3194:3194:0501/094841.490555:VERBOSE1:login_display_host_webui.cc(592)] Login WebUI >> wizard postponed
[3194:3194:0501/094841.490588:VERBOSE1:login_display_host_webui.cc(227)] Initial time zone: America/Los_Angeles
[3194:3194:0501/094841.490651:VERBOSE1:input_events_blocker.cc(29)] InputEventsBlocker 0x91bdc31edd0 destroyed.
[3194:3194:0501/094841.490721:VERBOSE1:automatic_reboot_manager.cc(309)] Rescheduling reboot
[3194:3194:0501/094841.491169:WARNING:statistics_provider.cc(415)] Requested statistic not found: Product_S/N
[3235:3240:0501/094841.500319:VERBOSE1:drm_device_manager.cc(53)] Could not initialize DRM device for /sys/devices/platform/vgem/drm/card1
[3194:3194:0501/094841.538756:WARNING:statistics_provider.cc(415)] Requested statistic not found: Product_S/N
[3194:3194:0501/094841.552402:WARNING:statistics_provider.cc(415)] Requested statistic not found: Product_S/N
[3194:3194:0501/094841.553377:WARNING:statistics_provider.cc(415)] Requested statistic not found: Product_S/N
[3194:3194:0501/094841.921820:VERBOSE1:login_display_host_webui.cc(802)] Login WebUI >> wp animation done
[3194:3194:0501/094841.921889:VERBOSE1:login_display_host_webui.cc(977)] Login WebUI >> Init postponed WebUI
[3194:3194:0501/094841.921927:VERBOSE1:login_display_host_webui.cc(595)] Login WebUI >> wizard
[3194:3194:0501/094841.925304:VERBOSE1:login_display_host_webui.cc(1065)] Login WebUI >> show login wnd on create
[3194:3194:0501/094841.954692:VERBOSE1:wizard_controller.cc(307)] Starting OOBE wizard with screen: unknown
[3235:3240:0501/094842.590535:VERBOSE1:drm_display.cc(104)] DRM configuring: device=/sys/devices/pci0000:00/0000:00:02.0/drm/card0 crtc=38 connector=60 origin=0,0 size=1366x768
[3194:3194:0501/094842.721424:VERBOSE1:wizard_controller.cc(480)] Showing network screen.
[3194:3194:0501/094842.721574:VERBOSE1:wizard_controller.cc(1118)] SetCurrentScreenSmooth: connect
[3194:3194:0501/094842.814457:WARNING:CONSOLE(0)] "Styling master document from stylesheets defined in HTML Imports is deprecated. Please refer to https://goo.gl/EGXzpw for possible migration paths.", source:  (0)
[3194:3194:0501/094845.497391:WARNING:statistics_provider.cc(415)] Requested statistic not found: Product_S/N
[3194:3194:0501/094846.252220:VERBOSE1:signin_screen_handler.cc(1418)] Login WebUI >> loginVisible, src: oobe, webui_visible_: 0
[3194:3194:0501/094846.252619:VERBOSE1:webui_login_view.cc(646)] Login WebUI >> login-prompt-visible
[3194:3194:0501/094846.252733:VERBOSE1:arc_session_runner.cc(151)] Session start requested: MINI_INSTANCE
[3194:3194:0501/094846.252768:VERBOSE1:arc_session_runner.cc(224)] Starting ARC instance
[3194:3194:0501/094846.803259:VERBOSE1:wizard_controller.cc(735)] SkipToUpdateForTesting.
[3194:3194:0501/094846.803741:VERBOSE1:auto_enrollment_controller.cc(161)] Auto-enrollment disabled: command line.
[3194:3194:0501/094846.803794:VERBOSE1:auto_enrollment_controller.cc(283)] New auto-enrollment state: 5
[3194:3194:0501/094846.804390:VERBOSE1:wizard_controller.cc(1035)] StartOOBEUpdate


As far, as I understand, Chrome also creates .oobe_completed file here:
https://cs.chromium.org/chromium/src/chrome/browser/chromeos/login/existing_user_controller.cc?l=992
which happens after guest login. Maybe there are other places like that.


Looking at full logs, I see that there are several Chrome sessions after TPM clear before the one at 09:48:41 (otherwise, we won't have logs from them in encstateful if they happened before tpm clear):


     │        │   │   │   ├── chrome_20180501-092609 [11.7 kB]
     │        │   │   │   ├── chrome_20180501-094034 [8.8 kB]
     │        │   │   │   ├── chrome_20180501-094155 [6.9 kB]
     │        │   │   │   ├── chrome_20180501-094306 [340 Bytes]
     │        │   │   │   ├── chrome_20180501-094308 [9.9 kB]
     │        │   │   │   ├── chrome_20180501-094456 [7.7 kB]
     │        │   │   │   ├── chrome_20180501-094537 [8.6 kB]
     │        │   │   │   ├── chrome_20180501-094744 [7.9 kB]
     │        │   │   │   └── chrome_20180501-094839 [9.4 kB]

And in chrome_20180501-094308  I see 
[4524:4524:0501/094313.742041:VERBOSE1:user_session_manager.cc(518)] Completing guest session login

so we had a guest session.
After that login TPM initialization happens, which means it was not kicked off by some previous OOBE at least. So, probably this guest login is what caused it:

2018-05-01T09:43:08.028440-07:00 INFO session_manager[4442]: [INFO:browser_job.cc(148)] Running child /opt/google/chrome/chrome --ppapi-flash-path=/opt/google/chrome/pepper/libpepflashplayer.so --ppapi-flash-version=29.0.0.171 --ui-prioritize-in-gpu-process --use-gl=egl --enable-native-gpu-memory-buffers --gpu-sandbox-failures-fatal=yes --enable-logging --log-level=1 --use-cras --enable-wayland-server --user-data-dir=/home/chronos --system-developer-mode --login-profile=user --has-chromeos-keyboard --guest-wallpaper-large=/usr/share/chromeos-assets/wallpaper/guest_large.jpg --guest-wallpaper-small=/usr/share/chromeos-assets/wallpaper/guest_small.jpg --child-wallpaper-large=/usr/share/chromeos-assets/wallpaper/child_large.jpg --child-wallpaper-small=/usr/share/chromeos-assets/wallpaper/child_small.jpg --default-wallpaper-large=/usr/share/chromeos-assets/wallpaper/oem_large.jpg --default-wallpaper-small=/usr/share/chromeos-assets/wallpaper/oem_small.jpg --default-wallpaper-is-oem --enable-consumer-kiosk --arc-availability=officially-supported --enterprise-enrollment-initial-modulus=15 --enterprise-enrollment-modulus-limit=19 --login-manager --enable-net-benchmarking --metrics-recording-only --no-default-browser-check --no-first-run --enable-gpu-benchmarking --deny-permission-prompts --autoplay-policy=no-user-gesture-required --disable-component-extensions-with-background-pages --disable-default-apps --disable-search-geolocation-disclosure --enable-smooth-scrolling --enable-threaded-compositing --remote-debugging-port=0 --start-maximized --ash-disable-system-sounds --oobe-skip-postlogin --disable-logging-redirect --allow-failed-policy-fetch-for-test --disable-gaia-services --testing-channel=NamedTestingInterface:/tmp/.org.chromium.Chromium.MtKz38/.org.chromium.Chromium.KC3fR6 --vmodule=arc_kiosk*=2,*arc/*=1,automatic_reboot_manager=1,tablet_power_button_controller=1,*chromeos/login/*=1,auto_enrollment_controller=1,*plugin*=2,*zygote*=1,*/ui/ozone/*=1,*/ui/display/manager/chromeos/*=1,*night_light*=1,power_button_observer=2,webui_login_view=2,lock_state_controller=2,webui_screen_locker=2,screen_locker=2,*/chromeos/net/*=2,*/chromeos/login/*=2,chrome_browser_main_posix=2 --show-webui-login
...
2018-05-01T09:43:19.329546-07:00 ERR cryptohomed[1043]: Taking TPM ownership took 2643ms
2018-05-01T09:43:19.330211-07:00 INFO cryptohomed[1043]: TPM error 0x2020 (Key not found in persistent storage): LoadKeyByUuid: failed LoadKeyByUUID
2018-05-01T09:43:20.312347-07:00 INFO cryptohomed[1043]: Created new cryptohome key.
2018-05-01T09:43:21.635664-07:00 INFO cryptohomed[1043]: Configuring TPM, ownership taken: 1.
2018-05-01T09:43:22.318241-07:00 INFO cryptohomed[1043]: Lockbox created.
2018-05-01T09:43:22.342157-07:00 INFO cryptohomed[1043]: Attestation: Preparing for enrollment...
2018-05-01T09:43:29.343646-07:00 INFO cryptohomed[1043]: Attestation: Prepared successfully (7001ms).

+alemate to confirm my theory.

As to what kicked off this guest login, in /var/log/messages it happened after this:
2018-05-01T09:43:04.818348-07:00 NOTICE root[4273]: autotest starting iteration /usr/local/autotest/results/default/graphics_WebGLAquarium/sysinfo/iteration.1 on quawks_2.4GHz_4GB


So, it seems, autoupdate_NonBlockingOOBEUpdate was not the first test to run on the board after going through TPM clear, so whatever expectations it has re OOBE state, they might be ruined by those previous tests like graphics_WebGLAquarium.
Is this is the place that autoupdate_NonBlockingOOBEUpdate relies upon to call crossystem clear_tpm_owner_request=1 and reboot to ensure OOBE is not done yet:
http://cs/chromeos_public/src/third_party/autotest/files/client/common_lib/cros/tpm_utils.py?l=93 ?

If so, it only happens if the owner password is empty. Which may be not the case if the initialization is still ongoing. Or, if the initialization was kicked off by a user login, and Chrome hasn't explicitly told cryptohomed to claer the owner password. I'm not quite sure if Chrome says the owner password can be discarded for guest logins. So, if we only had a single guest login after tpm clear, the password may still be there, another clear doesn't happen, but OOBE marked as done already.

These are the two places I was able to find it happens in Chrome:
https://cs.chromium.org/chromium/src/chrome/browser/chromeos/login/session/user_session_manager.cc?l=1365
https://cs.chromium.org/chromium/src/chrome/browser/chromeos/settings/install_attributes.cc?l=272

And I indeed have my doubts it happens for guest logins.
No, I see autoupdate_NonBlockingOOBEUpdate calls ClearTPMOwnerRequest: http://cs/chromeos_public/src/third_party/autotest/files/server/site_tests/autoupdate_NonBlockingOOBEUpdate/autoupdate_NonBlockingOOBEUpdate.py?l=33 

Which is the right thing to do. It always calls clear_tpm_owner_request=1.

And looks like it was called:
05/01 09:47:33.728 DEBUG|          ssh_host:0301| Running (ssh) 'crossystem clear_tpm_owner_request=1' from '_call_run_once_with_retry|_call_run_once|run_once|ClearTPMOwnerRequest|run|run_very_slowly'

And, yes, on reboot the TPM is cleared. But since the TPM initialization was not finished on the last boot, the encrypted stateful key is still there, so encstateful is preserved over this TPM clear, and /home/chronos/.oobe_completed with it:

2018-05-01T09:47:50.083248-07:00 NOTICE mount-encrypted[1814]: [pid:192] TPM not Owned, ignoring Lockbox NVRAM area.
2018-05-01T09:47:50.083269-07:00 NOTICE mount-encrypted[1814]: [pid:192] Using NVRAM as system key; finalization needed.
2018-05-01T09:47:50.083288-07:00 NOTICE mount-encrypted[1814]: [pid:192] No usable system key found.
...
0.0.1/platform2/cryptohome/mount_encrypted.cc, 951): Finalization unfinished! Encryption key still on disk!
2018-05-01T09:47:50.083338-07:00 NOTICE mount-encrypted[1814]: [pid:192] Loopback attaching /mnt/stateful_partition/encrypted.block (named encstateful).
2018-05-01T09:47:50.083366-07:00 NOTICE mount-encrypted[1814]: [pid:192] Setting up dm-crypt /dev/loop0 as /dev/mapper/encstateful.
2018-05-01T09:47:50.083387-07:00 NOTICE mount-encrypted[1814]: [pid:192] Mounting /dev/mapper/encstateful onto /mnt/stateful_partition/encrypted.
...
2018-05-01T09:47:50.083491-07:00 NOTICE mount-encrypted[1814]: [pid:192] Bind mounting /mnt/stateful_partition/encrypted/chronos onto /home/chronos.


Owner: alemate@chromium.org
Hm, I also see this:
2018-05-01T09:47:33.603980-07:00 NOTICE ag[6876]: autotest server[stack::ClearTPMOwnerRequest|CleanupAndReboot|run] -> ssh_run(sudo rm -rf /home/.shadow/* /home/chronos/.oobe_completed /home/chronos/Local\ State /var/cache/app_pack /var/cache/shill/default.profile /var/lib/tpm /var/lib/whitelist/* )

So, yes, it happened on the DUT before reboot. Should have been deleted from encstateful. So, Back to the same square again. In general, stateful partition survived TPM clear, but specifically /home/chronos/.oobe_completed should have been deleted.

No ext4 errors. No idea why Chrome thinks it has done OOBE already on boot. Maybe .oobe_completed is not the only signal taken into consideration.

I don't see anything else. alemate@, could you please take a look at the logs to see if you find any hints.

Cc: sdantul...@chromium.org abod...@chromium.org
 Issue 875094  has been merged into this issue.
Components: OS>Systems>Security
Cc: r...@chromium.org
Owner: jdufault@chromium.org
Jacob, could you help finding owner for this?

Sign in to add a comment