Samsung Chromebook 3 devices are losing enrollment |
|||||||
Issue descriptionChromeOS version: 70.0.3538.76 ChromeOS device model: Samsung Chromebook 3 Case#: 17753176 Description: Devices are losing WiFi policies and spontaneously losing enrollment Steps to reproduce: Customer reports that some of their chromebooks are losing WiFi, and going back to the enterprise enrollment screen. They thought it was an issue with the latest Chrome OS update to version 70, but turning off auto updates did not help. Current Behavior / Reproduction: No specific steps, issue happens sporadically Expected Behavior: Devices stay connected and do not un-enroll Drive link to logs: https://drive.google.com/open?id=1LFh_Z-SLVZTrNx2buu7Rzf4XK6fapjdT
,
Dec 20
Customer provided a fresh debug log: https://drive.google.com/open?id=1GTAhNut9ZsvYkSd7hJ_An5MrFXkAmrtY, stating that issue happen again this morning (12/20/2018). From the log files, it looks like issue occur around 10:50 am. Force enrollment triggered at 10:52: 53:[1162:1162:1219/105215.263064:VERBOSE1:wizard_controller.cc(614)] Showing EULA screen. 54:[1162:1162:1219/105215.263186:VERBOSE1:wizard_controller.cc(1363)] SetCurrentScreenSmooth: eula 57:[1162:1162:1219/105224.021710:VERBOSE1:wizard_controller.cc(1535)] Wizard screen exit code: EULA_ACCEPTED 58:[1162:1162:1219/105224.022888:VERBOSE1:auto_enrollment_controller.cc(546)] Auto-enrollment required: flag in VPD. 59:[1162:1162:1219/105224.023034:VERBOSE1:auto_enrollment_controller.cc(555)] Proceeding with FRE 60:[1162:1162:1219/105224.023124:VERBOSE1:auto_enrollment_controller.cc(732)] New auto-enrollment state: 1 62:[1162:1162:1219/105224.023347:VERBOSE1:wizard_controller.cc(1275)] StartOOBEUpdate 63:[1162:1162:1219/105224.023417:VERBOSE1:wizard_controller.cc(1363)] SetCurrentScreenSmooth: update 77:[1162:1162:1219/105232.783284:VERBOSE1:wizard_controller.cc(1693)] Hiding error screen. 78:[1162:1162:1219/105232.783322:VERBOSE1:wizard_controller.cc(1363)] SetCurrentScreenSmooth: update 83:[1162:1162:1219/105237.936291:VERBOSE1:wizard_controller.cc(1535)] Wizard screen exit code: UPDATE_ERROR_UPDATING 84:[1162:1162:1219/105237.936440:VERBOSE1:wizard_controller.cc(746)] Showing Auto-enrollment check screen. 85:[1162:1162:1219/105237.936531:VERBOSE1:wizard_controller.cc(1363)] SetCurrentScreenSmooth: auto-enrollment-check 86:[1162:1162:1219/105250.488051:VERBOSE1:auto_enrollment_controller.cc(685)] Starting auto-enrollment client for FRE. 87:[1162:1162:1219/105250.489027:VERBOSE1:auto_enrollment_controller.cc(732)] New auto-enrollment state: 1 88:[1162:1162:1219/105250.564658:VERBOSE1:auto_enrollment_controller.cc(732)] New auto-enrollment state: 1 89:[1162:1162:1219/105250.649777:VERBOSE1:auto_enrollment_controller.cc(732)] New auto-enrollment state: 1 91:[1162:1162:1219/105251.165345:VERBOSE1:auto_enrollment_controller.cc(732)] New auto-enrollment state: 4 92:[1162:1162:1219/105251.165441:VERBOSE1:wizard_controller.cc(1535)] Wizard screen exit code: ENTERPRISE_AUTO_ENROLLMENT_CHECK_COMPLETED 93:[1162:1162:1219/105251.165534:VERBOSE1:wizard_controller.cc(2073)] Showing enrollment screen. Forcing interactive enrollment: 0. 94:[1162:1162:1219/105251.165609:VERBOSE1:wizard_controller.cc(1363)] SetCurrentScreenSmooth: oauth-enrollment 155:[1162:1162:1219/105341.100173:VERBOSE1:wizard_controller.cc(1535)] Wizard screen exit code: ENTERPRISE_ENROLLMENT_COMPLETED 156:[1162:1162:1219/105341.100935:VERBOSE1:wizard_controller.cc(579)] Showing login screen. Also, around this time there is a failed attempt to update from 70.0.3538.76 to 70.0.3538.110 [1219/105237:ERROR:update_attempter.cc(1367)] Update failed. [1219/105237:INFO:payload_state.cc(257)] Updating payload state for error code: 49 (ErrorCode::kNonCriticalUpdateInOOBE)
,
Jan 11
,
Jan 11
Andrey/Maksim - is this disk corruption causing us to lose the stateful partition? If we think there's a bug here, this should be a higher priority.
,
Jan 17
(5 days ago)
After taking a look at logs from comment 3, it seems that the device has already been in the bad state for a while. Each boot, starting from the very first one present in the logs, has messages about TPM not being ready: 2018-12-16T09:23:15.216120+00:00 INFO cryptohomed[1093]: TPM error 0x2020 (Key not found in persistent storage): LoadKeyByUuid: failed LoadKeyByUUID 2018-12-16T09:23:15.216184+00:00 WARNING cryptohomed[1093]: Canceled creating cryptohome key - TPM is not ready. 2018-12-16T09:23:15.216328+00:00 WARNING cryptohomed[1093]: Could not load the device policy file. 2018-12-16T09:23:15.216617+00:00 ERR cryptohomed[1093]: Creating new salt at /home/.shadow/salt (0, 0) 2018-12-16T09:23:15.231710+00:00 WARNING chapsd[1036]: SRK does not exist - this is normal when the TPM is not yet owned. 2018-12-16T09:23:15.235271+00:00 WARNING chapsd[1036]: SRK does not exist - this is normal when the TPM is not yet owned. 2018-12-16T09:23:15.262651+00:00 ERR cryptohomed[1093]: stat() of /mnt/stateful_partition/unencrypted/preserve/attestation.epb failed.: No such file or directory 2018-12-16T09:23:15.262731+00:00 ERR cryptohomed[1093]: Failed to read db.: No such file or directory 2018-12-16T09:23:15.262758+00:00 INFO cryptohomed[1093]: Attestation: Attestation data not found. Similar with logs from comment 0. It's likely that the stateful partition had indeed been corrupted before these logs were collected. I couldn't find any trace in the logs about the root cause. Who could be the right person/team for investigating disk corruption and related issues?
,
Jan 17
(5 days ago)
Looks like here's where the stateful was cleared: 2018/12/16 09:22:57 UTC Self-repair incoherent stateful partition: var and home. History: /home/chronos /var /home 2018/12/16 09:22:57 UTC (preserve log): /sbin/clobber-state fast keepimg 2018/12/16 09:23:00 UTC (restore log): /sbin/clobber-state And I don't see any kernel crashes or other issues in the event.log +gwendal, does it look any similar to issue 878595 or other recent disk-related issues? I'm not sure which of those issues affected which devices. And if a disk was just full (inodes or otherwise), we'd have some other traces in the logs, right?
,
Jan 18
(5 days ago)
Other Enterprise customer has reported the similar case. ChromeOS version: 71.0.3578.94 ChromeOS device model: banon Acer Chromebook 15 (CB3-532) Case#: 18055655 Note: Chromebooks are losing enrollment after rebooting them and they are not auto re-enrolling. Auto-enrollment is not working whether devices are wiped manually or lose enrollment due to the issue. can still be manually re-enrolled. The customer shared only a device log, but they’ve explained us about there are more affected devices. Drive link to logs: https://drive.google.com/open?id=1UvR4FzT5K3Pxv8JlEbh6ng7VstbHkwEz 2018-06-20T12:25:54.959094+00:00 INFO cryptohomed[1220]: TPM error 0x2020 [Reason: info:TPM error codes] (Key not found in persistent storage): LoadKeyByUuid: failed LoadKeyByUUID 2018-06-20T12:25:54.959130+00:00 WARNING cryptohomed[1220]: Canceled creating cryptohome key - TPM is not ready. 2018-06-20T12:25:54.959245+00:00 WARNING cryptohomed[1220]: Could not load the device policy file. 2018-06-20T12:25:54.964480+00:00 WARNING chapsd[918]: SRK does not exist - this is normal when the TPM is not yet owned. 2018-06-20T12:25:54.965314+00:00 WARNING chapsd[918]: SRK does not exist - this is normal when the TPM is not yet owned. 2018-06-20T12:25:55.060328+00:00 INFO cryptohomed[1220]: TPM error 0x2020 [Reason: info:TPM error codes] (Key not found in persistent storage): Unseal: Failed to load SRK. 2018-06-20T12:25:55.060765+00:00 ERR cryptohomed[1220]: Cannot unseal aes key. 2018-06-20T12:25:55.060798+00:00 ERR cryptohomed[1220]: Attestation: Could not unseal decryption key. 2018-06-20T12:25:55.060820+00:00 WARNING cryptohomed[1220]: Attestation: Attestation data invalid. This is normal if the TPM has been cleared. 2018-06-20T12:25:55.147232+00:00 INFO cryptohomed[1220]: Cannot read boot lockbox files. 2018-06-20T12:25:55.147328+00:00 INFO cryptohomed[1220]: The TPM chip does not support GetAlertsData. Stop UploadAlertsData task.
,
Jan 18
(5 days ago)
,
Jan 21
(2 days ago)
,
Today
(19 hours ago)
Removing the Enterprise component as this is just stateful-partition corruption which is not enterprise-specific. Keeping on Hotlist-Enterprise so it can be tracked. Auto-re-enrollment is not currently enabled I believe so it's WAI that you're not seeing re-enrollment - FRE should force you to manually enroll though. I'm not clear exactly how we want to deal with these kinds of corrupted-disk issues - apronin/gwendal, should one of you own this? Also, those TPM errors are weird in the logs, but they are also from June so not sure how relevant they are. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by vkasatkin@google.com
, Dec 17