ChromeOS issue: Devices losing enrollment after updating to OS 63 |
||||||||||||||||
Issue descriptionChromeOS version: 63.0.3239.140 ChromeOS device model: HP Chromebook 11 G5 EE Case#: 14753835 Description: Devices losing enrollment after updating to OS 63 -Customer reported that devices started to lose enrollment after being updated to version 63 -Customer hasn't identify the action that force the device to lose enrollment (rebooting the device, closing the lid) -Customer is just notified when a device lost enrollment and issue can't be reproduce on demand -Devices that are re-enrolled don't lose the enrollment again -Not all devices are being affected -Amount of the affected devices so far: 50-70 Steps to reproduce: N/A autoupdate Current Behavior / Reproduction: device loses enrollment Expected Behavior: don't lose enrollment Drive link to logs: https://drive.google.com/open?id=1HswXRNzGyYN-kNqSUmpTKPkxyh6igBp1 https://drive.google.com/open?id=1Jz7U7RTRBXpt_NmuDZjsLUl6P29C7YoF In the log I see the same errors as crbug.com/685370 messages:2018-02-06T17:48:30.741429+00:00 INFO session_manager[1252]: [INFO:policy_key.cc(51)] No policy key on disk at /var/lib/whitelist/owner.key and chrome/chrome_20180207-082803:[1323:1323:0207/082804.542895:ERROR:device_cloud_policy_store_chromeos.cc(233)] Device policy read on enrolled device yields no DM token! Status: 1.
,
Feb 9 2018
Customer Info and date&time of the issue: https://drive.google.com/open?id=1md9OCfRdcO8sT3xXMRV0jShOEd_LWNHx4ScjsIDb_Qk
,
Feb 9 2018
I've another customer with the same issue, but different HW: KEFKA: Dell Chromebook 11 Model 3180 case: 14842246 customer info: https://drive.google.com/open?id=1swuwNB_DakBVI8ybPiMyHPGlf5ivhp-Kf4DN0ayaSMw debug logs: https://drive.google.com/open?id=1o1BBqiLaY0Hvo4BCZ5dAFPOSSSpyWnZJ https://drive.google.com/open?id=1_qGCe_pTSrbrrKdM3ogJwItKBgdnX1jW
,
Feb 16 2018
Hi! Is there any update on this? Thanks!
,
Feb 19 2018
I have another customer with the same HW as #1: case: 14983742 HW: HP Chromebook 11 G5 EE relm version: 63.0.3239.140 Platform Version 10032.86.0 (Official Build) stable-channel relm Firmware Version Google_Relm.7287.370.0 debug logs: https://drive.google.com/open?id=1lISxrCEoxdInjxcEFRjbpT3Ki-WgDOp_ https://drive.google.com/open?id=1M9EuL4z0urcyqUo1vK7xCHY5LQ86Gqwg Is there something else we can gather to help the investigation?
,
Feb 20 2018
Another customer reports issue on Terra device and v63. Raising priority
,
Feb 20 2018
I have this same issue on many devices, all ASUS C202SA, 63.0.3239.140 Current Case: 14993413
,
Feb 21 2018
Is there any update to this ticket?
,
Feb 22 2018
Can you please update to M64 and see if the issue still exists? M63 is now a dead branch and the stable channel is M64. The issue was likely fixed there.
,
Feb 22 2018
+ Kevin M64 TPM
,
Feb 22 2018
I'm not aware of any issues in M64. Hasn't been raise as an issue / blocker / etc.
,
Feb 22 2018
@gkihumba ..... I have no way of knowing if the issue still exists after updating to M64.... When this occurs to the devices, we go through the re-enrollment steps and the device keeps its enrollment....but I do not know for how long, because it is random as to what devices this occurs on.... Some devices that are on M63 or lower do not lose their enrollment, while some do.
,
Feb 23 2018
Could you give exact repro steps? Without that it's difficult to triage.
,
Feb 23 2018
,
Feb 23 2018
Matt, is this a known issue?
,
Feb 23 2018
Customers didn't find any particular steps to trigger this, looks random in nature and some even wanted to know if it was done by users themselves. Only thing in common is either closing the lid or powering off device.
,
Feb 28 2018
any update to this? Please!?
,
Mar 1 2018
Lets see if I can jumpstart some ideas on possible resolutions.... Lets say, hypothetically, we confirm the issue happens when the device updates from 63 to 64, and the user closes the lid (and the user policy is set to logout the user/restart the device on the lid closing) during the upgrade process before it has a chance to finish. (again, just being hypothetical here) Would it be safe to assume that the problem could be avoided by turning off the policy which causes the device to logout/restart on lid closing? Also, if we have a large majority of our devices (715 of them) still on version 63, is there any way to force all of them to update to 64? I am pretty sure there is a spot somewhere in the Admin console that does this, and I am also pretty sure we have it set to force an update if the user doesnt do it, after 1 day, and yet the vast majority of our devices remain on version 63.... I don't think that is off topic, since it relates to the issue at hand. Thoughts?
,
Mar 6 2018
Apologies for the delay and thank you for the relevant details of what you've seen happening with your devices. re#18: if the issue does occur when the device reboots after pulling down the update, then I'm not sure changing the functionality on lid closing would help other than preventing the update from actually taking affect (i.e. would not take the update unless a user signs out, reboots, or hits the update arrow). There isn't a policy that forces the update, but so long as you aren't blocking updates via policy then all devices will take the update shortly after it is released and then stage the update awaiting next reboot. @atwilson, can you help triage to have someone investigate the logs in c#5? Only thought that comes to mind is that this is related to the FS corruption we saw previously, but given the sporadic reproduction, not sure if that could be related.
,
Mar 6 2018
Andrey, looks like FS corruption again - WDYT?
,
Mar 6 2018
+igor in case he sees anything else suspicious in the logs
,
Mar 7 2018
Looking at the 1st log: Did the problems start on Feb 06? The issue is there's no tpm device created by the driver. So, all further communications fail: 2018-02-06T18:27:30.076234+00:00 NOTICE tpm-probe[526]: tpm is non-functional or tpm driver is missing 2018-02-06T18:27:30.359947+00:00 ERR tcsd[714]: TSS: Could not find a device to open! The tpm driver's probe seems to fail at 2018-02-06T18:27:29.816063+00:00 ERR kernel: [ 1.067868] tpm_tis tpm_tis: Unable to read header Apparently, the tpm is unresponsive. It comes back ok after the next reboot. But by that time mount-encrypted should have received an error attempting to communicate with the TPM and reset encrypted stateful, causing the loss of enrollment and local data. I also see one ext4 repair: 2018-02-06T11:12:34.434506-08:00 INFO kernel: [ 1.518283] EXT4-fs (mmcblk0p1): recovery complete 2018-02-06T11:12:34.434534-08:00 INFO kernel: [ 1.827077] EXT4-fs (dm-1): recovery complete which happened after this on the previous boot: 2018-02-06T18:27:31.537079+00:00 INFO kernel: [ 5.567383] EXT4-fs (dm-1): resizing filesystem from 440498 to 804909 blocks 2018-02-06T18:27:31.537283+00:00 INFO kernel: [ 5.568072] EXT4-fs (dm-1): resized filesystem to 804909 It's unclear if it led to filesystem corruption. Possible, but that issue with non-communicating TPM chip seems like a more serious issue and more likely root cause. Unfortunately, no indication of why that happened. So, no further alley for investigation. mount-encrypted overreaction in case of possibly transient tpm errors is tracked separately in issue 702724.
,
Mar 7 2018
Same in the 2nd log: 2018-02-06T17:48:27.108639+00:00 ERR kernel: [ 1.070472] tpm_tis tpm_tis: Unable to read header 2018-02-06T17:48:27.624183+00:00 ERR tcsd[704]: TSS: Could not find a device to open! then 2018-02-06T17:48:28.795488+00:00 INFO kernel: [ 4.826147] EXT4-fs (dm-1): resizing filesystem from 209661 to 804909 blocks 2018-02-06T17:48:28.808072+00:00 INFO kernel: [ 4.839177] EXT4-fs (dm-1): resized filesystem to 804909 then 2018-02-06T09:49:51.445561-08:00 INFO kernel: [ 1.489621] EXT4-fs (mmcblk0p1): recovery complete 2018-02-06T09:49:51.445709-08:00 INFO kernel: [ 1.825991] EXT4-fs (dm-1): recovery complete
,
Mar 7 2018
In both logs I also see warnings and errors on every boot seemingly related to enumeraton of devices: 2018-02-06T17:48:27.106837+00:00 WARNING kernel: [ 0.170074] ------------[ cut here ]------------ 2018-02-06T17:48:27.106838+00:00 WARNING kernel: [ 0.170082] WARNING: CPU: 0 PID: 1 at /mnt/host/source/src/third_party/kernel/v3.18/drivers/base/power/common.c:156 dev_pm_domain_set+0x58/0x5b() 2018-02-06T17:48:27.106840+00:00 WARNING kernel: [ 0.170084] PM domains can only be changed for unbound devices 2018-02-06T17:48:27.106841+00:00 WARNING kernel: [ 0.170086] Modules linked in: 2018-02-06T17:48:27.106849+00:00 WARNING kernel: [ 0.170092] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.18.0-16288-g64d05cf80004 #1 2018-02-06T17:48:27.106851+00:00 WARNING kernel: [ 0.170095] Hardware name: GOOGLE Relm, BIOS Google_Relm.7287.370.0 08/06/2017 2018-02-06T17:48:27.106853+00:00 WARNING kernel: [ 0.170098] 0000000000000000 000000006ec2bee4 ffff88017b227b78 ffffffff9daa3024 2018-02-06T17:48:27.106855+00:00 WARNING kernel: [ 0.170102] 0000000000000000 ffff88017b227bd0 ffff88017b227bb8 ffffffff9d4647a7 2018-02-06T17:48:27.106856+00:00 WARNING kernel: [ 0.170107] 000000000000000f ffffffff9d80cb06 ffff88017a933c10 ffffffff9e095140 2018-02-06T17:48:27.106858+00:00 WARNING kernel: [ 0.170111] Call Trace: 2018-02-06T17:48:27.106864+00:00 WARNING kernel: [ 0.170120] [<ffffffff9daa3024>] dump_stack+0x4e/0x71 2018-02-06T17:48:27.106867+00:00 WARNING kernel: [ 0.170127] [<ffffffff9d4647a7>] warn_slowpath_common+0x81/0x9b 2018-02-06T17:48:27.106869+00:00 WARNING kernel: [ 0.170131] [<ffffffff9d80cb06>] ? dev_pm_domain_set+0x58/0x5b 2018-02-06T17:48:27.106870+00:00 WARNING kernel: [ 0.170136] [<ffffffff9d464816>] warn_slowpath_fmt+0x55/0x6b 2018-02-06T17:48:27.106872+00:00 WARNING kernel: [ 0.170142] [<ffffffff9d6f2aea>] ? acpi_get_data_full+0x7d/0x90 2018-02-06T17:48:27.106873+00:00 WARNING kernel: [ 0.170146] [<ffffffff9d80cb06>] dev_pm_domain_set+0x58/0x5b 2018-02-06T17:48:27.106875+00:00 WARNING kernel: [ 0.170152] [<ffffffff9d6db2bc>] acpi_lpss_platform_notify+0x139/0x15b 2018-02-06T17:48:27.106882+00:00 WARNING kernel: [ 0.170158] [<ffffffff9d47eccc>] notifier_call_chain+0x40/0x64 2018-02-06T17:48:27.106884+00:00 WARNING kernel: [ 0.170163] [<ffffffff9d47ef8b>] __blocking_notifier_call_chain+0x4d/0x66 2018-02-06T17:48:27.106886+00:00 WARNING kernel: [ 0.170167] [<ffffffff9d47efb8>] blocking_notifier_call_chain+0x14/0x16 2018-02-06T17:48:27.106888+00:00 WARNING kernel: [ 0.170172] [<ffffffff9d806077>] driver_bound+0x93/0x97 2018-02-06T17:48:27.106889+00:00 WARNING kernel: [ 0.170176] [<ffffffff9d8061e5>] driver_probe_device+0x13b/0x2b1 2018-02-06T17:48:27.106891+00:00 WARNING kernel: [ 0.170180] [<ffffffff9d806421>] __driver_attach+0x5e/0x81 2018-02-06T17:48:27.106898+00:00 WARNING kernel: [ 0.170184] [<ffffffff9d8063c3>] ? __device_attach_driver+0x68/0x68 2018-02-06T17:48:27.106907+00:00 WARNING kernel: [ 0.170188] [<ffffffff9d805284>] bus_for_each_dev+0x8c/0xaf 2018-02-06T17:48:27.106909+00:00 WARNING kernel: [ 0.170192] [<ffffffff9d805c41>] driver_attach+0x1e/0x20 2018-02-06T17:48:27.106910+00:00 WARNING kernel: [ 0.170196] [<ffffffff9d8058c9>] bus_add_driver+0xeb/0x1e3 2018-02-06T17:48:27.106912+00:00 WARNING kernel: [ 0.170200] [<ffffffff9d806c04>] driver_register+0x8f/0xcc 2018-02-06T17:48:27.106913+00:00 WARNING kernel: [ 0.170204] [<ffffffff9e155297>] ? spidev_init+0xab/0xab 2018-02-06T17:48:27.106915+00:00 WARNING kernel: [ 0.170209] [<ffffffff9d807a1e>] __platform_driver_register+0x4a/0x4c 2018-02-06T17:48:27.106923+00:00 WARNING kernel: [ 0.170213] [<ffffffff9e1552a9>] pxa2xx_spi_init+0x12/0x14 2018-02-06T17:48:27.106925+00:00 WARNING kernel: [ 0.170218] [<ffffffff9d4003b5>] do_one_initcall+0x188/0x19d 2018-02-06T17:48:27.106927+00:00 WARNING kernel: [ 0.170224] [<ffffffff9e11c09c>] kernel_init_freeable+0x194/0x234 2018-02-06T17:48:27.106935+00:00 WARNING kernel: [ 0.170230] [<ffffffff9da9f920>] ? rest_init+0x84/0x84 2018-02-06T17:48:27.106937+00:00 WARNING kernel: [ 0.170234] [<ffffffff9da9f92e>] kernel_init+0xe/0xda 2018-02-06T17:48:27.107225+00:00 WARNING kernel: [ 0.170238] [<ffffffff9daa837c>] ret_from_fork+0x7c/0xb0 2018-02-06T17:48:27.107229+00:00 WARNING kernel: [ 0.170243] [<ffffffff9da9f920>] ? rest_init+0x84/0x84 2018-02-06T17:48:27.107238+00:00 WARNING kernel: [ 0.170256] ---[ end trace 64a422b868e883d9 ]--- and 2018-02-06T17:48:27.107262+00:00 WARNING kernel: [ 0.177304] ------------[ cut here ]------------ 2018-02-06T17:48:27.107264+00:00 WARNING kernel: [ 0.177311] WARNING: CPU: 0 PID: 1 at /mnt/host/source/src/third_party/kernel/v3.18/drivers/base/power/common.c:156 dev_pm_domain_set+0x58/0x5b() 2018-02-06T17:48:27.107272+00:00 WARNING kernel: [ 0.177313] PM domains can only be changed for unbound devices 2018-02-06T17:48:27.107274+00:00 WARNING kernel: [ 0.177314] Modules linked in: 2018-02-06T17:48:27.107276+00:00 WARNING kernel: [ 0.177319] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 3.18.0-16288-g64d05cf80004 #1 2018-02-06T17:48:27.107278+00:00 WARNING kernel: [ 0.177321] Hardware name: GOOGLE Relm, BIOS Google_Relm.7287.370.0 08/06/2017 2018-02-06T17:48:27.107279+00:00 WARNING kernel: [ 0.177324] 0000000000000000 000000006ec2bee4 ffff88017b227b78 ffffffff9daa3024 2018-02-06T17:48:27.107280+00:00 WARNING kernel: [ 0.177328] 0000000000000000 ffff88017b227bd0 ffff88017b227bb8 ffffffff9d4647a7 2018-02-06T17:48:27.107282+00:00 WARNING kernel: [ 0.177333] 000000000000000f ffffffff9d80cb06 ffff88017a933010 ffffffff9e095140 2018-02-06T17:48:27.107289+00:00 WARNING kernel: [ 0.177337] Call Trace: 2018-02-06T17:48:27.107292+00:00 WARNING kernel: [ 0.177342] [<ffffffff9daa3024>] dump_stack+0x4e/0x71 2018-02-06T17:48:27.107294+00:00 WARNING kernel: [ 0.177348] [<ffffffff9d4647a7>] warn_slowpath_common+0x81/0x9b 2018-02-06T17:48:27.107295+00:00 WARNING kernel: [ 0.177352] [<ffffffff9d80cb06>] ? dev_pm_domain_set+0x58/0x5b 2018-02-06T17:48:27.107297+00:00 WARNING kernel: [ 0.177356] [<ffffffff9d464816>] warn_slowpath_fmt+0x55/0x6b 2018-02-06T17:48:27.107298+00:00 WARNING kernel: [ 0.177361] [<ffffffff9d6f2aea>] ? acpi_get_data_full+0x7d/0x90 2018-02-06T17:48:27.107300+00:00 WARNING kernel: [ 0.177366] [<ffffffff9d80cb06>] dev_pm_domain_set+0x58/0x5b 2018-02-06T17:48:27.107308+00:00 WARNING kernel: [ 0.177371] [<ffffffff9d6db2bc>] acpi_lpss_platform_notify+0x139/0x15b 2018-02-06T17:48:27.107311+00:00 WARNING kernel: [ 0.177376] [<ffffffff9d47eccc>] notifier_call_chain+0x40/0x64 2018-02-06T17:48:27.107313+00:00 WARNING kernel: [ 0.177380] [<ffffffff9d47ef8b>] __blocking_notifier_call_chain+0x4d/0x66 2018-02-06T17:48:27.107315+00:00 WARNING kernel: [ 0.177385] [<ffffffff9d47efb8>] blocking_notifier_call_chain+0x14/0x16 2018-02-06T17:48:27.107316+00:00 WARNING kernel: [ 0.177389] [<ffffffff9d806077>] driver_bound+0x93/0x97 2018-02-06T17:48:27.107317+00:00 WARNING kernel: [ 0.177393] [<ffffffff9d8061e5>] driver_probe_device+0x13b/0x2b1 2018-02-06T17:48:27.107326+00:00 WARNING kernel: [ 0.177397] [<ffffffff9d806421>] __driver_attach+0x5e/0x81 2018-02-06T17:48:27.107328+00:00 WARNING kernel: [ 0.177401] [<ffffffff9d8063c3>] ? __device_attach_driver+0x68/0x68 2018-02-06T17:48:27.107330+00:00 WARNING kernel: [ 0.177405] [<ffffffff9d805284>] bus_for_each_dev+0x8c/0xaf 2018-02-06T17:48:27.107332+00:00 WARNING kernel: [ 0.177409] [<ffffffff9d805c41>] driver_attach+0x1e/0x20 2018-02-06T17:48:27.107333+00:00 WARNING kernel: [ 0.177413] [<ffffffff9d8058c9>] bus_add_driver+0xeb/0x1e3 2018-02-06T17:48:27.107334+00:00 WARNING kernel: [ 0.177417] [<ffffffff9d806c04>] driver_register+0x8f/0xcc 2018-02-06T17:48:27.107336+00:00 WARNING kernel: [ 0.177421] [<ffffffff9e1586d0>] ? i2c_i801_init+0xc9/0xc9 2018-02-06T17:48:27.107343+00:00 WARNING kernel: [ 0.177426] [<ffffffff9d807a1e>] __platform_driver_register+0x4a/0x4c 2018-02-06T17:48:27.107346+00:00 WARNING kernel: [ 0.177430] [<ffffffff9e1586e2>] dw_i2c_init_driver+0x12/0x14 2018-02-06T17:48:27.107347+00:00 WARNING kernel: [ 0.177434] [<ffffffff9d4003b5>] do_one_initcall+0x188/0x19d 2018-02-06T17:48:27.107349+00:00 WARNING kernel: [ 0.177439] [<ffffffff9e11c09c>] kernel_init_freeable+0x194/0x234 2018-02-06T17:48:27.107350+00:00 WARNING kernel: [ 0.177444] [<ffffffff9da9f920>] ? rest_init+0x84/0x84 2018-02-06T17:48:27.107352+00:00 WARNING kernel: [ 0.177448] [<ffffffff9da9f92e>] kernel_init+0xe/0xda 2018-02-06T17:48:27.107359+00:00 WARNING kernel: [ 0.177453] [<ffffffff9daa837c>] ret_from_fork+0x7c/0xb0 2018-02-06T17:48:27.107362+00:00 WARNING kernel: [ 0.177457] [<ffffffff9da9f920>] ? rest_init+0x84/0x84 2018-02-06T17:48:27.107363+00:00 WARNING kernel: [ 0.177460] ---[ end trace 64a422b868e883da ]--- and 2018-02-06T17:48:27.107247+00:00 ERR kernel: [ 0.175902] ERROR: Unable to locate IOAPIC for GSI 184 2018-02-06T17:48:27.107255+00:00 WARNING kernel: [ 0.175905] Failed to set pin attr for GSI184 2018-02-06T17:48:27.107258+00:00 ERR kernel: [ 0.176532] i2c i2c-0: failed to add I2C device MLFS0000:00 from ACPI 2018-02-06T17:48:27.107259+00:00 ERR kernel: [ 0.176579] ERROR: Unable to locate IOAPIC for GSI 184 2018-02-06T17:48:27.107261+00:00 WARNING kernel: [ 0.176582] Failed to set pin attr for GSI184 and 2018-02-06T17:48:27.107828+00:00 ERR kernel: [ 0.225139] ERROR: Unable to locate IOAPIC for GSI 182 2018-02-06T17:48:27.107830+00:00 WARNING kernel: [ 0.225142] Failed to set pin attr for GSI182 Those are seen both on boots when the tpm device is successfully created and when we have no /dev/tpm0, so unclear if can affect it. But overall sounds not very healthy.
,
Mar 7 2018
What was the Chrome OS version before the AU to 63? Do we still have un-updated devices? Is it possible to gather logs from them to compare?
,
Mar 8 2018
I looked at other logs from #3 and #5. In all cases I see "tpm_tis tpm_tis: Unable to read header" errors from the tpm driver, leading to no /dev/tpm0 device and all tpm-communicating utilities/daemons getting only errors from it. And that'd typically lead to losing enrollment. Also, re #24: I see the same warnings on much earlier versions, so they didn't start after AU. So, probably unrelated.
,
Mar 8 2018
Yeah, and if those TPM errors are persistent, it makes sense to wipe everything and re-enroll, re-own the TPM, etc. But if it's some transient thing, it's a shame to do the wipe as opposed to (say) rebooting or blocking at an error screen or something. Not clear to me that "can't reach the TPM" is something Chrome OS should try to silently recover from, rather than throwing up an error and letting the user intervene (power cycle, powerwash, etc).
,
Mar 8 2018
Re #27: Absolutely correct. I'm not saying that's how it should behave, only that this is how mount-encrypted behaves now. Those potentially transient errors shouldn't automatically lead to wiping stateful. The bug for fixing this mount-encrypted behavior is issue 702724. In many cases, we can't silently recover from those errors. Some retries are done at the driver level already to recover from i2c/spi level issues. There are still some semi-exotic situations when a tpm can return a recoverable error, which can be silently dealt with at Tlcl library level or mount-encrypted level. Those were not reported in practice, but possible. Still, in most cases, we will have to show an error and make the user reboot, hoping that the error will go away. A reboot loop in case of a persistent error is better than suddenly losing stateful on reboot or falling back to tpm-less mechanisms.
,
Mar 9 2018
OK, so not sure what next steps are here. Clearly something weird is happening with TPMs, but not sure who would be able to look at that. We have a clear AI to not clear stateful on TPM errors, but not sure when we'd move forward on that either (and requires UX review, etc). Andrey, are you comfortable keeping this on your plate, or is there someone else who should also look at this?
,
Mar 9 2018
Yes, let me take care of it. I deal with a similar issue in b/73351086. And fixing the tpm errors processing in mount-encrypted and tlcl library in general is what I wanted to do for a long time already. The next steps are along two parallel trucks, and some are beyond the scope of this bug, though help to address its consequences: 1) Reproduce and find the reason for tpm communication issues. I started looking, but no repro yet. Also, I may backport to 3.18 some of the driver patches that improve error processing and log messages in case of errors. As of now, some communication timeouts (possibly what we actually get here) can lead to confusing messages and passing garbage data. 2) Fix how mount-encrypted/tlcl deal with these errors: (a) better propagate the errors to the level that can make a decision on retry, (b) silently retry a few times in case of comm issues, (c) agree on the UX in case we still get an error, (d) implement.
,
Mar 16 2018
We are seeing the same issue on version 64 now as well on HP Chromebook 11 G5 EE... This is starting to be a serious problem.
,
Mar 16 2018
Next in my list after issue 702724. Fixing which should also help here - when hitting this issue, users may see an error screen and have to reboot, but not lose data/enrollment.
,
Mar 23 2018
Having this issue on Samsung Chromebooks in our fleet. Hope a fix is coming soon - particularly with State Testing coming up in April.
,
Mar 29 2018
Have the same on celes devices 2018-03-26T12:25:31.054929+00:00 ERR kernel: [ 0.146981] ERROR: Unable to locate IOAPIC for GSI 183 I see recent commits in b/73351086 but not on any build yet.
,
Apr 2 2018
We have another customer having this issue on Relm, running 64.0.3282.190 Drive link to debug logs: https://drive.google.com/open?id=17CeBO3uMuWWi6t53W-PYUT-PwVCTcFNj
,
Apr 2 2018
Thanks! For all 3 cases in comment #35 I also see 2018-03-26T12:43:44.398082+00:00 ERR kernel: [ 1.051564] tpm_tis tpm_tis: Unable to read header followed by 2018-03-26T12:43:44.500586+00:00 NOTICE tpm-probe[519]: tpm is non-functional or tpm driver is missing As a result, the key for mounting encrypted stateful cannot be read from the TPM, and that partition is wiped.
,
Apr 2 2018
I'm not sure if the IOAPIC stuff from comment #34 is directly related. We have these messages on every boot for all 3 devices: 2018-03-26T12:43:44.397508+00:00 ERR kernel: [ 0.171995] ERROR: Unable to locate IOAPIC for GSI 184 2018-03-26T12:43:44.397510+00:00 WARNING kernel: [ 0.171999] Failed to set pin attr for GSI184 ... 2018-03-23T18:08:46.284177+00:00 WARNING kernel: [ 0.200283] Expanded resource reserved due to conflict with PCI Bus 0000:00 2018-03-23T18:08:46.284178+00:00 DEBUG kernel: [ 0.200287] e820: reserve RAM buffer [mem 0x7cc13000-0x7fffffff] ... 2018-03-23T18:08:46.284477+00:00 ERR kernel: [ 0.219436] ERROR: Unable to locate IOAPIC for GSI 182 2018-03-23T18:08:46.284479+00:00 WARNING kernel: [ 0.219439] Failed to set pin attr for GSI182 But only on the 1st boot the TPM is unreachable.
,
Apr 2 2018
More re #34: > I see recent commits in b/73351086 but not on any build yet. The fixes for b/73351086 won't help this issue since they affect only devices with different TPM models.
,
Apr 2 2018
RE #33: > Having this issue on Samsung Chromebooks in our fleet. Do you use different Samsung Chromebook models in your fleet? Is the issue limited to one particular model?
,
Apr 3 2018
I see the same lines about IOAPIC and Expanded resource reserved due to conflict with PCI Bus on M54. So, that's not something, which was added with M63/M64. Another observation: One of the byproducts of tpm not being present is that chromeos reports its dictionary attack counter as 100 (special value meaning "no tpm"). Looking at the last month data only, I see that although the occurrence of counter=100 for relm, kefka and similar boards at boot remained at the level of thousandths of a percent, it's frequency has increased ~2x from M61/62 to M63/64, so it is possible that there was some change between 62 and 63. Note that this is the analysis of the recent situation only, so this 2x is hardly a statistical fact at this point. And it's hard to compare to M62 and earlier, since as we go down with revision numbers, the number of data point quickly decreases.
,
Apr 3 2018
General question: in many of the reports above, seeing the lost enrollment cases is tied to migration to 63/64. What was the previous Chrome OS revision running on those boards, for which lost enrollment was not seen with this frequency? M61/62 or earlier?
,
Apr 3 2018
,
Apr 3 2018
In one of reports, it started in 63* but was fine in 62*, but no more specific versions.
,
Apr 10 2018
I haven't been able to reproduce the issue by running a stress boot test so far, but it's possible that 1500 reboots is not sufficient yet - the frequency may be very low (supported by UMA figures that show much lower frequency for tpm issues in general, as commented in #40). One of the changes that landed in coreboot between M62 (9901.79.0, with firmware 7287.341.0) and M63 (10032.86.0, with firmware 7287.370.0 - that's what we see on devices with issues) is https://crrev.com/i/342463 (plus https://crrev.com/i/378309 and https://crrev.com/i/412088) for b/36790283. It is possible it's causing these issues - similar to b/68224689, but at boot time. The outcome at the tpm driver level is similar. Then the tpm driver changes being made for b/68224689 should also help here.
,
Apr 11 2018
I've another customer with this issue and different HW: case: 15132165 model: Asus Chromebook C202 version: 63.0.3239.140 / 10032.86.0 customer info & serial#: https://drive.google.com/open?id=1j-nSYUYxGvHNesiUM4D1pstSs-qscdkK0ZW7-DlNatQ debug logs: https://drive.google.com/open?id=1ZenXe2KZVn9qJB4AdJgJwXgVE2uDQ6h8 but in the log I have: messages.5:2018-02-20T13:04:48.022483-08:00 INFO session_manager[1138]: [INFO:policy_key.cc(51)] No policy key on disk at /home/root/f394d6c960ed35d8d736e60486726787d1936532/session_manager/policy/key chrome/chrome_20180227-124455:[1162:1162:0227/124457.038302:ERROR:device_cloud_policy_store_chromeos.cc(233)] Device policy read on enrolled device yields no DM token! Status: 5. Status 5 == STORE_VALIDATION_ERROR, // Policy validation failure. instead of 1 STORE_KEY_UNAVAILABLE, // Owner key not yet configured. but I've also the same crashes found at #24
,
Apr 12 2018
Re #45: what are the symptoms in this case, btw? Unlike other cases on this bug, there were no tpm errors that led to losing encstateful - the device haven't gone through wiping it since at least 02/13, so it's a different root cause and there should not have been unexpected OOBE screens on boot. I do see policy errors: 2018-02-27T11:04:09.014158-08:00 ERR shill[1104]: [ERROR:device_policy_impl.cc(132)] Policy on disk could not be parsed! ... 2018-02-27T11:04:09.116916-08:00 ERR session_manager[1132]: [ERROR:device_policy_impl.cc(132)] Policy on disk could not be parsed! which started after ext4 errors. I'm splitting it to a separate bug - see issue 832204 .
,
Apr 26 2018
Have one more model, case: 15132165 model: Edugear CMT Chromebook (wizpig) version: 65.0.3325.209 / 10323.67.0 with 2018-04-19T16:23:23.192750+00:00 ERR kernel: [ 1.068344] tpm_tis tpm_tis: Unable to read header and 2018-04-19T10:45:20.301391-07:00 INFO kernel: [ 1.530381] EXT4-fs (mmcblk0p1): recovery complete 2018-04-19T10:45:20.301414-07:00 INFO kernel: [ 1.813420] EXT4-fs (dm-1): recovery complete and 2018-04-19T16:23:23.192096+00:00 ERR kernel: [ 0.176149] ERROR: Unable to locate IOAPIC for GSI 184 2018-04-19T16:23:23.192097+00:00 WARNING kernel: [ 0.176153] Failed to set pin attr for GSI184 SN and logs https://drive.google.com/open?id=1XuT2e2aTFDh5mK-NFtnxUDOtqELWBeo2
,
May 4 2018
Any updates? Please let me know if anything else is needed from customer regarding this issue.
,
May 4 2018
No updates since comment 44. The fix for the suspected root cause there is in R67.
,
May 10 2018
Same issue: HP Chromebook 11 G5 65.0.3325.209 and an Asus
,
May 10 2018
Issue 838033 has been merged into this issue.
,
May 23 2018
Same issue with our fleet of 3000 Dell 3180. Sent sample device to Dell, they confirmed its an issue with ChromeOS. The said to put pressure on Google for a fix. We've upgraded from 63, to 64, and now 65. Still occurs randomly; we get 10-20 reports per week.
,
May 24 2018
"tpm_tis tpm_tis: Unable to read header" is what led to lost enrollment in reported cases with logs. There is an known issue that causes occasional "tpm_tis tpm_tis: Unable to read header" on strago family since R63. The fix was made in R67. Plus, additional hardening from similar low-level issues at encstateful mount time is to be included in R68. Marking as Fixed. Please report if seen again on 67 or later. Note that once seen on a device, fixing the problem includes powerwash. W/o powerwash, the issue will re-occur on reboots and will still be there after AU to 67+. |
||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||
Comment 1 by marcore@chromium.org
, Feb 9 2018