GetSystemNSSKeySlot never returns when TPM is disabled (e.g. chromeos-on-linux) |
||||
Issue descriptionWhen the TPM is disabled, GetSystemNSSKeySlot never returns. This affects e.g. client certs on the sign-in screen, or enterprise.platformKeys API. GetSystemNSSKeySlot hangs in IsTPMTokenReady. We should have a mechanism similar to crypto::InitializePrivateSoftwareSlotForChromeOSUser[1] for the system token. Probably we can trigger this on TPMTokenLoader going into state TPM_DISABLED. [1] https://cs.chromium.org/chromium/src/chrome/browser/profiles/profile_io_data.cc?rcl=211e27c281bdd7c94d235011b07421904e6d48be&l=393
,
May 17 2018
> Is the TPM ever disabled on ChromeOS, other than chromeos-on-linux? This is a good question, and I don't know yet :-) I'll find out what is supposed to happen in VMs. > I'm not sure this is a bug vs WAI. I got notified of this because testing chromeos-on-linux with staging gaia is currently broken. The reason is that staging gaia requests a client cert, so we try to list certs from the system token, which hangs. System token certs on sign-in screen can be disabled by a command-line switch and this helps. What I could do is simply disable the feature if chromeos-on-linux, that'd be simpler. It's supposedly also showing the same behavior on VMs, but I didn't have time yet to investigate if we try loading the system token there and fail or if something else happens.
,
May 17 2018
+apronin Specifically, I'll check if TPMTokenLoader ends up in TPM_DISABLED state on a VM or in some other state. I *do* think there would be value in emulating a system token in VMs. (not so much on chromeos-on-linux, but maybe we'll get that for free)
,
May 17 2018
Gotcha, thanks! I agree we should try to get tests working, and chromeos-on-linux not being able to use client certs is definitely a regression that we should figure out, as it was 'supposed' to just fall back to the user slot without the TPM init (at least, from memory of ~4 years ago, so... :D) Breaking VMs is also definitely not good. It sounds like we've got code with implicit/explicit dependencies on GetSystemNSSKeySlot - can we identify and remove those, to handle there not being a SystemNSSKeySlot? I take it this is due to the recent(ish) support for such client certs? It sounds like we might be more robust like that anyways.
,
May 17 2018
I think the core question is how GetSystemNSSKeySlot should behave -- I think we have five cases: (1) There is (supposed to be) a TPM but loading the system slot has not finished yet -> GetSystemNSSKeySlot returns nullptr and promises to call back the callback. This is OK. (2) There is a TPM and loading the system slot finished successfully. -> GetSystemNSSKeySlot returns the system slot, this is OK. (3) There is (supposed to be) a TPM but loading the system slot failed -> Not sure what happens here. (4) There is no real TPM (chromeos-on-linux), so there will be no system slot. -> It looks like GetSystemNSSKeySlot behaves as in (1) here, but never calls the callback. (5) There is no real TPM (VM), so there will be no system slot. -> I don't know yet what happens here. I'd like (4) to either return a software-emulated system slot, or something to indicate that there never will be a system slot. Letitng the caller wait forever is not good behavior :) (5) should definitely return a software-emulated system slot IMO.
,
May 17 2018
Ah, or were you thinking of mandating that GetSystemNSSKeySlot may only be called when the caller knows that there is a system slot? So that there would be no need for GetSystemNSSKeySlot to indicate a failure to its caller?
,
May 17 2018
Yeah, I'm not sure what happens with (3) either, which is why it sounds good to resolve. I'm a bit mixed on (4) / (5), mostly because I'm concerned about the risk of a TPM failure (#3) being interpreted as #4/#5, and causing the wrong thing (a fallback to a non-TPM rather than a failure to work) In the past, the failure mode of #3 - that everything stops working and is clearly broken (at least as I recall) - helped find real bugs in TPM init. But if you have a path that can distinguish "blow up if a TPM isn't there" versus "try to simulate a TPM isn't there", awesome. Bonus if you can make //crypto completely ignorant of all of that :) The alternative - GetSystemNSSKeySlot - or more generaly, the expectation that the TPM never fails - is one way to accomplish that, but understandably hampers testing in the non-TPM-present cases.
,
May 17 2018
For (4), should we then better emulate the TPM at a lower level as a part of VM-specific setup? Having an emulated TPM at kernel level for VMs sounds like a good goal. We even have it in a long-term projects list I'm reluctant to make it a run-time decision at this level. How do we know if it is case (3) or case (4)? Having the code that emulates TPM if it doesn't see one in production Chrome OS may open new attack vectors on chromebooks that are supposed to have TPMs. If we follow a special paths for VMs, what would we base such decision on? Absence of TPM device - that can happen on chromebooks in case of errors. Some board ID - do we have the one that we can trust? Or is it possible to turn on such emulation at compile time?
,
May 17 2018
Looks like #7, which I missed while writing this, is expressing the same concerns as #8. :)
,
May 18 2018
I understand the concern now, thanks! It sounds like emulating the TPM in the VM would be significantly cleaner and better than what we do now even for private slots (or special vm-specific code in cryptohome emaxx@ mentioned to me once -- I guess that could go away too?). If that doesn't work for some reason, we can recognize chromeos-on-linux by using base::SysInfo::IsRunningOnChromeOS() in chrome (checks "lsb-release" IIRC) and a VM by using StatisticsProvider::IsRunningOnVm() (checks VPD). Not sure if it's a good idea to decide around TPM init sequence on this though :-) I'd prefer not to after reading your arguments. I'll do a quickfix to disable client certs on the sign-in screen (which would only use a system slot as there is no private slot yet, so no point enabling it) if one of these conditions is detected to unblock the usecase which triggered this investigation.
,
Sep 28
Triage nag: This Chrome OS bug has an owner but no component. Please add a component so that this can be tracked by the relevant team.
,
Sep 28
|
||||
►
Sign in to add a comment |
||||
Comment 1 by rsleevi@chromium.org
, May 17 2018