New issue
Advanced search Search tips

Issue 844022 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

GetSystemNSSKeySlot never returns when TPM is disabled (e.g. chromeos-on-linux)

Project Member Reported by pmarko@chromium.org, May 17 2018

Issue description

When the TPM is disabled, GetSystemNSSKeySlot never returns.
This affects e.g. client certs on the sign-in screen, or enterprise.platformKeys API.

GetSystemNSSKeySlot hangs in IsTPMTokenReady.

We should have a mechanism similar to crypto::InitializePrivateSoftwareSlotForChromeOSUser[1] for the system token. Probably we can trigger this on TPMTokenLoader going into state TPM_DISABLED.

[1] https://cs.chromium.org/chromium/src/chrome/browser/profiles/profile_io_data.cc?rcl=211e27c281bdd7c94d235011b07421904e6d48be&l=393
 
I'm not sure this is a bug vs WAI. TPM interactions on ChromeOS-on-linux have never been expected to work.

I'm concerned about having a solution like that's described, because the ChromeOS initialization sequence is already rather difficult to trace and understand. Would this be a good opportunity to try to move more of the ChromeOS bits out of //crypto and the specific TPM notions baked in there, and instead have those provided as dependencies through.

Is the TPM ever disabled on ChromeOS, other than chromeos-on-linux? 

Comment 2 by pmarko@chromium.org, May 17 2018

> Is the TPM ever disabled on ChromeOS, other than chromeos-on-linux? 
This is a good question, and I don't know yet :-) I'll find out what is supposed to happen in VMs.

> I'm not sure this is a bug vs WAI.
I got notified of this because testing chromeos-on-linux with staging gaia is currently broken.
The reason is that staging gaia requests a client cert, so we try to list certs from the system token, which hangs. System token certs on sign-in screen can be disabled by a command-line switch and this helps. What I could do is simply disable the feature if chromeos-on-linux, that'd be simpler.

It's supposedly also showing the same behavior on VMs, but I didn't have time yet to investigate if we try loading the system token there and fail or if something else happens.

Comment 3 by pmarko@chromium.org, May 17 2018

Cc: apronin@chromium.org
+apronin

Specifically, I'll check if TPMTokenLoader ends up in TPM_DISABLED state on a VM or in some other state.
I *do* think there would be value in emulating a system token in VMs.
(not so much on chromeos-on-linux, but maybe we'll get that for free)
Gotcha, thanks! I agree we should try to get tests working, and chromeos-on-linux not being able to use client certs is definitely a regression that we should figure out, as it was 'supposed' to just fall back to the user slot without the TPM init (at least, from memory of ~4 years ago, so... :D)

Breaking VMs is also definitely not good.

It sounds like we've got code with implicit/explicit dependencies on GetSystemNSSKeySlot - can we identify and remove those, to handle there not being a SystemNSSKeySlot? I take it this is due to the recent(ish) support for such client certs? It sounds like we might be more robust like that anyways.

Comment 5 by pmarko@chromium.org, May 17 2018

I think the core question is how GetSystemNSSKeySlot should behave -- I think we have five cases:
(1) There is (supposed to be) a TPM but loading the system slot has not finished yet
  -> GetSystemNSSKeySlot returns nullptr and promises to call back the callback. This is OK.

(2) There is a TPM and loading the system slot finished successfully.
  -> GetSystemNSSKeySlot returns the system slot, this is OK.

(3) There is (supposed to be) a TPM but loading the system slot failed
  -> Not sure what happens here.

(4) There is no real TPM (chromeos-on-linux), so there will be no system slot.
  -> It looks like GetSystemNSSKeySlot behaves as in (1) here, but never calls the callback.

(5) There is no real TPM (VM), so there will be no system slot.
  -> I don't know yet what happens here.

I'd like (4) to either return a software-emulated system slot, or something to indicate that there never will be a system slot. Letitng the caller wait forever is not good behavior :)

(5) should definitely return a software-emulated system slot IMO.

Comment 6 by pmarko@chromium.org, May 17 2018

Ah, or were you thinking of mandating that GetSystemNSSKeySlot may only be called when the caller knows that there is a system slot? So that there would be no need for GetSystemNSSKeySlot to indicate a failure to its caller?
Yeah, I'm not sure what happens with (3) either, which is why it sounds good to resolve.

I'm a bit mixed on (4) / (5), mostly because I'm concerned about the risk of a TPM failure (#3) being interpreted as #4/#5, and causing the wrong thing (a fallback to a non-TPM rather than a failure to work)

In the past, the failure mode of #3 - that everything stops working and is clearly broken (at least as I recall) - helped find real bugs in TPM init. But if you have a path that can distinguish "blow up if a TPM isn't there" versus "try to simulate a TPM isn't there", awesome. Bonus if you can make //crypto completely ignorant of all of that :)

The alternative - GetSystemNSSKeySlot - or more generaly, the expectation that the TPM never fails - is one way to accomplish that, but understandably hampers testing in the non-TPM-present cases.
For (4), should we then better emulate the TPM at a lower level as a part of VM-specific setup? Having an emulated TPM at kernel level for VMs sounds like a good goal. We even have it in a long-term projects list

I'm reluctant to make it a run-time decision at this level. How do we know if it is case (3) or case (4)? Having the code that emulates TPM if it doesn't see one in production Chrome OS may open new attack vectors on chromebooks that are supposed to have TPMs. If we follow a special paths for VMs, what would we base such decision on? Absence of TPM device - that can happen on chromebooks in case of errors. Some board ID - do we have the one that we can trust? 

Or is it possible to turn on such emulation at compile time?
Looks like #7, which I missed while writing this, is expressing the same concerns as #8. :)
Cc: alemate@chromium.org feiling@chromium.org
I understand the concern now, thanks!
It sounds like emulating the TPM in the VM would be significantly cleaner and better than what we do now even for private slots (or special vm-specific code in cryptohome emaxx@ mentioned to me once -- I guess that could go away too?).

If that doesn't work for some reason, we can recognize chromeos-on-linux by using base::SysInfo::IsRunningOnChromeOS() in chrome (checks "lsb-release" IIRC) and a VM by using StatisticsProvider::IsRunningOnVm() (checks VPD). Not sure if it's a good idea to decide around TPM init sequence on this though :-) I'd prefer not to after reading your arguments.

I'll do a quickfix to disable client certs on the sign-in screen (which would only use a system slot as there is no private slot yet, so no point enabling it) if one of these conditions is detected to unblock the usecase which triggered this investigation.
Triage nag: This Chrome OS bug has an owner but no component. Please add a component so that this can be tracked by the relevant team.
Components: OS>Systems>Network Enterprise

Sign in to add a comment