TPM2 does not work inside VMTest: eve-pre-cq VMTests are failing, apparently unrelated to CLs being tested. CLs are blocked. |
|||||||||||||
Issue descriptionFor example: https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/pre_cq/27121 https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/pre_cq/25958 In both cases the failure looks like: FAIL cheets_CTSHelper.sanity cheets_CTSHelper.sanity timestamp=1491898847 localtime=Apr 11 03:20:47 Unhandled LoginException: Timed out going through login screen. OOBE not dismissed. It looks the same as Issue 708715 . That has been marked as fixed but because caroline-pre-cq was removed, not because it was fixed. This is preventing us from submitting Eve CLs since the pre-cq is always (likely spuriously) failing.
,
Apr 11 2017
gwendal, any chance this has to do with ext4 instead of ecryptfs? The fact that cryptohome is failing and it's only happening on caroline and eve makes me suspicious.
,
Apr 11 2017
The 2nd login attempt works btw. But then after rebooting, same thing: 1st login attempt times out, 2nd one works.
,
Apr 11 2017
,
Apr 11 2017
Ah, Gwendal is on holiday this week. Junichi, can you help finding an owner? Or we're going to have to disable VMTests for eve, the bug is blocking its Pre-CQ.
,
Apr 12 2017
hmmm.. this sounds familiar
,
Apr 12 2017
is this b/37189798 ?
,
Apr 12 2017
Not immediately clear to me if this is same as b/37189798. /var/log/messages or ui.LATEST pasted above both shows failing to connect/find Dbus services, while b/37189798 shows more filesystem/cryptohome/tpm corruption logs. https://pantheon.corp.google.com/storage/browser/chromeos-image-archive/trybot-eve-pre-cq/R59-9451.0.0-b27121/vm_test_results_1/smoke_suite/test_harness/all/SimpleTestVerify/1_autotest_tests/results-01-cheets_CTSHelper.smoke/cheets_CTSHelper.sanity/sysinfo/var/log/ 2017-04-11T07:33:29.799593+00:00 ERR cryptohomed[2667]: AddDBusError(...): Domain=dbus, Code=org.freedesktop.DBus.Error.NoReply, Message=Message did not receive a reply (timeout by message bus) 2017-04-11T07:33:29.799916+00:00 ERR cryptohomed[2667]: SetIsInitialized: Not Implemented. 2017-04-11T07:33:29.800562+00:00 ERR cryptohomed[2667]: AddDBusError(...): Domain=dbus, Code=org.freedesktop.DBus.Error.ServiceUnknown, Message=The name org.chromium.TpmManager was not provided by any .service files 2017-04-11T07:33:29.800716+00:00 ERR cryptohomed[2667]: SetIsOwned: Not Implemented. 2017-04-11T07:33:29.800736+00:00 ERR cryptohomed[2667]: SetIsEnabled: Not Implemented. 2017-04-11T07:33:29.801286+00:00 ERR cryptohomed[2667]: AddDBusError(...): Domain=dbus, Code=org.freedesktop.DBus.Error.ServiceUnknown, Message=The name org.chromium.TpmManager was not provided by any .service files 2017-04-11T07:33:59.987946+00:00 WARNING cryptohomed[2667]: Failed to initialize the trunks IPC proxy; trunksd is not ready. 2017-04-11T07:33:59.988005+00:00 ERR cryptohomed[2667]: Failed to initialize trunks factory. 2017-04-11T07:33:59.988082+00:00 ERR cryptohomed[2667]: Couldn't wrap cryptohome key 2017-04-11T07:33:59.988176+00:00 WARNING cryptohomed[2667]: Could not load the device policy file. 2017-04-11T07:33:59.990891+00:00 ERR cryptohomed[2667]: Creating new salt at /home/.shadow/salt (0, 0)
,
Apr 12 2017
2017-04-11T07:32:58.215858+00:00 ERR trunksd[1261]: TPM: Error opening tpm0 file descriptor at /dev/tpm0: No such file or directory 2017-04-11T07:32:58.216112+00:00 CRIT trunksd[1261]: Check failed: low_level_transceiver->Init(). Error initializing TPM communication.#012/usr/lib64/libbase-core-395517.so(base::debug::StackTrace::StackTrace()+0x13) [0x7d94e4e33da3]#012 2017-04-11T07:32:58.226715+00:00 INFO tpm_managerd[1349]: Starting TPM Manager... 2017-04-11T07:32:58.262508+00:00 WARNING crash_reporter[1377]: Could not load the device policy file. 2017-04-11T07:32:58.262716+00:00 WARNING crash_reporter[1377]: [user] Received crash notification for trunksd[1261] sig 6, user 0 (developer build - not testing - always dumping) 2017-04-11T07:32:58.265120+00:00 INFO crash_reporter[1377]: State of crashed process [1261]: S (sleeping) trunksd is failing to open /dev/tpm0 and dead, as a result tpm_managerd is dead, and as a result cryptohomed is dead?
,
Apr 12 2017
Based on https://storage.cloud.google.com/chromeos-image-archive/trybot-eve-pre-cq/R59-9451.0.0-b27121/vm_test_results_2/smoke_suite/test_harness/all/SimpleTestVerify/1_autotest_tests/results-01-cheets_CTSHelper.smoke/cheets_CTSHelper.sanity/sysinfo/var/log/messages , it's not the homedir removal issues. No "No valid keysets on disk" and "Fatal decryption error, but unable to remove cryptohome" errors in the log. From what I see from the logs, the image is built with USE=tpm2, and attempts to run on a VM without an actual physical tpm chip (right?). So, tpm-related daemons (trunksd, attestationd, tpm_managerd) continuously crash in the loop unable to connect to /dev/tpm0. But that's probably expected for a VM. cryptohomed seems to work fine. But now it waits for trunksd on each restart (and it is restarted by the tests for each browser session, iirc) before giving up. The test might timeout because of that.
,
Apr 12 2017
I'm not sure what the implications of USE=tpm2 are, but yes, your explanation makes sense since builders first and foremost build images for the hardware (i.e. with the TPM included) and then in the last stage tweak the image at the margin to make it bootable in a VM. But that doesn't change the installed packages or the expectations of devices being present. I don't know how the lack of TPM is handled in samus/cyan VMs, and if the same logic could work on eve/caroline?
,
Apr 12 2017
Re #11: Unf, the tpm communication stack is completely different for tpm 1.2 (samus, cyan, caroline, kevin) compared to tpm 2.0 (eve, gru, reef). And, based on logs, cryptohome seems to be working without the tpm (though nobody until now tested it in that mode in 2.0 case, so there can still be surprises). Just much slower: it is significantly delayed by attempts to communicate with more tpm-reliant daemons (trunksd that talks directly to the tpm, atetstationd that it relies in 2.0 case to do all the attestation work, tpm_managerd that it uses to work with nvram spaces, where it keeps things like install-attributes). I will need to do a pass through cryptohomed and other daemons to see how we can minimize these delays in the tpm-less case. But it will take time, and only matters for being able to run unchanged images in a VM. Meanwhile, as an interim solution: 1) Is it possible to increase the timeouts when running tests on VM? That will slow down the pre-cq tests, though. 2) Is it possible to build a special image for VM? Different USE flags. Or at least putting a special trunks package that emulates TPM. We have it, it used to work at some point, but it requires a different trunksd binary and upstream script. 3) As the last resort, turn eve-pre-cq VM into an experimental builder indeed? 4) Can we run the pre-cq tests on the actual hardware instead of a VM? I don't think there is a good way to reduce the set of tests. Since, even if a test is not focusing on logging in or working with the keystore or other user actions, it often needs that to exercise other features. Still, as a side note: running login-related tests on a tpm-less VM is a poor predictor of how the actual system with a physical tpm will behave. Since, even if we address the delays, the code paths inside cryptohomed, chapsd, and other daemons will be significantly different. As well as the timing and sequencing of all operations there.
,
Apr 12 2017
#c12: 1) I don't think that will work, the first login always fails. After 4 minutes iirc 2) That's an option, see Issue 710629 3) I'll file a different bug to remove the VMTests from eve-pre-cq at least to unblock eve for the time being 4) Probably not in the Pre-CQ, but there should be a paladin that tests TPM 2.0. Issue 709696 to add caroline-paladin, that should do it.
,
Apr 12 2017
,
Apr 13 2017
Hi norvez@ So it sounds like we need to fix the way binary running inside VM and how TPM is (not) set up, do you want to take this ? (I doubt the ability of running unchanged binary in VM image is adding much value here at the cost of correct TPM error handling.)
,
Apr 13 2017
I'll talk offline to gwendal@ and apronin@ to see what we can do. In the mean time we're going to disable VMTests for eve to unblock the situation ( Issue 710954 ). Lowering the priority to P2.
,
Apr 13 2017
,
Apr 17 2017
Making the title a bit more descriptive now that we know what's failing.
,
Apr 27 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/board-overlays/+/7659817ef39e201f3b7f64aee64db585fbee8a48 commit 7659817ef39e201f3b7f64aee64db585fbee8a48 Author: Nicolas Norvez <norvez@chromium.org> Date: Thu Apr 27 16:42:37 2017 eve: don't run VMTests in Pre-CQ Eve images won't run in a VM because they require TPM 2.0. This CL removes VMTests from the Pre-CQ since they're expected to fail. BUG= chromium:710954 BUG= chromium:710492 TEST=None Change-Id: If05450ffaf71a82a9dba3fabda0fd7cf5cbb4187 Reviewed-on: https://chromium-review.googlesource.com/477090 Reviewed-by: Dylan Reid <dgreid@chromium.org> Reviewed-by: Ningning Xia <nxia@chromium.org> Reviewed-by: Chad Versace <chadversary@chromium.org> Tested-by: Nicolas Norvez <norvez@chromium.org> [modify] https://crrev.com/7659817ef39e201f3b7f64aee64db585fbee8a48/overlay-eve/COMMIT-QUEUE.ini
,
Apr 28 2017
FYI, reef & poppy have TPM2.0 and are running vmtests as part of pre-cq AFAICT, baseboard-poppy/COMMIT-QUEUE.ini:pre-cq-configs: poppy-pre-cq baseboard-reef/COMMIT-QUEUE.ini:pre-cq-configs: reef-pre-cq overlay-poppy/COMMIT-QUEUE.ini:pre-cq-configs: poppy-pre-cq overlay-reef/COMMIT-QUEUE.ini:pre-cq-configs: reef-pre-cq
,
Apr 28 2017
,
Apr 28 2017
,
Apr 28 2017
Apologies for cc removals adding to front of cc list is problematic (at least for me) While reef doesn't have the same no-vmtest-pre-cq I do see its skipping vm tests in this build, https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/pre_cq/30042 Not sure how its determining that though.
,
Apr 28 2017
It looks like it's configured in chromite. reef is part of _cheets_x86_boards that is then added to a 'no_vmtest_boards' list https://chromium.googlesource.com/chromiumos/chromite/+/master/cbuildbot/chromeos_config.py#773 eve should probably be treated the same way, i.e. added to _cheets_x86_boards, not sure why it wasn't. I'll do it, sounds like a good task for the bus ride back home :-)
,
Apr 29 2017
Right, CL:490522 is up for review, it should bring eve in line with reef and friends. Once it's landed we can revert CL:477090 (that specifically modifies eve's pre-cq config) since it will be obsolete and won't be consistent any more. Then we can fix the underlying cause, or more likely punt until we have dedicated VM builders (crbug.com/710629)
,
May 2 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/357c3ed80392d6fbfc9f75f82779dfc1fdf0e1eb commit 357c3ed80392d6fbfc9f75f82779dfc1fdf0e1eb Author: Nicolas Norvez <norvez@chromium.org> Date: Tue May 02 02:18:15 2017 chromeos_config: add eve to cheets boards eve had not been added to _cheets_x86_boards and the behaviour of the pre-cq is different. cheets boards do not currently run VMTests because ARC++ wasn't working in VMs at one point. This has since been fixed, but other VM-specific issues arise from time to time and it's confusing why some boards (e.g. reef) do not run VMTests while eve does. This CL fixes the inconsistent behaviour between supposedly similar boards. Still TODO: - update the VMTests blacklist with the real reason VMTests don't always work (not ARC++, likely TPM2) - remove VMTests from most boards and use dedicated VM builders BUG= chromium:710492 BUG= chromium:710954 TEST=chromeos_config_unittest Change-Id: I67914956f61eeaee47f79ea65952dba4922d389e Reviewed-on: https://chromium-review.googlesource.com/490522 Commit-Ready: Nicolas Norvez <norvez@chromium.org> Tested-by: Nicolas Norvez <norvez@chromium.org> Reviewed-by: Bernie Thompson <bhthompson@chromium.org> [modify] https://crrev.com/357c3ed80392d6fbfc9f75f82779dfc1fdf0e1eb/cbuildbot/config_dump.json [modify] https://crrev.com/357c3ed80392d6fbfc9f75f82779dfc1fdf0e1eb/cbuildbot/waterfall_layout_dump.txt [modify] https://crrev.com/357c3ed80392d6fbfc9f75f82779dfc1fdf0e1eb/cbuildbot/chromeos_config.py
,
May 19 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/board-overlays/+/b6a4235103f0710c40d39c170da04abe0668b0be commit b6a4235103f0710c40d39c170da04abe0668b0be Author: Nicolas Norvez <norvez@chromium.org> Date: Fri May 19 20:57:41 2017 Revert "eve: don't run VMTests in Pre-CQ" This reverts commit 7659817ef39e201f3b7f64aee64db585fbee8a48. Reason for revert: CL:490522 has landed, it makes the eve pre-cq skip VMTests the same way as other similar boards and makes CL:477090 redundant. For consistency and clarity, revert back to the original COMMIT-QEUEUE.ini so all similar boards have the same behaviour (skipping VMTests through config in chromite) and we only have to fix/tweak/change the behaviour in one place. Original change's description: > eve: don't run VMTests in Pre-CQ > > Eve images won't run in a VM because they require TPM 2.0. This CL > removes VMTests from the Pre-CQ since they're expected to fail. > > BUG= chromium:710954 > BUG= chromium:710492 > TEST=None > > Change-Id: If05450ffaf71a82a9dba3fabda0fd7cf5cbb4187 > Reviewed-on: https://chromium-review.googlesource.com/477090 > Reviewed-by: Dylan Reid <dgreid@chromium.org> > Reviewed-by: Ningning Xia <nxia@chromium.org> > Reviewed-by: Chad Versace <chadversary@chromium.org> > Tested-by: Nicolas Norvez <norvez@chromium.org> > TBR=jrbarnette@chromium.org,dgreid@chromium.org,nxia@chromium.org,norvez@chromium.org,chadversary@chromium.org # Not skipping CQ checks because original CL landed > 1 day ago. BUG= chromium:710954 Change-Id: Id3f7875568c228ac067bfc77b4b32f816e916e89 Reviewed-on: https://chromium-review.googlesource.com/498857 Commit-Ready: Nicolas Norvez <norvez@chromium.org> Tested-by: Nicolas Norvez <norvez@chromium.org> Reviewed-by: Nicolas Norvez <norvez@chromium.org> Reviewed-by: Ningning Xia <nxia@chromium.org> [modify] https://crrev.com/b6a4235103f0710c40d39c170da04abe0668b0be/overlay-eve/COMMIT-QUEUE.ini
,
May 22 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/f4bef4d7e09c1b02093051604bf8e37d1ac650fa commit f4bef4d7e09c1b02093051604bf8e37d1ac650fa Author: Nicolas Boichat <drinkcat@google.com> Date: Mon May 22 08:39:53 2017 chromeos_config: add soraka to cheets boards Mirrors https://chromium-review.googlesource.com/c/490522/ for soraka, as we see the same issue (VMTests failing because of tpm2). Adds soraka to both _x86_internal_release_boards and _cheets_x86_boards, then run: bin/cros_show_waterfall_layout > cbuildbot/waterfall_layout_dump.txt cbuildbot/chromeos_config_unittest --update BUG= chromium:710492 BUG=b:38477401 TEST=cbuildbot/chromeos_config_unittest TEST=cbuildbot/run_tests TEST=cbuildbot --remote -g XXX soraka-release Change-Id: I48a579d6a88ee97487a8ea7324cfb870c2477a9c Reviewed-on: https://chromium-review.googlesource.com/509649 Reviewed-by: Nicolas Norvez <norvez@chromium.org> Tested-by: Nicolas Boichat <drinkcat@chromium.org> Commit-Queue: Nicolas Boichat <drinkcat@chromium.org> [modify] https://crrev.com/f4bef4d7e09c1b02093051604bf8e37d1ac650fa/cbuildbot/config_dump.json [modify] https://crrev.com/f4bef4d7e09c1b02093051604bf8e37d1ac650fa/cbuildbot/waterfall_layout_dump.txt [modify] https://crrev.com/f4bef4d7e09c1b02093051604bf8e37d1ac650fa/cbuildbot/chromeos_config.py
,
Nov 20 2017
Obsolete. VMTests are now only run on the dedicated target, betty. |
|||||||||||||
►
Sign in to add a comment |
|||||||||||||
Comment 1 by norvez@chromium.org
, Apr 11 2017