Cryptohome could not remove the user's vault. |
||||||||||||||||||||||||||
Issue descriptionTracking bug for occurences of this flake.
,
Jun 2 2016
,
Jun 2 2016
Seems to be a sheriff issue, please find an owner.
,
Jun 2 2016
I don't understand why this particular failure has been assigned as an issue. The linked failure on 'peach-pit-chrome-pfg' was for build 2109. Build 2110 showed the same issue. However, subsequent builds have either failed for different reasons or have been successful.
,
Jun 2 2016
Please feel free to Wont' Fix or Obsolete it.
,
Jun 3 2016
I filed this because I need some tracking bug for the historical pfq annotating that I'm doing. If this issue is no longer relevant then feel free to close.
,
Jul 12 2016
This is still failing frequently (4 times today) in issue 615475. The problem is always the same: "Remove call failed: Message did not receive a reply (timeout by message bus)." Is this just a flake we have to live with, or a symptom of a bug? 07/12 10:47:38.706 DEBUG| cryptohome:0180| Removing vault for user test@test.test with hash 19cc1a8c867fe7ddec4f09764ccf7de59cd45978 07/12 10:47:38.707 DEBUG| base_utils:0185| Running '/usr/sbin/cryptohome --action=remove --force --user=test@test.test 2>&1' 07/12 10:47:39.508 DEBUG| base_utils:0278| [stdout] Remove call failed: Message did not receive a reply (timeout by message bus). 07/12 10:47:39.510 DEBUG| base_utils:0278| [stdout] Remove failed. 07/12 10:47:39.517 DEBUG| base_utils:0185| Running 'mkdir -p /usr/local/autotest/results/default/login_Cryptohome/sysinfo/iteration.1/var/spool' 07/12 10:47:39.547 DEBUG| base_utils:0185| Running 'rsync --no-perms --chmod=ugo+r -a --exclude=autoserv* /var/spool/crash /usr/local/autotest/results/default/login_Cryptohome/sysinfo/iteration.1/var/spool' 07/12 10:47:39.986 DEBUG| base_utils:0185| Running 'rm -rf /var/spool/crash/*' 07/12 10:47:40.108 DEBUG| base_utils:0185| Running 'logger "autotest finished iteration /usr/local/autotest/results/default/login_Cryptohome/sysinfo/iteration.1"' 07/12 10:47:40.157 WARNI| test:0606| Autotest caught exception when running test: Traceback (most recent call last): File "/usr/local/autotest/common_lib/test.py", line 600, in _exec _call_test_function(self.execute, *p_args, **p_dargs) File "/usr/local/autotest/common_lib/test.py", line 804, in _call_test_function return func(*args, **dargs) File "/usr/local/autotest/common_lib/test.py", line 461, in execute dargs) File "/usr/local/autotest/common_lib/test.py", line 347, in _call_run_once_with_retry postprocess_profiled_run, args, dargs) File "/usr/local/autotest/common_lib/test.py", line 376, in _call_run_once self.run_once(*args, **dargs) File "/usr/local/autotest/tests/login_Cryptohome/login_Cryptohome.py", line 35, in run_once cryptohome.remove_vault(username) File "/usr/local/autotest/cros/cryptohome.py", line 185, in remove_vault raise ChromiumOSError('Cryptohome could not remove the user\'s vault.') ChromiumOSError: Cryptohome could not remove the user's vault.
,
Jul 13 2016
,
Jul 14 2016
+wad@ This timeout has happened in the pfq 9 times since Monday. I don't see any recent changes to the test, so this may be a regression or an infra problem if the devices are overloaded. Can anyone shed some light on what's happening here? When this cropped up in 2014 (issue 344862), increasing the timeout helped, but seemed like a stopgap measure. Trying to understand what has happened since then.
,
Jul 20 2016
,
Jul 20 2016
I'm sheriff this week. This happened three times today (see issue 615475).
,
Jul 20 2016
,
Jul 20 2016
Darren, do you know who would be the right person to look at this? This seems to be a new flaky regression.
,
Jul 20 2016
https://bugs.chromium.org/p/chromium/issues/detail?id=612797 is probably also related
,
Jul 20 2016
,
Jul 20 2016
,
Jul 20 2016
This bug started happening May 27, and was quite infrequent until Jul 11, when it started happening multiple times a day. What do these CLs do? https://chromium-review.googlesource.com/#/c/357971/ https://chromium-review.googlesource.com/#/c/359049/ Assigning to Darren to find an owner.
,
Jul 20 2016
https://pantheon.corp.google.com/m/cloudstorage/b/chromeos-autotest-results/o/70182206-chromeos-test/chromeos4-row2-rack4-host3/login_Cryptohome/sysinfo/var/log/messages 2016-07-20T08:15:21.183147+00:00 INFO cryptohomed[744]: TPM error 0x2020 (Key not found in persistent storage): LoadKeyByUuid: failed LoadKeyByUUID 2016-07-20T08:15:21.836353+00:00 ERR cryptohomed[744]: Couldn't wrap cryptohome key 2016-07-20T08:15:21.836421+00:00 WARNING cryptohomed[744]: Could not load the device policy file. 2016-07-20T08:15:21.836554+00:00 ERR cryptohomed[744]: Creating new salt at /home/.shadow/salt (0, 0) 2016-07-20T08:15:22.350180+00:00 ERR shill[803]: [ERROR:crypto_des_cbc.cc(103)] Unable to load key matter from /var/lib/whitelist/owner.key 2016-07-20T08:15:22.350608+00:00 ERR shill[803]: [ERROR:crypto_des_cbc.cc(103)] Unable to load key matter from /var/lib/whitelist/owner.key 2016-07-20T08:15:22.360562+00:00 WARNING chapsd[736]: SRK does not exist - this is normal when the TPM is not yet owned. 2016-07-20T08:15:22.361106+00:00 WARNING chapsd[736]: SRK does not exist - this is normal when the TPM is not yet owned. 2016-07-20T08:15:22.394026+00:00 INFO kernel: [ 11.399371] tpm_tis tpm_tis: command 0x15 (size 14) returned code 0x0 2016-07-20T08:15:22.397194+00:00 WARNING cryptohomed[744]: Failed to read PCR1. 2016-07-20T08:15:22.501176+00:00 INFO cryptohomed[744]: TPM error 0x2020 (Key not found in persistent storage): Unseal: Failed to load SRK. 2016-07-20T08:15:22.501326+00:00 ERR cryptohomed[744]: Cannot unseal aes key. 2016-07-20T08:15:22.501336+00:00 ERR cryptohomed[744]: Attestation: Could not unseal decryption key. 2016-07-20T08:15:22.501341+00:00 WARNING cryptohomed[744]: Attestation: Attestation data invalid. This is normal if the TPM has been cleared. 2016-07-20T08:15:22.526024+00:00 INFO kernel: [ 11.531450] tpm_tis tpm_tis: command 0x46 (size 14) returned code 0x0 2016-07-20T08:15:22.575027+00:00 INFO kernel: [ 11.581072] tpm_tis tpm_tis: command 0x7c (size 30) returned code 0x0 2016-07-20T08:15:22.580184+00:00 INFO cryptohomed[744]: Cannot read boot lockbox files.
,
Jul 20 2016
First, there was a regression, later fixed by https://chromium-review.googlesource.com/#/c/359986/. But we still see it now. Second, based on two logs in issue 615475 and issue 612797, the symptoms in /var/log/messages there are similar to issue 618392. So, all three may be related. Not identical (in issue 618392 chrome dies first), but all the rest is the same: WARNING chapsd[21545]: SRK does not exist - this is normal when the TPM is not yet owned. WARNING cryptohomed[21554]: Could not load the device policy file. ... INFO cryptohomed[21554]: Migrated (or created) user directory: ... ... WARNING cryptohomed[21554]: TPM was not owned. TPM initialization call back will handle PKCS#11 initialization. ... ERR cryptohomed[21554]: TPM initialization took 1998ms ... INFO cryptohomed[21554]: Created new cryptohome key. ... WARNING chapsd[21545]: Unload Token event received for unknown path ... WARNING cryptohomed[21554]: PKCS#11 initialization requested but cryptohome is not mounted ... WARNING crash_reporter[23124]: [user] Received crash notification for cryptohomed[21554] sig 11, user 0 And then the crashed cryptohomed manifests itself in various ways: failed commands, dbus timeouts, etc
,
Jul 20 2016
#18: Those CLs are for TPM 2.0 support. They should not effect existing devices. Looks like apronin@ is looking at this :). +gwendal too.
,
Jul 20 2016
Yes, I'm looking. Just to keep records on the investigation: At least from two of the logs where I could match dump to symbols, looks like cryptohome is crashing in Service::InitializeTpmComplete, while trying to go through the set of mounts. The session has ended while crypohome was still initializing tpm. And looks like Service::InitializeTpmComplete runs in parallel with Service::RemoveMountForUser or Service::RemoveAllMounts. And somewhere there's not enough synchronization. Consistent with observations for issue 618392.
,
Jul 21 2016
,
Jul 21 2016
I'm adding 627367 which I imagine has the same root cause of cryptohome crashing. This failed in the PFQ again today.
,
Jul 21 2016
apronin@ we really need to get this issue to conclusion asap. This is affecting a number of boards and a variety of tests. Do you have any CL or solution in mind already or are we still investigating this?
,
Jul 22 2016
While I'm working on a proper fix (won't be today), this workaround may buy some time: https://chromium-review.googlesource.com/#/c/362513/. Can't really test if it helps, as I can't reproduce the issue, but based on what I see about the bug, it might help temporarily. The problem is with the session being closed and the new started before the tpm initialization initiated during this session is still ongoing. Then, when the tpm initialization is complete, it tries to access the objects which are not there anymore. Fixing it will take time, so I'm changing the test to delay the moment when the session is restarted.
,
Jul 22 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/32820b4ed32d388e7a5a00f4c67564ee33b49d5d commit 32820b4ed32d388e7a5a00f4c67564ee33b49d5d Author: Andrey Pronin <apronin@google.com> Date: Fri Jul 22 02:19:59 2016 WORKAROUND: autotest: login_Cryptohome: let tpm finish initialization Workaround: the test seems to fail because cryptohome crashes if a session is closed while tpm is still being initialized. While a proper fix is being implemented, workaround by waiting before closing the session in case we need time to finish tpm initialization. BUG= chromium:616854 TEST=none Change-Id: Ied5a20efcbfa2105f40ed0720015dba5d680871d Reviewed-on: https://chromium-review.googlesource.com/362513 Commit-Ready: Achuith Bhandarkar <achuith@chromium.org> Tested-by: Achuith Bhandarkar <achuith@chromium.org> Reviewed-by: Achuith Bhandarkar <achuith@chromium.org> [modify] https://crrev.com/32820b4ed32d388e7a5a00f4c67564ee33b49d5d/client/site_tests/login_Cryptohome/login_Cryptohome.py
,
Jul 23 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/cd07fc15c10755078dceb8560a3dd83fd4f259e9 commit cd07fc15c10755078dceb8560a3dd83fd4f259e9 Author: Chirantan Ekbote <chirantan@chromium.org> Date: Fri Jul 22 23:53:19 2016 cryptohome: Add missing lock when accessing mounts The cryptohome mounts can be accessed concurrently by multiple threads and have a lock (mounts_lock_) that must be held before they can be accessed. Unfortunately, when iterating over the mounts after successfully initializing the TPM, the Service::InitializeTpmComplete function did not acquire the lock. Since the time this bug was introduced (commit b009d07 in March, 2013) the tpm initialization flow has apparently been fast enough that it never overlapped with the session teardown phase of the login_Cryptohome test. This changed recently and the race condition is frequently triggered. Add the missing lock acquisition around the mount iteration code in Service::InitializeTpmComplete. BUG= chromium:616854 TEST=login_Cryptohome on peach_pit Change-Id: I8ab4e4a6e5b17beb341cb86b13dc0dc258e04f19 Signed-off-by: Chirantan Ekbote <chirantan@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/362609 Reviewed-by: Andrey Pronin <apronin@chromium.org> [modify] https://crrev.com/cd07fc15c10755078dceb8560a3dd83fd4f259e9/cryptohome/service.cc
,
Jul 23 2016
We've had 6 successful PFQ runs since yesterday, and no cryptohome failures. We could try reverting the WAR to see if the missing-lock fix works.
,
Jul 25 2016
,
Jul 26 2016
,
Aug 26 2016
,
Sep 2 2016
Some fixes landed - tracked in issue 641369 , issue 641372 , issue 641366 .
,
Sep 15 2016
Do we still see the issue? The last auto-generated bug report in issue 615475 is from Sep 7.
,
Sep 24 2016
Not seen anymore, marking as Fixed
,
Oct 7 2016
,
Nov 19 2016
,
Jan 21 2017
,
Mar 4 2017
,
Apr 17 2017
,
May 30 2017
,
Aug 1 2017
,
Oct 14 2017
|
||||||||||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||||||||||
Comment 1 by akes...@chromium.org
, Jun 2 2016