New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 627115 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Long delays at login due to sync() calls in cryptohome

Project Member Reported by derat@chromium.org, Jul 11 2016

Issue description

I'm sometimes seeing abysmal login times when doing development on a ToT lumpy. Digging around a bit, it looks like cryptohomed may be blocking on calls to sync():

-----

chrome:

[14122:14122:0711/090956:VERBOSE1:existing_user_controller.cc(398)] Setting flow from PerformLogin
[14122:14122:0711/090956:VERBOSE1:user_flow.cc(20)] Flow 0x1e7fa45c1570 got host 0x1e7fa4aa2420
[14122:14122:0711/090956:VERBOSE1:login_performer.cc(278)] Offline auth started.
[14122:14122:0711/091021:VERBOSE1:gaia_screen_handler.cc(392)] OnPortalDetectionCompleted Online
[14122:14122:0711/091026:VERBOSE1:cryptohome_authenticator.cc(706)] Resolved state to: 12
[14122:14122:0711/091026:VERBOSE1:cryptohome_authenticator.cc(613)] Login success
[14122:14122:0711/091026:VERBOSE1:login_performer.cc(82)] LoginSuccess hash: e2d8d554ca4e43218a5acb5121bd3763e5231fb2
[14122:14122:0711/091026:VERBOSE1:user_session_manager.cc(485)] Starting session for derattest2@gmail.com

-----

cryptohomed:

2016-07-11T09:10:26.085372-06:00 WARNING cryptohomed[1342]: Long sync(): 29 seconds
                                                            ^^^^^^^^^^^^^^^^^^^^^^^
2016-07-11T09:10:26.095956-06:00 INFO cryptohomed[1342]: Creating pass-through directories /home/.shadow/e2d8d554ca4e43218a5acb5121bd3763e5231fb2/vault/user/GCache/v1/tmp

-----

After digging around in git, I think that the original sync() call was added by https://chromium-review.googlesource.com/#/c/14485/, but it's grown into multiple calls now:

- Mount::ForceUnmount() calls sync() before unmounting if the mountpoint was busy
- Mount::ForceUnmount() calls sync() after unmounting unconditionally
- Mount::MountCryptohomeInner() calls sync() before clearing the user keyring
- Mount::UnmountCryptohome() calls sync() before clearing the user keyring
- maybe more?

I'm skeptical that cryptohomed should ever be calling sync(). Shouldn't it be doing more-targeted syncs of the files and directories that it actually cares about? (Its Platform class already has methods for this.) In the worst case, can't it at least call syncfs() instead? It seems like using sync() (documented as blocking until the "actual writing is done" since Linux 1.3.20) could cause all sorts of unintended delays -- won't it even block on system updates going on in the background?
 

Comment 1 by derat@chromium.org, Jul 11 2016

I am hopefully hitting pathologically bad cases. I think that the 29-second (!) sync above was likely started soon after I pushed a new version of Chrome to the device.

If these calls can block login, or block the UI coming back after logout, they don't feel much different from calling sync() on Chrome's UI thread, though... which we'd never do.
Owner: alemate@chromium.org
Status: Assigned (was: Untriaged)
Possibly a dupe of bug 625514?

Comment 3 by derat@chromium.org, Jul 11 2016

The timing log on the other bug points at IME rather than cryptohome, doesn't it?

Comment 4 by derat@chromium.org, Jul 12 2016

Cc: gwendal@chromium.org
Owner: derat@chromium.org
Status: Started (was: Assigned)
Here's another one I just saw:

----

[9147:9147:0712/095847:VERBOSE1:login_performer.cc(278)] Offline auth started.
[9147:9147:0712/095925:VERBOSE1:cryptohome_authenticator.cc(706)] Resolved state to: 12
[9147:9147:0712/095925:VERBOSE1:cryptohome_authenticator.cc(613)] Login success

----

2016-07-12T09:58:48.122276-06:00 ERR cryptohomed[1343]: XXX ClearUserKeyring: pre-clear sync start
...
2016-07-12T09:59:25.355928-06:00 WARNING cryptohomed[1343]: Long sync(): 37 seconds
2016-07-12T09:59:25.355966-06:00 ERR cryptohomed[1343]: XXX ClearUserKeyring: pre-clear sync end

----

Gwendal, it looks like you added this call to sync() in https://chromium-review.googlesource.com/353354. I don't understand why it's there:

- In the call from Mount::MountCryptohomeInner() (which I think is what I'm seeing take a long time), I believe that we've already unmounted previously. Why do we need to sync here?
- In the call from Mount::UnmountCryptohome(), ForceUnmount() already called sync() if the filesystem couldn't be unmounted.

What case is the additional call trying to guard against?

I'm working on cleaning this up (so we just sync the directories we care about), unless someone tells me that that's a horrible idea. :-P

Comment 5 by dkrahn@chromium.org, Jul 13 2016

Potential performance issues were called out during the review of https://chromium-review.googlesource.com/353354. FWIW, I'm supportive of rolling back this CL while the performance issues are investigated.
Labels: -Pri-2 Pri-1
Looks like 53 might be affected as well.
#CBC-TC-RS Watchlist

Ditto for Asus Flip
Google Chrome	53.0.2785.4 (Official Build) dev (32-bit)
Revision	6dfaf57f2779c1591d26fe68632c0fd6a4457a4a-refs/branch-heads/2785@{#7}
Platform	8530.6.0 (Official Build) dev-channel veyron_minnie
ARC	3026157

Comment 9 Deleted

Comment 10 by derat@chromium.org, Jul 13 2016

(I'm not familiar with this codebase, but I uploaded a speculative change to make the syncs more targeted at https://chromium-review.googlesource.com/#/c/360132/ .)
#4,#5: do rollback c/353354, let me know if you have a test to reproduce the performance degradation so that I don't make the error again.

I thought that sync would not be a performance issue because as pointed in #4, ClearUserKeyring should be called after an existing explicit sync() [so as I understand, little performance penalty] or unmount [where I think a sync() is implied].
But thinking about it again, if a lot of disk activities happen between logins, we pain the price at MountInner() time. 

Then, I am wondering if have to call ClearUserKeyring in MountInner at all.

On ext4 crypto, a sync and dropping the cache is required for consistency and security reason before removing the key: 
- without sync, when the key is gone, ext4 would not be able to write data to the disk (it would not know how to encrypt it) 
- without dropping the cache, you could read data from encryption directory in clean even after the key has been removed, as long as someone put it in the cache previously.

Similarly, although I haven't recreated it, the flush thread from ecryptfs to ext4 would be confused if it had to write data from the ecryptfs cache to the ext4 cache without having access to the key. 

ClearUserKeyring removes all keys in root user keyring, not only the keys of the user directory cryptohome thinks is mounted.
Looking at the code, when cryptohome crashes and restarts, it tries to recreate the mount object, but I was not sure it could unmount properly stale mounts from the previous cryptohome process.

Comment 12 by derat@chromium.org, Jul 13 2016

#11: I'm able to repro a slow login on ToT lumpy by doing the following:

1. Boot the device.
2. SSH to it as root and run "dd if=/dev/zero of=/mnt/stateful_partition/foo.bin bs=1M count=512" (may need to play with the count).
3. Try to log in via the UI.
4. Look at /var/log/messages and see something like "WARNING cryptohomed[1341]: Long sync(): 17 seconds".

To repeat my earlier question: why is cryptohomed calling sync() instead of calling fsync() on whatever it actually cares about? Sure, we can revert the call that adds the sync() to ClearUserKeyring, but there are still other sync() calls in cryptohomed, and they'll still block login and logout if they're called when there's a lot of data to flush, regardless of whether the data has any relation to encrypted home directories... right?

Unmounting isn't the same as sync() since it's only going to block on the mount point being synced, right?

With https://chromium-review.googlesource.com/#/c/360132/ applied, I no longer see the login delay, but I'm also not sure whether it's still covering whatever the sync() calls were intended to cover.

Comment 13 by derat@chromium.org, Jul 29 2016

Um. So. Opinions? My change is still sitting around at https://chromium-review.googlesource.com/#/c/360132/, although it has a merge conflict now.
Cc: tnagel@chromium.org
I [vaguely] remember various bugs (some worked on by +tnagel) that were caused or exacerbated by a failure to consistently sync to disk. The sync() calls may well have been an over-reaction and I'm supportive of your change. At the same time it's difficult to predict what sort of issues we may see if the sync() calls are removed and the current cryptohome maintainers should be aware / watchful / supportive so I'll leave the +2 to them.
Project Member

Comment 15 by bugdroid1@chromium.org, Aug 11 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/40dd95d5b501a5e5dcc6bbdf4ce4be347f3f7fd8

commit 40dd95d5b501a5e5dcc6bbdf4ce4be347f3f7fd8
Author: Daniel Erat <derat@chromium.org>
Date: Tue Jul 12 17:11:03 2016

cryptohome: Avoid calling sync() when possible.

Make more-targeted fsync() calls on source and destination
mount directories instead of using the big, slow sync()
hammer.

BUG= chromium:627115 
TEST=unit tests pass; no longer see 20+ second hangs when
     logging in after syncing an updated version of chrome
     to the device

Change-Id: Ib4e4477609cd6a1c21768fe0be6b8fb1551ac23d
Reviewed-on: https://chromium-review.googlesource.com/360132
Commit-Ready: Dan Erat <derat@chromium.org>
Tested-by: Dan Erat <derat@chromium.org>
Reviewed-by: Gwendal Grignou <gwendal@chromium.org>

[modify] https://crrev.com/40dd95d5b501a5e5dcc6bbdf4ce4be347f3f7fd8/cryptohome/mount_stack.cc
[modify] https://crrev.com/40dd95d5b501a5e5dcc6bbdf4ce4be347f3f7fd8/cryptohome/mount.cc
[modify] https://crrev.com/40dd95d5b501a5e5dcc6bbdf4ce4be347f3f7fd8/cryptohome/mock_platform.h
[modify] https://crrev.com/40dd95d5b501a5e5dcc6bbdf4ce4be347f3f7fd8/cryptohome/mount.h
[modify] https://crrev.com/40dd95d5b501a5e5dcc6bbdf4ce4be347f3f7fd8/cryptohome/platform.cc
[modify] https://crrev.com/40dd95d5b501a5e5dcc6bbdf4ce4be347f3f7fd8/cryptohome/mount_stack_unittest.cc
[modify] https://crrev.com/40dd95d5b501a5e5dcc6bbdf4ce4be347f3f7fd8/cryptohome/mount_stack.h
[modify] https://crrev.com/40dd95d5b501a5e5dcc6bbdf4ce4be347f3f7fd8/cryptohome/platform.h

Comment 16 by derat@chromium.org, Aug 11 2016

Status: Fixed (was: Started)

Sign in to add a comment