New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 728892 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Closed: Jun 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocked on:
issue 729997

Blocking:
issue 719266
issue 724414



Sign in to add a comment

NOT_FOUND error during migration without power off.

Project Member Reported by uekawa@google.com, Jun 2 2017

Issue description

forking the thread from
https://bugs.chromium.org/p/chromium/issues/detail?id=713556#c35

Error message is:
listxattr: filename: No such file or directory
Failed to migrate "filename"


Version is 60.0.3112.10 dev which includes the recent multi-threaded migration change.







 

Comment 1 by uekawa@google.com, Jun 2 2017

Blocking: 724414
Owner: dspaid@chromium.org
Status: Started (was: Untriaged)
Looking in to it.
Status: Available (was: Started)
Haven't managed to find out anything interesting.  I'm out sick today so unassigning in case someone else has a chance to look into this before I'm back.

Comment 4 by uekawa@google.com, Jun 5 2017

Owner: ----
Cc: oka@chromium.org
+oka@, who looks to knowledeable on the GCache tmp dir, according to the blame.




I think I got the repro step:

(1) create an ecryptfs user
(2) log-in to ecryptfs (= skip migration)
(3) (from VT2) # rm -rf /home/chronos/user/GCache/v1/tmp
(4) chrome://inducebrowsercrashforrealz (= trigger a browser process crash)
(5) log-out
(6) log-in to ecryptfs (= skip migration)
(7) copy a .gdoc file from Google Drive to Downloads by using the file manager
(8) log-out
(9) log-in and do migration.

Then I saw:

2017-06-06T15:21:56.147758+09:00 ERR cryptohomed[1528]: listxattr: /home/.shadow/f61da8200f0336418a4f2daffc506559490b03c5/temporary_mount/user/GCache/v1/tmp/8VrYUz/tmp.gdoc: No such file or directory
2017-06-06T15:21:56.148396+09:00 ERR cryptohomed[1528]: Failed to migrate "user/GCache/v1/tmp/8VrYUz/tmp.gdoc"
2017-06-06T15:21:56.401902+09:00 ERR cryptohomed[1528]: Migration Failed, aborting.
2017-06-06T15:21:57.108554+09:00 ERR cryptohomed[1528]: Failed to migrate.
2017-06-06T15:21:57.108629+09:00 ERR cryptohomed[1528]: Failed to migrate.



The step 3 and 4 generates an encrypted version of user/GCache/v1/tmp:
https://chromium.googlesource.com/chromium/src/+/28efe6a5918b0a0cb890b5f93c09f5d1d1216cdf/chrome/browser/chromeos/drive/drive_integration_service.cc#119

The (unwanted) encrypted directory is expected to be deleted and replaced with the pass-through one:
https://chromium.git.corp.google.com/chromiumos/platform2/+/dd83e49293e8d3e079df518c9ea47ed6ef015dfb/cryptohome/mount.cc#991

but for some reason, with the steps above, both the encrypted and unencrypted one kept in the vault:
/home/.shadow/f61da8200f0336418a4f2daffc506559490b03c5/vault/user/GCache/v1/tmp/8VrYUz/tmp.gdoc
/home/.shadow/f61da8200f0336418a4f2daffc506559490b03c5/vault/user/GCache/v1/tmp
/home/.shadow/f61da8200f0336418a4f2daffc506559490b03c5/vault/user/GCache/v1/<Encrypted>

which looks to be confusing cryptohome and/or ecryptfs.



The feedback log has contains the following snippet (before doing the migration),
which has only single line of "Creating pass-through directories" suggesting that the above mentioned recreation has happened.

2017-06-01T11:24:01.913740-07:00 INFO cryptohomed[1253]: Mount attempt with force_dircrypto on ecryptfs.
2017-06-01T11:24:01.914074-07:00 WARNING cryptohomed[1253]: PKCS#11 initialization requested but cryptohome is not mounted.
2017-06-01T11:24:04.635390-07:00 DEBUG kernel: [  128.270447] SELinux: initialized (dev ecryptfs, type ecryptfs), uses xattr
2017-06-01T11:24:04.640124-07:00 INFO cryptohomed[1253]: Creating pass-through directories /home/.shadow/ad44920984a00b2df6128fc3fcc97417578b469c/vault/user/GCache/v1/tmp
2017-06-01T11:24:04.665090-07:00 INFO cryptohomed[1253]: Putting a Pkcs11_Initialize on the mount thread.
2017-06-01T11:24:04.667512-07:00 INFO session_manager[852]: [INFO:policy_key.cc(53)] No policy key on disk at /home/root/ad44920984a00b2df6128fc3fcc97417578b469c/session_manager/policy/key
2017-06-01T11:24:04.668268-07:00 ERR session_manager[852]: [ERROR:policy_store.cc(34)] Could not read policy off disk at /home/root/ad44920984a00b2df6128fc3fcc97417578b469c/session_manager/policy/policy: No such file or directory
2017-06-01T11:24:04.668402-07:00 WARNING session_manager[852]: [WARNING:user_policy_service_factory.cc(88)] Failed to load user policy data, continuing anyway.
2017-06-01T11:24:04.691431-07:00 INFO chapsd[905]: Opening database in: /home/root/ad44920984a00b2df6128fc3fcc97417578b469c/chaps
2017-06-01T11:24:04.710144-07:00 INFO chapsd[905]: Slot 1 ready for token at /home/root/ad44920984a00b2df6128fc3fcc97417578b469c/chaps
2017-06-01T11:24:04.711250-07:00 INFO cryptohomed[1253]: A Pkcs11_Init event got finished.
2017-06-01T11:24:04.711399-07:00 INFO cryptohomed[1253]: PKCS#11 initialization succeeded.
2017-06-01T11:24:04.736033-07:00 INFO session_manager[852]: [INFO:session_manager_impl.cc(440)] Starting user session
What's not working as intended is this code:

https://chromium.git.corp.google.com/chromiumos/platform2/+/dd83e49293e8d3e079df518c9ea47ed6ef015dfb/cryptohome/mount.cc#991
const FilePath user_home(GetMountedUserHomePath(obfuscated_username));
...
const FilePath userside_dir = user_home.Append(tracked_dir);



It generates duplicated "user/user" path like

/home/.shadow/f61da8200f0336418a4f2daffc506559490b03c5/mount/user/user/GCache/v1

because both |user_home| and |tracked_dir| contains.
Not immediately clear for me if just fixing this is ok... (does it automatically resolve the already broken dir before migration?)
I have to go home now today. I'll think about it tomorrow.
Blockedon: 729997
On the broken cases I found, there can be two "user/GCache/v1/tmp" directories on ecryptfs homedir.
The path is duped. One of the 'tmp' is a pass-through (filename-unencrypted) and another is not (encrypted.)
Since there are two 'tmp' entries, multi-threaded migrator simultaneously runs two jobs for the 'tmp' path.
Depending on the timing, faster thread deletes the file and the slower thread gets not-found error.


What's not fully clear is why the dup is there.
Well, there's one scenario I can provide, as the combination of the following (1) and (2):

(1) https://bugs.chromium.org/p/chromium/issues/detail?id=341719#c6

I added (Mar 2014) a code to delete 'tmp' and recreate each time at sign-in.
Until oka@'s change (Aug 2016) comes in, therefore, 'tmp' is re-created as an encrypted path.

(2) https://bugs.chromium.org/p/chromium/issues/detail?id=729997

I guess this is a regression at https://chromium-review.googlesource.com/#/c/433481/ (Jan 2017)
that makes a dup if there already is an encrypted path for the pass-through path.
Before this point, the encrypted path should have been deleted and replaced with the pass-through version.

So, if an user once logged in before the oka@'s change and slept 5 months, and re-started
using Chrome OS after the (2) regression, the broken state arises. I'm not sure if this can explain
the current percentage of failure fully.
Simple fixes for  Bug 729997  can prevent this to happen, but I'm not sure if any fix
can recover an already broken state.

If the issue is specific to this 'GCache/v1/tmp' path, just skipping this path for
migration might be the easiest way.

Comment 10 by oka@chromium.org, Jun 7 2017

I'm not aware of any place which removes GCache/v1/tmp.
HomeDirs::DeleteCacheCallback will not remove the entire directory, but its contents.
Owner: kinaba@chromium.org
Status: Started (was: Available)
> Depending on the timing, faster thread deletes the file and the slower thread gets not-found error.
Migrator threads don't try to delete one file from multiple threads. Why can this happen when there are duplicated directories?
`ls` command enumerates two 'tmp' directory in this case, probably the same thing is happening on FileEnumerator (though I haven't checked it actually yet.)

Then, MigrateDir("user/GCache/v1/tmp") is called twice, both runs on the same directory referred by the path string (I haven't checked which directory is preferred), both can push jobs for "user/GCache/v1/tmp/foo/tmp.gdoc", and both can copy and delete the file.
Cc: uekawa@chromium.org
uekawa@,

How large is the portion of this type of (GCache, NOT_FOUND) failure?

If it is small enough that can be explained by the scenario #8 (i.e., an account that was not signed-in around 5 months of 2016 year-end), then we can avoid all of them by simply skipping GCache/v1/tmp from migration.
(This is perfectly safe because the content of this directory is always deleted when Chrome starts up anyway.)

As far as I and oka@ (#10) aware, there's no cases that causes this type of failure,
but if the number of this failure is large, we're still missing something else.

Comment 15 by uekawa@google.com, Jun 8 2017

from a quick look, it's around 0.05%, 
4 instances out of 8700
OK, that's sounds small enough to be explained by #8, I think.

Then, the most simple workaround is just skip migrating GCache/v1/tmp (i.e., MigrateDir("GCache/v1/tmp") just recursively deletes the source dir if existed.)
Rationales:
* The content of this directory is never reused across a session. Having it empty is perfectly fine.
* Leaving the broken state for normal (non-migrating) users will be ok, I hope. If cache matters on which "tmp" dir is chosen ( Bug 729997  #c3 - #c4) I believe it works as intended for all the use cases of this directory. As long as either one of them is consistently chosen during tmp file creation and use, it should just work.
Project Member

Comment 17 by bugdroid1@chromium.org, Jun 13 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/dda823df34ab0cfe5659a0892b8c4d9763a546bb

commit dda823df34ab0cfe5659a0892b8c4d9763a546bb
Author: Kazuhiro Inaba <kinaba@chromium.org>
Date: Tue Jun 13 01:20:05 2017

cryptohome: Skip migrating GCache/v1/tmp.

This path can become a weird state that confuses the migrator (see the BUG.)
Chrome anyway deletes all the content of this directory, so there's no need
to migrate it anyway.

BUG= chromium:728892 
TEST=Repro step #c5 in the bug
TEST=cros_workon_make --board=reef cryptohome --test

Change-Id: Id8458b3886d2218ba507130d61ea7e9ad61d9f7c
Reviewed-on: https://chromium-review.googlesource.com/527984
Commit-Ready: Kazuhiro Inaba <kinaba@chromium.org>
Tested-by: Kazuhiro Inaba <kinaba@chromium.org>
Reviewed-by: Kazuhiro Inaba <kinaba@chromium.org>

[modify] https://crrev.com/dda823df34ab0cfe5659a0892b8c4d9763a546bb/cryptohome/dircrypto_data_migrator/migration_helper.cc
[modify] https://crrev.com/dda823df34ab0cfe5659a0892b8c4d9763a546bb/cryptohome/dircrypto_data_migrator/migration_helper_unittest.cc

Labels: Merge-Request-60 OS-Chrome
Project Member

Comment 19 by sheriffbot@chromium.org, Jun 14 2017

Labels: -Merge-Request-60 Hotlist-Merge-Review Merge-Review-60
This bug requires manual review: M60 has already been promoted to the beta branch, so this requires manual review
Please contact the milestone owner if you have questions.
Owners: amineer@(Android), cmasso@(iOS), josafat@(ChromeOS), bustamante@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Cc: josa...@chromium.org
+josafat

The change fixes certain amount of ext4 migration failure due to broken ecryptfs state.
Labels: -Merge-Review-60 Merge-Approved-60
Labels: -Merge-Approved-60 Merge-Merged
Status: Fixed (was: Started)
Thanks. Merged.
https://chromium-review.googlesource.com/c/535216/

Comment 23 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Sign in to add a comment