NOT_FOUND error during migration without power off. |
||||||||||||||
Issue descriptionforking the thread from https://bugs.chromium.org/p/chromium/issues/detail?id=713556#c35 Error message is: listxattr: filename: No such file or directory Failed to migrate "filename" Version is 60.0.3112.10 dev which includes the recent multi-threaded migration change.
,
Jun 2 2017
Looking in to it.
,
Jun 4 2017
Haven't managed to find out anything interesting. I'm out sick today so unassigning in case someone else has a chance to look into this before I'm back.
,
Jun 5 2017
,
Jun 6 2017
+oka@, who looks to knowledeable on the GCache tmp dir, according to the blame. I think I got the repro step: (1) create an ecryptfs user (2) log-in to ecryptfs (= skip migration) (3) (from VT2) # rm -rf /home/chronos/user/GCache/v1/tmp (4) chrome://inducebrowsercrashforrealz (= trigger a browser process crash) (5) log-out (6) log-in to ecryptfs (= skip migration) (7) copy a .gdoc file from Google Drive to Downloads by using the file manager (8) log-out (9) log-in and do migration. Then I saw: 2017-06-06T15:21:56.147758+09:00 ERR cryptohomed[1528]: listxattr: /home/.shadow/f61da8200f0336418a4f2daffc506559490b03c5/temporary_mount/user/GCache/v1/tmp/8VrYUz/tmp.gdoc: No such file or directory 2017-06-06T15:21:56.148396+09:00 ERR cryptohomed[1528]: Failed to migrate "user/GCache/v1/tmp/8VrYUz/tmp.gdoc" 2017-06-06T15:21:56.401902+09:00 ERR cryptohomed[1528]: Migration Failed, aborting. 2017-06-06T15:21:57.108554+09:00 ERR cryptohomed[1528]: Failed to migrate. 2017-06-06T15:21:57.108629+09:00 ERR cryptohomed[1528]: Failed to migrate. The step 3 and 4 generates an encrypted version of user/GCache/v1/tmp: https://chromium.googlesource.com/chromium/src/+/28efe6a5918b0a0cb890b5f93c09f5d1d1216cdf/chrome/browser/chromeos/drive/drive_integration_service.cc#119 The (unwanted) encrypted directory is expected to be deleted and replaced with the pass-through one: https://chromium.git.corp.google.com/chromiumos/platform2/+/dd83e49293e8d3e079df518c9ea47ed6ef015dfb/cryptohome/mount.cc#991 but for some reason, with the steps above, both the encrypted and unencrypted one kept in the vault: /home/.shadow/f61da8200f0336418a4f2daffc506559490b03c5/vault/user/GCache/v1/tmp/8VrYUz/tmp.gdoc /home/.shadow/f61da8200f0336418a4f2daffc506559490b03c5/vault/user/GCache/v1/tmp /home/.shadow/f61da8200f0336418a4f2daffc506559490b03c5/vault/user/GCache/v1/<Encrypted> which looks to be confusing cryptohome and/or ecryptfs. The feedback log has contains the following snippet (before doing the migration), which has only single line of "Creating pass-through directories" suggesting that the above mentioned recreation has happened. 2017-06-01T11:24:01.913740-07:00 INFO cryptohomed[1253]: Mount attempt with force_dircrypto on ecryptfs. 2017-06-01T11:24:01.914074-07:00 WARNING cryptohomed[1253]: PKCS#11 initialization requested but cryptohome is not mounted. 2017-06-01T11:24:04.635390-07:00 DEBUG kernel: [ 128.270447] SELinux: initialized (dev ecryptfs, type ecryptfs), uses xattr 2017-06-01T11:24:04.640124-07:00 INFO cryptohomed[1253]: Creating pass-through directories /home/.shadow/ad44920984a00b2df6128fc3fcc97417578b469c/vault/user/GCache/v1/tmp 2017-06-01T11:24:04.665090-07:00 INFO cryptohomed[1253]: Putting a Pkcs11_Initialize on the mount thread. 2017-06-01T11:24:04.667512-07:00 INFO session_manager[852]: [INFO:policy_key.cc(53)] No policy key on disk at /home/root/ad44920984a00b2df6128fc3fcc97417578b469c/session_manager/policy/key 2017-06-01T11:24:04.668268-07:00 ERR session_manager[852]: [ERROR:policy_store.cc(34)] Could not read policy off disk at /home/root/ad44920984a00b2df6128fc3fcc97417578b469c/session_manager/policy/policy: No such file or directory 2017-06-01T11:24:04.668402-07:00 WARNING session_manager[852]: [WARNING:user_policy_service_factory.cc(88)] Failed to load user policy data, continuing anyway. 2017-06-01T11:24:04.691431-07:00 INFO chapsd[905]: Opening database in: /home/root/ad44920984a00b2df6128fc3fcc97417578b469c/chaps 2017-06-01T11:24:04.710144-07:00 INFO chapsd[905]: Slot 1 ready for token at /home/root/ad44920984a00b2df6128fc3fcc97417578b469c/chaps 2017-06-01T11:24:04.711250-07:00 INFO cryptohomed[1253]: A Pkcs11_Init event got finished. 2017-06-01T11:24:04.711399-07:00 INFO cryptohomed[1253]: PKCS#11 initialization succeeded. 2017-06-01T11:24:04.736033-07:00 INFO session_manager[852]: [INFO:session_manager_impl.cc(440)] Starting user session
,
Jun 6 2017
What's not working as intended is this code: https://chromium.git.corp.google.com/chromiumos/platform2/+/dd83e49293e8d3e079df518c9ea47ed6ef015dfb/cryptohome/mount.cc#991 const FilePath user_home(GetMountedUserHomePath(obfuscated_username)); ... const FilePath userside_dir = user_home.Append(tracked_dir); It generates duplicated "user/user" path like /home/.shadow/f61da8200f0336418a4f2daffc506559490b03c5/mount/user/user/GCache/v1 because both |user_home| and |tracked_dir| contains. Not immediately clear for me if just fixing this is ok... (does it automatically resolve the already broken dir before migration?) I have to go home now today. I'll think about it tomorrow.
,
Jun 6 2017
,
Jun 6 2017
On the broken cases I found, there can be two "user/GCache/v1/tmp" directories on ecryptfs homedir. The path is duped. One of the 'tmp' is a pass-through (filename-unencrypted) and another is not (encrypted.) Since there are two 'tmp' entries, multi-threaded migrator simultaneously runs two jobs for the 'tmp' path. Depending on the timing, faster thread deletes the file and the slower thread gets not-found error. What's not fully clear is why the dup is there. Well, there's one scenario I can provide, as the combination of the following (1) and (2): (1) https://bugs.chromium.org/p/chromium/issues/detail?id=341719#c6 I added (Mar 2014) a code to delete 'tmp' and recreate each time at sign-in. Until oka@'s change (Aug 2016) comes in, therefore, 'tmp' is re-created as an encrypted path. (2) https://bugs.chromium.org/p/chromium/issues/detail?id=729997 I guess this is a regression at https://chromium-review.googlesource.com/#/c/433481/ (Jan 2017) that makes a dup if there already is an encrypted path for the pass-through path. Before this point, the encrypted path should have been deleted and replaced with the pass-through version. So, if an user once logged in before the oka@'s change and slept 5 months, and re-started using Chrome OS after the (2) regression, the broken state arises. I'm not sure if this can explain the current percentage of failure fully.
,
Jun 6 2017
Simple fixes for Bug 729997 can prevent this to happen, but I'm not sure if any fix can recover an already broken state. If the issue is specific to this 'GCache/v1/tmp' path, just skipping this path for migration might be the easiest way.
,
Jun 7 2017
I'm not aware of any place which removes GCache/v1/tmp. HomeDirs::DeleteCacheCallback will not remove the entire directory, but its contents.
,
Jun 7 2017
,
Jun 7 2017
> Depending on the timing, faster thread deletes the file and the slower thread gets not-found error. Migrator threads don't try to delete one file from multiple threads. Why can this happen when there are duplicated directories?
,
Jun 7 2017
`ls` command enumerates two 'tmp' directory in this case, probably the same thing is happening on FileEnumerator (though I haven't checked it actually yet.)
Then, MigrateDir("user/GCache/v1/tmp") is called twice, both runs on the same directory referred by the path string (I haven't checked which directory is preferred), both can push jobs for "user/GCache/v1/tmp/foo/tmp.gdoc", and both can copy and delete the file.
,
Jun 7 2017
uekawa@, How large is the portion of this type of (GCache, NOT_FOUND) failure? If it is small enough that can be explained by the scenario #8 (i.e., an account that was not signed-in around 5 months of 2016 year-end), then we can avoid all of them by simply skipping GCache/v1/tmp from migration. (This is perfectly safe because the content of this directory is always deleted when Chrome starts up anyway.) As far as I and oka@ (#10) aware, there's no cases that causes this type of failure, but if the number of this failure is large, we're still missing something else.
,
Jun 8 2017
from a quick look, it's around 0.05%, 4 instances out of 8700
,
Jun 8 2017
OK, that's sounds small enough to be explained by #8, I think.
Then, the most simple workaround is just skip migrating GCache/v1/tmp (i.e., MigrateDir("GCache/v1/tmp") just recursively deletes the source dir if existed.)
Rationales:
* The content of this directory is never reused across a session. Having it empty is perfectly fine.
* Leaving the broken state for normal (non-migrating) users will be ok, I hope. If cache matters on which "tmp" dir is chosen ( Bug 729997 #c3 - #c4) I believe it works as intended for all the use cases of this directory. As long as either one of them is consistently chosen during tmp file creation and use, it should just work.
,
Jun 13 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/dda823df34ab0cfe5659a0892b8c4d9763a546bb commit dda823df34ab0cfe5659a0892b8c4d9763a546bb Author: Kazuhiro Inaba <kinaba@chromium.org> Date: Tue Jun 13 01:20:05 2017 cryptohome: Skip migrating GCache/v1/tmp. This path can become a weird state that confuses the migrator (see the BUG.) Chrome anyway deletes all the content of this directory, so there's no need to migrate it anyway. BUG= chromium:728892 TEST=Repro step #c5 in the bug TEST=cros_workon_make --board=reef cryptohome --test Change-Id: Id8458b3886d2218ba507130d61ea7e9ad61d9f7c Reviewed-on: https://chromium-review.googlesource.com/527984 Commit-Ready: Kazuhiro Inaba <kinaba@chromium.org> Tested-by: Kazuhiro Inaba <kinaba@chromium.org> Reviewed-by: Kazuhiro Inaba <kinaba@chromium.org> [modify] https://crrev.com/dda823df34ab0cfe5659a0892b8c4d9763a546bb/cryptohome/dircrypto_data_migrator/migration_helper.cc [modify] https://crrev.com/dda823df34ab0cfe5659a0892b8c4d9763a546bb/cryptohome/dircrypto_data_migrator/migration_helper_unittest.cc
,
Jun 14 2017
,
Jun 14 2017
This bug requires manual review: M60 has already been promoted to the beta branch, so this requires manual review Please contact the milestone owner if you have questions. Owners: amineer@(Android), cmasso@(iOS), josafat@(ChromeOS), bustamante@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jun 14 2017
+josafat The change fixes certain amount of ext4 migration failure due to broken ecryptfs state.
,
Jun 19 2017
,
Jun 20 2017
Thanks. Merged. https://chromium-review.googlesource.com/c/535216/
,
Jan 22 2018
|
||||||||||||||
►
Sign in to add a comment |
||||||||||||||
Comment 1 by uekawa@google.com
, Jun 2 2017