Autocomplete unrecoverable error on merge |
|||||||||
Issue description
,
Mar 21 2016
It looks like this happens while we're deleting items older than 60 days, see https://code.google.com/p/chromium/codesearch#chromium/src/components/autofill/core/browser/webdata/autofill_webdata_backend_impl.cc&sq=package:chromium&type=cs&l=474&rcl=1458315077 Still not quite sure how it's happening. It seems like we merge, at the end of which we check to delete, something gets deleted by the service that the processor is currently unaware of, and when the service tells the processor about the deletion things get upset. I've tried a bunch of testing manually with two clients by setting one of their age out windows to be very small. I've tried with various unicode characters. Haven't been able to reproduce the crash yet. I don't quite understand the threading model the the processor uses to access the directory's data.
,
Mar 21 2016
So the way the directory works is that it is accessible from multiple threads (sync thread and all model type threads), but via locks we assure exclusive access, whether that is for read or write. This is done via transactions. Part of the problem here is that this model type/thread operation is spanning multiple transactions, one to create the data to pass into MergeDataAndStartSyncing, ReadTransaction can be seen https://code.google.com/p/chromium/codesearch#chromium/src/components/sync_driver/generic_change_processor.cc&sq=package:chromium&type=cs&l=248&rcl=1458545209 . Then later in ProcessSyncChanges you can see the WriteTransaction here https://code.google.com/p/chromium/codesearch#chromium/src/components/sync_driver/generic_change_processor.cc&sq=package:chromium&type=cs&l=428&rcl=1458545209 . In this case it should be completely valid to have someone come in between these two transactions on another thread (in this case on the Sync thread) and perform modifications (in this case delete items). This is not really a threading/synchronization issue, our thread safety constructs are working perfectly. But rather more of a logical issue, where we're making an assumption that's clearly not safe to assume.
,
Mar 21 2016
We can actually see very similar crashes when sessions are performing garbage collections (deletions) as well https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27browser%27%20AND%20CrashedStackTrace.StackFrame.FunctionName%20CONTAINS%20%22ChromeReportUnrecoverableError%22%20OMIT%20RECORD%20IF%20SUM(CrashedStackTrace.StackFrame.FunctionName%20CONTAINS%20%27DoGarbageCollection%27)%20%3D%200%20OR%20SUM(CrashedStackTrace.StackFrame.FunctionName%20CONTAINS%20%27AttemptDelete%27)%20%3D%200&ignore_case=false&enable_rewrite=false&omit_field_name=&omit_field_value=&omit_field_opt=%3D
,
Mar 21 2016
So the way this is supposed to work is that if an item is deleted in the directory, it synchronously informs the datatype about that deletion (via transactions). As such, if the datatype was planning to delete that item, it would be informed the item was already deleted via ProcessSyncChanges. The datatype should never be in a situation where something changed from under it without being informed yet due to that synchronicity. No non-datatype changes to the directory should be happening while a MergeDataAndStartSyncing call is on the stack.
,
Mar 22 2016
Looked into what the data type was doing. The sync service doesn't hold data in memory. Instead when ProcessSyncChanges/Merge is called, it starts by reading from the DB. The other two mechanisms (besides sync) for affecting the data in the db are https://code.google.com/p/chromium/codesearch#chromium/src/components/autofill/core/browser/webdata/autofill_webdata_backend_impl.cc&l=270&rcl=1458646239 https://code.google.com/p/chromium/codesearch#chromium/src/components/autofill/core/browser/webdata/autofill_webdata_backend_impl.cc&l=90&cl=GROK&rcl=1458646239 They both check that they're on the DB thread, which all of AutocompleteSyncableService also is, so my race condition line of reasoning may be on the wrong track. It's interesting that the first of those two links does not notify observers.
,
Mar 22 2016
Well, I found a different crash. 1. Open firefox, create autofill data for that given form, close firefox. 2. Start up chrome, sign in with a sync account, find an open a web page that has an autofillable form. 3. Import autofill data into your chrome profile (Settings -> Import bookmarks and settings... -> Import) 4. Update the autofill data by submitting to that form in chrome. 5. Go to chrome://sync-internals and you should see that data type Autofill has the error: Error: HandleActionUpdate@../../components/sync_driver/generic_change_processor.cc:587, datatype error was encountered: Failed to load Autofill node. ActOnChanges@../../components/autofill/core/browser/webdata/autocomplete_syncable_service.cc:382, bad entry What's going on here is that we're creating autofill data without informing sync. Assuming we haven't gone through a merge (no restarting our client), then when it's updated and we try to inform sync, we tell the processor that it's an update (but the processor is expecting an add, it doesn't know about this data at all yet). The fix is likely to just notify observers after import. This crash is quite rare, we have only recorded 2 to date, see https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27browser%27%20AND%20CrashedStackTrace.StackFrame.FunctionName%20CONTAINS%20%22ChromeReportUnrecoverableError%22%20OMIT%20RECORD%20IF%20SUM(CrashedStackTrace.StackFrame.FunctionName%20CONTAINS%20%27HandleActionUpdate%27)%20%3D%200%20OR%20SUM(CrashedStackTrace.StackFrame.FunctionName%20CONTAINS%20%27ActOnChanges%27)%20%3D%200&ignore_case=false&enable_rewrite=false&omit_field_name=&omit_field_value=&omit_field_opt=%3D , potentially due to infrequently use of the import feature.
,
Mar 22 2016
,
Mar 22 2016
In addition to the ~300 stacks we have for deleting non-existing items, we also have ~70 stacks for updating items upon merge. This case seems a little bit more simple/isolated, since this is pretty much all interactions between the service and the processor. Again, despite having old clients (there's a bunch that claim M37), none of the dumps were reported before Nov 25th 2015. https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27browser%27%20AND%20CrashedStackTrace.StackFrame.FunctionName%20CONTAINS%20%22ChromeReportUnrecoverableError%22%20OMIT%20RECORD%20IF%20SUM(CrashedStackTrace.StackFrame.FunctionName%20CONTAINS%20%27MergeDataAndStartSyncing%27)%20%3D%200%20OR%20SUM(CrashedStackTrace.StackFrame.FunctionName%20CONTAINS%20%27HandleActionUpdate%27)%20%3D%200%20OR%20SUM(CrashedStackTrace.StackFrame.FunctionName%20CONTAINS%20%27AutocompleteSyncableService%27)%20%3D%200&ignore_case=true&enable_rewrite=false&omit_field_name=&omit_field_value=&omit_field_opt=#-samplereports:5,productversion:1000 Picking one of the client ids, this seems to happen multiple times for the same user. See https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27browser%27%20AND%20ClientID%3D%274f179df8-4260-4eb3-a564-fd0f94811ba5%27&ignore_case=true&enable_rewrite=false&omit_field_name=&omit_field_value=&omit_field_opt=&stbtiq=&reportid=9b2b3b1400000000&index=0#0 This update case is much more simple than the delete case. There's no significant data type logic happening. Essentially the tag [hash] generated by the device is not matching what the processor has. If some bad data someone got onto a client where the autofill key doesn't hash to the tag, then we might expect this. I'm not sure how to naturally get into this state, after lots of poking around. None of the URL escaping or hash logic has been touched in a very long time.
,
Mar 30 2016
,
Oct 4 2016
Issue 651876 has been merged into this issue.
,
Oct 4 2016
Users experienced this crash on the following builds: Mac Canary 55.0.2880.0 - 0.72 CPM, 1 reports, 1 clients (signature [Dump without crash] syncer::DataTypeErrorHandlerImpl::OnUnrecoverableError) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Oct 10 2016
Users experienced this crash on the following builds: Mac Canary 56.0.2885.0 - 0.58 CPM, 1 reports, 1 clients (signature [Dump without crash] syncer::DataTypeErrorHandlerImpl::OnUnrecoverableError) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Oct 17 2016
This looks like it's actually just a dupe of 387672. However, leaving this open since it's currently deduping auto-cuts from stack. reports.
,
Nov 20 2016
Users experienced this crash on the following builds: Mac Canary 57.0.2925.0 - 0.67 CPM, 1 reports, 1 clients (signature [Dump without crash] syncer::DataTypeErrorHandlerImpl::OnUnrecoverableError) Android Dev 56.0.2919.3 - 0.35 CPM, 8 reports, 6 clients (signature [Dump without crash] syncer::DataTypeErrorHandlerImpl::OnUnrecoverableError) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Jan 4 2017
Okay, finally was able to reproduce. Not through the UI, but if you hack up the autofill logic to set the value of an entry to contain the null terminating character, you see this behavior. Note that you need to force your std::string/base::string16 to have a length that includes the null terminating character, otherwise it will think it's the end of your input raw literal. net::EscapePath will translate "\0" to "%00". The name/value will have a "\0000" when viewed server side or in chrome://sync-internals. We aaaaalmost round-trip everything successful, but during our merge/apply in the syncable service we use a c_str() and drop the ball. https://cs.chromium.org/chromium/src/components/autofill/core/browser/webdata/autocomplete_syncable_service.cc?l=312 Not quite sure how to fix this. The wrong data is in everyone's db. Autofilling with null terminating characters is also probably not very useful.
,
Jan 23 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8f5740d76f1f7e60af5a9a0ef118e807a506c7c3 commit 8f5740d76f1f7e60af5a9a0ef118e807a506c7c3 Author: sath <sath@yandex-team.ru> Date: Mon Jan 23 10:29:53 2017 Fixed synchronization autocomplete unrecoverable error. I found the same error as described in a bug with our test accounts. And I suppose this problem can be as result not correct import autocomplete entries from the other browsers. For example the import from the firefox browser doesn't validate name and value fields in autofill entry before writing it in an autofill table so the browser can import not valid entry from other browser. Also the browser doesn't need to cut null terminating symbol in a name or a value, because the browser generates differ tag for autocomplete entry and lose link between autocomplete table and local synchronization database entries. BUG= 596170 Review-Url: https://codereview.chromium.org/2646783002 Cr-Commit-Position: refs/heads/master@{#445348} [modify] https://crrev.com/8f5740d76f1f7e60af5a9a0ef118e807a506c7c3/chrome/browser/importer/in_process_importer_bridge.cc [modify] https://crrev.com/8f5740d76f1f7e60af5a9a0ef118e807a506c7c3/components/autofill/core/browser/webdata/autocomplete_syncable_service.cc [modify] https://crrev.com/8f5740d76f1f7e60af5a9a0ef118e807a506c7c3/components/autofill/core/browser/webdata/autofill_entry.cc [modify] https://crrev.com/8f5740d76f1f7e60af5a9a0ef118e807a506c7c3/components/autofill/core/browser/webdata/autofill_entry.h [modify] https://crrev.com/8f5740d76f1f7e60af5a9a0ef118e807a506c7c3/components/autofill/core/browser/webdata/autofill_table_unittest.cc [modify] https://crrev.com/8f5740d76f1f7e60af5a9a0ef118e807a506c7c3/components/browser_sync/profile_sync_service_autofill_unittest.cc
,
Jan 17 2018
|
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by s...@chromium.org
, Mar 18 2016