New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 700603 link

Starred by 3 users

Issue metadata

Status: Archived
Owner:
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug-Regression


Participants' hotlists:
IDB-Stability


Sign in to add a comment

ConstraintError (Key already exists in the object store.) on CrOS 58.0.3029.6

Project Member Reported by ralp...@google.com, Mar 11 2017

Issue description

UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36

Steps to reproduce the problem:
1. Users who have offline enabled in Google Docs, working on ChromeOS 58.0.3029.6
2. Open an Google Docs document

What is the expected behavior?
Should work.

What went wrong?
ConstraintError (Key already exists in the object store) is thrown when trying to write to IndexeDB.

Did this work before? Yes 58.0.3027.0

Does this work in other browsers? N/A

Chrome version: 58.0.3029.6  Channel: dev
OS Version: 58.0.3029.6
Flash Version: 

We are working on a better repro case. Basically we're reading the database using ObjectStore.get(), that is returning a falsy value, we then try to write a new document using ObjectStore.add(), which fails with the constaint error above.

This issue only started appearing on Docs starting 3/8/2017 when Chrome 58.0.3029.6 started rolling out. It only appears on CrOS.
Also, the issue doesn't seem to reproduce for newly created documents, so our suspicion is that something may be going wrong with the update flow between two chrome versions.
 

Comment 1 by jsb...@chromium.org, Mar 13 2017

Cc: jbroman@chromium.org
cc jbroman@ in case we mucked up the serialization in some way.

(aside: since you can store falsy values like false, 0, '', null, or undefined, you can't distinguish with get() - but you're probably not storing those.)

Comment 2 by jsb...@chromium.org, Mar 13 2017

Components: -Blink>Storage Blink>Storage>IndexedDB
This looks to me like it was caused by the V8 version rolling back, https://chromium.googlesource.com/chromium/src/+/a8be5d4dabe8587567639e63831e3fc74ee089c2.

Which would mean the same root cause as bug 698214. Basically, the structured clone code has for some time had an assumption that the wire format version would not go backwards, but the V8 rollback for the branch did just that.

As a result, objects written by newer versions of the serialization logic are being rejected by older versions. Blink handles this, and all serialization errors, by producing null (which is indeed a falsy value).

In that bug, I thought we'd determined that only Canary would see this issue. Is CrOS dev channel unusual in this regard, as compared to the other platform dev channels?

Comment 4 by ralp...@google.com, Mar 13 2017

That theory seems to match what we're seeing.
So we do *not* expect this issue to propagate beyond the dev channel?
If so, I'd like to instruct users that reported this issue to disable and re-enable Docs offline so they are able to use their documents again.

Also, are there any mechanisms are being put in place to prevent this from happening in the future, especially in prod. What happens if we ever need to rollback in prod? That could potentially brick all Docs users that have offline enabled. Shouldn't wire format changes be backwards compatible?

Comment 5 by jsb...@chromium.org, Mar 13 2017

Changes are backwards compatible (i.e. new versions of the code can read old versions of the data); this is a case where forwards compatibility was violated due to the rollback (i.e. old versions of the code can not read new versions of the data).

We embed the wire format version in the database and fail the open if it's a "future" version rather than hitting undefined behavior as a defense against this. In this case I don't think the version number was bumped (jbroman@?).


The format is backward compatible (newer versions can read data written by older versions), but not forward compatible (old versions cannot read data written by newer versions, and in fact don't even try).

I don't know the background behind this decision; it's been this way for some (all?) time. Being perfectly forward compatible would seem impossible (there will be new kinds of values that didn't exist in old versions).

Extreme measures are, of course, possible (like requiring everything to be in the deserializer a milestone before they're emitted by the serializer), but that'd be very rough to work with.

jsbell, do you know the storage team's historical thinking about this?

The particular wire format change in this case is very small, so we could cherry-pick the change forward, or cherry-pick a change that made it accept the new version without ever writing it. I'd thought that unnecessary because it hadn't hit anyone but Canary (and Canary quickly moved back forward past the V8 change again, and so once again was able to read the data).
#5: Looking more closely, I was unaware of that copy of the wire format version (it's somewhat detached from the rest of the serialization code). It's a little more complicated now (because there can be wire format version changes on the V8 side, as well), so that mechanism should probably be updated to include a combined version.

Though if I had known about that, the failure mode would have changed (database fails to open), but there would still be an issue here, correct? Or do IDB apps recover gracefully?

Comment 8 by jsb...@chromium.org, Mar 13 2017

Web apps using IDB need to handle self-induced version skew - if you previously did indexedDB.open('db', 2) then indexedDB.open('db', 1) would error out in a similar way. There will be a different error code, but browser-induced failure here obviously isn't something developers can test for; it goes beyond specified behavior.

The other option we've considered is blowing away the database, which we do in the case of corruption.

Comment 9 by ralp...@google.com, Mar 13 2017

Yes, sorry I meant forward compatibility. Usually when we deal with storage at the app level, we push out the reading part first and only after a while do we start writing data in the new version. This way we are still able to rollback without breaking things. I don't know if that can be done in your case though.

Blowing away the database is definitely not an option we prefer, as it can potentially delete users' unsaved changes causing data loss.
Cc: jsb...@chromium.org jkummerow@chromium.org
For the immediate issue, how do people feel about cherry-picking the read half of 6543519977b2012b58a4ffef28b8527db404fbdb into M58? The patch would look roughly like:

 // Version 12: regexp and string objects share normal string encoding
 static const uint32_t kLatestVersion = 12;
+// Version 13: host objects have an explicit tag (rather than handling all
+//             unknown tags)
+static const uint32_t kLatestVersionForReading = 13;


   //  wasmWireByteLength:uint32_t, then raw data
   //  compiledDataLength:uint32_t, then raw data
   kWasmModule = 'W',
+  // The delegate is responsible for processing all following data.
+  // This "escapes" to whatever wire format the delegate chooses.
+  kHostObject = '\\',
 };


 Maybe<bool> ValueDeserializer::ReadHeader() {
   if (position_ < end_ &&
       *position_ == static_cast<uint8_t>(SerializationTag::kVersion)) {
     ReadTag().ToChecked();
-    if (!ReadVarint<uint32_t>().To(&version_) || version_ > kLatestVersion) {
+    if (!ReadVarint<uint32_t>().To(&version_) || version_ > kLatestVersionForReading) {
       isolate_->Throw(*isolate_->factory()->NewError(


     case SerializationTag::kWasmModule:
       return ReadWasmModule();
-    default:
-      // TODO(jbroman): Introduce an explicit tag for host objects to avoid
-      // having to treat every unknown tag as a potential host object.
-      position_--;
+    case SerializationTag::kHostObject:
       return ReadHostObject();
+    default:
+      // Before there was an explicit tag for host objects, all unknown tags
+      // were delegated to the host.
+      if (version_ < 13) {
+        position_--;
+        return ReadHostObject();
+      }
+      return MaybeHandle<Object>();
   }
 }


Alternatively, we could cherry-pick the whole patch (the writing side is just a bump of kLatestVersion and one line to include SerializationTag::kHostObject before each host object), which changes more but is also exactly a patch that's tested in canary/ToT, whereas I'd have to test this locally.

Regardless, if this is the issue it shouldn't end up affecting beta/stable users, but dev channel users do deserve a resolution of some kind.
I think I'd lean slightly towards the "read half" patch above to keep it defensive in dev rather than pushing out the writing change, just in case further churn happens.

Sounds like the follow-up actions are:
* Work on the plumbing for getting the serialization version(s) persisted. It's definitely gotten more complicated over time. :(
* Teach the IDB code to handle a failure here if our other defenses fail - i.e. replace the DCHECKs with handling the serialization failure, pass an error through to JS, and force-close the connection (as we do on I/O error).

Labels: Merge-Request-58
Owner: jbroman@chromium.org
Status: Assigned (was: Unconfirmed)
(Merge request note: the merge would be on the V8 side.)
Cc: hablich@chromium.org
There's already a pending merge request of those (entire!) changes for node.js: https://bugs.chromium.org/p/v8/issues/detail?id=6080

If CrOS needs it too, then maybe we should just do that?

(re #3 -- yes, it seems CrOS-Dev is unusual in that it shipped a 3027 build on Dev, which the desktop platforms only shipped on Canary.)
Cc: bhthompson@chromium.org
The last 58 Dev in ChromeOS is scheduled to go live today and I assume already built. Even if we merge back the feature/fix, it wouldn't help.

If ChromeOS does another 58 push on Dev, a merge should be ok. In that case, please merge the whole patch, not only the reading part. The whole patch is at least tested :-).

Bernie please decide regarding the merge as you know the actual ChromeOS release plan better,
Labels: -Merge-Request-58 Merge-Approved-58
We will be doing another 58 push on Dev, likely next Tuesday, today's is already built and is not a beta candidate.
Could a v8 committer please cherry-pick 6543519977b2012b58a4ffef28b8527db404fbdb for me, then? (Alas I can only cherry-pick to Chromium branches.)
Labels: -Pri-2 Pri-1
Project Member

Comment 19 by bugdroid1@chromium.org, Mar 15 2017

Labels: merge-merged-5.8
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/41e12ea699bfb7ab564606b8764b3dcd5d12244e

commit 41e12ea699bfb7ab564606b8764b3dcd5d12244e
Author: Jakob Kummerow <jkummerow@chromium.org>
Date: Wed Mar 15 12:12:55 2017

Merged: ValueSerializer: Add an explicit tag for host objects.

Revision: 6543519977b2012b58a4ffef28b8527db404fbdb

BUG= chromium:686159 , chromium:700603 , v8:6080 
LOG=N
NOTRY=true
NOPRESUBMIT=true
NOTREECHECKS=true
R=hablich@chromium.org

Review-Url: https://codereview.chromium.org/2749263002 .
Cr-Commit-Position: refs/branch-heads/5.8@{#29}
Cr-Branched-From: eda659cc5e307f20ac1ad542ba12ab32eaf4c7ef-refs/heads/5.8.283@{#1}
Cr-Branched-From: 4310cd02d2160b1457baed81a2f40063eb264a21-refs/heads/master@{#43429}

[modify] https://crrev.com/41e12ea699bfb7ab564606b8764b3dcd5d12244e/src/value-serializer.cc
[modify] https://crrev.com/41e12ea699bfb7ab564606b8764b3dcd5d12244e/test/unittests/value-serializer-unittest.cc

Comment 20 by ralp...@google.com, Mar 17 2017

Hi, what's the ETA for the fix to hit Chrome OS dev channel please?
#16: bhthompson above is the TPM, so sounds like next Tuesday.
That is correct, Tuesday is the next 58 dev target.
Project Member

Comment 23 by sheriffbot@chromium.org, Mar 20 2017

Cc: bhthompson@google.com ketakid@google.com
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible!

If all merges have been completed, please remove any remaining Merge-Approved labels from this issue.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Merge-Approved-58 merge-merged-58
Merged to 58 above (on the V8 5.8 branch).

Comment 25 by ralp...@google.com, Mar 21 2017

crbug/698214 suggests that it might have also affected stable versions? I was under the impression that this was caught in dev channel on crOs and canary on other builds. Please confirm.

Comment 26 by ralp...@google.com, Mar 22 2017

jbroman@, could you confirm which platform/versions are affected? We've received a user report from Mac 57.0.2987.110
Impacted user number is much lower than CrOs, but I just wanted to double-check.
Also, in which chrome version is the roll forward included?
The CL we believe to be responsible for this was not in M57. Assuming the diagnosis was correct, an M57 user should only observe this issue if:
- data is corrupted on disk, or
- they were previously running a greater milestone, and downgraded to 57

Comment 28 by ralp...@google.com, Mar 22 2017

Confirmed that user recently switched from beta to stable channel.
jsbell could comment further, but my understanding is that we don't support profile downgrades.

We should be behaving slightly better than this, but I accidentally broke the downgrade detection (and intend to fix it). But even if that were working correctly, I think the failure mode would be that the IDB database would fail to open (and the data would appear to be "gone" to the application).
Correct. After double-checking the code, if we detect "from the future" data on open we report it as a variant of data-loss - the open's upgradeneeded event has nonstandard dataLoss == "total" / dataLossMessage == "I/O error checking schema" properties.

I have issue 703704 filed to track handling this post-open, but we should really never get that far. Feedback welcome!

Comment 31 by ralp...@google.com, Mar 24 2017

We have verified that a user who was affected upgraded to 58.0.3029.31 and the issue was resolved for him. Thank you for fixing.

At this point, I think a postmortem would be appropriate. This issue has caused around ~4000 daily crashes affecting up to 700 users. If the original change had reached stable channel and was rolled back there, the impact would have been catastrophic. Just in Docs, 20% of prod users (users with offline enabled) would have been unable to access their documents.

I also think that we can draw useful action items from this:
1) How to better handle object deserialization in IndexedDB (fail fast on read?) [i.e. jsbell@'s crbug/703704]
2) How to avoid making changes in Chrome that can break on rollback. This is something that apps that use storage have to deal with all the time, and should be standard practice in Chrome IMO.
3) Regression tests for Chrome that verify upgrade/downgrade flows (i.e. create a database, upgrade chrome, verify that all the contents are intact and database is functional, then downgrade, and verify again).
4) How can apps like Docs be more resilient to these problems? (eg: We will probably try to catch errors during database initialization and fallback to server-only)

Comment 32 by ralp...@google.com, Apr 18 2017

Hi - Was there a postmortem or a list of takeaways from this issue? It seems to me that there is a missing best practice here that could avoid this happening again, possibly in stable where it would have caused significant damage.
Thanks for prodding - we're getting started on the PM now
Status: Fixed (was: Assigned)
Postmortem (Google-only, sorry) is at go/chromepostmortem399.

Comment 35 by ralp...@google.com, May 10 2017

Thank you!
Labels: VerifyIn-61

Comment 37 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Sign in to add a comment