New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 650254 link

Starred by 2 users

Issue metadata

Status: Verified
Owner:
no longer working on chrome
Closed: Oct 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Chrome , Mac
Pri: 1
Type: Bug



Sign in to add a comment

GCM.Database.Open indicates a 14% failure rate

Project Member Reported by peter@chromium.org, Sep 26 2016

Issue description

Opening the GCM Database indicates a failure rate of 14%; 20% on Windows, 10% on Mac, 8% on Linux and almost none (0.02%) on Chrome OS. The vast majority of failures fall into the "invalid argument" bucket.

In comparison, opening the GCMKeyStore succeeds in 99.9% of attempts, the Notification Database succeeds in 99.3% of attempts.

Chris, any idea what this means? A search indicates that it might indicate concurrent access to the database, but that'd presumably be an IO error?
 

Comment 1 by peter@chromium.org, Sep 26 2016

Labels: OS-Chrome OS-Mac OS-Windows

Comment 2 by peter@chromium.org, Sep 26 2016

Labels: -Pri-3 Pri-1
Would this happen when the database does not exist and |create_if_missing| is set to FALSE?

We also see 25.9% of loads failing because the database does not exist. This is a much bigger number (factor of 2.5) than the "Invalid Argument" errors, however.

Comment 3 by peter@chromium.org, Sep 26 2016

We actually check whether the *directory* exists prior to opening the database, and bail out if it doesn't and |create_if_missing| is set to FALSE.

When loading the database fails (tracked through GCM.LoadStatus) for any other reason than the database not existing, we'll "reset" (blow away and start over) the database. This happens in 10.5% (!) of all loads.

Comment 4 by joh...@chromium.org, Sep 26 2016

Locking failures do indeed seem to result in IOError instead. I looked through for the possible causes of InvalidArgument errors:

Ones that can happen at DB Open include:
- "does not exist (create_if_missing is false)",
- "exists (error_if_exists is true)" and
- "does not match existing comparator".

There are some others that don't seem to occur during Open (but I haven't checked all codepaths):
- "Invalid database key ID",
- "unknown file type" (in leveldb::DumpFile),
- "not a dump-able file type",
- "changing comparator while building table".

Finally there are also some InvalidArgument statuses constructed indirectly via DatabaseErrorToStatus, and it's hard to track those, though perhaps they are just deserializations of the ones above.

Comment 5 by joh...@chromium.org, Sep 27 2016

Peter pointed out https://github.com/google/leveldb/issues/215. Since the GCMStore only checks if the directory exists, it could indeed be unintentionally hitting the "does not exist (create_if_missing is false)" case quite frequently.

We should probably add an explicit DeleteDir call and/or check more carefully in GCMStore whether the database exists, in order to narrow down the cause of this.
Discussed offline with johnme@ and it does appear as though an empty directory could result in this error. He's going to add a check for leveldb's "CURRENT" file - in addition to checking for the existence of the leveldb directory.

Comment 7 by joh...@chromium.org, Sep 28 2016

Cc: peter@chromium.org
Owner: joh...@chromium.org
Status: Started (was: Assigned)
I believe this also explains the high rate of "Infinite store reset" errors in GCM.ResetStore (about one Infinite store reset error for every two Invalid argument errors in GCM.Database.Open):

1. DB directory exists but is empty.
2. GCMClientImpl::Start is called with StartMode DELAYED_START.
3. GCMStoreImpl::Backend::OpenStoreAndLoadData thinks the DB already exists since the directory exists, so fails to return STORE_DOES_NOT_EXIST.
4. GCMStoreImpl::Backend::OpenStoreAndLoadData tries to load DB with leveldb::Options.create_if_missing=false, which fails with kInvalidArgument "does not exist (create_if_missing is false)".
5. GCMClientImpl::OnLoadCompleted enters the !result->success block, but does not enter the result->store_does_not_exist block due to #3; instead it calls ResetStore.
6. GCMClientImpl::ResetStore destroys the store without removing its directory, then triggers another load with the same StartMode (in this case DELAYED_START).
7. That load fails for the same reason.
8. GCMClientImpl::ResetStore gets called again, sees that this is the second time it tried to reset the store, logs the GCM.ResetStore UMA, and bails out, setting state_=UNINITIALIZED, which causes any future calls to GCMClientImpl::Start to fail (until the browser is restarted at least).

Devices likely remain persistently stuck in this state, except perhaps if a call to GCMClientImpl::Start with StartMode IMMEDIATE_START wins the race condition and results in a load with leveldb::Options.create_if_missing=true (haven't checked if that's possible, or if the two DELAYED_START load attempts would always happen first).
Project Member

Comment 8 by bugdroid1@chromium.org, Sep 29 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6d6c7809c0fff72b92a8038b7d65909b7449d3da

commit 6d6c7809c0fff72b92a8038b7d65909b7449d3da
Author: johnme <johnme@chromium.org>
Date: Thu Sep 29 22:27:16 2016

GCM Store: Fix invalid argument errors

Opening the GCM leveldb database resulted in a high rate of "does not
exist (create_if_missing is false)" invalid argument errors, since the
GCM Store was incorrectly considering the database to already exist if
its directory exists, when actually the directory sometimes continues
to exist after the database is destroyed.

This is expected to significantly reduce the following UMA buckets:
- GCM.Database.Open "Invalid Argument" percent (from ~20% to < 1%)
- GCM.LoadStatus "Store open failed" percent (from ~10% to < 1%)
- GCM.ResetStore "Infinite store reset" count (from ~10% of GCM.Database.Open
  count to < 1%)

BUG= 650254 

Review-Url: https://codereview.chromium.org/2380003002
Cr-Commit-Position: refs/heads/master@{#421957}

[modify] https://crrev.com/6d6c7809c0fff72b92a8038b7d65909b7449d3da/components/gcm_driver/gcm_client_impl_unittest.cc
[modify] https://crrev.com/6d6c7809c0fff72b92a8038b7d65909b7449d3da/google_apis/gcm/engine/gcm_store_impl.cc
[modify] https://crrev.com/6d6c7809c0fff72b92a8038b7d65909b7449d3da/google_apis/gcm/engine/gcm_store_impl_unittest.cc

Comment 9 by joh...@chromium.org, Oct 26 2016

Labels: M-55
Status: Verified (was: Started)
mattgaunt reproduced this, and his GCM Store directory was not quite empty, instead it contained the following:
- a LOCK file
- a LOG file
- an Encryption directory (itself containing files called 000003.log, CURRENT, LOCK, LOG, LOG.old, and MANIFEST-000001)

Fortunately https://codereview.chromium.org/2380003002 explicitly checks for a CURRENT file (since we weren't sure if the left over directories were actually empty), so it should handle this fine (and I confirmed that unit tests still successfully reset the DB when there are LOCK and LOG files).

Looking at UMA (for Channel == Beta AND Milestone == 55):
- GCM.Database.Open "Invalid Argument" count is now tiny
- GCM.LoadStatus "Store open failed" count is now tiny
- GCM.ResetStore "Infinite store reset" count is now tiny (it's a reasonable fraction of GCM.ResetStore, but the total count for GCM.ResetStore is now tiny)

So it seems pretty clear that this was indeed the primary cause of failure, and that it has indeed been fixed. Woohoo \o/

Sign in to add a comment