New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 795369 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 2
Type: Bug-Regression



Sign in to add a comment

Increase in IndexedDB open timeouts in Chrome 63

Project Member Reported by ohsnapit...@google.com, Dec 15 2017

Issue description

UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.24 Safari/537.36

Steps to reproduce the problem:
In Google Docs we're seeing our number of IndexedDb database open timeouts more than double with Chrome 63 release.

The database open request has a timeout of 10 seconds.

What is the expected behavior?

What went wrong?
This issue was discovered by monitoring that is in place to track database open timeouts. Breaking down by browser version it is clear that the timeout has increased with the rollout of Chrome 63.

Did this work before? Yes 

Does this work in other browsers? No

Chrome version: 63.0.3239.84  Channel: stable
OS Version: OS X 10.12.6
Flash Version:
 

Comment 1 by gov...@chromium.org, Dec 15 2017

Cc: pbomm...@chromium.org
Labels: Needs-Triage-M63

Comment 2 by gov...@chromium.org, Dec 15 2017

Cc: abdulsyed@chromium.org jsb...@chromium.org
Components: -Blink>Storage Blink>Storage>IndexedDB
Could be related to bug 785305?

Comment 3 by jsb...@chromium.org, Dec 15 2017

Owner: dmu...@chromium.org
Status: Assigned (was: Unconfirmed)
dmurph@ - can you take a look?

Comment 4 by pwnall@chromium.org, Dec 15 2017

Cc: pwnall@chromium.org
I'm fairly sure this is not related to bug 785305 -- the comments there say that we've had that issue since M50.

Also, in case you're wondering (I was), we've re-enabled large value wrapping in https://crrev.com/c/765057 which landed in M64 (first shipped in 64.0.3268.0).

The only other project that comes to mind is index tombstone deletion. However, it should only kick in on database close, and the CLs I've reviewed for it also landed in M64.
Taking another pass through the data, this does not appear to be as significant of an increase as originally reported. Instead of doubling, we appear to see about a 40-50% increase in errors in Chrome 63.
The issue also appears to be specific to Linux and Mac. ChromeOS and Windows appear stable. Here are the following breakdowns of timeout increase from Chrome 62 to Chrome 63 over the past 5 days.

ChromeOS - 3%
Linux - 107%
Mac - 82%
Windows - 6%

Comment 7 by dmu...@chromium.org, Dec 19 2017

ok so our open times show that, from the browser side, things have not changed much:
https://uma.googleplex.com/p/chrome/timeline_v2/?sid=2e649c9eb5f59da2a6a1c41db26687a3

(and the 99-percentile time here is 180ms)

So it seems like something is happening on the renderer side before we get the open request, or in the scheduling between the browser and the renderer.

Comment 8 by dmu...@chromium.org, Dec 19 2017

I'm working on a patch to improve this performance by no re-queuing all IDB response events, which should help this a fair amount.

We're going to be refactoring this area soon (next year) as well which should help iron things out. I'll keep poking at this though.
Thanks Dan,

I figure any additional information I can provide here might help. Unfortunately we don't have anything in terms of a consistent repro case, we really just have reporting numbers. Looking at the past week (12/12 - 12/18) and we continue to see the same trend on Mac and Linux.

Overall increase:
ChromeOS - 2%
Linux - 111%
Mac - 145%
Windows - 8%

It's also worth noting while Linux and Mac seems to have increased significantly, the error percentage (error count relative to page load count) varies significantly for each platform, in both 62 and 63.

Below are the error percentage changes from 62 to 62, as well as the general usage per platform:

ChromeOS - 0.23% -> 0.23% 
20% of all loads

Linux - 0.078% -> 0.165%
1% of all loads

Mac - 0.022% -> 0.052%
17% of all loads

Windows - 0.058% -> 0.063%
61% of all loads

I did an investigation of the blink idb changes between 62 and 63, and I don't see anything that would effect open performance. Value wrapping may have been turned back on here? But that shouldn't matter at all for open.

Question: what is your timeout number? I think I remember reading 10 seconds somewhere?

This sounds like it could also be a scheduling issue, or a task starvation issue. I'm going to focus on making IDB schedule less tasks to hopefully alleviate. I'll keep poking at any differences between 62 and 63 as well.
Another question: is this till the database is opened fully? Does this account for the versionchange event? That could slow things down.
We set our 10 second timeout after calling IndexedDb.open, and the timeout is disposed when either onsuccess, onerror, onblocked or onupgradeneeded are called.
Cc: dmu...@chromium.org rjfioravanti@google.com
 Issue 777666  has been merged into this issue.
I'm merging the locking-up bug with this one, as the focus is on timeouts where we end up locking up.

I looked at umas again and it looks like there is an increase in the time for onerror's to be sent for opens (But not onblocked/onsuccess) for the backend.
https://uma.googleplex.com/p/chrome/timeline_v2/?sid=1f4a90c2e966675ce1dd6313bd4af35d

I combed through all backend changes and the diff for m62 to m63, and sadly nothing is sticking out to me. Some sweeper changes are there, and the metadata code location refactoring is in there, so I'm going to take a closer look at metadata reading error handling.

But this still doesn't really give us info for a 'locked' state, where that time wouldn't be reported anyways.


Carrying over info from other bug:
User reported a locked-up IDB, and saw lock file error message in their log. Lock file issues should pop up here:
https://cs.chromium.org/chromium/src/content/browser/indexed_db/indexed_db_backing_store.cc?l=955

And we should be sending the error back here:
https://cs.chromium.org/chromium/src/content/browser/indexed_db/indexed_db_factory_impl.cc?type=cs&l=592

Investigating error code differences for mac - it looks like we have a higher percentage of ERANGE errors on mac as well, so I'll look at possible overflows around file offsets & lengths.
https://uma.googleplex.com/p/chrome/histograms/?endDate=20171022&dayCount=14&histograms=PlatformFile.UnknownErrors.Posix&fixupData=true&showMax=true&filters=milestone%2Ceq%2C63%2Cplatform%2Ceq%2CM%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial
Cc: cmumford@chromium.org
I think I found a potential cause:

There was an introduction of the idea of a maximum # of files open with leveldb. This is here:
https://chromium-review.googlesource.com/c/chromium/src/+/654070

+cmumford I'm guessing that this could cause leveldb to perform extra slowly if there are a lot of files open elsewhere?

oooo - especially for opens, if all of the files are 'open on access', I've seen these file reads from leveldb be pretty small, especially during reading the log file and doing a compaction / recovery.
Labels: M-64 M-63 Target-64 FoundIn-63
Because this is mac/posix only (and windows has 'unlimited' file descriptors), I'm more sure that it's the above change.

I'm going to add a flag to disable it, as well as reporting on how many files we open past our limit (and then close right away).  This was we can confirm this is the issue when it happens on a dev machine.

Plan is to:
* Add metrics and reporting
* Add off-switch, which will be turned on by default in the worst case scenario
* Improve eviction policy, hopefully around our knowledge of the leveldatabase file access patterns.
Project Member

Comment 19 by bugdroid1@chromium.org, Jan 17 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/67bb8ac2a1b5d4f0d9cacecf43000dd9f666510d

commit 67bb8ac2a1b5d4f0d9cacecf43000dd9f666510d
Author: Daniel Murphy <dmurph@chromium.org>
Date: Wed Jan 17 05:24:44 2018

[LevelDatabase] Add option to bypass file handle limits.

Bug:  795369 
Change-Id: I4a4102f7094166b7ab55250a1727d60a1f69165f
Reviewed-on: https://chromium-review.googlesource.com/861341
Reviewed-by: Victor Costan <pwnall@chromium.org>
Commit-Queue: Daniel Murphy <dmurph@chromium.org>
Cr-Commit-Position: refs/heads/master@{#529622}
[modify] https://crrev.com/67bb8ac2a1b5d4f0d9cacecf43000dd9f666510d/third_party/leveldatabase/env_chromium.cc

Project Member

Comment 20 by bugdroid1@chromium.org, Feb 16 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2a9a8201b8c0b213aa41b6a390e9324113176f15

commit 2a9a8201b8c0b213aa41b6a390e9324113176f15
Author: Daniel Murphy <dmurph@chromium.org>
Date: Fri Feb 16 10:20:31 2018

[LevelDatabase] Switched file handle eviction to LRU (previously MRU)

During compaction each level is scanned at the same time sequentially.
This means there are files being read from each level at the same
time but each level is read in a sequential manner. Files are read
multiple times during scanning, first reading the end of the file then
contents inside.

Because of this behavior, most-recently-used eviction is the worst
because:
1. The multiple reads in a file (in a non-sequential manner) causes a
   ton of filesystem thrashing here.
2. Files are never re-read once they are finished being scanned in a
   level, so there is no reason to keep it the handle around.

Least-recently-used eviction more closely models the reading behavior
of a leveldb compaction, and this uses leveldb's sharded LRU cache.

File thrashing here is the suspected cause of the assigned bug, where
leveldb never opens. This is a speculative but educated guess fix.

Bug:  795369 
Change-Id: I7f4965a2a93b05ae558688e58d220b0fcf549430
Reviewed-on: https://chromium-review.googlesource.com/905575
Commit-Queue: Victor Costan <pwnall@chromium.org>
Reviewed-by: Victor Costan <pwnall@chromium.org>
Reviewed-by: Chris Mumford <cmumford@chromium.org>
Cr-Commit-Position: refs/heads/master@{#537275}
[modify] https://crrev.com/2a9a8201b8c0b213aa41b6a390e9324113176f15/third_party/leveldatabase/env_chromium.cc
[modify] https://crrev.com/2a9a8201b8c0b213aa41b6a390e9324113176f15/third_party/leveldatabase/env_chromium.h

Status: Fixed (was: Assigned)
Marking as fixed, should be fixed in 66

Sign in to add a comment