New issue
Advanced search Search tips

Issue 763103 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

TopSitesBackend::UpdateTopSitesOnDBThread takes way too long

Project Member Reported by sky@chromium.org, Sep 7 2017

Issue description

On average it takes 934ms! YOW! I suspect part of the issue is lots of transactions.
 
Cc: -chengx@chromium.org brucedaw...@chromium.org
Owner: chengx@chromium.org
Status: Assigned (was: Untriaged)
I'd glad to take the ownership.
Project Member

Comment 2 by bugdroid1@chromium.org, Sep 12 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/efddaef85901ddb8070f3b0c5207784c65943a1a

commit efddaef85901ddb8070f3b0c5207784c65943a1a
Author: Xi Cheng <chengx@chromium.org>
Date: Tue Sep 12 00:58:50 2017

Shorten the size of the maintained top site list from 20 to 10

Currently, TopSites maintains a list of 20 top sites excluding the blacklist. Whenever there is
a change in these 20 sites, TopSites updates its related database on disk via a task called
TopSitesBackend::UpdateTopSitesOnDBThread. The task takes more than 900 ms on average and has a
very high jank rate. These evidences show that it's the most expensive operation on the DB thread.
In addition to updating DB, TopSites also notifies its observers about the change so that work is
triggered in observer classes.

However, only the top 8 sites are used by the observers of TopSites. In more details, apart from
JumpList which uses the top 5 sites, all its observers use only the top 8 sites. These observers
include InstantService, MostVistedSites, SpotlightTopSitesBridge and GlobalMenuBarX11.

These facts indicate TopSites is maintaining more sites than needed. This CL changes TopSites to
maintain only 10 top sites to trim out that waste. We prefer 10 over 8 because in that way, there
is a bit of a buffer if the user blacklists a url and won't end up with an empty thumbnail.

Bug:  763103 ,  763519 
Change-Id: Ia0355cff95c8df23335c564a492a62e0011bc6f8
Reviewed-on: https://chromium-review.googlesource.com/656399
Commit-Queue: Xi Cheng <chengx@chromium.org>
Reviewed-by: Scott Violet <sky@chromium.org>
Cr-Commit-Position: refs/heads/master@{#501131}
[modify] https://crrev.com/efddaef85901ddb8070f3b0c5207784c65943a1a/components/history/core/browser/top_sites_impl.cc
[modify] https://crrev.com/efddaef85901ddb8070f3b0c5207784c65943a1a/components/history/core/browser/top_sites_impl.h
[modify] https://crrev.com/efddaef85901ddb8070f3b0c5207784c65943a1a/components/history/core/browser/top_sites_impl_unittest.cc

Project Member

Comment 3 by bugdroid1@chromium.org, Sep 14 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f2ad20eea27a01bc6c9f20744e70c00434e84339

commit f2ad20eea27a01bc6c9f20744e70c00434e84339
Author: Xi Cheng <chengx@chromium.org>
Date: Thu Sep 14 22:56:33 2017

Update all changed top sites in DB using a single DB transaction

Previously, when TopSites updates its DB on disk, it creates a DB transaction for
each changed top site (added, deleted, moved). This creates a lot of overhead for
DB update, which partially explains why this DB update task takes almost one second
to finish on average.

This CL fixes this by packing the update work for all changed top sites in a single
DB transaction.

This CL also retires some methods that will no longer be used, and moves some methods
into the "private" section in top_sites_database.h as they are now only used by
TopSitesDatabase and its tests. A follow-up CL will be used to reorder the methods in
top_sites_database.cc file.

Bug:  763103 
Change-Id: I690d657448fd07bdd21c44aaf3cf4404a74a5453
Reviewed-on: https://chromium-review.googlesource.com/661358
Reviewed-by: Scott Violet <sky@chromium.org>
Commit-Queue: Xi Cheng <chengx@chromium.org>
Cr-Commit-Position: refs/heads/master@{#502076}
[modify] https://crrev.com/f2ad20eea27a01bc6c9f20744e70c00434e84339/components/history/core/browser/top_sites_backend.cc
[modify] https://crrev.com/f2ad20eea27a01bc6c9f20744e70c00434e84339/components/history/core/browser/top_sites_database.cc
[modify] https://crrev.com/f2ad20eea27a01bc6c9f20744e70c00434e84339/components/history/core/browser/top_sites_database.h
[modify] https://crrev.com/f2ad20eea27a01bc6c9f20744e70c00434e84339/components/history/core/browser/top_sites_database_unittest.cc

Project Member

Comment 4 by bugdroid1@chromium.org, Sep 16 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/22a60b3e9692d34e27f4f8afc3f252a23e0ab9a5

commit 22a60b3e9692d34e27f4f8afc3f252a23e0ab9a5
Author: Xi Cheng <chengx@chromium.org>
Date: Sat Sep 16 00:03:40 2017

Fix the order of declaration/definition in TopSitesDatabase

This is a follow-up CL of https://chromium-review.googlesource.com/661358.
This CL only reorders the declaration/definition in the class to make it align
with chromium code guideline. No functionality changes are involved at all.

Bug:  763103 
Change-Id: Ifd1c665b8c893c51a88f1bf03513e449fb5319b6
Reviewed-on: https://chromium-review.googlesource.com/669861
Reviewed-by: Scott Violet <sky@chromium.org>
Commit-Queue: Xi Cheng <chengx@chromium.org>
Cr-Commit-Position: refs/heads/master@{#502450}
[modify] https://crrev.com/22a60b3e9692d34e27f4f8afc3f252a23e0ab9a5/components/history/core/browser/top_sites_database.cc
[modify] https://crrev.com/22a60b3e9692d34e27f4f8afc3f252a23e0ab9a5/components/history/core/browser/top_sites_database.h

Comment 5 by chengx@chromium.org, Sep 18 2017

Status: Fixed (was: Assigned)
Due to the removal of task profiler and jank dashboard, I am not able to estimate these fixes using "jank rate" which I believe is the fairest perf indicator. 

Luckily, I found a UMA histogram called History.FirstUpdateTime that can provide some timing data. With its help, we can roughly know the progress. 

In Canary (Windows):
1) the average (arithmetic mean) runtime of this async task has dropped by ~180 ms.  
2) the runtime at percentile 50 has dropped by ~230 ms.
3) the runtime at percentile 75 has dropped by ~460 ms. 

Possible future improvement will involve SQL optimization, which I don't have plans for now. So I am going to mark this bug as "fixed".

What were the History.FirstUpdateTime histogram values before the change? That is, does 180/230/460 ms represent a huge percentage or a modest percentage?

Comment 7 by chengx@chromium.org, Sep 18 2017

Here are the before/after values.
1) average runtime 1280ms -> 1100ms, drop by 14%
2) the runtime at percentile 50: 800ms -> 570ms, drop by 29%
3) the runtime at percentile 75: 1778ms -> 1317ms, drop by 26%

For janky tasks (especially with disk IO), I think avg runtime is skewed. That's why I provide the other two metrics here.

Also, this is Canary and the metrics are solely based on Sundays. So the metrics may vary in the near future.

Comment 8 by chengx@chromium.org, Sep 18 2017

One more comment. "History.FirstUpdateTime" is actually UMA_HISTOGRAM_TIMES for this async task (see the title of this bug). I think it was designed to log the runtime of this expensive async task for its first run during a browsing session. However, due to its incorrect impl., it actually logs the runtime for all task runs. 

We can see how hard it is to tell the accurate perf gain from the UMA histograms. I think in general, we want a single metric to tell the progress (e.g. 50% done or 80% done). We had "jank rate" for this purpose, which has been unfortunately retired.

Avg run time is not a good perf indicator at all. One quick example is from my jumplist work. When I filtered out a lot of fake notifications that jumplist receives, less jumplist runs are scheduled. These filtered task runs are generally taking much less time than others. As a result, the avg runtime went up. Does this mean we are going sth wrong? No, because we filtered out a lot of redundant runs with short duration, which is a clear win. On the contrary, the jank rate went down as it took into consideration the drop the number of task runs. Similarly, I've reduced the number of top sites from 20 to 10, which might also has increased the avg run time. 

Comment 9 by chengx@chromium.org, Sep 19 2017

Re comment 8: sorry I made a mistake. "History.FirstUpdateTime" does log the runtime of this async task for its first run during a browsing session. Therefore, it can be viewed as a sample of UMA_HISTOGRAM_TIMES.

Other statements are still valid though.
It seems the fixes have hit most Canary users. The improvement is (much) more than what is mentioned in comment 7.

UMA metric: History.FirstUpdateTime,
before/after date: Sep 5 --> Sep 26,
data: 7 day aggregation

[Windows]
1) average runtime 1260ms-> 900ms, drop by 28.6%
2) the runtime at percentile 50: 750ms-> 510ms, drop by 32%
3) the runtime at percentile 75: 1790ms -> 1180ms, drop by 34%

For other platforms, data is either missing or sparse.

Comment 11 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Comment 12 by dchan@chromium.org, Jan 23 2018

Status: Fixed (was: Archived)

Sign in to add a comment