TopSitesBackend::UpdateTopSitesOnDBThread takes way too long |
||||
Issue descriptionOn average it takes 934ms! YOW! I suspect part of the issue is lots of transactions.
,
Sep 12 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/efddaef85901ddb8070f3b0c5207784c65943a1a commit efddaef85901ddb8070f3b0c5207784c65943a1a Author: Xi Cheng <chengx@chromium.org> Date: Tue Sep 12 00:58:50 2017 Shorten the size of the maintained top site list from 20 to 10 Currently, TopSites maintains a list of 20 top sites excluding the blacklist. Whenever there is a change in these 20 sites, TopSites updates its related database on disk via a task called TopSitesBackend::UpdateTopSitesOnDBThread. The task takes more than 900 ms on average and has a very high jank rate. These evidences show that it's the most expensive operation on the DB thread. In addition to updating DB, TopSites also notifies its observers about the change so that work is triggered in observer classes. However, only the top 8 sites are used by the observers of TopSites. In more details, apart from JumpList which uses the top 5 sites, all its observers use only the top 8 sites. These observers include InstantService, MostVistedSites, SpotlightTopSitesBridge and GlobalMenuBarX11. These facts indicate TopSites is maintaining more sites than needed. This CL changes TopSites to maintain only 10 top sites to trim out that waste. We prefer 10 over 8 because in that way, there is a bit of a buffer if the user blacklists a url and won't end up with an empty thumbnail. Bug: 763103 , 763519 Change-Id: Ia0355cff95c8df23335c564a492a62e0011bc6f8 Reviewed-on: https://chromium-review.googlesource.com/656399 Commit-Queue: Xi Cheng <chengx@chromium.org> Reviewed-by: Scott Violet <sky@chromium.org> Cr-Commit-Position: refs/heads/master@{#501131} [modify] https://crrev.com/efddaef85901ddb8070f3b0c5207784c65943a1a/components/history/core/browser/top_sites_impl.cc [modify] https://crrev.com/efddaef85901ddb8070f3b0c5207784c65943a1a/components/history/core/browser/top_sites_impl.h [modify] https://crrev.com/efddaef85901ddb8070f3b0c5207784c65943a1a/components/history/core/browser/top_sites_impl_unittest.cc
,
Sep 14 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f2ad20eea27a01bc6c9f20744e70c00434e84339 commit f2ad20eea27a01bc6c9f20744e70c00434e84339 Author: Xi Cheng <chengx@chromium.org> Date: Thu Sep 14 22:56:33 2017 Update all changed top sites in DB using a single DB transaction Previously, when TopSites updates its DB on disk, it creates a DB transaction for each changed top site (added, deleted, moved). This creates a lot of overhead for DB update, which partially explains why this DB update task takes almost one second to finish on average. This CL fixes this by packing the update work for all changed top sites in a single DB transaction. This CL also retires some methods that will no longer be used, and moves some methods into the "private" section in top_sites_database.h as they are now only used by TopSitesDatabase and its tests. A follow-up CL will be used to reorder the methods in top_sites_database.cc file. Bug: 763103 Change-Id: I690d657448fd07bdd21c44aaf3cf4404a74a5453 Reviewed-on: https://chromium-review.googlesource.com/661358 Reviewed-by: Scott Violet <sky@chromium.org> Commit-Queue: Xi Cheng <chengx@chromium.org> Cr-Commit-Position: refs/heads/master@{#502076} [modify] https://crrev.com/f2ad20eea27a01bc6c9f20744e70c00434e84339/components/history/core/browser/top_sites_backend.cc [modify] https://crrev.com/f2ad20eea27a01bc6c9f20744e70c00434e84339/components/history/core/browser/top_sites_database.cc [modify] https://crrev.com/f2ad20eea27a01bc6c9f20744e70c00434e84339/components/history/core/browser/top_sites_database.h [modify] https://crrev.com/f2ad20eea27a01bc6c9f20744e70c00434e84339/components/history/core/browser/top_sites_database_unittest.cc
,
Sep 16 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/22a60b3e9692d34e27f4f8afc3f252a23e0ab9a5 commit 22a60b3e9692d34e27f4f8afc3f252a23e0ab9a5 Author: Xi Cheng <chengx@chromium.org> Date: Sat Sep 16 00:03:40 2017 Fix the order of declaration/definition in TopSitesDatabase This is a follow-up CL of https://chromium-review.googlesource.com/661358. This CL only reorders the declaration/definition in the class to make it align with chromium code guideline. No functionality changes are involved at all. Bug: 763103 Change-Id: Ifd1c665b8c893c51a88f1bf03513e449fb5319b6 Reviewed-on: https://chromium-review.googlesource.com/669861 Reviewed-by: Scott Violet <sky@chromium.org> Commit-Queue: Xi Cheng <chengx@chromium.org> Cr-Commit-Position: refs/heads/master@{#502450} [modify] https://crrev.com/22a60b3e9692d34e27f4f8afc3f252a23e0ab9a5/components/history/core/browser/top_sites_database.cc [modify] https://crrev.com/22a60b3e9692d34e27f4f8afc3f252a23e0ab9a5/components/history/core/browser/top_sites_database.h
,
Sep 18 2017
Due to the removal of task profiler and jank dashboard, I am not able to estimate these fixes using "jank rate" which I believe is the fairest perf indicator. Luckily, I found a UMA histogram called History.FirstUpdateTime that can provide some timing data. With its help, we can roughly know the progress. In Canary (Windows): 1) the average (arithmetic mean) runtime of this async task has dropped by ~180 ms. 2) the runtime at percentile 50 has dropped by ~230 ms. 3) the runtime at percentile 75 has dropped by ~460 ms. Possible future improvement will involve SQL optimization, which I don't have plans for now. So I am going to mark this bug as "fixed".
,
Sep 18 2017
What were the History.FirstUpdateTime histogram values before the change? That is, does 180/230/460 ms represent a huge percentage or a modest percentage?
,
Sep 18 2017
Here are the before/after values. 1) average runtime 1280ms -> 1100ms, drop by 14% 2) the runtime at percentile 50: 800ms -> 570ms, drop by 29% 3) the runtime at percentile 75: 1778ms -> 1317ms, drop by 26% For janky tasks (especially with disk IO), I think avg runtime is skewed. That's why I provide the other two metrics here. Also, this is Canary and the metrics are solely based on Sundays. So the metrics may vary in the near future.
,
Sep 18 2017
One more comment. "History.FirstUpdateTime" is actually UMA_HISTOGRAM_TIMES for this async task (see the title of this bug). I think it was designed to log the runtime of this expensive async task for its first run during a browsing session. However, due to its incorrect impl., it actually logs the runtime for all task runs. We can see how hard it is to tell the accurate perf gain from the UMA histograms. I think in general, we want a single metric to tell the progress (e.g. 50% done or 80% done). We had "jank rate" for this purpose, which has been unfortunately retired. Avg run time is not a good perf indicator at all. One quick example is from my jumplist work. When I filtered out a lot of fake notifications that jumplist receives, less jumplist runs are scheduled. These filtered task runs are generally taking much less time than others. As a result, the avg runtime went up. Does this mean we are going sth wrong? No, because we filtered out a lot of redundant runs with short duration, which is a clear win. On the contrary, the jank rate went down as it took into consideration the drop the number of task runs. Similarly, I've reduced the number of top sites from 20 to 10, which might also has increased the avg run time.
,
Sep 19 2017
Re comment 8: sorry I made a mistake. "History.FirstUpdateTime" does log the runtime of this async task for its first run during a browsing session. Therefore, it can be viewed as a sample of UMA_HISTOGRAM_TIMES. Other statements are still valid though.
,
Sep 29 2017
It seems the fixes have hit most Canary users. The improvement is (much) more than what is mentioned in comment 7. UMA metric: History.FirstUpdateTime, before/after date: Sep 5 --> Sep 26, data: 7 day aggregation [Windows] 1) average runtime 1260ms-> 900ms, drop by 28.6% 2) the runtime at percentile 50: 750ms-> 510ms, drop by 32% 3) the runtime at percentile 75: 1790ms -> 1180ms, drop by 34% For other platforms, data is either missing or sparse.
,
Jan 22 2018
,
Jan 23 2018
|
||||
►
Sign in to add a comment |
||||
Comment 1 by chengx@chromium.org
, Sep 7 2017Owner: chengx@chromium.org
Status: Assigned (was: Untriaged)