New issue
Advanced search Search tips

Issue 712166 link

Starred by 4 users

Issue metadata

Status: Archived
Owner: ----
Closed: Nov 12
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocking:
issue 768851



Sign in to add a comment

Excessive disk I/O when rewriting History file

Reported by lightlyf...@gmail.com, Apr 17 2017

Issue description

Chrome Version: 58.0.3004.3 (Developer Build)

Over the past couple of months, I've begun to notice Chrome causing significant disk I/O for somewhere around 3-6 minutes. The browser remains fully responsive during this time, but the disk LED is on pretty solid. I'm on a fairly old (okay, 11 year old) machine with a spinning HDD.

Naturally I was curious as to the cause of this continuous, sustained disk I/O (and why the browser remained responsive during it :P) and did a bit of digging. I looked in htop to find the source of the I/O, and straced the couple of processes I found. I was rewarded with:

Process 646 attached
Process 695 attached
[pid   646] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid   695] ftruncate64(215, 0)         = 0
[pid   695] fdatasync(215)              = 0
[pid   695] pread64(65, "SQLite format 3\0\20\0\1\1\0@  \0\0M\32\0\0 \2"..., 4096, 0) = 4096
[pid   695] pread64(65, "\5\0\0\0\2\17\364\0\0\0\2<\17\372\17\364\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 40960) = 4096
[pid   695] pread64(65, "\5\6\246\1\200\4&\0\0\0\20\357\17\373\17\366\6\241\17\361\17\354\17\347\6\234\17\342\17\334\17\326"..., 4096, 634880) = 4096
[pid   695] pread64(65, "\r\0\0\0\17\1r\0\1r\17\26\16*\r@\fT\vl\n\256\t\321\10T\7g\5\356\4\360"..., 4096, 655360) = 4096
[pid   695] pwrite64(215, "\0\0\0\0\0\0\0\0\0\0\0\0$\251\31\221\0\0 \10\0\0\2\0\0\0\20\0\0\0\0\0"..., 512, 0) = 512
[pid   695] pwrite64(215, "\0\0\0\241", 4, 512) = 4
[pid   695] pwrite64(215, "\r\0\0\0\17\1r\0\1r\17\26\16*\r@\fT\vl\n\256\t\321\10T\7g\5\356\4\360"..., 4096, 516) = 4096
[pid   695] pwrite64(215, "$\251\37\217", 4, 4612) = 4
[pid   695] pread64(215, "", 8, 5120)   = 0
[pid   695] fdatasync(215)              = 0
[pid   695] pwrite64(215, "\331\325\5\371 \241c\327\0\0\0\1", 12, 0) = 12
[pid   695] fdatasync(215)              = 0
[pid   695] pwrite64(65, "\r\0\0\0\17\1r\0\1r\17\26\16*\r@\fT\vl\n\256\t\321\10T\7g\5\356\4\360"..., 4096, 655360) = 4096
[pid   695] fdatasync(65)               = 0
[pid   695] ftruncate64(215, 0)         = 0
[pid   695] fdatasync(215)              = 0
[pid   695] pread64(65, "SQLite format 3\0\20\0\1\1\0@  \0\0M\32\0\0 \2"..., 4096, 0) = 4096
[pid   695] pread64(65, "\5\0\0\0\2\17\364\0\0\0\2<\17\372\17\364\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 40960) = 4096
(...)

The next obvious step was to chase down the file descriptors:

$ ls /proc/695/fd/65 -l
lrwx------ 1 i336 users 64 Apr 17 19:08 /proc/695/fd/65 -> /home/i336/.config/chromium/Default/History
$ ls /proc/695/fd/215 -l
lrwx------ 1 i336 users 64 Apr 17 19:08 /proc/695/fd/215 -> /home/i336/.config/chromium/Default/History-journal

Hmm.

I did some reading about SQLite3 some time ago, and among other things I learned about how its atomicity mechanisms (described at https://sqlite.org/atomiccommit.html) worked.

In short (to quote a random paragraph),

> SQLite does a "flush" or "fsync" operation at key points. 

This means that it explicitly forces the OS to sync the data written to disk; it doesn't trust the OS-level periodically-written I/O buffers. This is primarily to mitigate power loss.

The full details of this process can be found in sections

- 3.5, "Creating A Rollback Journal File"
- 3.6, "Flushing The Rollback Journal File To Mass Storage"
- 3.10, "Flushing Changes To Mass Storage", and
- 3.11, "Deleting The Rollback Journal".

I was able to observe this process happening in real time as I ran "watch --i 0 ls -lh History-journal": the journal file size continuously jumped between 4.6K and 0 bytes.

The top of the above-linked page is informative:

> Atomic commit means that either all database changes within a single transaction occur or none of them occur.

Now, I'm not sure where I read this, but I also learned that a straightforward "bare" UPDATE or INSERT is treated like a discrete transaction.

Taking into account the amount of data my History file contains...

$ grep -o 'INSERT INTO "[^"]\+"' History.txt | uniq -c   
      4 INSERT INTO "meta"
  35854 INSERT INTO "urls"
  42735 INSERT INTO "visits"
  14980 INSERT INTO "visit_source"
   4827 INSERT INTO "keyword_search_terms"
  15367 INSERT INTO "downloads"
  15690 INSERT INTO "downloads_url_chains"
   1042 INSERT INTO "segments"
   2140 INSERT INTO "segment_usage"

...the numbers I see in the strace log I created while I observed the I/O happening only suggest one thing.

$ grep -o 'fdatasync\|ftruncate' chromium-hdd2.txt | sort | uniq -c
  31836 fdatasync
   5892 ftruncate

And that's that a significant number of writes being made to this file are being done as an independent transaction.

I assume that the  fdatasync()s are database commits and the ftruncate()s are journal manipulations, but I'm not sure how to break down the difference between them. Regardless, /ouch/.

That being said, this only runs every few days, and my pathological worst-case is that it runs for a few minutes. So what?

Well, regardless of how well my old laptop can keep up, this is kind of really inefficient. It would be nice to fix it.

Clearing the SQLite3 flag that enforces atomicity (I'm not sure which it is) is probably not the greatest way to go about it though, as I'm sure you'll agree: Chrome is "just" a consumer desktop application, not a high-assurance system, but you've already got SQLite3 working to handle unexpected power loss - which probably occurs thousands of times a day :) - so killing that would be a huge setback.

Rather, I think the (fairly obvious) solution would be to identify whatever code is causing these writes, and see if you can add more BEGINs and COMMITs than are currently being employed, if any. (That won't lose any atomicity, note.)

You may have noticed I said "History.txt", not "History", a couple of commands ago. I thought I should dump the SQL out to a text file before grepping it, figuring the export would take a bit: I'm typing this on a ThinkPad T43, with

- a single-core Pentium M (I'm using the unofficial 32-bit Chromium build I found on Launchpad)
- 2GB RAM
- a 5400RPM HDD (TIL! - I just Googled the model... I had no idea)

Well, it took all of 1.48 seconds for sqlite3 to .dump the entire schema, which it wrote to disk at 18.1MB/s according to pipeviewer.

While watching htop's I/O column, I noted that Chrome was slowly - painfully - iterating over my History file at around 100-300KB/s.

It just doesn't make sense.
 
Labels: Needs-Triage-M58
Components: Blink>Storage>WebSQL
Labels: TE-NeedsTriageHelp

Comment 3 by jsb...@chromium.org, Apr 24 2017

Components: -Blink>Storage>WebSQL UI>Browser>History
Adding a small note that this behavior appears to be being noticed by someone else as  issue 765881 , which I've tentatively suggested is a duplicate of this one.

I'm making a note of this issue because that user is pointing out that activity described this and the other issue is causing poor system performance. I don't notice performance issues myself, but that's very likely because I realize my computer is old and don't push it too much when I see the disk spinning a lot. In all fairness, if there's a ton of disk I/O - and there is, in this issue I mention above that the HDD LED is _on solid_ - then that's going to have an orthogonal effect in any case.
Blocking: 768851
May not be related, but there is a significant jump in R58 in io activity.
Espcially, 99% percentile of the instaneous IO load doubled: 

https://uma.googleplex.com/p/chrome/timeline_v2/?sid=e7743122c922d88bfd12220e0315afcb


Link to 768851 for reference
UMA.WriteShortSectors.pdf
220 KB Download
Cc: asavery@chromium.org
Project Member

Comment 7 by sheriffbot@chromium.org, Nov 12

Status: Archived (was: Unconfirmed)
Issue has not been modified or commented on in the last 365 days, please re-open or file a new bug if this is still an issue.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member

Comment 8 by bugdroid1@chromium.org, Dec 15

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/portage-stable/+/900cef8c0cb027d281077e3aa8a4bb25b645fa7d

commit 900cef8c0cb027d281077e3aa8a4bb25b645fa7d
Author: Gwendal Grignou <gwendal@chromium.org>
Date: Sat Dec 15 06:40:33 2018

iotop: upgraded package to upstream

Upgraded sys-process/iotop to version 0.6 on amd64, arm, x86
Used cros_portage_upgrade --board=eve:kevin:x86-generic --upgrade --unstable-ok
sys-process/iotop

Note:
- Needs to be installed in /usr/local (to find python libraries)
- Needs 4.14 or I/O accounting support enabled.
(CONFIG_TASKSTATS, CONFIG_TASK_DELAY_ACCT, CONFIG_TASK_IO_ACCOUNTING)
Added.

BUG= chromium:712166 
TEST=Instal on eve [move from /usr to /usr/local]

Change-Id: Icabb1342d53fd80bf92143e35361369fe1846187
Signed-off-by: Gwendal Grignou <gwendal@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/804495
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Reviewed-by: Mike Frysinger <vapier@chromium.org>

[add] https://crrev.com/900cef8c0cb027d281077e3aa8a4bb25b645fa7d/sys-process/iotop/metadata.xml
[add] https://crrev.com/900cef8c0cb027d281077e3aa8a4bb25b645fa7d/sys-process/iotop/files/iotop-0.6-setup.py3.patch
[add] https://crrev.com/900cef8c0cb027d281077e3aa8a4bb25b645fa7d/sys-process/iotop/iotop-0.6.ebuild
[add] https://crrev.com/900cef8c0cb027d281077e3aa8a4bb25b645fa7d/metadata/md5-cache/sys-process/iotop-0.6
[add] https://crrev.com/900cef8c0cb027d281077e3aa8a4bb25b645fa7d/sys-process/iotop/Manifest

Sign in to add a comment