ReadFileToStringWithMaxSize should be optimized
Reported by
misto...@yandex-team.ru,
Mar 13 2018
|
||||
Issue description1. Currently ReadFileToStringWithMaxSize uses constant buffer size (64kb). Function can't know actual file size and has to read file sequentially till EOF. We can use GetFileSize function as a hint for buffer size, to speed up file reading (avoid reading to small buffer and appending that buffer to string in a loop). Also many fread syscalls may lead to frequent rescheduling if there are not enough idle threads (for example on browser startup) which leads to performance degradation. 2. Currently reading loop ends when fread returns 0. The latest fread syscall can be avoided using feof function which is just a flag check. Performance results of these optimizations: https://docs.google.com/spreadsheets/d/1P8x9_k8aEKlWs2kd8ltlMWKh4r9sJo4MpWUuIJZ9l7M/edit?usp=sharing
,
Mar 15 2018
All measurements in doc are in milliseconds (forgot to add it to doc). The gain is valuable when ReadFileToString appears for large files (larger than buffer size) while all CPU threads are busy. For 5% of measurements there are notable slowdowns up to 2x times from mean (see 13+ threads, 1mb in performance results - 95 percentile). I guess many users have 2-core CPUs and situation when all threads are busy and reading occurs can easily happen for them. And this fix may save about 20-50ms on file reading for 5% of them.
,
Mar 22 2018
It sounds like you have a fix ready, do you want to create and upload a CL? I can't really comment on the proposed implementation without seeing it. Since the benefits are greatest with larger buffers, maybe test with a 2 MB buffer, or even larger? It looks like the mean is not capturing the improvement well, maybe add 50th percentile and 95th percentile numbers? Although, I'm still not sure why the mean doesn't show more movement on the 13-thread 1 MB version. Note: if one advantage of this change is that it avoids resizing the buffer, with all of the extra allocating, copying, and freeing that this implies. On Windows this maps to reduced CPU time in the Chrome process (faulting in pages and faulting them out again is expensive) and also reduced CPU time in the system process (which is responsible for zeroing recently freed pages). It would be interesting to see if this affects CPU usage more than it affects elapsed time. I would expect the fastest way to read a file to be either to memory map it or to read it in one read call. Feel free to CC me on your CL when you upload it so that I can see what you are proposing.
,
Apr 4 2018
You can find CL here: https://chromium-review.googlesource.com/c/chromium/src/+/958861
,
Apr 5 2018
,
Apr 5 2018
The CL looks reasonable, just needs some cleanup. Marking as assigned.
,
May 16 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/1ede155664606236fe8958b0a4c7e695bb378c39 commit 1ede155664606236fe8958b0a4c7e695bb378c39 Author: Mikhail Istomin <mistomin@yandex-team.ru> Date: Wed May 16 17:40:24 2018 Optimize ReadFileToStringWithMaxSize function GetFileSize can be used to estimate buffer size if file size is available. So use it as a hint for buffer size for file reading. Also avoid latest fread syscall in reading loop using feof. Bug: 821262 Change-Id: I7dc30483a2f95d4da2c0d3cb0049941d2634a281 Reviewed-on: https://chromium-review.googlesource.com/958861 Commit-Queue: Daniel Cheng <dcheng@chromium.org> Reviewed-by: Daniel Cheng <dcheng@chromium.org> Cr-Commit-Position: refs/heads/master@{#559168} [modify] https://crrev.com/1ede155664606236fe8958b0a4c7e695bb378c39/base/files/file_util.cc [modify] https://crrev.com/1ede155664606236fe8958b0a4c7e695bb378c39/base/files/file_util_unittest.cc
,
May 17 2018
If this is done then you can mark it as fixed. It looks like it's working smoothly.
,
Jun 14 2018
Marking as fixed. Please reopen if there is more work to do, with a description of the work remaining. |
||||
►
Sign in to add a comment |
||||
Comment 1 by brucedaw...@chromium.org
, Mar 14 2018