New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 921383 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
hobby only
Closed: Jan 17
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows
Pri: 1
Type: Bug



Sign in to add a comment

Timeout in csv_reader_fuzzer

Project Member Reported by ClusterFuzz, Jan 13

Issue description

Detailed report: https://clusterfuzz.com/testcase?key=6269331968884736

Fuzzer: libFuzzer_csv_reader_fuzzer
Fuzz target binary: csv_reader_fuzzer
Job Type: libfuzzer_chrome_asan_debug
Platform Id: linux

Crash Type: Timeout (exceeds 25 secs)
Crash Address: 
Crash State:
  csv_reader_fuzzer
  
Sanitizer: address (ASAN)

Regressed: https://clusterfuzz.com/revisions?job=libfuzzer_chrome_asan_debug&range=622020:622025

Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=6269331968884736

Issue filed automatically.

See https://github.com/google/clusterfuzz-tools for instructions to reproduce this bug locally.
 
Project Member

Comment 1 by ClusterFuzz, Jan 13

Cc: vabr@chromium.org
Labels: ClusterFuzz-Auto-CC
Automatically adding ccs based on OWNERS file / target commit history.

If this is incorrect, please add ClusterFuzz-Wrong label.
Project Member

Comment 2 by ClusterFuzz, Jan 13

Labels: Test-Predator-Auto-Owner
Owner: junyer@google.com
Status: Assigned (was: Untriaged)
Automatically assigning owner based on suspected regression changelist https://chromium.googlesource.com/external/github.com/google/re2/+/5fc41bc658e23019b90f13a9aedc6c9d7704384c (Simplify SparseArray<> significantly.).

If this is incorrect, please let us know why and apply the Test-Predator-Wrong-CLs label. If you aren't the correct owner for this issue, please unassign yourself as soon as possible so it can be re-triaged.
Components: UI>Browser>Passwords
components/password_manager/core/browser/import/csv_reader.cc which is run inside the fuzzed code does indeed use RE2 to parse.
I am looking whether I can reproduce the slow-down and get rid of it by a local revert of the CL pointed out in #2.
The stack trace does indict RE2, but I don't expect that my deleting a bunch of code from SparseArray<> is really to blame. Still, there might be something subtle. For example, max_size() becoming more complex might have increased the cost of the assertions too much.

The minimised testcase is 940 KB. The unminimized testcase is 1023 KB. Ouch! How big does a testcase really need to be?

BTW, if PasswordCSVReader::DeserializePasswords() took a base::StringPiece, then you wouldn't need to copy the input into a std::string.

I have not had a closer look at the test case yet, but if it represents an input which is unlikely to come from the intended use-case (importing passwords), then it would be possible to put limits on the counts of items inside the production code. I will confirm soon.

As for avoiding copying of strings, please see bug 918530 and the discussion linked from there -- TL;DR: we have plans.
Also note that this is debug build, so some optimizations might be missing (and it might be OK for the code to run inefficiently).
Running this on a virtual Linux machine, with the current tip-of-the-tree Chromium and the corresponding third_party/re2, executing the fuzzer on the minimized test case took consistently about 52 seconds. When I rolled third_party/re2 just before the commit mentioned in comment #2 here, the run time dropped to 43 seconds (consistent among multiple runs).

Note that this also reverted two more commits to third_party/re2: 9bbdf9b762242c81609d63bb76c306e61d9c5445 (Use PODArray<> in SparseArray<>) and 68752454406a2ca8de090e41927702288dfe101e (Avoid null PODArray<> issues in SparseSet and SparseArray<>).

It seems like something did have impact in your change after all. Or if you see some bad patterns of re2 usage in components/password_manager/core/browser/import/csv_reader.cc which could be improved, please share.
Actually, for me the speed downgrade happens after the 9bbdf9b762242c81609d63bb76c306e61d9c5445 (Use PODArray<> in SparseArray<>) commit.
The auto-generated hash links are broken, so full URL for the commit from #9 here: https://code.googlesource.com/re2/+/9bbdf9b762242c81609d63bb76c306e61d9c5445
Aye, that's what I meant about max_size() becoming more complex: it used to return max_size_, but now it calls size() on dense_ and *that* calls get_deleter() on a std::unique_ptr<> and all of this is happening very frequently due to assertions.

Having looked at the grammar specified in RFC 4180 and the notes in csv_reader.h, I'm not sure that it really makes sense to use regular expressions. I mean, a handwritten CSV parser should be clear, simple and performant without too much effort, but if you were going to depend on a third-party library, surely that could just be a CSV parser rather than a regular expression engine? :)

Cc: -vabr@chromium.org junyer@google.com
Owner: vabr@chromium.org
Thanks, both for the explanation about size(), and for the good suggestion in #12. I appreciate your extra effort in looking into this and I fully agree with what you wrote in #12. Let me take care of this, given that this is not an RE2 problem as much as a design issue in the CSV parser (if you don't want to get spammed by this further, feel free to remove yourself from the Cc).
Glad to help! And to help further if need be.

Project Member

Comment 15 by bugdroid1@chromium.org, Jan 17 (5 days ago)

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ee4f3db677a1195e7c47903a8892d3267645beb7

commit ee4f3db677a1195e7c47903a8892d3267645beb7
Author: Vaclav Brozek <vabr@chromium.org>
Date: Thu Jan 17 19:05:26 2019

Remove RE2 from csv_reader.cc

The CSV reader has been using regular expression to extract fields
from comma-separated-value records. While allowing concise code in CSV
reader, regular expressions are too big a hammer for this task. After
some recent changes in RE2, the CSV reader became slower, as discussed
in the attached bug.

Because the regular expressions used here are compile-time constants,
and the equivalent parser is simple, this CL removes the RE2
dependency in favour of a custom parser.

This change caused 1-2 orders of magnitude of speedup in the
csv_reader_fuzzer, when running the fuzzer on my machine in the
(debug) configuration and with the input specified by the ClusterFuzz
report in the associated bug:
* Before the CL, running the fuzzer took about 52 seconds.
* After the CL, running took about a second.
In release build with DCHECKS this was 6.5 seconds (RE2) vs. half a
second (this CL). Without DCHECKS this was 6.3 vs. 0.3 seconds.

Note: The parser is currently written so that it allows both CR and
LF as end-of-line markers. (It still needs some work to allow CRLF.)
This was a blocker for avoiding a copy of the input in
CSVTable::ReadCSV, because the copy was made just to convert all
EOLs to LF. However, this improvement is not made here, to keep
CLs focused.

Bug:  921383 
Change-Id: I3965e117377cdd58483abaa5ab3434540f4232e8
Reviewed-on: https://chromium-review.googlesource.com/c/1409457
Commit-Queue: Vaclav Brozek <vabr@chromium.org>
Reviewed-by: Jan Wilken Dörrie <jdoerrie@chromium.org>
Cr-Commit-Position: refs/heads/master@{#623781}
[modify] https://crrev.com/ee4f3db677a1195e7c47903a8892d3267645beb7/components/password_manager/core/browser/import/csv_reader.cc
[modify] https://crrev.com/ee4f3db677a1195e7c47903a8892d3267645beb7/components/password_manager/core/browser/import/csv_reader_unittest.cc

Project Member

Comment 16 by bugdroid1@chromium.org, Jan 17 (5 days ago)

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4d5ba645b78440b3c2109a029411d17db5be1e18

commit 4d5ba645b78440b3c2109a029411d17db5be1e18
Author: Vaclav Brozek <vabr@chromium.org>
Date: Thu Jan 17 20:10:26 2019

Refactor csv_reader_unittest.cc

The file was a long collection of two types of test: successful and
failing parsing of a CSV. Within each TEST macro, boilerplate was
repeated. Through copy-paste errors ASSERT_FALSE were used where
EXPECT_FALSE were more appropriate.

This CL modifies the file so that it only has two TEST instances, one
for all the positive and one for negative cases. The particular inputs
and expectations are wrapped in test-case structs, where the test
values are not lost among boilerplate. Copy-paste errors are limited
because there is not much to copy-paste anymore when adding a new
test.

Bug:  921383 
Change-Id: I124adaac97f1986549d4766209db6564a92126b4
Reviewed-on: https://chromium-review.googlesource.com/c/1415297
Commit-Queue: Vaclav Brozek <vabr@chromium.org>
Reviewed-by: Jan Wilken Dörrie <jdoerrie@chromium.org>
Cr-Commit-Position: refs/heads/master@{#623814}
[modify] https://crrev.com/4d5ba645b78440b3c2109a029411d17db5be1e18/components/password_manager/core/browser/import/csv_reader_unittest.cc

Comment 17 by vabr@chromium.org, Jan 17 (5 days ago)

Status: Fixed (was: Assigned)
r623781 above should have fixed this.
Project Member

Comment 18 by ClusterFuzz, Jan 18 (5 days ago)

Labels: OS-Windows
Project Member

Comment 19 by bugdroid1@chromium.org, Jan 18 (4 days ago)

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/013a33008e5b1a805c43528be30acd04811de6e5

commit 013a33008e5b1a805c43528be30acd04811de6e5
Author: Vaclav Brozek <vabr@chromium.org>
Date: Fri Jan 18 08:53:35 2019

Fix typo in csv_reader.cc: maks -> marks

Tricium caught a typo in https://crrev.com/c/1409457, but vabr@ was
too late to address that before landing, so doing this follow-up.

Bug:  921383 
Change-Id: I924056cb201455d0789c995f56ea2ce37d347c8d
Reviewed-on: https://chromium-review.googlesource.com/c/1418057
Auto-Submit: Vaclav Brozek <vabr@chromium.org>
Commit-Queue: Jan Wilken Dörrie <jdoerrie@chromium.org>
Reviewed-by: Jan Wilken Dörrie <jdoerrie@chromium.org>
Cr-Commit-Position: refs/heads/master@{#624042}
[modify] https://crrev.com/013a33008e5b1a805c43528be30acd04811de6e5/components/password_manager/core/browser/import/csv_reader.cc

Project Member

Comment 20 by ClusterFuzz, Jan 18 (4 days ago)

ClusterFuzz has detected this issue as fixed in range 623772:623787.

Detailed report: https://clusterfuzz.com/testcase?key=6269331968884736

Fuzzer: libFuzzer_csv_reader_fuzzer
Fuzz target binary: csv_reader_fuzzer
Job Type: libfuzzer_chrome_asan_debug
Platform Id: linux

Crash Type: Timeout (exceeds 25 secs)
Crash Address: 
Crash State:
  csv_reader_fuzzer
  
Sanitizer: address (ASAN)

Regressed: https://clusterfuzz.com/revisions?job=libfuzzer_chrome_asan_debug&range=622020:622025
Fixed: https://clusterfuzz.com/revisions?job=libfuzzer_chrome_asan_debug&range=623772:623787

Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=6269331968884736

See https://github.com/google/clusterfuzz-tools for instructions to reproduce this bug locally.

If you suspect that the result above is incorrect, try re-doing that job on the test case report page.
Project Member

Comment 21 by ClusterFuzz, Jan 18 (4 days ago)

Labels: ClusterFuzz-Verified
Status: Verified (was: Fixed)
ClusterFuzz testcase 6269331968884736 is verified as fixed, so closing issue as verified.

If this is incorrect, please add ClusterFuzz-Wrong label and re-open the issue.

Sign in to add a comment