Timeout in csv_reader_fuzzer |
||||||
Issue descriptionDetailed report: https://clusterfuzz.com/testcase?key=6269331968884736 Fuzzer: libFuzzer_csv_reader_fuzzer Fuzz target binary: csv_reader_fuzzer Job Type: libfuzzer_chrome_asan_debug Platform Id: linux Crash Type: Timeout (exceeds 25 secs) Crash Address: Crash State: csv_reader_fuzzer Sanitizer: address (ASAN) Regressed: https://clusterfuzz.com/revisions?job=libfuzzer_chrome_asan_debug&range=622020:622025 Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=6269331968884736 Issue filed automatically. See https://github.com/google/clusterfuzz-tools for instructions to reproduce this bug locally.
,
Jan 13
Automatically assigning owner based on suspected regression changelist https://chromium.googlesource.com/external/github.com/google/re2/+/5fc41bc658e23019b90f13a9aedc6c9d7704384c (Simplify SparseArray<> significantly.). If this is incorrect, please let us know why and apply the Test-Predator-Wrong-CLs label. If you aren't the correct owner for this issue, please unassign yourself as soon as possible so it can be re-triaged.
,
Jan 13
components/password_manager/core/browser/import/csv_reader.cc which is run inside the fuzzed code does indeed use RE2 to parse. I am looking whether I can reproduce the slow-down and get rid of it by a local revert of the CL pointed out in #2.
,
Jan 13
The stack trace does indict RE2, but I don't expect that my deleting a bunch of code from SparseArray<> is really to blame. Still, there might be something subtle. For example, max_size() becoming more complex might have increased the cost of the assertions too much.
,
Jan 13
The minimised testcase is 940 KB. The unminimized testcase is 1023 KB. Ouch! How big does a testcase really need to be? BTW, if PasswordCSVReader::DeserializePasswords() took a base::StringPiece, then you wouldn't need to copy the input into a std::string.
,
Jan 13
I have not had a closer look at the test case yet, but if it represents an input which is unlikely to come from the intended use-case (importing passwords), then it would be possible to put limits on the counts of items inside the production code. I will confirm soon. As for avoiding copying of strings, please see bug 918530 and the discussion linked from there -- TL;DR: we have plans.
,
Jan 13
Also note that this is debug build, so some optimizations might be missing (and it might be OK for the code to run inefficiently).
,
Jan 13
Running this on a virtual Linux machine, with the current tip-of-the-tree Chromium and the corresponding third_party/re2, executing the fuzzer on the minimized test case took consistently about 52 seconds. When I rolled third_party/re2 just before the commit mentioned in comment #2 here, the run time dropped to 43 seconds (consistent among multiple runs). Note that this also reverted two more commits to third_party/re2: 9bbdf9b762242c81609d63bb76c306e61d9c5445 (Use PODArray<> in SparseArray<>) and 68752454406a2ca8de090e41927702288dfe101e (Avoid null PODArray<> issues in SparseSet and SparseArray<>). It seems like something did have impact in your change after all. Or if you see some bad patterns of re2 usage in components/password_manager/core/browser/import/csv_reader.cc which could be improved, please share.
,
Jan 13
Actually, for me the speed downgrade happens after the 9bbdf9b762242c81609d63bb76c306e61d9c5445 (Use PODArray<> in SparseArray<>) commit.
,
Jan 13
The auto-generated hash links are broken, so full URL for the commit from #9 here: https://code.googlesource.com/re2/+/9bbdf9b762242c81609d63bb76c306e61d9c5445
,
Jan 14
Aye, that's what I meant about max_size() becoming more complex: it used to return max_size_, but now it calls size() on dense_ and *that* calls get_deleter() on a std::unique_ptr<> and all of this is happening very frequently due to assertions.
,
Jan 14
Having looked at the grammar specified in RFC 4180 and the notes in csv_reader.h, I'm not sure that it really makes sense to use regular expressions. I mean, a handwritten CSV parser should be clear, simple and performant without too much effort, but if you were going to depend on a third-party library, surely that could just be a CSV parser rather than a regular expression engine? :)
,
Jan 14
Thanks, both for the explanation about size(), and for the good suggestion in #12. I appreciate your extra effort in looking into this and I fully agree with what you wrote in #12. Let me take care of this, given that this is not an RE2 problem as much as a design issue in the CSV parser (if you don't want to get spammed by this further, feel free to remove yourself from the Cc).
,
Jan 14
Glad to help! And to help further if need be.
,
Jan 17
(5 days ago)
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ee4f3db677a1195e7c47903a8892d3267645beb7 commit ee4f3db677a1195e7c47903a8892d3267645beb7 Author: Vaclav Brozek <vabr@chromium.org> Date: Thu Jan 17 19:05:26 2019 Remove RE2 from csv_reader.cc The CSV reader has been using regular expression to extract fields from comma-separated-value records. While allowing concise code in CSV reader, regular expressions are too big a hammer for this task. After some recent changes in RE2, the CSV reader became slower, as discussed in the attached bug. Because the regular expressions used here are compile-time constants, and the equivalent parser is simple, this CL removes the RE2 dependency in favour of a custom parser. This change caused 1-2 orders of magnitude of speedup in the csv_reader_fuzzer, when running the fuzzer on my machine in the (debug) configuration and with the input specified by the ClusterFuzz report in the associated bug: * Before the CL, running the fuzzer took about 52 seconds. * After the CL, running took about a second. In release build with DCHECKS this was 6.5 seconds (RE2) vs. half a second (this CL). Without DCHECKS this was 6.3 vs. 0.3 seconds. Note: The parser is currently written so that it allows both CR and LF as end-of-line markers. (It still needs some work to allow CRLF.) This was a blocker for avoiding a copy of the input in CSVTable::ReadCSV, because the copy was made just to convert all EOLs to LF. However, this improvement is not made here, to keep CLs focused. Bug: 921383 Change-Id: I3965e117377cdd58483abaa5ab3434540f4232e8 Reviewed-on: https://chromium-review.googlesource.com/c/1409457 Commit-Queue: Vaclav Brozek <vabr@chromium.org> Reviewed-by: Jan Wilken Dörrie <jdoerrie@chromium.org> Cr-Commit-Position: refs/heads/master@{#623781} [modify] https://crrev.com/ee4f3db677a1195e7c47903a8892d3267645beb7/components/password_manager/core/browser/import/csv_reader.cc [modify] https://crrev.com/ee4f3db677a1195e7c47903a8892d3267645beb7/components/password_manager/core/browser/import/csv_reader_unittest.cc
,
Jan 17
(5 days ago)
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4d5ba645b78440b3c2109a029411d17db5be1e18 commit 4d5ba645b78440b3c2109a029411d17db5be1e18 Author: Vaclav Brozek <vabr@chromium.org> Date: Thu Jan 17 20:10:26 2019 Refactor csv_reader_unittest.cc The file was a long collection of two types of test: successful and failing parsing of a CSV. Within each TEST macro, boilerplate was repeated. Through copy-paste errors ASSERT_FALSE were used where EXPECT_FALSE were more appropriate. This CL modifies the file so that it only has two TEST instances, one for all the positive and one for negative cases. The particular inputs and expectations are wrapped in test-case structs, where the test values are not lost among boilerplate. Copy-paste errors are limited because there is not much to copy-paste anymore when adding a new test. Bug: 921383 Change-Id: I124adaac97f1986549d4766209db6564a92126b4 Reviewed-on: https://chromium-review.googlesource.com/c/1415297 Commit-Queue: Vaclav Brozek <vabr@chromium.org> Reviewed-by: Jan Wilken Dörrie <jdoerrie@chromium.org> Cr-Commit-Position: refs/heads/master@{#623814} [modify] https://crrev.com/4d5ba645b78440b3c2109a029411d17db5be1e18/components/password_manager/core/browser/import/csv_reader_unittest.cc
,
Jan 17
(5 days ago)
,
Jan 18
(5 days ago)
,
Jan 18
(4 days ago)
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/013a33008e5b1a805c43528be30acd04811de6e5 commit 013a33008e5b1a805c43528be30acd04811de6e5 Author: Vaclav Brozek <vabr@chromium.org> Date: Fri Jan 18 08:53:35 2019 Fix typo in csv_reader.cc: maks -> marks Tricium caught a typo in https://crrev.com/c/1409457, but vabr@ was too late to address that before landing, so doing this follow-up. Bug: 921383 Change-Id: I924056cb201455d0789c995f56ea2ce37d347c8d Reviewed-on: https://chromium-review.googlesource.com/c/1418057 Auto-Submit: Vaclav Brozek <vabr@chromium.org> Commit-Queue: Jan Wilken Dörrie <jdoerrie@chromium.org> Reviewed-by: Jan Wilken Dörrie <jdoerrie@chromium.org> Cr-Commit-Position: refs/heads/master@{#624042} [modify] https://crrev.com/013a33008e5b1a805c43528be30acd04811de6e5/components/password_manager/core/browser/import/csv_reader.cc
,
Jan 18
(4 days ago)
ClusterFuzz has detected this issue as fixed in range 623772:623787. Detailed report: https://clusterfuzz.com/testcase?key=6269331968884736 Fuzzer: libFuzzer_csv_reader_fuzzer Fuzz target binary: csv_reader_fuzzer Job Type: libfuzzer_chrome_asan_debug Platform Id: linux Crash Type: Timeout (exceeds 25 secs) Crash Address: Crash State: csv_reader_fuzzer Sanitizer: address (ASAN) Regressed: https://clusterfuzz.com/revisions?job=libfuzzer_chrome_asan_debug&range=622020:622025 Fixed: https://clusterfuzz.com/revisions?job=libfuzzer_chrome_asan_debug&range=623772:623787 Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=6269331968884736 See https://github.com/google/clusterfuzz-tools for instructions to reproduce this bug locally. If you suspect that the result above is incorrect, try re-doing that job on the test case report page.
,
Jan 18
(4 days ago)
ClusterFuzz testcase 6269331968884736 is verified as fixed, so closing issue as verified. If this is incorrect, please add ClusterFuzz-Wrong label and re-open the issue. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by ClusterFuzz
, Jan 13Labels: ClusterFuzz-Auto-CC