New issue
Advanced search Search tips

Issue 752142 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Closed: Sep 24
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Add seed_corpus for ICU fuzz targets

Project Member Reported by mmoroz@chromium.org, Aug 3 2017

Issue description

We have 7 fuzz targets for ICU:https://cs.chromium.org/chromium/src/third_party/icu/fuzzers/BUILD.gn


They've discovered quite a few of bugs so far and most likely will discover more: https://bugs.chromium.org/p/chromium/issues/list?can=1&q=description%3Aicu_uregex_open_fuzzer%2Clibfuzzer_icu_uregex_open_fuzzer%2Cafl_icu_uregex_open_fuzzer%2Cicu_unicode_string_codepage_create_fuzzer%2Clibfuzzer_icu_unicode_string_codepage_create_fuzzer%2Cafl_icu_unicode_string_codepage_create_fuzzer%2Cicu_number_format_fuzzer%2Clibfuzzer_icu_number_format_fuzzer%2Cafl_icu_number_format_fuzzer%2Cicu_break_iterator_utf32_fuzzer%2Clibfuzzer_icu_break_iterator_utf32_fuzzer%2Cafl_icu_break_iterator_utf32_fuzzer%2Cicu_ucasemap_fuzzer%2Clibfuzzer_icu_ucasemap_fuzzer%2Cafl_icu_ucasemap_fuzzer%2Cicu_converter_fuzzer%2Clibfuzzer_icu_converter_fuzzer%2Cafl_icu_converter_fuzzer&colspec=ID+Pri+M+Stars+ReleaseBlock+Component+Status+Owner+Summary+OS+Modified&x=m&y=releaseblock&cells=ids

We've used to have testdata in the repository, and used that data as a seed corpus for fuzz targets, but 10 months ago the tests have been deleted: https://codereview.chromium.org/2435373002


May I ask you to create a directory named "corpus" or "seed_corpus" under "icu/fuzzers", and put interesting inputs inside of it? 

I've noticed that many of testdata files were in some meta format (hex encoded text with comments, e.g. https://github.com/Maluuba/icu/blob/master/source/test/testdata/test3.ucm), so it would be really great to convert those hex strings into raw data and put those files as the seed corpus.

Thanks!
 

Comment 1 by js...@chromium.org, Aug 3 2017

> We've used to have testdata in the repository, and used that data as a seed corpus for fuzz targets, but 10 months ago the tests have been deleted: https://codereview.chromium.org/2435373002


Sorry that I didn't know that fuzzer was using test/testdata. 

If necessary, I can add them back (either entire source/test or just source/test/testdata). With gerrit, it's not any more an issue to have them back (I deleted them because I kept having riedvelt issues when updating ICU). 

Would it work for fuzzer or do you still need seed_corpus/* ? 

>  https://github.com/Maluuba/icu/blob/master/source/test/testdata/test3.ucm

This is not a good example, I'm afraid. It's an encoding converter definition file. 

Returning the testdata back would be fine as well. We just need to have it in the repo and specify a valid path to the directory in seed_corpus attribute (i.e. the stuff we temporarily removed in https://chromium-review.googlesource.com/c/600488)

Would it be possible to filter out bad examples and upload only "good" ones?


Status: Archived (was: Assigned)

Sign in to add a comment