New issue
Advanced search Search tips

Issue 914971 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows
Pri: 1
Type: Bug



Sign in to add a comment

Add automatic syncing of corpora

Project Member Reported by metzman@chromium.org, Dec 13

Issue description

ClusterFuzz has started running more and more jobs where targets will have substantially different codepaths between jobs. For example, a target with Windows and Linux specific code can take different code paths for a given input depending on whether the target is part of windows_libfuzzer_chrome_asan or libfuzzer_chrome_asan (Linux). 

This breaks assumptions ClusterFuzz makes about corpus pruning, since pruning is right now done by libfuzzer_chrome_asan, which will eliminate testcases it considers non-interesting, even if those test cases are interesting to other jobs.

I think this problem is relevant to targets with Windows-specific code and v8 targets in non-x64 builds (x86 for now). For non-V8 targets, code paths shouldn't depend on architecture.

A workaround to this problem is to have separate corpus buckets with separate pruning jobs. This is what we currently do for Windows libFuzzer builds. 

The problem with this workaround is it doesn't permit jobs to share any test cases, which means they don't share progress in finding obscure code paths that are the same between jobs. Instead of doing that, we should have some way of sharing test cases between the jobs, while still allowing them to prune the ones not considered interesting.

I can think of two ways of doing this. Both will depend on each job with a separate corpus bucket it prunes defining other corpus buckets they want to get test cases from.

1. Download the corpus from the other job's corpus bucket during pruning.
2. Schedule a recurring Google Cloud Storage Transfer job (https://cloud.google.com/storage-transfer/docs/create-manage-transfer-program) to transfer new files from the other job's corpus bucket.

Solution 1 is extremely simple to implement. 
Its drawbacks are wasted time during pruning for downloading corpora and maintaining this feature as well as sharing progress only during pruning instead of throughout the day. 

We are leaning towards solution 2 since it helps solve a lot of the problems we have with large-scale pollination.

 
Description: Show this description

Sign in to add a comment