[📍] Experiment to understand story dependence |
|
Issue descriptionBackground Currently, a perf bisect only runs a single story from a benchmark. When Pinpoint fails to reproduce a regression, we often find that re-running the bisect with the full benchmark causes the regression to reproduce. We also note that for many no-repro cases, the numbers reported by Pinpoint are often very different from what we see on the perf dashboard (i.e. precise but not accurate). We know that this is not just due to hardware differences, because we parallelize bisects across many devices. Proposal Run an experiment to answer: * What proportion of no-repros would be reproducible by running more of the benchmark? * How much more of the benchmark would we need to run? Just the preceding story? 10 stories? The entire benchmark? All the preceding benchmarks? Experimental design 1. Select a random sample of N=100 completed no-repro Pinpoint bisects from the past week. 2. For each bisect, investigate to see if the reference build shifted or the bisect range is off, or there's another explanation for the no-repro. If so, skip the bisect and move on to the next one. 3. Get the bisect's story index. We can pass `--story-shard-begin-index` and `--story-shard-end-index` to Telemetry to filter for a specific contiguous set of stories. 4. Re-run the bisect, including the preceding 10 stories. 5a. If the bisect repros, try again with the preceding 5 stories, then the preceding 1 story. 6b. If the bisect doesn't repro, try again with the preceding 50 stories, then the full benchmark (if hardware capacity allows).
,
Sep 21
Is this related to cache state or something? Should we clear the cache between each story?
,
Sep 21
We already do clear lots of caches and state between story runs (at least in most benchmarks). It looks like, in some cases, some things may not be getting cleared and still leaking from one run to the next. But we have no idea what's that. And that's what Dave's work will help us find out.
,
Sep 21
Makes sense. SGTM! |
|
►
Sign in to add a comment |
|
Comment 1 by perezju@chromium.org
, Sep 21