[📍] Better handling of failures in performance bisects |
||
Issue descriptionhttps://bugs.chromium.org/p/chromium/issues/detail?id=881063#c7 In the above bug, the compile stopped failing at commit af3c256. Therefore, Pinpoint reported it as a "difference". This is confusing to users. Under the hood, Pinpoint has so far treated functional and performance bisects the same. We want to change the way performance bisects display failures: 1. If the perf values immediately before and after the failing range are not significantly different, ignore the failure completely. 2. If the perf values immediately before and after the failing range are significantly different, say "there was a regression in this range." 3. If there are no perf values before or after the failing range, say "there could be a regression in this range, but we don't know."
,
Sep 11
+1 it was always difficult to try to pull that info on where exactly things went wrong in the legacy bisect script, we tried to piece it together after the fact on the dashboard based on the bisect's raw output. I'd say the range output is definitely a step in the right direction, when we did that for recipe bisect it did help to clear up confusion as to what happened (ignoring there was still confusion as to why it wasn't able to get further into the range).
,
Sep 12
The model is a little different from legacy bisect, since legacy bisect was designed to run linearly and find exactly one regression (no more, no less), and therefore every kind of error became a top-level error.
At the top-level, Pinpoint has only 2 possible results: completed and failed. "Failed" means Pinpoint died. The bug comment will say "(sad cat face) Pinpoint stopped with an error".
If completed, the comment will say "Found {1 or more} changes" or "Couldn't reproduce a change" if there are none. It will list the changes, which can take one of 3 formats:
1.
metric_name changed from 80 units -> 120 units (+40 units)
Commit name by commit author
https://commit.url
2.
metric_name changed from 80 units -> 120 units (+40 units)
The Build failed from r123400 to r123500, so Pinpoint was unable to narrow it down.
3. In this case, Pinpoint will say "Pinpoint was unsure about {0 or more} changes."
There may or may not have been a change.
The Test failed from r123400 to r123500, so Pinpoint was unable to narrow it down.
,
Dec 10
Issue 909626 has been merged into this issue. |
||
►
Sign in to add a comment |
||
Comment 1 by perezju@chromium.org
, Sep 11