New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 675995 link

Starred by 0 users

Issue metadata

Status: WontFix
Owner:
Closed: Dec 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 1
Type: Bug

Blocked on:
issue 552391



Sign in to add a comment

Incorrect revision range given for chromium.perf build

Project Member Reported by charliea@chromium.org, Dec 20 2016

Issue description

Filing this bug after a rough time identifying the culprit responsible for https://bugs.chromium.org/p/chromium/issues/detail?id=675034.

That bug eventually tracked down the responsible CL for smoothness.sync_scroll.key_mobile_sites_smooth starting to fail. 

However, we were misled for a long time by an incorrect revision range given on the builders.

For example, look at Android Nexus 5 (https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus5%20Perf%20%283%29?numbuilds=200). The build where the benchmark failures began is clear (see smoothness_fail_start.png).

Looking at that build, it becomes obvious that we're in luck: there's only one CL in it! (see smoothness_false_lead.png).

The CL's description and touched files seemed innocuous, so I decided to launch a perf try job to ensure that the revert would work before just reverting it. I did that in https://codereview.chromium.org/2580053002/. The result: the revert had no effect. (see revert_no_effect.png)

At this point, I was bamboozled. sullivan@ suggested widening the suspected revision range and performing a bisect to identify the culprit. I did this here (https://chromeperf.appspot.com/buildbucket_job_status/8992819288306958080), which ultimately identified https://codereview.chromium.org/2572893002 as the responsible CL. That CL had commit position 438961.

Let's map that back to the bot status page: see missing_cls.png for how each of these builds maps back to a given commit position range.

Once we do this, it becomes immediately clear that CLs with commit positions 438962...439068 aren't represented in ANY build revision ranges. Or, put another way, build #4113 (https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus5%20Perf%20%283%29/builds/4113, and shown earlier in smoothness_false_lead.png) misrepresented its revision range by saying that 439069 was the only new commit in that build. Instead, there were actually 108 (!) new commits in that range.

I have no idea what might have caused this.
 
smoothness_failure_start.png
208 KB View Download
smoothness_false_lead.png
92.1 KB View Download
revert_no_effect.png
198 KB View Download
missing_cls.png
110 KB View Download
benhenry@, could you help us get this fixed to make future regression investigations smoother?
Cc: -charliea@google.com
Cc: stip@chromium.org
+stip: is this related to  bug 552391 ?

Comment 4 by aga...@chromium.org, Dec 20 2016

The biggest thing that I note is that Build 4113 was triggered from Android Compile, while all builds around it were triggered from Android Builder. Maybe there was some sort of configuration change (a recipe change?) which then got reverted, which explains why no builds were triggered for the missing revisions.

Comment 5 by benhenry@google.com, Dec 20 2016

Cc: estaab@chromium.org andyb...@chromium.org
Labels: -Pri-3 Pri-2
I think Annie's correct. It seems this is a problem with Gitiles poller blamelists not be 100% correct. If you look at build 4112, there are duplicate commits in the blamelist, and as Charlie said: 4113 has one commit in the blamelist, even though the build shows the correct git hash range.

So, I guess I don't understand enough. Aaron - does your comment mean that 4113 wasn't triggered by Gitiles Poller?

Erik/Andy - this is either platform or crossover, but it seems others have similar issues with gitiles poller. Could we find someone to look into this?

Comment 6 by benhenry@google.com, Dec 20 2016

Blockedon: 552391

Comment 7 by estaab@chromium.org, Dec 20 2016

Owner: aga...@chromium.org
Status: Assigned (was: Untriaged)
I'd like to investigate Aaron's theory about a recipes change more before we dig deeper into gitiles polling. Aaron, do you suggest just looking through build revisions in that time range?

Comment 8 by aga...@chromium.org, Dec 20 2016

Owner: ----
Status: Available (was: Assigned)
(I don't have cycles to own this right now, sorry, just posted the comment because benhenry pinged me directly.)

I don't have a firm grasp on how triggering works these days, but I believe that it can be done entirely from inside recipes. There might be a recipe change which changed how triggering worked for a short time on the 17th, and then was reverted. Similarly, there might be a change to those json config files in chromium/src, which I *think* control exactly which bots trigger which others?

Really that piece of the investigation should be assigned to someone who understands recipe-based triggering and knows all the right places to look. But yes, examining the log during that time period (and slightly before it, due to lag) is where I'd start.
benhenry@, any suggestion of who might be able to investigate this further and push it to completion? Even if there's no bandwidth for it right now, it seems like an important enough part of our infrastructure that we don't want to drop this.
Labels: -Pri-2 Pri-1
Owner: estaab@chromium.org
Status: Assigned (was: Available)
Yeah, this seems more like a P1, actually. Erik - can we get someone assigned to this?
/bump to estaab@
Cc: -andyb...@chromium.org
Owner: iannucci@chromium.org
Robbie, do you think you can investigate this? If we can understand the root cause and the likeliness of it happening again we can better determine the priority of this.

Has this happened again since December?

Comment 14 by stip@chromium.org, Feb 10 2017

Cc: -stip@chromium.org
I haven't seen it happen since December but I definitely think it's still sometimes a problem unless we've done something to fix it since.
This is going to be super difficult to track down, IMO. The behavior is random, and it's unclear how it happens. We could archive this bug.
Cc: iannucci@chromium.org
Owner: martiniss@chromium.org
Sorry this didn't get any attention :(

Martiniss is working on perf related things recently and might have some luck investigating.

I would recommend what we just ensure that this doesn't happen in milo, however, which is slated to replace the buildbot UI sometime in Q2. This bug looks like purely an artifact of the way that buildbot handles git polling, revision ranges and changelists.
Got it. If you think this is the case, I'm fine with just closing this as WontFix with that justification. It doesn't seem to happen enough that it's a huge problem.
Status: WontFix (was: Assigned)
I think blamelists have changed since this bug was filed, and since we moved to LUCI UI. WontFix-ing

Sign in to add a comment