New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 700890 link

Starred by 2 users

Issue metadata

Status: Duplicate
Merged: issue 662771
Owner:
Last visit > 30 days ago
Closed: Mar 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Trybots sometimes not updating on Gerrit issue

Project Member Reported by machenb...@chromium.org, Mar 13 2017

Issue description

Seeing that once in a while, now captured it and logged a bunch of debugging data.

CL where this was observed:
https://chromium-review.googlesource.com/c/452484

Screenshot showing the problem (the trybots are all from patchset 1, but are shown for patchset 2, which had no tryjobs at the time):
https://screenshot.googleplex.com/cto8P9XvBJD.png

Screenshot after duplicating the tab (looked good immediately):
https://screenshot.googleplex.com/EAQCo3R9nBU.png

The attached screenshots show the network activity on the first tab with the problem. All times in CET.
- First screenshot shows there is no buildbucket polling going on
- Second screenshot (3 minutes later) shows buildbucket polling started again out of the blue, but returns 401s.

The 401 response was a buildbucket error claiming that an authorization token was required. The request header did specify the token. Was staring at it, but unfortunately the data got lost before I was able to copy it.
 
First - 2017-03-13 13:55:38.png
226 KB View Download
Second - 2017-03-13 13:58:34.png
269 KB View Download
So, after more discussion with Michael, we think there are two bugs:



### First, the most important one

If tryjobs for a given patchset have not been fetched yet, the UI should say that and not show potentially stale results.

The buildbucket polling seems ongoing but with weird delays between calls (for 3 minutes no call, then 2 calls withing 1 minute), but for whatever reason results from prior patchset are still shown.

Also note that the timestamp listed on buildbucket results say 1:41PM, while timestamp for the change ("Updated" left top corner) shows 1:46 PM (which is when patchset #2 was uploaded).



Now, I am very certain I have witnessed this before, also in similar conditions to Michael case here:

0. send patcshet #1 for review.
1. get email from reviewer, open reviewer's comment on patchset #1
2. fix it in my terminal

3. run git cl upload, which takes some time.
4. expect patchset #2 to be uploaded very soon, go to browser and navigate to change's main view (ie like https://chromium-review.googlesource.com/c/452484).

As 3 is still running while I'm doing 4, there are potential races, even more likely with behind the scenes data stores replicating git data asynchronously.



### Second

Buildbucket replied with 401 and complained about auth. Maybe auth token was close to expiry when first fetched from Gerrit backend and on later calls it buildbucket couldn't accept it? Or maybe rare buildbucket bug?

+nodir@ we have timestamp of bad calls (~1:55 PM UTC+1) -> can you check buildbucket logs and provide any more insight into what was actually wrong with the request auth?
Ran into it again. Same CL. Just like in Andrii's theory, I uploaded a third patchset, was impatient, clicked onto the top-left-corner issue link to update the tab, manually selected the first patchset again (the one that should have tryjobs), and saw no tryjobs. Again, no polling, and after a bit, saw 401s from buildbucket. Here the headers (bearer token removed) and response:
https://paste.googleplex.com/6207088699113472

Comment 3 by no...@chromium.org, Mar 14 2017

Cc: andyb...@chromium.org
buildbucket logs: https://screenshot.googleplex.com/wFE4Azxk5E3

This looks like an issue in Gerrit-Buildbucket integration
Owner: aga...@chromium.org
Status: Assigned (was: Untriaged)
Aaron, once you are back, can you decide what to do with this?

Comment 5 by aga...@chromium.org, Mar 23 2017

Owner: andyb...@chromium.org
I have no idea what to do with this. I can't reproduce it myself, and I've tried. And I have no earthly clue why refreshing the page at the wrong time would cause 401s. I'm well and truly lost.

Andy, you did all the buildbucket plugin auth stuff, can you take a look?
I have a reliable repro of at least one scenario:
- Take any CL that has trybots on the latest patchset, navigate to it
-> You can verify in the developer console's network tab that buildbucket is polled regularly and returns 200s.
- Wait one hour (guess until auth token expires)
-> buildbucket starts returning 401s and polling gets more and more delayed
- Optionally upload a new patchset (that contains changes) and trigger some tryjobs on cmd line, e.g. "git cl try -m tryserver.v8 -b boom" to distinguish with existing tryjobs
- On the CL, navigate by clicking on the CL number in the top left corner (don't press refresh)
-> Now the latest patchset can be seen, but still the old tryjobs
-> Buildbucket polling still returns 401s.
-> If you refresh, everything gets back to normal

I think instead of clicking on the CL number, there are several ways to navigate and see this bug, e.g. clicking on My->Changes and selecting a different change, but that's a guess...
Mergedinto: 662771
Status: Duplicate (was: Assigned)

Sign in to add a comment