New issue
Advanced search Search tips

Issue 850105 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Milo suggests build not started on transient failure to retrieve logdog stream

Project Member Reported by jbudorick@chromium.org, Jun 6 2018

Issue description

A developer brought a build (https://chromium-swarm.appspot.com/task?id=3deb1b07326cad10&refresh=10&show_raw=1&wide_logs=true) to my attention yesterday shortly after it finished. The gerrit plugin reported the build as having failed, but the build page only reported that it was unable to load the logdog stream and that the build may not have started. Clicking through to the swarming task showed the raw output (https://chromium-swarm.appspot.com/task?id=3deb1b07326cad10&refresh=10&show_raw=1&wide_logs=true), which includes the steps and which steps failed.

This appears to have transiently resolved, but it can be confusing for developers who look at their trybot failures shortly after they've happened.
 

Comment 1 by no...@chromium.org, Jun 6 2018

Blockedon: 850113
Components: -Infra>Platform>Milo Infra>Platform>Milo>LUCI
Status: Available (was: Untriaged)
perhaps it is time to stop loading steps from logdog, but instead load them from new buildbucket APIs. Filed bug 850113
How did this user get to that page?  I thought we've gotten rid of all references to the chromium-swarm page (which loads a different codepath) but that doesn't seem to be the case.

This is important because if the user loaded https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_chromium_rel_ng/63783 page instead, it would've loaded the correct build info from buildbucket, and merely said "couldn't load steps from logdog".

850113 won't help because the /task/<id> codepath doesn't touch buildbucket.
https://chromium-review.googlesource.com/c/chromium/src/+/1087927/1 -> https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_chromium_rel_ng/63783 -> "Source: Task ..."

... though the user in question didn't independently go to that page; I suggested it.

I believe that what you suggest in #2 didn't in fact happen; the user did load that page, and it was unable to load anything.
Ah I see, in that case I have no idea what's going on.  The behavior i'd expect to see is this:
https://screenshot.googleplex.com/HWGrFKwniAQ

What you're describing doesn't match up to what I'd expected.  A screenshot would've been helpful.

Comment 5 by no...@chromium.org, Jun 7 2018

Blockedon: -850113
for swarming tasks that do have a proper buildbucket build id, should we redirect to the canonical build page? then we would avoid such problems and the build status would be accurate.

Sign in to add a comment