Issue metadata
Sign in to add a comment
|
Milo reports spurious purple steps |
||||||||||||||||||||
Issue descriptionE.g.: https://ci.chromium.org/p/v8/builders/luci.v8.ci/V8%20Linux64%20ASAN/27574 On the overview page the build appears green: http://shortn/_ApSke0lKoX The underlying swarming task is completed: https://chromium-swarm.appspot.com/task?id=4048d95948831e10&refresh=10&show_raw=1&wide_logs=true Also the purple step's shards are all completed. And when going to the step's stdout (which is huge) everything looks like a successful step. Also the duration of the step seems to never end. Now it's at 16h: http://shortn/_lOzcsyjf0c On our lkgr status page, all those builds appear as unfinished, see: https://storage.cloud.google.com/chromium-v8/lkgr-status/v8-lkgr-status.html Screenshot: http://shortn/_nh2re3T9oG
,
Oct 2
,
Oct 3
Sorry, the whole foundation had onsite, which made trooper (me) not very responsive. Looking today.
,
Oct 3
Very interesting, indeed. Thanks for filing in such detail, Michael! The buildER view page is sourced purely from buildbucket, so green implies buildbucket says build finished with SUCCESS. OTH, build view uses logdog step data to display steps and their status. Step may indeed be purple even though build succeeds. But this makes lkgr view weird IF lkgr take only build status. So, maybe lkgr takes into account individual build steps, and hence noticed the purpleness?
,
Oct 4
Hm, logdog output for a step is OK (from swarming task page):
substep: <
step: <
name: "[trigger] Test262"
status: SUCCESS
stdout_stream: <
name: "steps/s__trigger__Test262/0/stdout"
>
started: <
seconds: 1538404589
nanos: 89256289
>
ended: <
seconds: 1538404591
nanos: 587541228
>
text: "Run on OS: 'Ubuntu-14.04'"
other_links: <
label: "json.output"
logdog_stream: <
name: "steps/s__trigger__Test262/0/logs/json.output/0"
>
>
other_links: <
label: "shard #0"
url: "https://chromium-swarm.appspot.com/user/task/4048e0a888c4ae10"
>
other_links: <
label: "shard #1"
url: "https://chromium-swarm.appspot.com/user/task/4048e0a9ef85da10"
>
other_links: <
label: "shard #2"
url: "https://chromium-swarm.appspot.com/user/task/4048e0ab4c7d3210"
>
other_links: <
label: "shard #3"
url: "https://chromium-swarm.appspot.com/user/task/4048e0aca4043c10"
>
other_links: <
label: "shard #4"
url: "https://chromium-swarm.appspot.com/user/task/4048e0ae31846410"
>
>
,
Oct 4
Verified that buildbucket is indeed reporting success for this build:
"build": {
"status": "COMPLETED",
"result": "SUCCESS",
}
So, it think it's logdog which hasn't finalized the state for this step, which is thus interpreted by Milo as a purple failure.
,
Oct 4
Thanks to Ryan, $ cit logdog cat logdog://logs.chromium.org/v8/buildbucket/cr-buildbucket.appspot.com/8933860088081465920/+/annotations indeed shows that logdog server isn't aware about the rest of the log.
,
Oct 4
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/config/+/17e0185edfa074051e77ae442a3d7481bb5aec8e commit 17e0185edfa074051e77ae442a3d7481bb5aec8e Author: Ryan Tseng <hinoka@google.com> Date: Thu Oct 04 00:44:59 2018
,
Oct 4
Michael, according to our investigation, the log stream is unfortunately lost for good. AFAIU, lkgr is doing OK now, so I'm removing this from trooper queue. And assigning to Ryan to deploy a fix a to logdog. |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 Deleted