LogDog has a delay between a step starting and log link appearing |
|||||||||
Issue descriptionhttps://screenshot.googleplex.com/JRihJtXXvx8.png https://screenshot.googleplex.com/f6n1YkefxAW.png To "reproduce": open currently running build in https://luci-scheduler-dev.appspot.com/jobs/infra/infra-continuous-trusty-64 I've also observed state where Milo doesn't display any steps at all (for several tens of seconds) in a new build.
,
Mar 15 2017
This appears to be a recipe engine issue. Here is the raw output from logdog:
substep: <
step: <
name: "infra python tests"
status: SUCCESS
stdout_stream: <
name: "steps/infra_python_tests/0/stdout"
>
started: <
seconds: 1489614314
nanos: 700000000
>
ended: <
seconds: 1489614526
nanos: 549999872
>
>
>
substep: <
step: <
name: "go third parties"
>
>
The status, stdout_stream, and started timestamps aren't annotated even though the substep exists.
,
Mar 16 2017
Dan, is this an annotee issue? I'm not sure exactly how the logdog output is generated..
,
Mar 20 2017
,
Mar 20 2017
,
Mar 20 2017
From what I can tell, you're just experiencing a bit of latency. There are three latency points here: 1) Annotee doesn't push a new protobuf on every small change. It pushes structural changes (e.g., new step) immediately and small changes (e.g., modification to existing step) on a timeout of ~30 seconds. 2) LogDog has a few second ingest lag. 3) LogDog and Milo may cache a running build log for a few seconds. I think this is WAI?
,
Mar 20 2017
What's the estimated total latency? Is it fair to say, that if a step runs for less than e.g 30 sec, there's high chance we'll see its logs only upon its completion?
,
Mar 20 2017
RE #7, ATM yes. We could update things to mark adding "stdout" and "stderr" streams as structural changes, thus pushing the annotation sooner, if that latency is too high. 30-40 seconds is probably about right given the current config.
,
Mar 20 2017
Adding a link to 'stdout' log (even if it's empty) will help. It's more similar to what Buildbot does. So even if the actual latency is still high, the perception will be closer to Buildbot's case. Also there'll be a page to open and start smashing "refresh" on (or just rely on logdog viewer's polling).
,
Mar 21 2017
The issue with #9 is that the stdout log doesn't exist until data is emitted. If data doesn't actually get emitted, we shouldn't actually reference the log. If we need that link to appear immediately, we should either: 1) Delay generating the step in the first place until data is produced on its STDOUT. I don't like this option. 2) Have the addition of STDOUT / STDERR links be considered "structural", such that a new annotation protobuf gets pushed immediately. If such a link were generated, the viewer would automatically "refresh" and pick up the log when/if it generated data. However, if the log never actually got data (i.e., nothing was sent to STDOUT/STDERR), the viewer wouldn't know this, and would just sit there saying "waiting" indefinitely. Or we could just wait 30 second *shrug*
,
Mar 21 2017
I dunno. From my "user" perspective, the existing behavior looks like a bug. Maybe other users won't think so.
,
Mar 21 2017
They have to refresh anyway for other things; this doesn't seem too out of line with that. We're just used to BuildBot supplying "stdio" by default. I'll think on this. Making "stdout" and "stderr" structural is not a big change.
,
Mar 21 2017
Actually another option would be to make steps *not* a structural change. This would reduce annotation data throughput, but introduce a general 30 second lag for all annotation pushes. We could mitigate this by reducing the timeout to 10 or 15 seconds or whatever.
,
May 15 2017
Whats the status of this bug... I think it's fixed, but I'm not sure?
,
May 15 2017
This is not fixed, there's still a delay (~30s?) between a step starting and log links appearing.
,
May 16 2018
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
May 21 2018
,
Oct 19
|
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by estaab@chromium.org
, Feb 22 2017Owner: hinoka@chromium.org
Status: Assigned (was: Untriaged)