New issue
Advanced search Search tips

Issue 767026 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner: ----
Closed: Nov 22
Cc:
Components:
EstimatedDays: ----
NextAction: 2017-10-15
OS: ----
Pri: 1
Type: Bug-Regression

Blocking:
issue 731095



Sign in to add a comment

buildbot stdout stream for swarming collection doesn't contain task stdout

Project Member Reported by perezju@chromium.org, Sep 20 2017

Issue description

As far as I can see the "stdout" logdog link for test runs are missing their actual output.

For example:
https://luci-milo.appspot.com/buildbot/chromium.perf/Mac%20Retina%20Perf/1160

system_health.common_desktop with one test failure points to stdout on logdog:
https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Fchromium.perf%2FMac_Retina_Perf%2F1160%2F%2B%2Frecipes%2Fsteps%2Fsystem_health.common_desktop_on_Intel_GPU_on_Mac_on_Mac-10.11%2F0%2Fstdout

WARNING:root:task_output_dir '/var/folders/p4/py6nlvcx4fs5s3fg51_pzqmc0000gm/T/tmpXdbzjO' already exists!
WARNING:root:task_output_dir existing content: []
Waiting for results from the following shards: 0
Waiting for results from the following shards: 0
Waiting for results from the following shards: 0
18951 2017-09-19 18:39:37.819 E: Request to https://chromium-swarm.appspot.com/api/swarming/v1/task/38b383a78cb58910/result failed with HTTP status code 502: <!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 502 (Server Error)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>502.</b> <ins>That’s an error.</ins>
  <p>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.  <ins>That’s all we know.</ins>
Waiting for results from the following shards: 0

The link to the swarming task _does_ contain the output, e.g.:
https://chromium-swarm.appspot.com/task?id=38b383a78cb58910

So this works as a workaround, but it's currently breaking all my scripts that expect to find the output from a logdog link.
 
Summary: buildbot stdout stream for swarming collection doesn't contain task stdout (was: Swarming: task_output_dir already exists!)
This doesn't appear to be related to task_output_dir.

It looks as though task stdout disappeared from the stdout stream late on 2017-09-07 pacific time. I don't yet have a suspected cause. The timestamp for https://chromium.googlesource.com/chromium/tools/build/+/60ab2bba2366227984e6b434eaf95e6987a2bd7c is close, but nothing in that CL appears to be related despite the changes to the swarming recipe module.
Cc: tansell@chromium.org
This appears to be due to https://chromium.googlesource.com/infra/luci/client-py.git/+/5e8001d9a710121ce7a68efd0804430a34b4f9e4, which rolled into chromium in https://chromium-review.googlesource.com/c/chromium/src/+/656401. It looks like the default value was supposed to be keeping the previous behavior; that may not be working as intended given that the stdout disappeared from the console.
The first build in which this appeared on that Mac Retina bot is https://luci-milo.appspot.com/buildbot/chromium.perf/Mac%20Retina%20Perf/1104. That build's blamelist contains the DEPS roll mentioned in #2.
Cc: -tansell@chromium.org jbudorick@chromium.org
Labels: -Infra-Troopers
Owner: tansell@chromium.org
Status: Assigned (was: Untriaged)
Assigning to tansell and removing from trooper queue given that this appears to be an unintentional side-effect of that CL. Feel free to re-add troopers if necessary.
I'm just about to get onto a plane flight and won't have time to look at this until Monday. 

If it is really my CL you should be able to tell by a change in the swarming output;

-      '+---------------------------------------------------------------------+',
-      '| Shard 0  https://localhost:1/user/task/10100                        |',
-      '+---------------------------------------------------------------------+',
+      '+------------------------------------------------------+',
+      '| Shard 0  https://localhost:1/user/task/10100         |',
+      '+------------------------------------------------------+',
       'Foo',
-      '+---------------------------------------------------------------------+',
-      '| End of shard 0  Pending: 6.0s  Duration: 1.0s  Bot: swarm6  Exit: 0 |',
-      '+---------------------------------------------------------------------+',
+      '+------------------------------------------------------+',
+      '| End of shard 0                                       |',
+      '|  Pending: 6.0s  Duration: 1.0s  Bot: swarm6  Exit: 0 |',
+      '+------------------------------------------------------+',
+      'Total duration: 1.0s',


or

+      '+------------------------------------------------------+',
+      '| Shard 0  https://localhost:1/user/task/10100         |',
+      '|  Pending: 6.0s  Duration: 1.0s  Bot: swarm6  Exit: 0 |',
+      '+------------------------------------------------------+',
Yep, that change in the swarming output is the point of this bug. The perf bots that Juan is asking about are seeing something like

+------------------------------------------------------------------------+
| Shard 0  https://chromium-swarm.appspot.com/user/task/38784e7b10cbae10 |
|  Pending: 9476.3s  Duration: 3543.0s  Bot: build30-b4  Exit: 1         |
+------------------------------------------------------------------------+

w/o stdout, despite not passing anything to --task-output-stdout (which should mean that the stdout gets sent to the console and the JSON, if I'm reading your patch correctly).
The default should 'all' -- Can you compare what happens when you run it locally with "--task-output-stdout=all" ?

I'm guessing there is an error in my logic at;

+  def convert_value(self, opt, value):
+    if value not in self.choices:
+      raise optparse.OptionValueError("%s must be one of %s not %r" % (
+          self.get_opt_string(), self.choices, value))
+    stdout_to = []
+    if value == 'all':
+      stdout_to = ['json', 'console']
+    elif value != 'none':
+      stdout_to = [value]
+    return stdout_to

Components: -Infra Infra>Client>Chrome
Tansell, any updates? As far as I can see this is still happening.

Comment 10 by kbr@chromium.org, Oct 13 2017

Issue 774173 has been merged into this issue.

Comment 11 by kbr@chromium.org, Oct 13 2017

Blocking: 731095
Cc: estaab@chromium.org
Components: Infra>Platform>LogDog Infra>Platform>Swarming
Labels: -Type-Bug Type-Bug-Regression
This is a significant regression and is preventing external contributors from seeing the results of their tryjobs.

NextAction: 2017-10-15
It's Saturday in Sydney so let's see if it's something tansell can look at on Monday.
The NextAction date has arrived: 2017-10-15
Components: -Infra>Platform>Swarming Infra>Platform>Swarming>Admin
Owner: ----
Status: Available (was: Assigned)
Status: WontFix (was: Available)

Sign in to add a comment