New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 645278 link

Starred by 2 users

Issue metadata

Status: Archived
Owner:
Closed: Jun 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Build status page shows green for a failed test

Project Member Reported by laszio@chromium.org, Sep 8 2016

Issue description

While the test #28 is green:

https://uberchromegw.corp.google.com/i/chromeos/builders/lumpy-chrome-pfq/builds/9092

It seemd to fail:
https://uberchromegw.corp.google.com/i/chromeos/builders/lumpy-chrome-pfq/builds/9092/steps/HWTest%20%5BAFDO_record%5D/logs/stdio

...
  Suite timed out. Started on 09-07-2016 [18:11:52], timed out on 09-07-2016 [19:36:20]
  Suite job                [ PASSED ]
  telemetry_AFDOGenerate   [ FAILED ]
  telemetry_AFDOGenerate     ABORT: None
...
 
Cc: llozano@chromium.org cmt...@chromium.org yunlian@chromium.org
Labels: Build-Toolchain
Labels: -Pri-2 Pri-1
This is bad and I think it should be a P1. 
It is quite distracting. 

Sorry to raise to P1 but AFDO collection has been broken since August 30 because of autotest/lab issues. 

Comment 3 by dshi@chromium.org, Sep 9 2016

Cc: akes...@chromium.org
Components: Infra>Client>ChromeOS
Labels: -Hardware-Lab

Comment 4 by autumn@chromium.org, Sep 13 2016

Cc: jrbarnette@chromium.org
Owner: nxia@chromium.org
Ningning can you do an initial sanity check on this? 

+ Richard who can advise but is swamped at the moment. 
I see stuff like this in the logs:
    Will return from run_suite with status: SUITE_TIMEOUT

nxia@ - could you check what chromite does with that return
status, and report whether it looks like it actually treats
that return value as a failure?

Comment 6 by nxia@chromium.org, Sep 14 2016

I'll check and update.

Comment 7 by nxia@chromium.org, Sep 14 2016

The json dump result shows telemetry_AFDOGenerate was aborted. 

Why the exception was passed by this logic?

https://cs.corp.google.com/chromeos_public/chromite/cbuildbot/commands.py?type=cs&q=%22wait_cmd+has+lab+failures%22&sq=package:%5Echromeos_(internal%7Cpublic)$&l=1159


"
19:37:39: ERROR: wait_cmd has lab failures: cwd=None.
Exception will be raised in the next json_dump run.
"


Comment 8 by nxia@chromium.org, Sep 14 2016

Cc: shuqianz@chromium.org
AFE seems to be down now, but I'm curious what the created_on value for that test in the AFE is.  I think this may be one of those time related bugs (in other words, due to differences in localtime, run_suite.py calculates that the the test times out using its own time, but not using the server (existing job) time)
I spoke with Fang about this interesting comment in the code:

Note the timeout will have no sense when using -m option.

Long story short, this bug has existing since 2014 and something changed to cause this logic to be exercised.

Suite timeout is only checked when run_suite.py is tracking a running job. The HWTestStage in question runs run_suite.py three times:

1. Create the suite job and exit.
2. Track the created suite job.  This checks for suite timeout and outputs the correct error status.
3. Fetch the completed suite job and dump JSON. Since it is not tracking a running job, it doesn't check for suite timeout and thus exits with successful status.
Cc: ayatane@chromium.org

Comment 12 by nxia@chromium.org, Sep 21 2016

Issue 642198 has been merged into this issue.

Comment 13 by nxia@chromium.org, Sep 21 2016

suggestions about who/what to fix for this bug, akeshet@?

Comment 14 by nxia@chromium.org, Sep 21 2016

Status: Available (was: Untriaged)

Comment 15 by nxia@chromium.org, Jan 31 2017

Labels: Hotlist-Fixit
Owner: ----
Owner: llozano@chromium.org
Status: Archived (was: Available)
Is this still relevant / happening? Rather old issue. Reopen if still relevant.
I have not seen anything like this in a long time. 

Components: -Infra>Client>ChromeOS

Sign in to add a comment