Build status page shows green for a failed test |
||||||||||
Issue descriptionWhile the test #28 is green: https://uberchromegw.corp.google.com/i/chromeos/builders/lumpy-chrome-pfq/builds/9092 It seemd to fail: https://uberchromegw.corp.google.com/i/chromeos/builders/lumpy-chrome-pfq/builds/9092/steps/HWTest%20%5BAFDO_record%5D/logs/stdio ... Suite timed out. Started on 09-07-2016 [18:11:52], timed out on 09-07-2016 [19:36:20] Suite job [ PASSED ] telemetry_AFDOGenerate [ FAILED ] telemetry_AFDOGenerate ABORT: None ...
,
Sep 9 2016
This is bad and I think it should be a P1. It is quite distracting. Sorry to raise to P1 but AFDO collection has been broken since August 30 because of autotest/lab issues.
,
Sep 9 2016
,
Sep 13 2016
Ningning can you do an initial sanity check on this? + Richard who can advise but is swamped at the moment.
,
Sep 13 2016
I see stuff like this in the logs:
Will return from run_suite with status: SUITE_TIMEOUT
nxia@ - could you check what chromite does with that return
status, and report whether it looks like it actually treats
that return value as a failure?
,
Sep 14 2016
I'll check and update.
,
Sep 14 2016
The json dump result shows telemetry_AFDOGenerate was aborted. Why the exception was passed by this logic? https://cs.corp.google.com/chromeos_public/chromite/cbuildbot/commands.py?type=cs&q=%22wait_cmd+has+lab+failures%22&sq=package:%5Echromeos_(internal%7Cpublic)$&l=1159 " 19:37:39: ERROR: wait_cmd has lab failures: cwd=None. Exception will be raised in the next json_dump run. "
,
Sep 14 2016
,
Sep 16 2016
AFE seems to be down now, but I'm curious what the created_on value for that test in the AFE is. I think this may be one of those time related bugs (in other words, due to differences in localtime, run_suite.py calculates that the the test times out using its own time, but not using the server (existing job) time)
,
Sep 16 2016
I spoke with Fang about this interesting comment in the code: Note the timeout will have no sense when using -m option. Long story short, this bug has existing since 2014 and something changed to cause this logic to be exercised. Suite timeout is only checked when run_suite.py is tracking a running job. The HWTestStage in question runs run_suite.py three times: 1. Create the suite job and exit. 2. Track the created suite job. This checks for suite timeout and outputs the correct error status. 3. Fetch the completed suite job and dump JSON. Since it is not tracking a running job, it doesn't check for suite timeout and thus exits with successful status.
,
Sep 16 2016
,
Sep 21 2016
Issue 642198 has been merged into this issue.
,
Sep 21 2016
suggestions about who/what to fix for this bug, akeshet@?
,
Sep 21 2016
,
Jan 31 2017
,
Jun 29 2017
Is this still relevant / happening? Rather old issue. Reopen if still relevant.
,
Jun 30 2017
I have not seen anything like this in a long time.
,
Jan 4 2018
|
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by laszio@chromium.org
, Sep 8 2016Labels: Build-Toolchain