New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 644291 link

Starred by 1 user

Issue metadata

Status: Archived
Owner: ----
Closed: Sep 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

veyron_rialto-chrome-pfq timeout

Project Member Reported by achuith@chromium.org, Sep 6 2016

Issue description

Here's the bot:
https://uberchromegw.corp.google.com/i/chromeos/builders/veyron_rialto-chrome-pfq

First failing build:
https://uberchromegw.corp.google.com/i/chromeos/builders/veyron_rialto-chrome-pfq/builds/550

Most recent failing build:
https://uberchromegw.corp.google.com/i/chromeos/builders/veyron_rialto-chrome-pfq/builds/556


Looks like the build is taking too long?

Log snippet:
@@@STEP_FAILURE@@@
02:14:19: ERROR: Timeout occurred- waited 4701 seconds, failing. Timeout reason: This build has reached the timeout deadline set by the master. Either this stage or a previous one took too long (see stage timing historical summary in ReportStage) or the build failed to start on time.
cros_sdk: Signaled to shutdown: caught 15 signal.

@@@STEP_FAILURE@@@
02:14:19: ERROR: Traceback (most recent call last):
  File "/b/cbuild/internal_master/chromite/cbuildbot/stages/generic_stages.py", line 525, in Run
    self.PerformStage()
  File "/b/cbuild/internal_master/chromite/cbuildbot/stages/build_stages.py", line 331, in PerformStage
    extra_env=self._portage_extra_env)
  File "/b/cbuild/internal_master/chromite/cbuildbot/commands.py", line 482, in Build
    enter_chroot=True)
  File "/b/cbuild/internal_master/chromite/cbuildbot/commands.py", line 139, in RunBuildScript
    raise failures_lib.BuildScriptFailure(ex, cmd[0])
  File "/b/cbuild/internal_master/chromite/cbuildbot/commands.py", line 124, in RunBuildScript
    return runcmd(cmd, **kwargs)
  File "/b/cbuild/internal_master/chromite/lib/cros_build_lib.py", line 594, in RunCommand
    (cmd_result.output, cmd_result.error) = proc.communicate(input)
  File "/usr/lib/python2.7/subprocess.py", line 751, in communicate
    self.wait()
  File "/usr/lib/python2.7/subprocess.py", line 1291, in wait
    pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
  File "/usr/lib/python2.7/subprocess.py", line 478, in _eintr_retry_call
    return func(*args)
  File "/b/cbuild/internal_master/chromite/lib/timeout_util.py", line 131, in kill_us
    raise SystemExit(error_message)
SystemExit: Timeout occurred- waited 4575 seconds, failing. Timeout reason: This build has reached the timeout deadline set by the master. Either this stage or a previous one took too long (see stage timing historical summary in ReportStage) or the build failed to start on time.



 
Components: Infra>Client>ChromeOS
Cc: bpastene@chromium.org
Successful run lasts more than 3 hours:

https://00e9e64bac8930c4b86005811899dc66c8907d4cc29753134e-apidata.googleusercontent.com/download/storage/v1/b/chromeos-image-archive/o/veyron_rialto-chrome-pfq%2FR55-8765.0.0-rc1%2Ftimeline-stages.html?qk=AD5uMEub_Lm_1wsSGKa6wwRIs1s_cVidHolHaox84scgLEMEMoGjj5kqyWc0CJHegsDNkootaK8sxwn2yhGUdzCGsBxkOWobs4i7FGEN36HRI2L5tGfn4Xxw4LKB-0RsMYv8pb12pUBNYeevxBY11YNiAAIpxFgqnrDBMYhTejNfTZbXwoVWtEMa8cuPmYhVzWFPlH2Ps6P-8y-2icHNZSemz9OVQPuzDiBxUYPmQ533k-aJvDZv3unxq72Gi2rusHoP5KL8REY8W3tlqtIWhkvWH-WoF6fs0rT5VNcbUeFeUXrzfHwh8AXKoINHlx3uMvEfnGT2pABeVlNiSlUPf4COPzp4HqlT7MeO0P0gHSu05i7GNC_f63Fa34ba8b1acG_6Z6W_Cmfyl1rrrRZdxTjocLyj5qSeyvAY8TQ9NLOCSJt-51mEID48djnYDQjeUqjPEQMqC6hVmKCjf6Ca8L9GCFjcD6-8o_yM1pPQNFScI7EBBP1WcoTxY1gXjR8glqzPFLvJGzIzl9Gqyk6OjoiXzfOcKlgwnnVgdjY97Pr_Qgi6HYTmiAdv1wC4QIZwrfiTI0Sf5PcieG431p2LiG1KXZ3n4tM6po6l6zRer4KzQolCMfSy9zudtwBgF0umMLgWWljT9LTeGlX8QKhivLsq9vo8_d6XQ2HZM1hYmcdYu1XQp9A8dpNiapvcmGYi_yuwnzZx4ZTaZucAJ7gAydTRCxxiJYk-ERq-HuJl_5eHsZmu0m-tbFsyMHoQb4z8GlOweoGEpfFCeJ4k-Q_-E7WEiB9ENl03BTnqYZlv5cE1LlGl7wjko0qUqBvM3bz-CQD44WM0H5NZXf5W62G0mkDvx-kb42RoYSdbj44mI7xpxI5BrS0ZA1Y


Failed run is killed after hour or so:

https://00e9e64bacb522463dd90e9287be75c5d961bd638b41a30bc4-apidata.googleusercontent.com/download/storage/v1/b/chromeos-image-archive/o/veyron_rialto-chrome-pfq%2FR55-8775.0.0-rc1%2Ftimeline-stages.html?qk=AD5uMEv_xPgbVdBmOiEXJ3huws-_Luvj8J74DvbNovOhx-iSzCoQlCkH5hwzKMxRLLValnxJBKCzvn-Dla_iwRSQ_qZ51e0P95omr91FwL88kAyBF5rFYvH6glzQpq9l6UT--6JKDFEhPdNfDuUFgwARJSxnfhqRUGV8NKoJq8kFDoHpEEPUGCM3UxFBfrp13mxUXRzZd476aq496sWk8CQFYWNBozwBSJW_XDo4r0t_3r0R1oGa2g_odYFGNwx4p1c894H8Z_4J5VbhikTbHUdohxAuYOXHZnKGohODou7V8yRXHMsN8_8VPD7YrVxIXDKyujcJdvjibRzieBR9ux9PcbeOvWYvK4vMJJSrIOj8yk5L50bTeheRsuivx0sVEPGTgvVcmPIAn51pHMgmBiMKucF2PcVfdKMhViuT7ihit5R4do_taqMH4lsqb6DORt7u4MjMkLcazCzAFrrTvato33imGG-lMOOjsUDfzBZMPqT-JI3-qWJWsBwVNKN5Exghj12I0tOt9iv1tVTjUiIxCdtixOkRt_R6844uEHTsXJhuqEfvvfk93IQTPva0hC9m4vnHYFrTmB_3LlcZeMsClkjrLmeTaYcUm8Gd890czelgiFQI3uWw1oYbkr71GpVbl1SGYP9dfJYMDzxgjagcZ0T_skaAWt3eKErxBOhfQqTM_LWHMnW_O3i3gbFf4L65ljWSQJucNmLAHTNIdt1FybjG9DVDECsligIZyiLTkXWhQosPmNVv7FE-fxXNtMCgHavr46vLbOSNzrcBMJddMK7Lgf1WlgKaGwtBoe4OFqYKjwqYJeAXhuSngHh43loJw2GZUDwtoHKQ2P6jGMzQZhq2NoVJS91wG1XfCuqXodWc5-Gpbpw

Did we change time limit recently? Where is it set?
Somewhere in depths of cbuildbot: https://cs.chromium.org/chromium/src/third_party/chromite/scripts/cbuildbot.py?rcl=0&l=1280

As a trooper, I'm not sure how much I can help here since cbuildbot.py seems to implement its own timeout logic.
Looks like I have more data

Successful run:

17:08:41: INFO: Updating slave build timeout to 15986 seconds enforced by the master

(from https://uberchromegw.corp.google.com/i/chromeos/builders/veyron_rialto-chrome-pfq/builds/549/steps/steps/logs/stdio)

Timing-out run:

00:55:57: INFO: Updating slave build timeout to 4701 seconds enforced by the master

(from https://uberchromegw.corp.google.com/i/chromeos/builders/veyron_rialto-chrome-pfq/builds/556/steps/steps/logs/stdio)
Cc: lhchavez@chromium.org
+Luis in case he has some ideas on where the timeouts are set
While I got an idea, builder went back to green:

https://uberchromegw.corp.google.com/i/chromeos/builders/veyron_rialto-chrome-pfq/builds/558

Apparently, on failing runs veyron_rialto has not been starting for 2 hours after master started waiting for slaves. E.g. here master started around 9:30 and started waiting for slaves around 10:30
https://uberchromegw.corp.google.com/i/chromeos/builders/master-chromium-pfq/builds/3342/steps/steps/logs/stdio

but message "ERROR: No status found for build config veyron_rialto-chrome-pfq" disappeared only around 12:40.

Master sets timelimit to after 16200 seconds (can be seen in this file), but rialto eventually starts only when around 4000 seconds are left, which is insufficient for this builder.

Though, I am not sure why it didn't start in time (maybe there were long build requests queue for this machine)? Anyway, now this start-up slowness disappeared and probably we can close this bug until this happens again.
Status: Fixed (was: Untriaged)
 Issue 644466  has been merged into this issue.
Labels: VerifyIn-55

Comment 12 by dchan@chromium.org, Oct 10 2016

Labels: -VerifyIn-55

Comment 13 by dchan@google.com, Nov 19 2016

Labels: VerifyIn-56

Comment 14 by dchan@google.com, Jan 21 2017

Labels: VerifyIn-57

Comment 15 by dchan@google.com, Mar 4 2017

Labels: VerifyIn-58

Comment 16 by dchan@google.com, Apr 17 2017

Labels: VerifyIn-59

Comment 17 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61

Comment 19 by dchan@chromium.org, Oct 14 2017

Status: Archived (was: Fixed)

Sign in to add a comment