New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 695268 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Closed: Mar 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocked on:
issue 611139



Sign in to add a comment

cyan-chrome-pfq timing out due to long build_packages step

Project Member Reported by sha...@chromium.org, Feb 23 2017

Issue description

https://luci-milo.appspot.com/buildbot/chromeos/cyan-chrome-pfq/808

We see that we failed during HWTest:

@@@STEP_FAILURE@@@
16:47:38: ERROR: Timeout occurred- waited 16000 seconds, failing. Timeout reason: This build has reached the timeout deadline set by the master. Either this stage or a previous one took too long (see stage timing historical summary in ReportStage) or the build failed to start on time.

@@@STEP_FAILURE@@@
16:47:38: ERROR: Timeout occurred- waited 15764 seconds, failing. Timeout reason: This build has reached the timeout deadline set by the master. Either this stage or a previous one took too long (see stage timing historical summary in ReportStage) or the build failed to start on time.
16:47:38: INFO: Running cidb query on pid 14005, repr(query) starts with <sqlalchemy.sql.expression.Insert object at 0x3c74410>
16:47:39: INFO: Running cidb query on pid 14005, repr(query) starts with <sqlalchemy.sql.expression.Insert object at 0x3762290>

But, the HWTest stage only ran for ~15 mins. We spend nearly 2 hours running build_packages, probably because we're building chrome from source.
 

Comment 1 by sha...@chromium.org, Feb 23 2017

Ah, this might be related to issue 695172. Now that the chrome PFQ builder is green, maybe we won't have to build from source? Let's wait and see.
Cc: derat@chromium.org jrbarnette@chromium.org
Labels: -Pri-3 Pri-1
Owner: afakhry@chromium.org
Status: Assigned (was: Untriaged)
I'm not sure that this is due to issue 695172, nor that it is necessarily related to the BuildPackages step per-se. BuildPackages does take a very long time, and may have gone up across the board at some point (we should investigate that), however it takes just as long (from a quick glance) on other builders.

It does appear that we may be taking longer in general, and cyan runs the most tests so tends to take the longest.

We need to consider a short term fix (e.g. increase the master timeout), and also investigate why the builders are taking so long in general.

+derat@ and jrbarnette@ who have been investigating PFQ cycle times.

->afakhry@ (current gardener)

Comment 3 by derat@chromium.org, Feb 28 2017

It looks like this build spent about 114 minutes on BuildPackages and another 80 minutes on SimpleChromeArtifacts. As far as I know, we'll always need to run both of those stages serially on the PFQ builders, so even if we're able to cut down the time spent in HWTest (HWTest [sanity] took almost 100 minutes), it feels like we're still going to be awfully close to the current limit (which looks like it's about 267 minutes).

Will we be able to use GOMA to make either or both of these stages complete in less time?
Cc: steve...@chromium.org xixuan@chromium.org
Most recent failure is due to the same timeout: https://uberchromegw.corp.google.com/i/chromeos/builders/cyan-chrome-pfq/builds/826

Steven, there used to be a ReportStage step in the build that showed a timeline graph. Was that removed?
It's there ('Build stages timeline') in other builds, just not in build #826, for reasons I don't understand.

There's a single master deadline that is shared by all slaves. Yep, you can increase it, the knob is somewhere in chromeos_config I believe.
I can see a bunch of timeout overrides for cyan-chrome-pfq in chromite/cbuildbot/config_dump.json (https://cs.corp.google.com/chromeos_public/chromite/cbuildbot/config_dump.json?type=cs&q=cyan-chrome-pfq+p:chromeos_public&l=6481-6486), but I'm not sure whether these are the ones I should be changing?
Looking at the history of cyan builders:
https://uberchromegw.corp.google.com/i/chromeos/builders/cyan-chrome-pfq?numbuilds=200

We appear to have been pushing the edge of the master timeout at least as far back as September:
https://uberchromegw.corp.google.com/i/chromeos/builders/cyan-chrome-pfq/builds/215

BuildPackages appears to always take just under 2 hours, although it appears to have slowly crept up from ~1:40 to ~2:00.

HWTest shows the most variation, with times ranging from 0:20 to 1:15.


The config_dump.json file is generated, and what you really want to change is the master timeout, not the slaves.

akeshet@, jrbarnette@, can you please help us track down where the master sets the slave timeouts?

Apologies, I wasn't very clear in that last comment:

1) We appear to have been closer than comfortable to the timeout for a while now but have recently crept up to the point where the slaves (cyan in particular which runs the most tests) are taking to long. There is no obvious "smoking gun" here - build times and test times both appear to have crept up.

2) This needs a better fix (i.e. we should do less) but for now we should figure out how to increase the master timeout.

BTW, the specific failing error is:

@@@STEP_FAILURE@@@
06:22:28: ERROR: Timeout occurred- waited 15595 seconds, failing. Timeout reason: This build has reached the timeout deadline set by the master. Either this stage or a previous one took too long (see stage timing historical summary in ReportStage) or the build failed to start on time.
06:22:28: INFO: Running cidb query on pid 30746, repr(query) starts with <sqlalchemy.sql.expression.Insert object at 0x5370d50>


15595 seconds is just over 4.5 hours, which I recall being the master timeout, but I can not for the life of me recall where that is set.

That error mesage is set here:
https://cs.corp.google.com/chromeos_public/chromite/scripts/cbuildbot.py?q=p:chromeos_public+%22timeout+deadline+set+by+the+master%22&l=1308

I am struggling to follow that code, it appears to get the timeout from CIDB, but I can't figure out where that gets set?

akeshet@ - git blames you for that message :)


akeshet@ suggested that we need to change the master-chromium-pfq config here:
https://cs.corp.google.com/chromeos_public/chromite/cbuildbot/chromeos_config.py?q=chromeos_config.py&l=2821

I think we need to set build_timeout to something > 16200 (4.5 hours).

The default is here btw:
https://cs.corp.google.com/chromeos_public/chromite/cbuildbot/config_dump.json?q=_dump+file:%5Echromite/+package:%5Echromeos_public$&l=23

I'm not sure where that value comes from exactly, i thought that file was generated?

I would suggest with upping the timeout to 6 hours - I would rather investigate why individual stages are spinning for a ridiculous time than run into this problem again.

This is already being tracked as  issue 611139 , we should continue the discussion there.

Blockedon: 611139
Project Member

Comment 16 by bugdroid1@chromium.org, Mar 4 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/6735830d9db646296d3d76412245ec07980a9810

commit 6735830d9db646296d3d76412245ec07980a9810
Author: Ahmed Fakhry <afakhry@google.com>
Date: Sat Mar 04 00:50:34 2017

[chromeos_config]: Increase the master PFQ timeout to 6 hours.

Recently some builders started to timeout due to increased
BuildPackages and HWTest times. This CL increase this timeout to
6 hours from 4.5 hours.

BUG= chromium:611139 ,  chromium:695268 
TEST=none

Change-Id: I27bfd348c1fe9c61d2f5a24084777656620f82ef
Reviewed-on: https://chromium-review.googlesource.com/448109
Trybot-Ready: Ahmed Fakhry <afakhry@chromium.org>
Tested-by: Ahmed Fakhry <afakhry@chromium.org>
Reviewed-by: Ahmed Fakhry <afakhry@chromium.org>

[modify] https://crrev.com/6735830d9db646296d3d76412245ec07980a9810/cbuildbot/config_dump.json
[modify] https://crrev.com/6735830d9db646296d3d76412245ec07980a9810/cbuildbot/chromeos_config_unittest.py
[modify] https://crrev.com/6735830d9db646296d3d76412245ec07980a9810/cbuildbot/chromeos_config.py

Status: Fixed (was: Assigned)
We have increased the master PFQ builder timeout to 6 hours. cyan-chrome-pfq hasn't been timing out since.

Comment 18 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61

Comment 20 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Sign in to add a comment