New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 724541 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Jun 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

PreCQ time out, not explained by metrics.

Project Member Reported by dgarr...@chromium.org, May 19 2017

Issue description

This CL timed out in the PreCQ. 2017/5/18 at 9:40 PM.

https://chromium-review.googlesource.com/c/505955


Looking at the PreCQ graphs, I don't see any particular connection to the failure.

https://viceroy.corp.google.com/chromeos/pre-cq?duration=8d&utc_end=1495192685

 

Comment 1 by pho...@chromium.org, May 19 2017

Cc: akes...@chromium.org shuqianz@chromium.org
Labels: -Pri-2 Pri-1
This change is also being affected: https://chromium-review.googlesource.com/c/509934/

Comment 2 by aut...@google.com, May 23 2017

Cc: chingcodes@chromium.org xixuan@chromium.org
+ xixuan (current deputy) is also seeing this
+ chris who has another example too
Cc: nxia@chromium.org
Can we check the buildbucket id for one of these builds to see if buildbucket results confirm that the build never started?

Comment 4 by nxia@chromium.org, May 23 2017

You can find the buildbucket id in the clActionTable

mysql> select * from clActionTable where change_number=505955 and patch_number=3;


| 12609885 |  1528888 |        505955 |            3 | external      | trybot_launching          | rambi-pre-cq                  | 2017-05-19 03:10:14 | 8979201249746998544 |


https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/pre_cq/builds/33945


Basically it's because it failed at InitialCheckout, so it didn't reach the PreCQSync stage to insert any useful information.



The build ran for 11 minutes, then was killed by the launcher as timed out starting.

Hum... two issues happening. That means it didn't start for 80 minutes (there is currently a 90 minute launch timeout, right?).

A) 80 minutes is a long wait for a builder.

The delay needs further investigation, since I don't THINK the metrics show all builders in use at that time.

B) The launcher doesn't notice the build started until PreCQ Sync stage runs.

How hard would it be to change the launcher to use buildbucket to learn if the builders have started? And if the build finished/passed?

If the launcher can use buildbucket, we can delete the PreCQ Sync/Completion code on the PreCQ builders. More robust and flexible, but not urgent.

Comment 6 by nxia@chromium.org, May 24 2017

A) rambi-pre-cq started in time, but it failed at InitialCheckout stage and reported nothing back, then it's considered as timeout. 

It was triggered at 10:14 and the build started at 10:18.

B) cbuildbot is be the place to check the pre-cq runs and to report the status. If the build failed at InitialCheckout step, no Cbuildbot code can be run. 

A) You're right. I thought the sync timed out, here was the real error:

error: insufficient permission for adding an object to repository database /b/cbuild/repository/.repo/projects/src/third_party/chromiumos-overlay.git/objects

I *think* the updated launcher code will recover in this case now by forcing a full sync. I'm not 100% certain.

B) Yes: I agree that that's how things work today, I was proposing a change. I'm just not sure how expensive that change would be to implement.

Comment 8 by aut...@google.com, May 30 2017

Is there an immediate change needed here? Or are we proposing to wait until the new launcher code is live to see?
Status: WontFix (was: Untriaged)
 crbug.com/726065  will add additinal metrics that would help diagnose this kind of problem.

Let close this, and see if we still have unexplained timeouts after we have the better metrics.
Owner: dgarr...@chromium.org
Status: Assigned (was: WontFix)
Re c#9:

If you're closing this as WontFix, can you ensure that the follow up job links back to this job, and has an explicit action item to look for these timeouts.  Otherwise this just falls through the cracks again.
Status: WontFix (was: Assigned)
Moving to a new bug for the new graph work.

 crbug.com/734839 

Sign in to add a comment