linux_layout_tests_slimming_paint_v2 is under bot capacity |
|||||||||||||||||||
Issue descriptionI'm seeing this error somewhat frequently (more than once a day): linux_layout_tests_slimming_paint_v2 on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) It looks like these failures are causing patches to be rejected from the CQ flakily. Recent examples: https://codereview.chromium.org/2758683002/ https://codereview.chromium.org/2758203003/ https://codereview.chromium.org/2760063003/ Builder: https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_layout_tests_slimming_paint_v2 Background: linux_layout_tests_slimming_paint_v2 is a special CQ bot that runs webkit layout tests with --enable-slimming-paint-v2 and is triggered for paint-related changes. (I removed RVG because this can all be public)
,
Mar 21 2017
,
Mar 21 2017
Removing from the trooper queue.
,
Mar 21 2017
,
Mar 21 2017
,
Mar 21 2017
The green bubble in the CL is https://cr-buildbucket.appspot.com/_ah/api/buildbucket/v1/builds/8984556824149165552 It was created at 17:25 PDT, so it corresponds to the gray bubble in https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2758683002/20001 at 17:25 pdt. It is one build. From the buildbucket log: 1 2017-03-21 00:25:34 UTC Build 8984556824149165552 was created by user:5071639625-1lppvbtck1morgivc6sq4dul7klu27sd@developer.gserviceaccount.com 2 2017-03-21 00:27:20 UTC Build 8984556824149165552 was leased by user:446450136466-u4o8pcvmvt31t2os2d8of23egso76scf@developer.gserviceaccount.com 3 2017-03-21 02:27:41 UTC Build 8984556824149165552 was started. URL: http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_layout_tests_slimming_paint_v2/builds/3361 4 2017-03-21 02:48:42 UTC Build 8984556824149165552 was completed. Status: COMPLETED. Result: SUCCESS I see nothing wrong on the buildbucket side. CQ team, what does "JOB_TIMED_OUT, no build URL" means? > However, buildbucket only shows a single build for that CL that happened later I don't think it happened later.
,
Mar 21 2017
The answer is the nodir's quoted log right above: Build was scheduled by CQ at 00:25 UTC, but it actually started (and got URL, i think that's where is from!) by 02:27 UTC. **That's 2 hours later**. CQ doesn't wait that long, and that's why the error. So, capacity is clearly lacking here.
,
Mar 21 2017
no wonder, there is only one bot I am removing CQ and Buildbucket components because it is not a bug in either of them. I don't know who owns this builder, qyearsley set it up though. Whoever owns/cares about the builder, should consider adding more bots. Please consider cc'ing people who are familiar with the builder.
,
Mar 21 2017
Thanks for looking into it. @qyearsley, we originally used one bot and manually added the spv2 trybot. This was working so well that we started auto-triggering the trybot on changes in certain directories. May we have one or two additional builders?
,
Mar 21 2017
Please estimate the capacity properly and request machines from Infra>Labs. To estimate the capacity, estimate the peak number of CLs per hour (N) that would trigger this builder (e.g. for the entire Chromium it's ~60) and check the average runtime (T) in hours. The required capacity then is at least N*T. We strive to have 75% peak load on every pool, so make it 1.3*N*T. Here's the estimation if this were a full Chromium builder (internal only - sorry): http://vi/chrome_infra/Buildbot/per_pool?running_builders=&estimated_builders=linux_layout_tests_slimming_paint_v2&duration=1d&refresh=-1&pool=master.tryserver.chromium.linux%3Alinux_layout_tests_slimming_paint_v2&cq_project=chromium&est_master=master.tryserver.chromium.linux The above console shows ~20min cycle time (0.33h), and http://shortn/_oRY2SoiHqv suggests that you trigger at most 4-5 builds per hour. At this rate, it seems 3 bots should handle the load: ceil(1.3*0.33*5) = 3. This estimate is based purely on historical usage, so if you know your usage will be different please factor it in. Thanks!
,
Mar 21 2017
That viceroy link is pretty fancy. I don't expect our usage to go any higher in the near-term and agree with your analysis; 3 bots sounds great.
,
Mar 21 2017
,
Mar 23 2017
Now that we understand the bug, do you mind if I raise the priority? I have a stable blocker patch that will likely miss the branch today because of under capacity on the spv2 bot :( (https://codereview.chromium.org/2767343003)
,
Mar 31 2017
Pinging this -- what do we need to do next to "request machines from Infra>Labs"? I would like to add changes under cc/ to run this bot as well, which will require this capacity increase. Background is that I just spent several days finding a regression introduced http://crbug.com/707281 which would have been prevented if cc/... were running this bot on CQ. Thanks for insight on next steps.
,
Mar 31 2017
-Infra>Platform because it is not a platform issue +Infra to triage
,
Mar 31 2017
This was in Infra back in c#8 already. Should we put it in Infra > Labs?
,
Apr 3 2017
Ping -- how do we request machines from Infra > Labs?
,
Apr 3 2017
Yes, adding Infra>Labs adds the ticket to the Labs rotation. They can provide you with more machines.
,
Apr 3 2017
Assigning to person I am talking with on the work now
,
Apr 3 2017
Chatted with wkorman@ offline. He'll need 2 more GCE instances that looks like slave775-c4.
,
Apr 4 2017
slave1404-c4 slave1405-c4
,
Apr 11 2017
Did the new slaves go away or fail to get added? I still see a single slave: https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_layout_tests_slimming_paint_v2 (slave775-c4) Reopening because we just had Nico complain about their patch getting held up by the spv2 trybot.
,
Apr 11 2017
It looks like labs gave us two slaves, but I see nothing to indicate that the slaves were actually added to the builder. Though I suspect that 3 builders is still a bit low.
,
Apr 11 2017
In comment 10 we back-of-the-enveloped the capacity and 3 seemed reasonable. This bot just runs for a subset of blink changes.
,
Apr 11 2017
We are in discussion to add it to run on cc/... changes as well if that affects things: https://groups.google.com/a/chromium.org/d/msg/graphics-dev/wuY9tp3CkLI/ujEL24BcBQAJ
,
Apr 11 2017
,
Apr 14 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/931ed421b527d703565f3e09c50b6a013efe36d0 commit 931ed421b527d703565f3e09c50b6a013efe36d0 Author: Dirk Pranke <dpranke@chromium.org> Date: Fri Apr 14 00:03:54 2017 Add two more bots to the slimming paint builder on tryserver.chromium.linux. R=qyearsley@chromium.org BUG= 703478 Change-Id: If57756b3adc0f5444f41fa70f2281a577cc10ee8 Reviewed-on: https://chromium-review.googlesource.com/474968 Commit-Queue: Dirk Pranke <dpranke@chromium.org> Reviewed-by: Dirk Pranke <dpranke@chromium.org> Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> [modify] https://crrev.com/931ed421b527d703565f3e09c50b6a013efe36d0/masters/master.tryserver.chromium.linux/slaves.cfg
,
Apr 16 2017
,
Apr 19 2017
Hey Dirk, I haven't seen any grey spv2 trybots recently, and the bot seems a lot faster overall. Thank you for your help! It makes our team's day-to-day development faster.
,
Apr 19 2017
,
Apr 19 2017
Amazing how having more than one of something might make a difference :). |
|||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||
Comment 1 by sergeybe...@chromium.org
, Mar 21 2017Components: -Infra Infra>CQ Infra>Platform>Buildbucket