New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 673723 link

Starred by 3 users

Issue metadata

Status: WontFix
Owner:
User never visited
Closed: Dec 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 0
Type: Bug



Sign in to add a comment

Swarming Ubuntu-12.04 on linux_chromium_rel_ng is expiring

Project Member Reported by nedngu...@google.com, Dec 13 2016

Issue description

 Issue 673720  has been merged into this issue.

Comment 2 by mar...@chromium.org, Dec 13 2016

Cc: benhenry@chromium.org smut@chromium.org
Components: -Infra Infra>Platform>Swarming
Labels: -Pri-1 Infra-Troopers OS-Linux Pri-0
Owner: iannucci@chromium.org
Status: Assigned (was: Untriaged)
Summary: Swarming Ubuntu-12.04 on linux_chromium_rel_ng is expiring (was: Swarming infra failures cause flaky tests on linux_chromium_rel_ng)
Example: https://chromium-swarm.appspot.com/task?id=330e6b1650139c10&refresh=10&show_raw=1

Requests are coming as 12.04 but most of the fleet was changed to 14.04
These need to be updated to Ubuntu-14.04? This was tracked as  issue 664294 .

There's a large number of bots on both fleet so I'm quite surprised that this is a problem.

Ubuntu-14.04: 825 bots, 116 busy.
https://chromium-swarm.appspot.com/botlist?c=id&c=os&c=task&c=status&f=os%3AUbuntu-14.04&l=100&s=id%3Aasc

Ubuntu-12.04: 809 bots, 672 busy.
https://chromium-swarm.appspot.com/botlist?c=id&c=os&c=task&c=status&f=os%3AUbuntu-12.04&l=100&s=id%3Aasc


(bots may look instantaneously idle while fetching another task, which happens more often when running quick tasks)

Comment 3 by mar...@chromium.org, Dec 13 2016

The task seem to take a long time, a casual look I'm seeing many tasks with >10m runtime, which is significantly too high.

Comment 4 by mar...@chromium.org, Dec 13 2016

Cc: tandrii@chromium.org
+Andrii FYI

Comment 5 by benhenry@google.com, Dec 13 2016

It's unclear to me what we need to do to fix: does Ned need to update his recipe to request a less specific ubuntu version, or to update to 14.04? Do we need to expire tasks more quickly? Ned - are you completely blocked and should this be a p0?
To #5: I am filing this bug as a Chromium-sheriff for 12/13 & 12/14 shift. This does not block my work.

Comment 7 by s...@google.com, Dec 13 2016

Whatever you decide, let me know when to update MP config. It's still 80% Precise, 20% Trusty.

Comment 8 by iannu...@google.com, Dec 16 2016

Cc: -smut@chromium.org s...@google.com
I'm completely at a loss for what to do here... is there a relevant entry in the playbook that I'm missing?

Comment 9 by iannu...@google.com, Dec 16 2016

Owner: s...@google.com
It sounds like smut maybe is the one who knows what to do? Tentatively assigning, if that's actually the case.

Comment 10 by s...@google.com, Dec 16 2016

I'm not sure what I should do. All I see from the Machine Provider graphs is that the 12.04 VMs are available. Right now 737 of them are busy so it seems like they're running tasks.

Are we suggesting that 800 12.04 VMs is not enough? This is actually up from ~600 that we had before Machine Provider was supplying them.

Comment 11 by s...@google.com, Dec 16 2016

Here's a recent expiry:
https://chromium-swarm.appspot.com/task?id=33226c4962815110&refresh=10&show_raw=1

Created   12/16/2016, 1:47:34 PM (PST)
Abandoned 12/16/2016, 2:48:14 PM (PST)

At the time, some of the MP VMs were being refreshed (refreshing VMs is staggered across several hours), but there wasn't a significant drop in the number of available VMs, and most of the refreshed VMs connected and started running tasks right away.

e.g.:
https://chromium-swarm.appspot.com/bot?id=gce-precise-c4413e93-0b6h&selected=1&show_all_events=true&sort_stats=total%3Adesc

Leased at 1:56:01 PM (PST), connected at 1:56:44 PM (PST), requested task at 1:57:48 PM (PST).

Comment 12 by s...@google.com, Dec 17 2016

Ubuntu-12.04 tasks are expiring at this very moment even though there are currently 797 gce-precise VMs being supplied by Machine Provider, 720 of which are running tasks and the rest of which seem connected so they are probably instantaneously idle and about to reap a task as maruel said in #2.

Since all MP VMs are there and tasks are still expiring I can only conclude we need to provide even more than 800 Precise VMs.

Is it possible there was a massive surge of Ubuntu-12.04 tasks caused by the long pending queues on tryserver.chromium.linux? Maybe that's why we suddenly don't have enough?
Cc: stip@chromium.org
I think there was some kind of unusual spike in number of tasks. CCing Mike, I hope he knows more.

If so, I don't think adding more VMs is a good move.
We need to figure out what increased the load, since I suspect the load increased. We'd save a lot of money if we did the migration to 14.04 ASAP so we wouldn't have two parallel fleet.

Comment 15 by s...@google.com, Dec 20 2016

Cc: -s...@google.com
Project Member

Comment 16 by sheriffbot@chromium.org, Dec 27 2016

Pri-0 bugs are critical regressions or serious emergencies, and this bug has not been updated in three days. Could you please provide an update, or adjust the priority to a more appropriate level if applicable?

If a fix is in active development, please set the status to Started.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Status: WontFix (was: Assigned)
Seems like this no longer happens.

Comment 18 by s...@google.com, Jun 23 2017

Owner: smut@chromium.org

Sign in to add a comment