New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 704962 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

CFI trybot has a smaller memory size than CFI buildsbot

Project Member Reported by krasin@chromium.org, Mar 24 2017

Issue description

TL;DR: CFI trybot has only 118 GB RAM, while buildbots have 208 GB RAM. It caused some tests to only fail on buildbots and not trybots which lead to multi-day debug session. At 118 GB RAM, the trybots are also much slower. Please, update the CFI trybot to have 208 GB RAM

Longer version:

22 Mar 2017, 10:00 UTC: https://codereview.chromium.org/2743663005/ submitted and that broke CFI and LTO buildbots.
22 Mar 2017, 16:00 UTC: The CL was reverted to fix the bots: https://codereview.chromium.org/2766933004/
22 Mar 2017, 18:00 - 19:00 UTC: ssid@ and krasin@ are trying to reproduce it locally, and failing (all affected buildbot configurations had been tried). Trybots were green as well:
https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_cfi_rel_ng/builds/172

23 Mar 2017, 12:30 UTC: the CL is relanded with more checks: https://codereview.chromium.org/2766303002/
24 Mar 2017, 11:40 UTC: the fix is submitted. ssid@ correctly identified the signed integer overflow by just looking at the code. The overflow is only triggered if a machine has more than ~200 GB RAM.

Please, make slaves 904 and 905 to have 208 GB RAM, just like LTO / CFI buildbots have to avoid similar "reproduced only on the buildbot" issues:
https://cs.chromium.org/chromium/build/masters/master.tryserver.chromium.linux/slaves.cfg?type=cs&q=cfi+file:try+package:%5Echromium$&l=89

Thank you!

 
Components: -Infra Infra>Labs
Over to labs: could you upgrade slave904-c4 and slave905-c4 to 208GB of RAM, similar to slave20-c1. Thanks!
Owner: friedman@chromium.org
Status: Assigned (was: Untriaged)
Status: Fixed (was: Assigned)
Respawned with 208GB ram

Comment 4 by krasin@chromium.org, Mar 28 2017

Hm... https://build.chromium.org/p/tryserver.chromium.linux/buildslaves/slave904-c4 still shows

architecture: amd64
bitness (userland): 64-bit
memory total: 118.05 GB
os family: Debian
os version: 14.04
processor count: 32
processor type: Intel(R) Xeon(R) CPU @ 2.30GHz
product name: Google Compute Engine
uptime: 0:03 hours

Can you please take a second look?

Comment 5 by krasin@chromium.org, Mar 28 2017

Status: Assigned (was: Fixed)
Same thing with slave905-c4:

architecture: amd64
bitness (userland): 64-bit
memory total: 118.05 GB
os family: Debian
os version: 14.04
processor count: 32
processor type: Intel(R) Xeon(R) CPU @ 2.30GHz
product name: Google Compute Engine
uptime: 0:03 hours

Comment 6 by krasin@chromium.org, Mar 28 2017

So, the problem is that it's either still at 118 GB RAM, or slaves didn't get the info file updated. Impossible to say without an access to the bot.
Project Member

Comment 7 by bugdroid1@chromium.org, Mar 28 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infra/infra_internal/+/6b583627c081ec79721f6fdc7d73672b69f4c083

commit 6b583627c081ec79721f6fdc7d73672b69f4c083
Author: Elliott Friedman <friedman@google.com>
Date: Tue Mar 28 21:04:34 2017

Sorry about that... there was an error in the config file.  Also, due to that error, these 2 machines are/were Trusty.  I have fixed the error and they're now 208GB RAM, but I left them as Trusty since that's the direction we'd like you to go in and you never complained before.

I've respawned these and they should be 208GB now.  If you really want Precise, I'll respawn again.  Just let me know.

Thanks
Project Member

Comment 9 by bugdroid1@chromium.org, Mar 28 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infra/infra_internal/+/eb2effe0181ef9fef63799bed8803a011cd40678

commit eb2effe0181ef9fef63799bed8803a011cd40678
Author: Elliott Friedman <friedman@google.com>
Date: Tue Mar 28 21:36:49 2017

Status: Fixed (was: Assigned)
Thank you very much! The trybots being on Precise was another difference between them and CFI buildbots. Thank you for spotting it and fixing it.

Sign in to add a comment