CFI trybot has a smaller memory size than CFI buildsbot |
|||||
Issue descriptionTL;DR: CFI trybot has only 118 GB RAM, while buildbots have 208 GB RAM. It caused some tests to only fail on buildbots and not trybots which lead to multi-day debug session. At 118 GB RAM, the trybots are also much slower. Please, update the CFI trybot to have 208 GB RAM Longer version: 22 Mar 2017, 10:00 UTC: https://codereview.chromium.org/2743663005/ submitted and that broke CFI and LTO buildbots. 22 Mar 2017, 16:00 UTC: The CL was reverted to fix the bots: https://codereview.chromium.org/2766933004/ 22 Mar 2017, 18:00 - 19:00 UTC: ssid@ and krasin@ are trying to reproduce it locally, and failing (all affected buildbot configurations had been tried). Trybots were green as well: https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_cfi_rel_ng/builds/172 23 Mar 2017, 12:30 UTC: the CL is relanded with more checks: https://codereview.chromium.org/2766303002/ 24 Mar 2017, 11:40 UTC: the fix is submitted. ssid@ correctly identified the signed integer overflow by just looking at the code. The overflow is only triggered if a machine has more than ~200 GB RAM. Please, make slaves 904 and 905 to have 208 GB RAM, just like LTO / CFI buildbots have to avoid similar "reproduced only on the buildbot" issues: https://cs.chromium.org/chromium/build/masters/master.tryserver.chromium.linux/slaves.cfg?type=cs&q=cfi+file:try+package:%5Echromium$&l=89 Thank you!
,
Mar 28 2017
,
Mar 28 2017
Respawned with 208GB ram
,
Mar 28 2017
Hm... https://build.chromium.org/p/tryserver.chromium.linux/buildslaves/slave904-c4 still shows architecture: amd64 bitness (userland): 64-bit memory total: 118.05 GB os family: Debian os version: 14.04 processor count: 32 processor type: Intel(R) Xeon(R) CPU @ 2.30GHz product name: Google Compute Engine uptime: 0:03 hours Can you please take a second look?
,
Mar 28 2017
Same thing with slave905-c4: architecture: amd64 bitness (userland): 64-bit memory total: 118.05 GB os family: Debian os version: 14.04 processor count: 32 processor type: Intel(R) Xeon(R) CPU @ 2.30GHz product name: Google Compute Engine uptime: 0:03 hours
,
Mar 28 2017
So, the problem is that it's either still at 118 GB RAM, or slaves didn't get the info file updated. Impossible to say without an access to the bot.
,
Mar 28 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/infra/infra_internal/+/6b583627c081ec79721f6fdc7d73672b69f4c083 commit 6b583627c081ec79721f6fdc7d73672b69f4c083 Author: Elliott Friedman <friedman@google.com> Date: Tue Mar 28 21:04:34 2017
,
Mar 28 2017
Sorry about that... there was an error in the config file. Also, due to that error, these 2 machines are/were Trusty. I have fixed the error and they're now 208GB RAM, but I left them as Trusty since that's the direction we'd like you to go in and you never complained before. I've respawned these and they should be 208GB now. If you really want Precise, I'll respawn again. Just let me know. Thanks
,
Mar 28 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/infra/infra_internal/+/eb2effe0181ef9fef63799bed8803a011cd40678 commit eb2effe0181ef9fef63799bed8803a011cd40678 Author: Elliott Friedman <friedman@google.com> Date: Tue Mar 28 21:36:49 2017
,
Mar 29 2017
Thank you very much! The trybots being on Precise was another difference between them and CFI buildbots. Thank you for spotting it and fixing it. |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by sergeybe...@chromium.org
, Mar 28 2017