PFQ: Build slaves are progressively slowing and timing out |
||||||||
Issue descriptionFrom a recent PFQ lumpy build: https://uberchromegw.corp.google.com/i/chromeos/builders/lumpy-chrome-pfq/builds/9140 InitSDK: +25 mins BuildPackages: +37 mins Looking through the logs of BuildPackages it is not clear why either stage took longer. The Load numbers looked fairly similar, but something was clearly slowing the builder down. This has happened before. It is very frustrating. I am tempted to increase the master timeout since failing because we killed a slave is generally worse than delaying a PFQ build by an extra half an hour. Our margins are pretty thin right now.
,
Sep 22 2016
Something strange is afoot with the buildslaves. Starting with this build: https://uberchromegw.corp.google.com/i/chromeos/builders/master-chromium-pfq/builds/3395 The following slaves got significantly slower (by about an hour): x86-alex, tricky, falco, amd64-generic, peppy, cyan, lumpy Starting with the following build, the remaining builders also slowed down by about an hour. This can be visualized nicely but looking at the 'Slaves timeline' link from the Report section of the master-pfq (kudos to whoever added that!): 3394 (normal): https://00e9e64baca863998ad3db9b7fd5839586cf9215a92cee211f-apidata.googleusercontent.com/download/storage/v1/b/chromeos-image-archive/o/master-chromium-pfq%2FR55-8822.0.0-rc2%2Ftimeline-slaves.html?qk=AD5uMEtRX1uw9qPeZIBIUKH7k0DZu7Hiji7K__t5lJu-WXnJGWZHEluga1hkZ8ckt4GHZFki4tPyYT0kaB-JRsRZmFsou44oENv9SvYHmatF4PI6FrAuWcDeZ5lteYRm2aNytUGBfdNl9auUdiaZqleRKzn4XlUX_lyozb2-j4MELLpPmVZl2RoV3Gh6haJfh5JGa4PVer88s9JXEW7NvuODIZhLAoo3wSgx4vY8Vk69xz-_3k6mPKSMXAKmrbTV9IPyAxh8flNsUXucv5a9jcQQYsUh41bIxCGSlLNXkOpNpSfkmCxXAuU002SX_i-FhMgyU5PH7t8CFjHmTwHL7weWgEIVN0Q72Avek4-p1Z41qZtqifpYzSWw0vASSSUTT9rimXBPyLrFHnjH8ods5Yuk84G9w91EjQhL-melXcpZU9HjHjDX9t1t2MOWGmFym3fcpCEy4bDNGWHwgplNgluKu9dU64JUzv1UQMflrd72dJhza9aEsmefrRK9w_QlQIlwZBsv4tDtn6H9SNzSwkJrMbNzZoQUQaN7S-0AKMu6loZXsH6FIL7uSmB_OkOLgUSKoeKCEHtyTssWcG9LFp1yWakM3c7Sv0JTb-ID3oQtVKYnBQRleKoq163bOD2YTvGMv9Y1qg6CS_QyVyAUuh4K-fyztRZD_aRzm7_fdiHMYBoKOQHjOTHzEboQyyWDwuzSoBWxM-925jfr6opEnP8bHSKs32P3Hbt3B4RLl_cuE1NkP5fGlK1nM20qRRUAuETTaLKKNOFHhTjdW7NreYuIp9Q5FomXU5Lc3vGe6JORToNYFa6v68cTrC1lpLknTGIbjY4PWJwHkXbgVnlSDmbfMrP4xpkb8Q 3394 (some slow): https://00e9e64baca7f54b8a2052d9b8f59ce337968f57894e5d7247-apidata.googleusercontent.com/download/storage/v1/b/chromeos-image-archive/o/master-chromium-pfq%2FR55-8823.0.0-rc1%2Ftimeline-slaves.html?qk=AD5uMEsWy2uLIbzTgAVEqtd-xlwmTJYIXT0Gp5psKOpAKnRDXNoJ2FM2tiVmskSkkJsVfuZy_YHEXtdCaY7GBPba8napXGATtjUi_xFpNu3m3A54gxri4ba4xD2dBAJyW0oQnXRvpCDKHREiiRDc2_zvcppIiVSHDULCSuGL67HCM9afMgw4vFcVwTpJHTKYItaHDu53vvqe3OhmyDUd8V5d39pSnDVeTr6XzPxKBBQt1xKAxySZglO3X3kd4YjhGTE6XZEfIvuD0vgGdMxiPtdyWy2mY2CIuuY7ecNuRiosaWw2M14SjIAUVXGlnx45S2fojl_za5rd98-5K9ZguBJYztb2KLSrecTb8_iYIm3uhBJyd03kEzQaTQR651840l3FBtLA5ojiGKU5LoiR3bXw2N6YJIp4fWVq7QfZmoDHDe3s65YMRMBv3RuY1RxAbP1UUFbkh6WkM9AglJ_V2Nuh_pn9DnvglJJryDh5EN9TasbKK0oqhz2fINWTJ5ippJj212WhLwdQ4r-yB8tPGNkNEsLrexlI_7ISxCl2QKOaAq5Xr2XXfh9w1HjQKpoc8Jt3FbAsQQA5SSnrpYQO9IgoJzDb-Frs9fb3cwfrDXPzAXswZepBvgu4LGTEMXG6ABMlCTO-iMJbj741i_5SDhZDz_dW-Y-2MDOXesQwTHq7Eo4PuJjKpyikwvEJmLnVrxUiK1MK4PcfRYluwWRyhEwom5QdlftrIapkebnCrfoZ08eWAOo8M8TtG3twfnV1_zUbbwri2_2gr2QJk_9148f6g_nkdepOD-usccsJZem_osvPBVQwf5cY-lNmgedDWow8KPnQxaYmK70Kz-xeIrtNYmVLtQk9zA 3395 (all slow): https://00e9e64bac40f12647aa504ce911d8ea9323b8d04e1e098f51-apidata.googleusercontent.com/download/storage/v1/b/chromeos-image-archive/o/master-chromium-pfq%2FR55-8824.0.0-rc2%2Ftimeline-slaves.html?qk=AD5uMEv2EVy75QiohfVfqjN0H0V554nyyIyctw-I6RcM2h3KB96qsmXUtqaa3xkpOPhT7GshShDdxATaVGeyx5t0JKnS87BGl8pcvXDt2Er-NpZYrsKgFB-3oPFguwWIlPACjjHFaLfoN5z28nLX73lgEhxQi1NdsxI_zBaMdQMhgkmEAqt3X1vx0JHqXLWcWHichPIIXophIku3oSdJWCtWkW-6YZpdOv43A7v3zESd811PVVmjn8Elfx-BB3k1c9uKYcS2afI5d5jLb5WKLvhtvNrmRjK1akmsR0ZH0AfE6ajw2TdWt0srd1KXtCCoSEvW3Rprw5XGhHRYRjQkgjpPlVNnCHGfaVy9zlXVgmm1hdm6P7KqsIxUmXseMOrWEt6BNrtUdk-Z4TLqxtdiZbjIbJDGNd9qEXyyL5IBJq-ogVbzZPzvIPDuZiOv4fJDz4Z9EpnXFAdrO3SwgetYUcBwzpXwMgzKEcEhXWDCqLh057DkHe7yXc4jBslE2TkN9UXM9_KOQS3A4Nqv5ixzRvfN68KrLwbsqDYUpmReibh1z7pVv8lgyCULHkvRuYnfjKESRVLBDzhuFAKfb02au5_Qn07I0kDVlWzLgMPCm16IN5grzcAomwq_WC-SWBStg9TS0asDrbwpCnxKqM-Q5mUgG38xuwIyza3sACt-W1uwCLoiRbAksqdIGwHJ-kc89oTRQ55QKtWFioYwEkpkNAdMC5fpOVauiZJDC7VNtctSYj3XNhhhW2PI0B_2G_c9ihtiYj3rGukgY7tJzabtqartsI6WLK1JECVsQ14LqmI-tmN3YwTMqPKWoQEH3cmhkm4c0E-2SDoORbADSrURZP-oRVmj-9pGcw This behavior makes me suspect that something is slowing down the builders and this is not a cros or chrome change. (The fact that it also affects the InitSDK stage contributes to that suspiction).
,
Sep 22 2016
Ugh, my links to the pretty graphs expire, so here are the builders, you'll need to scroll down and click on the 'Slaves timeline' link in the Report section. https://uberchromegw.corp.google.com/i/chromeos/builders/master-chromium-pfq/builds/3394 https://uberchromegw.corp.google.com/i/chromeos/builders/master-chromium-pfq/builds/3395 https://uberchromegw.corp.google.com/i/chromeos/builders/master-chromium-pfq/builds/3396 And even slower on the next build, not a good trend... https://uberchromegw.corp.google.com/i/chromeos/builders/master-chromium-pfq/builds/3397
,
Sep 22 2016
,
Sep 22 2016
,
Sep 22 2016
Assigning to sbasi@ since it seems unlikely that this is due to a chrome or chromeos change, but also +cros sheriffs in case this is also affecting other cros builds.
,
Sep 22 2016
Building the Chrome package seems unusually slow (memory says that should be ! 45m, but that should be confirmed): Completed chromeos-base/chromeos-chrome-55.0.2867.0_rc-r1 (in 83m35.2s) Since it's the actual Chrome compile time, that could caused by a change in the Chrome code, or some bit of infrastructure Chrome uses to build (Goma? I'm ignorant here).
,
Sep 22 2016
Hey Steven, any idea if something changed on the Chrome build side? That link in comment #1 took 83m35.2s to build Chrome. Dgarrett is thinking this could be a change on their side: "That's compile time, so I'd look for something different in their build system. Using Goma, not parallel.... or something." In the peppy build for 3395 https://uberchromegw.corp.google.com/i/chromeos/builders/peppy-chrome-pfq/builds/2546/steps/BuildPackages/logs/stdio : 84m3.0s Cyan: 81m42.6
,
Sep 22 2016
+dpranke,dgarrett
,
Sep 22 2016
Oh... I checked a couple of recent lumpy-release build as a comparison: Completed chromeos-base/chromeos-chrome-55.0.2866.0_rc-r1 (in 77m29.6s) Completed chromeos-base/chromeos-chrome-55.0.2862.3_rc-r1 (in 78m16.5s) That means 80 minutes isn't surprising.
,
Sep 22 2016
My initial assumption was that it might be a chrome change, but that should have affected all boards at the same time. Also I don't see how that would affect InitSDK?
,
Sep 22 2016
+davidriley, phobbs Our new stats tools might be able to help narrow down when/where/what.
,
Sep 22 2016
/google/data/ro/projects/speckle/mysql -S /cloudsql/cosmic-strategy-646:cidb-replica-gviz -u root mysql> select builder_name, b.start_time, bs.status, bs.name, timediff(bs.finish_time, bs.start_time) from buildTable b join buildStageTable bs where builder_name like 'lumpy-chrome-pfq' and b.id = bs.build_id and bs.status != 'skipped' and bs.name = 'BuildPackages' and bs.last_updated > date_sub(current_date(), interval 14 day); +------------------+---------------------+--------+---------------+-----------------------------------------+ | builder_name | start_time | status | name | timediff(bs.finish_time, bs.start_time) | +------------------+---------------------+--------+---------------+-----------------------------------------+ | lumpy-chrome-pfq | 2016-09-07 23:00:04 | pass | BuildPackages | 01:37:28 | | lumpy-chrome-pfq | 2016-09-08 03:42:35 | pass | BuildPackages | 01:37:55 | | lumpy-chrome-pfq | 2016-09-08 17:08:03 | fail | BuildPackages | 00:02:32 | | lumpy-chrome-pfq | 2016-09-08 20:59:04 | fail | BuildPackages | 00:02:31 | | lumpy-chrome-pfq | 2016-09-09 00:50:35 | pass | BuildPackages | 01:35:51 | | lumpy-chrome-pfq | 2016-09-09 05:20:00 | pass | BuildPackages | 01:35:47 | | lumpy-chrome-pfq | 2016-09-09 09:52:02 | pass | BuildPackages | 01:35:28 | | lumpy-chrome-pfq | 2016-09-09 17:15:30 | pass | BuildPackages | 01:35:43 | | lumpy-chrome-pfq | 2016-09-09 21:54:32 | pass | BuildPackages | 01:35:29 | | lumpy-chrome-pfq | 2016-09-10 02:17:04 | pass | BuildPackages | 01:26:20 | | lumpy-chrome-pfq | 2016-09-10 05:56:27 | pass | BuildPackages | 01:36:00 | | lumpy-chrome-pfq | 2016-09-11 00:09:00 | pass | BuildPackages | 01:35:59 | | lumpy-chrome-pfq | 2016-09-11 04:54:33 | pass | BuildPackages | 01:36:09 | | lumpy-chrome-pfq | 2016-09-12 03:24:05 | pass | BuildPackages | 01:36:06 | | lumpy-chrome-pfq | 2016-09-12 07:50:05 | pass | BuildPackages | 01:36:11 | | lumpy-chrome-pfq | 2016-09-13 03:25:06 | pass | BuildPackages | 01:39:05 | | lumpy-chrome-pfq | 2016-09-13 08:10:40 | pass | BuildPackages | 01:39:14 | | lumpy-chrome-pfq | 2016-09-13 22:20:07 | pass | BuildPackages | 01:39:50 | | lumpy-chrome-pfq | 2016-09-14 02:35:36 | pass | BuildPackages | 01:38:03 | | lumpy-chrome-pfq | 2016-09-14 06:57:04 | pass | BuildPackages | 01:40:42 | | lumpy-chrome-pfq | 2016-09-14 16:36:06 | pass | BuildPackages | 01:38:57 | | lumpy-chrome-pfq | 2016-09-14 23:30:42 | pass | BuildPackages | 01:44:08 | | lumpy-chrome-pfq | 2016-09-15 04:16:40 | pass | BuildPackages | 01:44:38 | | lumpy-chrome-pfq | 2016-09-15 17:29:35 | pass | BuildPackages | 01:43:58 | | lumpy-chrome-pfq | 2016-09-16 00:11:12 | pass | BuildPackages | 01:39:21 | | lumpy-chrome-pfq | 2016-09-16 04:21:36 | pass | BuildPackages | 01:39:57 | | lumpy-chrome-pfq | 2016-09-16 21:44:09 | pass | BuildPackages | 01:39:11 | | lumpy-chrome-pfq | 2016-09-17 02:11:09 | pass | BuildPackages | 01:37:59 | | lumpy-chrome-pfq | 2016-09-17 06:08:06 | pass | BuildPackages | 01:39:56 | | lumpy-chrome-pfq | 2016-09-18 00:08:59 | pass | BuildPackages | 01:39:54 | | lumpy-chrome-pfq | 2016-09-18 04:54:02 | pass | BuildPackages | 01:39:54 | | lumpy-chrome-pfq | 2016-09-19 00:08:29 | pass | BuildPackages | 01:39:40 | | lumpy-chrome-pfq | 2016-09-19 04:48:32 | pass | BuildPackages | 01:38:59 | | lumpy-chrome-pfq | 2016-09-20 00:13:09 | pass | BuildPackages | 01:39:13 | | lumpy-chrome-pfq | 2016-09-20 04:10:04 | pass | BuildPackages | 01:40:35 | | lumpy-chrome-pfq | 2016-09-20 16:05:31 | pass | BuildPackages | 01:40:16 | | lumpy-chrome-pfq | 2016-09-21 03:27:03 | pass | BuildPackages | 01:42:05 | | lumpy-chrome-pfq | 2016-09-21 08:07:08 | pass | BuildPackages | 01:40:03 | | lumpy-chrome-pfq | 2016-09-21 17:26:05 | pass | BuildPackages | 02:17:07 | | lumpy-chrome-pfq | 2016-09-22 00:09:39 | pass | BuildPackages | 02:17:01 | | lumpy-chrome-pfq | 2016-09-22 04:56:04 | pass | BuildPackages | 02:17:29 | +------------------+---------------------+--------+---------------+-----------------------------------------+
,
Sep 22 2016
Via Monarch: http://shortn/_aVDA81yD0s
,
Sep 22 2016
this is due to this change: https://chromium-review.googlesource.com/386970 it's causing a lot of rebuilds due to metadata now being in there that wasn't before. it should go away for the sdk once the next sdk bot run finishes with that CL and publishes an updated sdk. it should go away for the board once we get updated binpkgs for target boards (so i guess CQ run finishes). you can tell when it's the problem because you'll see this in the package line: ABI_X86="(64*) -32 (-x32)" that * means the USE flag is new and portage needs to rebuild it. unfortunately, this change is something we can't avoid. it had to land at some point in order to stay in sync with Gentoo and to better support multilib. the longer we waited the worse it got :(. if we had pulled it in much sooner, it wouldn't have been nearly as painful.
,
Sep 22 2016
looks like the chromiumos-sdk bot has recently finished a build w/the CL: https://uberchromegw.corp.google.com/i/chromiumos/builders/chromiumos-sdk/builds/7467 so hopefully the InitSDK times should go back down for all new builds. BuildPackages times will go down once the CQ finishes and publishes new prebuilts.
,
Oct 6 2016
Pri-0 bugs are critical regressions or serious emergencies, and this bug has not been updated in three days. Could you please provide an update, or adjust the priority to a more appropriate level if applicable? If a fix is in active development, please set the status to Started. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Oct 6 2016
This was expected failure related to the toolchain update. |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by steve...@chromium.org
, Sep 22 2016Labels: -Pri-1 Pri-0
Owner: steve...@chromium.org