New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 620249 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner:
Closed: Jun 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 0
Type: Bug-Regression



Sign in to add a comment

Most of windows perf bots haven't run since two days ago

Project Member Reported by petrcermak@chromium.org, Jun 15 2016

Issue description

https://build.chromium.org/p/chromium.perf/console

This seems to affect almost all Windows perf bots with the exception of "Win 7 Perf" and "Win 7 Low-End Perf" (35/42 bots seem to be down).
 
The last build on the 35 relevant bots was on 13th June around 5pm.

Here's a screenshot of the perf console at the time of writing.
console.png
149 KB View Download
Cc: sullivan@chromium.org
Killed the buildhttps://uberchromegw.corp.google.com/i/chromium.perf/builders/Android%20Nexus9%20Perf%20%282%29/builds/2518, which took over 6 hours.

https://build.chromium.org/p/chromium.perf/buildslaves/build15-b1 is down and I can't SSH into it. Labs needs to reboot it, because it's not rebootable from vm-power.

Let's see if this helps. Will handle the rest after lunch.
Owner: serg...@chromium.org
Status: Started (was: Untriaged)
Looks "provision_devices" step takes a long time on Android Nexus9 Perf (2). For the build that ran over 6 hours, it actually took 5.5 hours just to get the devices. Judging by earlier successful builds, it takes anything between a few minutes to a couple of hours. I'm not sure what is normal here, so I'll leave it running for now.
Since bots are cycling so slowly, I assume they are running for all revisions, but do not end up visible on console page since we only show last 40 revisions. I've tried loading console page with 100 revisions (https://build.chromium.org/p/chromium.perf/console?limit=100), but it takes master to long to reply to my request that I think it'll probably time out. Refreshing several times probably made it worse. Let's just hope that didn't bring down the master simply by issuing a couple of requests :-).
s/running for all revisions/running on all builders/
I've just realized that I can check which builders are running by looking at https://build.chromium.org/p/chromium.perf/builders, but since master is still busy processing my previous requests, I can't open that page. I'm considering restarting the master if it doesn't reply in the next couple of minutes.
Cc: machenb...@chromium.org
+machenbach suggested that because I issued limit=100 request, master probably starting loading builders from on-disk pickle, which I've confirmed by looking at the log:

2016-06-15 04:31:12-0700 [-] Loading builder Win 7 Low-End Perf (1)'s build 4191 from on-disk pickle
2016-06-15 04:31:12-0700 [-] Loading builder Win 7 Low-End Perf (1)'s build 4190 from on-disk pickle
2016-06-15 04:31:12-0700 [-] Loading builder Win 7 Low-End Perf (1)'s build 4189 from on-disk pickle
2016-06-15 04:31:13-0700 [-] Loading builder Win 7 Low-End Perf (1)'s build 4188 from on-disk pickle
...

So I'll wait another 5 minutes to see if loading from pickle stops and if not I'll restart the master. The reason I'll wait 5 minutes is that we should rather not restart the master with long-running builds as they'll have to be rebuilt again.
Project Member

Comment 10 by bugdroid1@chromium.org, Jun 15 2016

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/master-manager.git/+/c81e6c0be2ecfd58a68778a7ff91c88bc1c0ca54

commit c81e6c0be2ecfd58a68778a7ff91c88bc1c0ca54
Author: sergiyb <sergiyb@google.com>
Date: Wed Jun 15 11:41:28 2016

 Issue 620284  has been merged into this issue.
After the restart it became much easier to see which builders are actually scheduling jobs for new revisions: https://screenshot.googleplex.com/NJa6fRszWOR.png. One missing on Android is the builder which uses bot build15-b1, which needs to be restarted by the Labs team. I'll investigate missing builds on most Win builders.
After studying some logs, I can see that new revisions are detected and recorded in database, e.g:

added change Change(revision=u'f72bda81d0c3e6f256826e9ecdf6d745e4ad76c8', who=u'guidou@chromium.org', branch=u'master', comments=u"Revert of Make default media device ID salts random by default (patchset #13 id:260001 of https://codereview.chromium.org/1987643002/ )\n\nReason for revert:\nReverting because it is breaking the WebRTC MacTester bot.\nhttps://build.chromium.org/p/chromium.webrtc/builders/Mac%20Tester/builds/55969/\n\nOriginal issue's description:\n> This results in different hashed device IDs on each session on embedders that don't have a specialized implementation of device ID salts such as WebView, Blimp and Content Shell\n...skip...\n incognito mode, Chrome's implementation of salts has been updated to defer to the new default on incognito mode.\n>\n> BUG= 315022 \n>\n> Committed: https://crrev.com/4db1329e005388540eb07429ac97827ca9bc422b\n> Cr-Commit-Position: refs/heads/master@{#399883}\n\nTBR=jam@chromium.org\n# Skipping CQ checks because original CL landed less than 1 days ago.\nNOPRESUBMIT=true\nNOTREECHECKS=true\nNOTRY=true\nBUG=315022\n\nReview-Url: https://codereview.chromium.org/2065383003\nCr-Commit-Position: refs/heads/master@{#399887}\n", when=1465995438, category=None, project=u'src', repository=u'https://chromium.googlesource.com/chromium/src') to database

After this, buildsets should be created, which eventually results in builds and indeed we see some buildsets created:

added buildset 7906 to database (build requests: {'Android One Perf (2)': 20213, 'Android One Perf (3)': 20214, 'Android Nexus5 Perf (1)': 20203, 'Android Galaxy S5 Perf (1)': 20200, 'Android Nexus5 Perf (3)': 20205, 'Android Nexus5 Perf (2)': 20204, 'Android Nexus7v2 Perf (1)': 20209, 'Android One Perf (1)': 20212, 'Android Nexus6 Perf (2)': 20207, 'Android Nexus6 Perf (3)': 20208, 'Android Nexus7v2 Perf (2)': 20210, 'Android Galaxy S5 Perf (3)': 20202, 'Android Nexus7v2 Perf (3)': 20211, 'Android Nexus6 Perf (1)': 20206, 'Android Galaxy S5 Perf (2)': 20201})

However, there are no buildsets for many Win builders, e.g. there is mention of adding buildset for "Win 7 x64 Perf (1)" in the logs: https://pantheon.corp.google.com/logs?project=chrome-infra-logs&minLogLevel=0&expandAll=false&resource=compute.googleapis.com%2Fresource_type%2Fmaster%2Fresource_id%2Fmaster.chromium.perf&advancedFilter=metadata.serviceName%3D%22compute.googleapis.com%22%0Ametadata.labels.%22compute.googleapis.com%2Fresource_type%22%3D%22master%22%0Ametadata.labels.%22compute.googleapis.com%2Fresource_id%22%3D%22master.chromium.perf%22%0A%22added%20change%22%20OR%20(%22added%20buildset%22%20AND%20%22Win%207%20x64%20Perf%20(1)%22)%20OR%20(%22added%20buildset%22%20AND%20%22Android%20Galaxy%20S5%20Perf%20(1)%22)&logName=
The only way buildset would not be created is when the change is deemed important by the function passed as fileIsImportant parameter to the Scheduler constructor. However, according to https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.chromium.perf/master.cfg#205, we don't pass it, so it's essentially considering every change as important. Now I am at a complete loss why we don't schedule builds on Windows builders.
Mystery solved. The builder "Win 7 x64 Perf (1)" is a tester, so it will be triggered after build on builder "Win x64 Builder" is finished. I'll wait until this happens and if things work after this, I'll mark this bug as fixed.
And the testers don't get triggered, because builders fail, see  issue 620249 .
Status: WontFix (was: Started)
I consider this WontFix (works as intended) since I see no reason why waterfall is broken - testers are just waiting for builders to succeed. Regarding  issue 620249 , it's P1 and will be handled accordingly during MTV working hours. EMEA support is limited to P0.
In comments #16 and #17, I meant to reference  issue 619949 .

Sign in to add a comment