"telemetry_gpu_unittests (with patch)" is flaky |
|||||||
Issue description"telemetry_gpu_unittests (with patch)" is flaky. This issue was created automatically by the chromium-try-flakes app. Please find the right owner to fix the respective test/step and assign this issue to them. If the step/test is infrastructure-related, please add Infra-Troopers label and change issue status to Untriaged. When done, please remove the issue from Sheriff Bug Queue by removing the Sheriff-Chromium label. We have detected 8 recent flakes. List of all flakes can be found at https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLwsSBUZsYWtlIiR0ZWxlbWV0cnlfZ3B1X3VuaXR0ZXN0cyAod2l0aCBwYXRjaCkM. This flaky test/step was previously tracked in issue 637200 .
,
Nov 7 2016
Will look a bit later today (swamped now)
,
Nov 8 2016
The tests fail it seems due to lack of capacity on swarming - looking at a few failing shards shows them as "Expired". The largest pool used for this builder on swarming indeed seems to be overloaded: http://vi/chrome_infra/Jobs/pools?duration=1d&job_regexp=tryserver.chromium.win.%2A&pool=cores%3A8%7Ccpu%3Ax86%7Ccpu%3Ax86-64%7Cgpu%3Anone%7Cmachine_type%3An1-highcpu-8%7Cos%3AWindows%7Cos%3AWindows-7-SP1%7Cpool%3AChrome&refresh=-1&service_name=chromium-swarm
,
Nov 8 2016
There was a definite influx of expired tasks late Nov 3 - early Nov 4, which is when flakes were reported: http://vi/chrome_infra/Buildbot/per_builder?builder=win_chromium_rel_ng&duration=7d&job_regexp=tryserver.chromium.win.%2A&master=master.tryserver.chromium.win&refresh=-1&service_name=chromium-swarm&utc_end=1478564683#_VG_JypEDXSe Not sure what was the trigger, but we are low on capacity, so for the time being this is probably unavoidable, until we get more machines.
,
Nov 8 2016
+vhang@ and phajdan.jr@ - FYI for capacity of the windows swarming pool - see #c3. The pool is running pretty much at full capacity at peaks. We should be aiming for ~75% peak load on average.
,
Nov 11 2016
Sergey, Can you tell me how bots are in each of the oversubscribed pools? Let's fan the fire by adding 10-20% more to each pool to see if that helps. I think we're taking too long to analyze the amount we need when we can quickly pad the pools and then analyze the numbers in details later. Your thoughts?
,
Dec 9 2016
Apparently, this dropped off my radar for too long - sorry. I'll try to get back to this once other immediate fires are dealt with.
,
Dec 13 2016
Detected 3 new flakes for test/step "telemetry_gpu_unittests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLwsSBUZsYWtlIiR0ZWxlbWV0cnlfZ3B1X3VuaXR0ZXN0cyAod2l0aCBwYXRjaCkM. This message was posted automatically by the chromium-try-flakes app.
,
Jan 6 2017
Looking at this again - the swarming pool for this step hasn't changed, and it's only one single pool of 319 bots: https://goto.google.com/rwtytb I expect this pool to overload again once we are fully back from holidays...
,
Feb 22 2017
Checked it again today - and sure enough, the pool is running at capacity again. vhang: any chance we can increase this pool? Here's the current list of bots: http://shortn/_V3CTyDxZ0A Some samples: vm1-m4, vm10-m4, vm103-m4, etc.
,
Mar 20 2017
Updated pool link: https://goto.google.com/rwtytb It is still close to full capacity, and is still expiring tasks: http://shortn/_llrRfJuioc Assigning to vhang@ - please check if it is possible to add more capacity to the pool. Thanks!
,
Mar 20 2017
How many more Win7 VMs would you like?
,
Mar 20 2017
Ideally, I'd ask for another 100 bots. Is that feasible? Or as many as you can if that's too much. I can't estimate the actual expected load easily, so my reasoning is: it's maxing out now at 324 bots in the pool, we want peaks to be at ~75% capacity, so let's add 30% on top (which comes out to ~100), so that if the current peak is a true peak, it'll end up at 75%. In reality, the current true peak is likely higher, but we'll only know for sure once we add enough capacity. Thanks!
,
Mar 24 2017
,
Mar 24 2017
Assigning to johnw to handle this. We had some server shuffling in the golo and will have to wait for b/35753978 to be complete before we can free up the servers.
,
Mar 24 2017
Sorry about this -- it looks like there has been a longstanding TODO in the recipe code that's preventing Swarming from de-duplicating these runs. I just filed P1 Issue 705104 about fixing this. This should ultimately reduce the load on the Swarming pool, but I'd be surprised if it was just this one target that's causing all of the problems.
,
May 7 2017
Detected 3 new flakes for test/step "telemetry_gpu_unittests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLwsSBUZsYWtlIiR0ZWxlbWV0cnlfZ3B1X3VuaXR0ZXN0cyAod2l0aCBwYXRjaCkM. This message was posted automatically by the chromium-try-flakes app.
,
May 8 2017
Detected 4 new flakes for test/step "telemetry_gpu_unittests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLwsSBUZsYWtlIiR0ZWxlbWV0cnlfZ3B1X3VuaXR0ZXN0cyAod2l0aCBwYXRjaCkM. This message was posted automatically by the chromium-try-flakes app.
,
May 9 2017
I believe we are still blocked on b/35753978 for capacity. Based on comment #17, has the number of required slaves changed?
,
May 11 2017
,
Jan 3 2018
Detected 3 new flakes for test/step "telemetry_gpu_unittests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLwsSBUZsYWtlIiR0ZWxlbWV0cnlfZ3B1X3VuaXR0ZXN0cyAod2l0aCBwYXRjaCkM. This message was posted automatically by the chromium-try-flakes app. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by xlai@chromium.org
, Nov 7 2016