Issue metadata
Sign in to add a comment
|
Replacing build8-b1 bot in Mac Retina Perf with a new machine. |
||||||||||||||||||||
Issue descriptionhttps://luci-milo.appspot.com/buildbot/chromium.perf/Mac%20Retina%20Perf/ Has lots of expired jobs with "shard #0 expired, not enough capacity" from builds 676 to 682. Maybe related to issue 675986 ?
,
May 22 2017
What happens if all shards for a specific config were taken offline? Ideally, the dashboard/recipe could correct for this, right? Otherwise, can we put a piece of metadata in the dashboard to let people know about bots going offline? What is acceptable? What information needs to be shared and for whom? We want everyone who could possibly care that a device is offline when they wonder about it? If that's true, then we either need to prevent them from wondering about problems or adding metadata in places where people also find failures.
,
May 22 2017
I chatted with Stephen today. Part of our sharding benchmark per story will include dealing with rebalancing when a bot is offline. We will send the design out later. For this particular bug, we currently can only wait for the lab to put in a replacement device, it seems like.
,
May 22 2017
Assign this bug to Vince since the lab is working on a replacement.
,
May 22 2017
I think the policy shoudl be to turn down all of one config when one bot is bad because of device affinity. In practice, even though there are like 4 machines running tests for for that config, there is virtually one "machine" running tests for that config. We are like some sort of RAID without anything explicit other than sharding. That said, if we pulled down all devices for one config, where would failures manifest?
,
May 22 2017
I'm not exactly sure what you mean in #5? If we pulled all the devices, the entire bot would fail. We could stop triggering jobs on the bot, like we did with the Zenbook bots.
,
May 22 2017
we have a spare macbook but it's not the exact same unit. The one that was pulled offline is a A1398 EMC 2910. The spare is a A1398 EMC 2673. If you want the exact same unit, I may have to contact our vendor to see if they still have some in stock.
,
May 23 2017
Corp techstop may have a used/old unit. Will be working with them to get a replacement.
,
May 24 2017
We have 4 bisectors, why not replace with one of those? https://uberchromegw.corp.google.com/i/tryserver.chromium.perf/builders/mac_retina_perf_bisect
,
May 25 2017
Replacement will be shipped soon. t/26829260 Assigning to johnw to help set it up.
,
May 25 2017
,
May 25 2017
Talking with martiniss@ offline, it sounds like we might not be able to get a replacement for build8-b1 for another week. Is there any way that we could reshard the benchmarks that were running on build8-b1 onto the other perfbots? Having two weeks of no coverage for these benchmarks is pretty scary - especially for system_health.common_desktop BattOr benchmarks, which provide most of the coverage for power regressions on Mac, given that we don't have BattOrs attached to any configurations besides the Mac Retina Perf.
,
May 25 2017
,
May 25 2017
To #12: Resharding the benchmarks is a complex operation, and we risks making other benchmarks timed out at 10h limit. We should prepare more hardware in the future so these swap operation happens quickly, but for now we can rely on other bot configurations to mitigate the risk of coverage lost.
,
May 26 2017
The NextAction date has arrived: 2017-05-26
,
May 26 2017
Annie, Ned, and I talked offline and decided that we're going to pull one of the Mac Retina Perf bisectors off and add it to the main waterfall. I talked with martiniss@ and he said that he didn't think this would be too difficult and would be willing to help make it happen.
,
May 26 2017
Note that this is an exceptional case because "Mac Retina Perf " is the only Mac config that we have battor, and the failing bot is including battor benchmark. Output of running "./tools/perf/generate_perf_data" about which benchmarks are currently affected: Device "build8-b1" is blacklisted. These benchmarks were not scheduled: * battor.steady_state * battor.steady_state.reference * blink_perf.css * blink_perf.css.reference * blink_perf.events * blink_perf.events.reference * blink_perf.shadow_dom * blink_perf.shadow_dom.reference * kraken * kraken.reference * media.mse_cases * media.mse_cases.reference * media.tough_video_cases_tbmv2 * media.tough_video_cases_tbmv2.reference * memory.long_running_idle_gmail_tbmv2 * memory.long_running_idle_gmail_tbmv2.reference * page_cycler_v2_site_isolation.basic_oopif * page_cycler_v2_site_isolation.basic_oopif.reference * performance_browser_tests * rasterize_and_record_micro.partial_invalidation * rasterize_and_record_micro.partial_invalidation.reference * smoothness.desktop_tough_pinch_zoom_cases * smoothness.desktop_tough_pinch_zoom_cases.reference * smoothness.gpu_rasterization.tough_path_rendering_cases * smoothness.gpu_rasterization.tough_path_rendering_cases.reference * smoothness.maps * smoothness.maps.reference * smoothness.tough_animation_cases * smoothness.tough_animation_cases.reference * smoothness.tough_texture_upload_cases * smoothness.tough_texture_upload_cases.reference * smoothness.tough_webgl_ad_cases * smoothness.tough_webgl_ad_cases.reference * start_with_ext.warm.blank_page * start_with_ext.warm.blank_page.reference * startup.large_profile.cold.blank_page * startup.large_profile.cold.blank_page.reference * system_health.common_desktop * system_health.common_desktop.reference * system_health.webview_startup * system_health.webview_startup.reference * v8.infinite_scroll-classic_tbmv2 * v8.infinite_scroll-classic_tbmv2.reference * v8.infinite_scroll_tbmv2 * v8.infinite_scroll_tbmv2.reference
,
May 26 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/95c9cb5e460bdf1a02b7e7f886e982b1df33065e commit 95c9cb5e460bdf1a02b7e7f886e982b1df33065e Author: Stephen Martinis <martiniss@chromium.org> Date: Fri May 26 22:09:17 2017 //tools/perf: Replace broken Mac Retina bot Replaces the broken Mac retina bot with a bot taken from the bisect pool. Bug: 724998 Change-Id: I91443e3427535f3a2e4b6be35ae8f8e00e9a8df4 Reviewed-on: https://chromium-review.googlesource.com/517293 Commit-Queue: Stephen Martinis <martiniss@chromium.org> Reviewed-by: Ned Nguyen <nednguyen@google.com> Cr-Commit-Position: refs/heads/master@{#475157} [modify] https://crrev.com/95c9cb5e460bdf1a02b7e7f886e982b1df33065e/testing/buildbot/chromium.perf.json [modify] https://crrev.com/95c9cb5e460bdf1a02b7e7f886e982b1df33065e/tools/perf/core/benchmark_sharding_map.json [modify] https://crrev.com/95c9cb5e460bdf1a02b7e7f886e982b1df33065e/tools/perf/core/perf_data_generator.py
,
May 26 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/config/+/11a37bdbdae285dcbe9fde2c1ac840ed14fe4ef9 commit 11a37bdbdae285dcbe9fde2c1ac840ed14fe4ef9 Author: Stephen Martinis <martiniss@google.com> Date: Fri May 26 22:39:51 2017
,
May 26 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/57a109ea1ae2719fc5fcc9d7306542cbf4c3904c commit 57a109ea1ae2719fc5fcc9d7306542cbf4c3904c Author: Stephen Martinis <martiniss@chromium.org> Date: Fri May 26 22:44:57 2017 Remove old retina bot from tryserver.chromium.perf It's being used on the main waterfall now. TBR=dtu Bug: 724998 Change-Id: I4783c663140bef6040b53cefc00d3d2c29b6ce91 Reviewed-on: https://chromium-review.googlesource.com/517278 Reviewed-by: Stephen Martinis <martiniss@chromium.org> Commit-Queue: Stephen Martinis <martiniss@chromium.org> [modify] https://crrev.com/57a109ea1ae2719fc5fcc9d7306542cbf4c3904c/masters/master.tryserver.chromium.perf/slaves.cfg
,
May 26 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/57a109ea1ae2719fc5fcc9d7306542cbf4c3904c commit 57a109ea1ae2719fc5fcc9d7306542cbf4c3904c Author: Stephen Martinis <martiniss@chromium.org> Date: Fri May 26 22:44:57 2017 Remove old retina bot from tryserver.chromium.perf It's being used on the main waterfall now. TBR=dtu Bug: 724998 Change-Id: I4783c663140bef6040b53cefc00d3d2c29b6ce91 Reviewed-on: https://chromium-review.googlesource.com/517278 Reviewed-by: Stephen Martinis <martiniss@chromium.org> Commit-Queue: Stephen Martinis <martiniss@chromium.org> [modify] https://crrev.com/57a109ea1ae2719fc5fcc9d7306542cbf4c3904c/masters/master.tryserver.chromium.perf/slaves.cfg
,
May 26 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/57a109ea1ae2719fc5fcc9d7306542cbf4c3904c commit 57a109ea1ae2719fc5fcc9d7306542cbf4c3904c Author: Stephen Martinis <martiniss@chromium.org> Date: Fri May 26 22:44:57 2017 Remove old retina bot from tryserver.chromium.perf It's being used on the main waterfall now. TBR=dtu Bug: 724998 Change-Id: I4783c663140bef6040b53cefc00d3d2c29b6ce91 Reviewed-on: https://chromium-review.googlesource.com/517278 Reviewed-by: Stephen Martinis <martiniss@chromium.org> Commit-Queue: Stephen Martinis <martiniss@chromium.org> [modify] https://crrev.com/57a109ea1ae2719fc5fcc9d7306542cbf4c3904c/masters/master.tryserver.chromium.perf/slaves.cfg
,
May 26 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/57a109ea1ae2719fc5fcc9d7306542cbf4c3904c commit 57a109ea1ae2719fc5fcc9d7306542cbf4c3904c Author: Stephen Martinis <martiniss@chromium.org> Date: Fri May 26 22:44:57 2017 Remove old retina bot from tryserver.chromium.perf It's being used on the main waterfall now. TBR=dtu Bug: 724998 Change-Id: I4783c663140bef6040b53cefc00d3d2c29b6ce91 Reviewed-on: https://chromium-review.googlesource.com/517278 Reviewed-by: Stephen Martinis <martiniss@chromium.org> Commit-Queue: Stephen Martinis <martiniss@chromium.org> [modify] https://crrev.com/57a109ea1ae2719fc5fcc9d7306542cbf4c3904c/masters/master.tryserver.chromium.perf/slaves.cfg
,
May 26 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/master-manager/+/3da14c21b4f5baa44768881a51691e015940f457 commit 3da14c21b4f5baa44768881a51691e015940f457 Author: Stephen Martinis <martiniss@google.com> Date: Fri May 26 22:54:34 2017
,
Jun 14 2017
Update: regarding the broken build8-b1. The replacement that came in wasn't the same spec, so we could not put it back into service. Vince is looking into finding a suitable replacement. Thanks. |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by nedngu...@google.com
, May 22 2017