New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 920665 link

Starred by 1 user

Issue metadata

Status: Started
Owner:
OOO until 2019-01-24
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Feature

Blocked on:
issue 871453
issue 886985
issue 923401

Blocking:
issue 887241



Sign in to add a comment

Update GPU driver docs to use Swarming fallback, remove trigger_multiple_dimensions.py

Project Member Reported by ynovikov@chromium.org, Jan 10

Issue description

Improvements in  issue 886985  weren't enough, we ran out of capacity during issue 887241 upgrade:

https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win10%20FYI%20Release%20%28NVIDIA%29/3669
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win-angle-rel/25

Logs for splitting are:
Running Swarming with args:
['query', '-S', 'chromium-swarm.appspot.com', '--limit', '0', '--json', 'c:\\b\\swarming\\w\\ir\\tmp\\t\\base_trigger_dimensionswnj_lc.json', 'bots/count?dimensions=gpu%3A10de%3A1cb3-23.21.13.8816&dimensions=os%3AWindows-10&dimensions=pool%3AChrome-GPU&is_dead=FALSE&quarantined=FALSE']
Bot config 0: {'available': 19, 'total': 21}
Running Swarming with args:
['query', '-S', 'chromium-swarm.appspot.com', '--limit', '0', '--json', 'c:\\b\\swarming\\w\\ir\\tmp\\t\\base_trigger_dimensionszn44e9.json', 'bots/count?dimensions=gpu%3A10de%3A1cb3-24.21.14.1195&dimensions=os%3AWindows-10&dimensions=pool%3AChrome-GPU&is_dead=FALSE&quarantined=FALSE']
Bot config 1: {'available': 19, 'total': 110}
Total bots: 131
Total bots after filtering: 131
Chose bot config 0 because bots were available
Chose bot config 1 because bots were available
Chose bot config 0 because bots were available
Chose bot config 1 because bots were available
Chose bot config 0 because bots were available
Chose bot config 1 because bots were available
Chose bot config 0 because bots were available
Chose bot config 1 because bots were available
Chose bot config 0 because bots were available
Chose bot config 1 because bots were available
Chose bot config 0 because bots were available
Chose bot config 1 because bots were available
Chose bot config 0 because bots were available
Chose bot config 1 because bots were available
Chose bot config 0 because bots were available
Chose bot config 1 because bots were available
Chose bot config 0 because bots were available
Chose bot config 1 because bots were available
Chose bot config 0 because bots were available
Chose bot config 1 because bots were available

I guess what happens is:
1. All tasks are triggered
2. Only after that all tasks are run
Thus, bots can be available during triggering of all the tasks, but once the first task runs, bots are no longer available.
What I mean, is that logic in trigger_multiple_dimensions.py only works for first task, but the "available" status when triggering a second task would be different from "available" status when the second task needs to run.

I suggest we should ignore "available" status and just split according to "total".
Other options are possible, like waiting for the previous tasks to start running before triggering a task, or keeping history of what was scheduled in previous tasks.
Ken, could you take care of this?
 
Blockedon: 871453
Cc: bradhall@chromium.org actodd@google.com
Components: Infra>Client>Chrome
Labels: Hotlist-PixelWrangler
Owner: ----
Status: Available (was: Assigned)
The script already takes into consideration the number of available bots per configuration. When it triggers a shard on one bot's configuration, it decrements the number of available bots on that configuration:

https://cs.chromium.org/chromium/src/testing/trigger_scripts/trigger_multiple_dimensions.py?sq=package:chromium&dr&g=0&l=95

If the last part of the upgrade was in progress, and the final machines were all taken offline right after jobs were triggered on them, then yes, those jobs would expire.

To improve this behavior, we should probably rewrite our documentation to use the Swarming fallback path that bradhall@ finished in Issue 871453, and remove the trigger_multiple_dimensions trigger script. Would appreciate help from the current pixel wrangler or anyone else that can help document the new steps.

Summary: Update GPU driver docs to use Swarming fallback, remove trigger_multiple_dimensions.py (was: Improve trigger_multiple_dimensions.py logic)
Owner: kbr@chromium.org
Status: Assigned (was: Available)
Taking this. Looks like https://chromium-review.googlesource.com/c/chromium/src/+/1376653 is a good example CL for how to use this.

Project Member

Comment 4 by bugdroid1@chromium.org, Jan 18 (4 days ago)

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/bdf880cb0da9330d92d7ff21171ffbf1af63605d

commit bdf880cb0da9330d92d7ff21171ffbf1af63605d
Author: Kenneth Russell <kbr@chromium.org>
Date: Fri Jan 18 16:44:51 2019

Update docs for process of upgrading GPU drivers.

Use the new optional Swarming dimensions instead of the
multi-dimension trigger script, which has certain pitfalls.

Bug: 920665
Tbr: jmadill@chromium.org
Tbr: bradhall@chromium.org
No-Try: True
Change-Id: I076145a456d88fc9f2df0b64bd074857efda57e9
Reviewed-on: https://chromium-review.googlesource.com/c/1421741
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#624161}
[modify] https://crrev.com/bdf880cb0da9330d92d7ff21171ffbf1af63605d/docs/gpu/gpu_testing_bot_details.md

Comment 5 by kbr@chromium.org, Jan 18 (4 days ago)

Blockedon: 923401

Comment 6 by kbr@chromium.org, Jan 18 (4 days ago)

Status: Started (was: Assigned)

Comment 7 by kbr@chromium.org, Jan 18 (4 days ago)

Cc: martiniss@chromium.org

Sign in to add a comment