New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 821522 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug


Sign in to add a comment

Driver upgrade required for Windows AMD GPU swarming pool (Mar 2018)

Project Member Reported by jmad...@chromium.org, Mar 13 2018

Issue description

These bots are currently using fairly old drivers, see this example job:

https://chromium-swarm.appspot.com/task?id=3c35333e9277d110&refresh=10&show_raw=1

The driver version is listed at 21.19.137.1, which corresponds to a 2016 driver that AMD released as 16.12.1 (determined from my email history) around 9-16-2016 (determined from logs). This driver has bugs with the Vulkan implementation (one such visible in the prior link)

The current newest driver for these R7 240 cards seems to be released as 18.3.2 dated 01-32-2018 and Chrome reports this as 23.20.15017.3010. Note that if a newer driver comes out after this issue is filed we should use the newest.

Could we please do a upgrade of these machines to the newest drivers by letting them machines cleanly finish their last job, upgrade them, and replace them in the pool? Windows-only for now.

The tests should continue to run as expected since they are not targeted for a specific driver version.

Cc'ing current wranglers, we will need to be on top of landing new failure expectations.
 

Comment 1 by pschm...@google.com, Mar 13 2018

I don't want to  ask this cuz I know what the answer is going to be.

But I'll ask it anyway.

Want them updated to Win10 like the nvidia & intel gpu swarming slaves?

Comment 2 by kbr@chromium.org, Mar 13 2018

Thanks for asking Peter. Can we leave these on Win7 for the moment? We are in the middle of converting things to LUCI and would rather do the Win7 -> Win10 migration after that's done. Thanks.

Owner: pschmidt@chromium.org
Status: Assigned (was: Available)
Status: Started (was: Assigned)
The latest is 18.3.3 but the installer fails.  From the install log file:

CUiErrorHandler::GetJSONErrorDetails::[19-03-2018 03:45:08] Error message for 182 is Error 182 - AMD Installer cannot properly identify the AMD graphics hardware

Investigating this.
From my experience, such errors can happen during remote installation.
The 17.x variant of drivers install.

The 18.x do not.  I cannot see what's in the driver inf file that is causing the failure.  I'm wondering whether it's getting confused with the disabled onboard matrox.
The installation log doesn't show any indication that's the problem but at this point I'm out of ideas.  I need another set of eyes to look at it.


Comment 7 by kbr@chromium.org, Mar 21 2018

Emailed AMD CC'ing you (Peter) and John asking for help.

Figured out the inf issue.

Verifying that this is the version you want before proceeding with the rest.

https://chromium-swarm.appspot.com/bot?id=build55-m4&sort_stats=total%3Adesc
Version is listed as 23.20.15033.1003.

Looks great, can you please proceed with the upgrade by letting each bot finish its current job, removing it from the pool, then doing an upgrade and putting it back?

Thanks a ton!
Done except for build55-m4 which is being looked at by hwops.

$ swarming.py bots -d pool "Chrome-GPU" -d os "Windows" -d gpu "1002:6613-23.20.15033.1003" -S https://chromium-swarm.appspot.com  --bare
build55-m4
build56-m4
build57-m4
build58-m4
build59-m4
build60-m4
build61-m4
build62-m4
build63-m4
build64-m4
build65-m4
build66-m4
build67-m4
build68-m4
build69-m4
build70-m4
build71-m4
build72-m4
build90-m1
build109-b1
build109-m4
build110-m4
build111-m4
build112-m4
build113-m4
build114-m4
build115-m1
build115-m4
build116-m4
build117-m4
Blocking: angleproject:2423
Blocking: angleproject:2424
Thanks Peter! The upgrade looks to have gone well and the bots are running now. There were two regressions that I've filed issues for (issue angleproject:2424 and issue angleproject:2423). I think we can deal with those on our end, the bots should clear up and be green sometime today. Other than the build55-m4 thing I think you can close this out.
Status: Fixed (was: Started)
I meant build54-m4 (not build55-m4)

Anyway build54-m4 is now up.

Closing this out.
Blocking: 826376

Sign in to add a comment