Driver upgrade required for Windows AMD GPU swarming pool (Mar 2018) |
|||||||
Issue descriptionThese bots are currently using fairly old drivers, see this example job: https://chromium-swarm.appspot.com/task?id=3c35333e9277d110&refresh=10&show_raw=1 The driver version is listed at 21.19.137.1, which corresponds to a 2016 driver that AMD released as 16.12.1 (determined from my email history) around 9-16-2016 (determined from logs). This driver has bugs with the Vulkan implementation (one such visible in the prior link) The current newest driver for these R7 240 cards seems to be released as 18.3.2 dated 01-32-2018 and Chrome reports this as 23.20.15017.3010. Note that if a newer driver comes out after this issue is filed we should use the newest. Could we please do a upgrade of these machines to the newest drivers by letting them machines cleanly finish their last job, upgrade them, and replace them in the pool? Windows-only for now. The tests should continue to run as expected since they are not targeted for a specific driver version. Cc'ing current wranglers, we will need to be on top of landing new failure expectations.
,
Mar 13 2018
Thanks for asking Peter. Can we leave these on Win7 for the moment? We are in the middle of converting things to LUCI and would rather do the Win7 -> Win10 migration after that's done. Thanks.
,
Mar 15 2018
,
Mar 19 2018
The latest is 18.3.3 but the installer fails. From the install log file: CUiErrorHandler::GetJSONErrorDetails::[19-03-2018 03:45:08] Error message for 182 is Error 182 - AMD Installer cannot properly identify the AMD graphics hardware Investigating this.
,
Mar 19 2018
From my experience, such errors can happen during remote installation.
,
Mar 21 2018
The 17.x variant of drivers install. The 18.x do not. I cannot see what's in the driver inf file that is causing the failure. I'm wondering whether it's getting confused with the disabled onboard matrox. The installation log doesn't show any indication that's the problem but at this point I'm out of ideas. I need another set of eyes to look at it.
,
Mar 21 2018
Emailed AMD CC'ing you (Peter) and John asking for help.
,
Mar 21 2018
Figured out the inf issue. Verifying that this is the version you want before proceeding with the rest. https://chromium-swarm.appspot.com/bot?id=build55-m4&sort_stats=total%3Adesc
,
Mar 21 2018
Version is listed as 23.20.15033.1003. Looks great, can you please proceed with the upgrade by letting each bot finish its current job, removing it from the pool, then doing an upgrade and putting it back? Thanks a ton!
,
Mar 22 2018
Done except for build55-m4 which is being looked at by hwops. $ swarming.py bots -d pool "Chrome-GPU" -d os "Windows" -d gpu "1002:6613-23.20.15033.1003" -S https://chromium-swarm.appspot.com --bare build55-m4 build56-m4 build57-m4 build58-m4 build59-m4 build60-m4 build61-m4 build62-m4 build63-m4 build64-m4 build65-m4 build66-m4 build67-m4 build68-m4 build69-m4 build70-m4 build71-m4 build72-m4 build90-m1 build109-b1 build109-m4 build110-m4 build111-m4 build112-m4 build113-m4 build114-m4 build115-m1 build115-m4 build116-m4 build117-m4
,
Mar 22 2018
,
Mar 22 2018
,
Mar 22 2018
Thanks Peter! The upgrade looks to have gone well and the bots are running now. There were two regressions that I've filed issues for (issue angleproject:2424 and issue angleproject:2423). I think we can deal with those on our end, the bots should clear up and be green sometime today. Other than the build55-m4 thing I think you can close this out.
,
Mar 26 2018
I meant build54-m4 (not build55-m4) Anyway build54-m4 is now up. Closing this out.
,
Mar 27 2018
|
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by pschm...@google.com
, Mar 13 2018