Issue metadata
Sign in to add a comment
|
ERROR:gl_surface_glx.cc(425) glxQueryVersion failed - Flaky on Mojo_Linux_Perf bot |
||||||||||||||||||||||||
Issue descriptionWhen running perf tests on 'Mojo Linux Perf' bot we seem to get the following error randomly: Standard output: ******************************************************************************** Fontconfig warning: "/etc/fonts/fonts.conf", line 146: blank doesn't take any effect anymore. please remove it from your fonts.conf Xlib: extension "RANDR" missing on display ":99". DevTools listening on ws://127.0.0.1:54425/devtools/browser/f92d6650-9f7a-4c5d-adb7-cc3acfe38bbf [30927:30927:0315/135041.026294:ERROR:gl_surface_glx.cc(425)] glxQueryVersion failed [30927:30927:0315/135041.026314:ERROR:gl_initializer_x11.cc(157)] GLSurfaceGLX::InitializeOneOff failed. [30927:30927:0315/135041.027539:ERROR:viz_main_impl.cc(199)] Exiting GPU process due to errors during initialization [30874:30874:0315/135041.058841:ERROR:gpu_process_transport_factory.cc(1008)] Lost UI shared context. [1:10:0315/135041.063613:ERROR:implementation_base.cc(188)] ContextResult::kFatalFailure: TransferBuffer::Initialize() failed [30874:30889:0315/135051.283423:ERROR:service_manager_context.cc(258)] Attempting to run unsupported native service: /b/s/w/ir/out/Release/chrome_renderer.service [30874:30889:0315/135051.337816:ERROR:service_manager_context.cc(258)] Attempting to run unsupported native service: /b/s/w/ir/out/Release/chrome_renderer.service ******************************************************************************** Link to the full log: https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf.fyi%2FMojo_Linux_Perf%2F4880%2F%2B%2Frecipes%2Fsteps%2Floading.desktop.network_service_on_NVIDIA_GPU_on_Linux%2F0%2Fstdout --- My thoughts: There are 2 differences between 'Mojo Linux Perf' and normal 'Linux Perf': 1. 'Mojo Linux Perf' runs tests with '--enable-features=NetworkService', however that doesn't seem to be related to gl. 2. 'Mojo Linux Perf' has a different version of GPU driver: * According to https://crbug.com/717744#c21 the driver on this bot should be either 390.25 or 384.111, where the normal bots have 384.60. * According to the full log above this bot has an interesting driver: ``` driver_vendor : SwiftShader driver_version : 4.0.0 gl_extensions : GL_OES_compressed_ETC1_RGB8_texture GL... gl_renderer : Google SwiftShader gl_reset_notification_strategy: 0 gl_vendor : Google Inc. gl_version : OpenGL ES 2.0 SwiftShader 4.0.0.0 gl_ws_extensions : EGL_KHR_create_context EGL_... gl_ws_vendor : Google Inc. gl_ws_version : 1.4 SwiftShader 4.0.0.0 ``` Where the normal bots would have: ``` driver_vendor : Nvidia driver_version : 384.69 //... ``` So my question is could this be a GPU driver issue? Thanks! --- Mojo Linux Perf: https://ci.chromium.org/buildbot/chromium.perf.fyi/Mojo%20Linux%20Perf/
,
Mar 15 2018
Peter: can we make sure the driver of this bot (1) is the same as Linux Perf bot (2)? (1): https://chromium-swarm.appspot.com/bot?id=build113-b4&sort_stats=total%3Adesc (2): an example bot is https://chromium-swarm.appspot.com/bot?id=build148-m1&sort_stats=total%3Adesc
,
Mar 15 2018
,
Mar 15 2018
(1) has a nvidia card (2) does not. It uses the onboard matrox card (There is no nvidia card installed) Sounds like you want (2) to be nvidia?
,
Mar 15 2018
Hmhh, we would want all our Linux configs to be the same as GPU team. An example of their Linux bot is: https://chromium-swarm.appspot.com/bot?id=build76-m4&sort_stats=total%3Adesc +Eyaich, Kbr@ to check this
,
Mar 16 2018
Does crbug.com/779618 take care of this?
,
Mar 16 2018
+johnw as he is doing 779618
,
Mar 16 2018
Sorry but I'm a little bit confused: 'Mojo Linux Perf' should be using build113-b4, which should have a NVIDIA Quadro P400. e.g. https://chromium-swarm.appspot.com/bot?id=build113-b4&sort_stats=total%3Adesc Is it possible to change the config to match this slave: (3) https://build.chromium.org/deprecated/chromium.perf/buildslaves/slave69-c1 As suggested in https://crbug.com/717744#c3 so we can compare numbers? Thanks!
,
Mar 16 2018
pschmidt@ Gentle ping. Thanks!
,
Mar 16 2018
Peter is OOO today. It looks like you are comparing swarmed-testers against machines that trigger the jobs. build113-b4 looks like it has the current nvidia driver 384.111, what specific version do you require? Thanks.
,
Mar 16 2018
Hi John, thanks for the response!
To clarify, I'm comparing the config between
a) 'chromium.perf.fyi/Mojo Linux Perf':
https://ci.chromium.org/buildbot/chromium.perf.fyi/Mojo%20Linux%20Perf/
b) 'chromium.perf/Linux Perf':
https://build.chromium.org/deprecated/chromium.perf/builders/Linux%20Perf
More specifically, I can see a) has only one slave slave146-c1, which to my knowledge is pinned to build113-b4.
Also, b) has one slave slave69-c1, however I'm not sure how to find the machine id it corresponds to.
--- My problem:
As described in #c0 the log of a) is suggesting that it's using a 'SwiftShader' driver:
https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf.fyi%2FMojo_Linux_Perf%2F4880%2F%2B%2Frecipes%2Fsteps%2Floading.desktop.network_service_on_NVIDIA_GPU_on_Linux%2F0%2Fstdout
However the log of b) is suggesting that it's using a 'Nvidia' driver:
https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2FLinux_Perf%2F2508%2F%2B%2Frecipes%2Fsteps%2Floading.desktop_on_NVIDIA_GPU_on_Linux%2F0%2Fstdout
Or am I interpreting the log incorrectly?
Thanks!
,
Mar 16 2018
FYI here is an example task from build113-b4. It has a Quadro P400 but the Raw Output on the right is showing `driver_vendor : SwiftShader`: https://chromium-swarm.appspot.com/task?id=3c4932c16bce4510&refresh=10&show_raw=1
,
Mar 17 2018
build113-b4 definitely has the kernel driver loaded. NVIDIA UNIX x86_64 Kernel Module 384.111 Tue Dec 19 23:51:45 PST 2017 +cc kbr in case he might be able to help
,
Mar 17 2018
https://chromium-swarm.appspot.com/task?id=3c4932c16bce4510&refresh=10&show_raw=1 has the command line: Command: /b/s/w/ir/.swarming_module_cache/vpython/73deba/bin/python ../../testing/scripts/run_telemetry_benchmark_as_googletest.py ../../tools/perf/run_benchmark loading.desktop.network_service -v --upload-results --output-format=chartjson --browser=release --xvfb --isolated-script-test-output=/b/s/w/ioduBpor/output.json --isolated-script-test-perf-output=/b/s/w/ioduBpor/perftest-output.json Note --xvfb. The chromium.gpu and chromium.gpu.fyi bots deliberately do not specify this flag.
,
Mar 17 2018
This is because of https://cs.chromium.org/chromium/src/testing/buildbot/chromium.perf.fyi.json?rcl=2ac1c7c72c3c369a7e99a76a22b3322687179439&l=458 Back then the benchmark doesn't run the bot without --xvfb. With the reconfiguration, maybe we can drop the flag? P/S: I am not sure about when we need --xvfb on Linux, would appreciate if someone who understands this can help explain :-P
,
Mar 17 2018
Some of the Perf bots are running inside VMs and on those you would need --xvfb. I advocate for the Perf team to stop running any tests inside VMs because they're not realistic end-user configurations, and to run everything on bare metal hardware.
,
Mar 17 2018
Ah, thanks for the explanation Ken!
,
Mar 19 2018
Thanks for the investigations! I will drop the '--xvfb' flag and see if it works. BTW: How do we know if it's a VM or real machine?
,
Mar 20 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/89d534b8328ca3a2b21d02bb747cc7451d6e7ee3 commit 89d534b8328ca3a2b21d02bb747cc7451d6e7ee3 Author: Chong Zhang <chongz@chromium.org> Date: Tue Mar 20 01:25:41 2018 Remove --xvfb flag on Mojo Linux Perf bot The flag is making the bot to load kernel driver and causes flakiness. We want to match chromium.gpu and chromium.gpu.fyi where they don't specify this flag. Background: The flag was added in Ife4228a86fa055416ec20a8049085bf4c2c33ce0 to fix a DISPLAY issue. The bot seems to be reconfigured since then and we want to drop the flag. Note: We should only need --xvfb inside VMs. Bug: 822479 Change-Id: I4709dbf3ca7e75e0697b5d9bede3af5eab320a04 Reviewed-on: https://chromium-review.googlesource.com/969616 Reviewed-by: Dirk Pranke <dpranke@chromium.org> Commit-Queue: Chong Zhang <chongz@chromium.org> Cr-Commit-Position: refs/heads/master@{#544250} [modify] https://crrev.com/89d534b8328ca3a2b21d02bb747cc7451d6e7ee3/testing/buildbot/chromium.perf.fyi.json
,
Mar 20 2018
I wasn't aware that any of the perf bots were VMs. We have vms that trigger the jobs, but all the bots in swarming that are running the script should be bare metal AFAIK. I might be mistaken, how do you identify them?
,
Mar 20 2018
Got a green build with the correct driver version 384.111: https://chromium-swarm.appspot.com/task?id=3c5ca0d064095a10&refresh=10&show_raw=1 Seems that '--xvfb' is the cause, thanks all for the help! Closing as fixed since the original issue has been resolved. However I'm still curious about how to identify VMs - Can I assume 'Bot Dimensions -> inside_docker: 0' tells something?
,
Mar 20 2018
It used to be the case that all the Perf bots which weren't explicitly named "GPU" were VMs and not physical hardware. I do see now that for example https://ci.chromium.org/buildbot/chromium.perf/Win%2010%20Perf/ is a VM but that it now triggers its jobs on physical hardware with an Intel GPU. https://ci.chromium.org/buildbot/chromium.perf/Win%207%20Perf/ however is still triggering its jobs on the built-in Matrox GPU on the labs bots, which is not a useful configuration to test in my opinion. There may be other similar misconfigurations of the Linux Perf bots. chongz@ assuming that a given bot is in the Swarming pool, like: https://chromium-swarm.appspot.com/bot?id=build113-b4&sort_stats=total%3Adesc then it's easy to see whether it's physical hardware or not; just look for the "gpu" dimension and see whether it has anything reasonable in it like an NVIDIA, AMD or Intel GPU. If it reports "none" or the built-in Matrox GPU (vendor 102b) then it's probably a VM. You can also sometimes tell by the machine name.
,
Mar 20 2018
kbr@ That's really helpful information, thanks for the detailed explanation! |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by kbr@chromium.org
, Mar 15 2018Components: -Internals>GPU>Internals Infra>Client>Perf