New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 822479 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 2
Type: Bug
Proj-Servicification

Blocking:
issue 822484



Sign in to add a comment

ERROR:gl_surface_glx.cc(425) glxQueryVersion failed - Flaky on Mojo_Linux_Perf bot

Project Member Reported by chongz@chromium.org, Mar 15 2018

Issue description

When running perf tests on 'Mojo Linux Perf' bot we seem to get the following error randomly:
Standard output:
********************************************************************************
	Fontconfig warning: "/etc/fonts/fonts.conf", line 146: blank doesn't take any effect anymore. please remove it from your fonts.conf
	Xlib:  extension "RANDR" missing on display ":99".
	
	DevTools listening on ws://127.0.0.1:54425/devtools/browser/f92d6650-9f7a-4c5d-adb7-cc3acfe38bbf
	[30927:30927:0315/135041.026294:ERROR:gl_surface_glx.cc(425)] glxQueryVersion failed
	[30927:30927:0315/135041.026314:ERROR:gl_initializer_x11.cc(157)] GLSurfaceGLX::InitializeOneOff failed.
	[30927:30927:0315/135041.027539:ERROR:viz_main_impl.cc(199)] Exiting GPU process due to errors during initialization
	[30874:30874:0315/135041.058841:ERROR:gpu_process_transport_factory.cc(1008)] Lost UI shared context.
	[1:10:0315/135041.063613:ERROR:implementation_base.cc(188)] ContextResult::kFatalFailure: TransferBuffer::Initialize() failed
	[30874:30889:0315/135051.283423:ERROR:service_manager_context.cc(258)] Attempting to run unsupported native service: /b/s/w/ir/out/Release/chrome_renderer.service
	[30874:30889:0315/135051.337816:ERROR:service_manager_context.cc(258)] Attempting to run unsupported native service: /b/s/w/ir/out/Release/chrome_renderer.service
********************************************************************************

Link to the full log:
https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf.fyi%2FMojo_Linux_Perf%2F4880%2F%2B%2Frecipes%2Fsteps%2Floading.desktop.network_service_on_NVIDIA_GPU_on_Linux%2F0%2Fstdout

--- My thoughts:
There are 2 differences between 'Mojo Linux Perf' and normal 'Linux Perf':
1. 'Mojo Linux Perf' runs tests with '--enable-features=NetworkService', however that doesn't seem to be related to gl.
2. 'Mojo Linux Perf' has a different version of GPU driver:
  * According to https://crbug.com/717744#c21 the driver on this bot should be either 390.25 or 384.111, where the normal bots have 384.60.
  * According to the full log above this bot has an interesting driver:
    ```
        driver_vendor       : SwiftShader
        driver_version      : 4.0.0
        gl_extensions       : GL_OES_compressed_ETC1_RGB8_texture GL...
        gl_renderer         : Google SwiftShader
        gl_reset_notification_strategy: 0
        gl_vendor           : Google Inc.
        gl_version          : OpenGL ES 2.0 SwiftShader 4.0.0.0
        gl_ws_extensions    : EGL_KHR_create_context EGL_...
        gl_ws_vendor        : Google Inc.
        gl_ws_version       : 1.4 SwiftShader 4.0.0.0
    ```
    Where the normal bots would have:
    ```
        driver_vendor       : Nvidia
        driver_version      : 384.69
        //...
    ```

So my question is could this be a GPU driver issue?
Thanks!

---
Mojo Linux Perf:
https://ci.chromium.org/buildbot/chromium.perf.fyi/Mojo%20Linux%20Perf/

 

Comment 1 by kbr@chromium.org, Mar 15 2018

Cc: -kbr@chromium.org -piman@chromium.org martiniss@chromium.org
Components: -Internals>GPU>Internals Infra>Client>Perf
It sounds like this may be a misconfiguration of this Perf bot. I'm not sure who administers this machine but folks in Infra>Client>Perf will probably know.

Components: -Infra>Client>Perf Infra>Labs Speed>Benchmarks>Waterfall
Owner: pschmidt@chromium.org
Peter: can we make sure the driver of this bot (1) is the same as Linux Perf bot (2)?

(1): https://chromium-swarm.appspot.com/bot?id=build113-b4&sort_stats=total%3Adesc

(2): an example bot is https://chromium-swarm.appspot.com/bot?id=build148-m1&sort_stats=total%3Adesc

Comment 3 by chongz@chromium.org, Mar 15 2018

Blocking: 822484

Comment 4 by pschm...@google.com, Mar 15 2018

(1) has a nvidia card

(2) does not. It uses the onboard matrox card (There is no nvidia card installed)

Sounds like you want (2) to be nvidia?
Cc: kbr@chromium.org eyaich@chromium.org
Hmhh, we would want all our Linux configs to be the same as GPU team. An example of their Linux bot is: https://chromium-swarm.appspot.com/bot?id=build76-m4&sort_stats=total%3Adesc

+Eyaich, Kbr@ to check this

Comment 6 by pschm...@google.com, Mar 16 2018

Does crbug.com/779618 take care of this?

Comment 7 by pschm...@google.com, Mar 16 2018

Cc: -kbr@chromium.org jo...@chromium.org
+johnw as he is doing 779618

Comment 8 by chongz@chromium.org, Mar 16 2018

Sorry but I'm a little bit confused:
'Mojo Linux Perf' should be using build113-b4, which should have a NVIDIA Quadro P400.
e.g.
https://chromium-swarm.appspot.com/bot?id=build113-b4&sort_stats=total%3Adesc

Is it possible to change the config to match this slave: 
(3) https://build.chromium.org/deprecated/chromium.perf/buildslaves/slave69-c1
As suggested in https://crbug.com/717744#c3 so we can compare numbers?

Thanks!


Comment 9 by chongz@chromium.org, Mar 16 2018

Components: -Internals>Network Internals>Services>Network
pschmidt@ Gentle ping. Thanks!

Comment 10 by jo...@google.com, Mar 16 2018

Peter is OOO today. 

It looks like you are comparing swarmed-testers against machines that trigger the jobs.

build113-b4 looks like it has the current nvidia driver 384.111, what specific version do you require?

Thanks.




Hi John, thanks for the response!

To clarify, I'm comparing the config between
  a) 'chromium.perf.fyi/Mojo Linux Perf':
     https://ci.chromium.org/buildbot/chromium.perf.fyi/Mojo%20Linux%20Perf/
  b) 'chromium.perf/Linux Perf':
     https://build.chromium.org/deprecated/chromium.perf/builders/Linux%20Perf

More specifically, I can see a) has only one slave slave146-c1, which to my knowledge is pinned to build113-b4.

Also, b) has one slave slave69-c1, however I'm not sure how to find the machine id it corresponds to.

--- My problem:
As described in #c0 the log of a) is suggesting that it's using a 'SwiftShader' driver:
https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf.fyi%2FMojo_Linux_Perf%2F4880%2F%2B%2Frecipes%2Fsteps%2Floading.desktop.network_service_on_NVIDIA_GPU_on_Linux%2F0%2Fstdout

However the log of b) is suggesting that it's using a 'Nvidia' driver:
https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2FLinux_Perf%2F2508%2F%2B%2Frecipes%2Fsteps%2Floading.desktop_on_NVIDIA_GPU_on_Linux%2F0%2Fstdout


Or am I interpreting the log incorrectly?

Thanks!
FYI here is an example task from build113-b4. It has a Quadro P400 but the Raw Output on the right is showing `driver_vendor       : SwiftShader`:
https://chromium-swarm.appspot.com/task?id=3c4932c16bce4510&refresh=10&show_raw=1

Comment 13 by jo...@google.com, Mar 17 2018

Cc: kbr@chromium.org
build113-b4 definitely has the kernel driver loaded.

NVIDIA UNIX x86_64 Kernel Module  384.111  Tue Dec 19 23:51:45 PST 2017

+cc kbr in case he might be able to help

Comment 14 by kbr@chromium.org, Mar 17 2018

https://chromium-swarm.appspot.com/task?id=3c4932c16bce4510&refresh=10&show_raw=1 has the command line:

Command: /b/s/w/ir/.swarming_module_cache/vpython/73deba/bin/python ../../testing/scripts/run_telemetry_benchmark_as_googletest.py ../../tools/perf/run_benchmark loading.desktop.network_service -v --upload-results --output-format=chartjson --browser=release --xvfb --isolated-script-test-output=/b/s/w/ioduBpor/output.json --isolated-script-test-perf-output=/b/s/w/ioduBpor/perftest-output.json


Note --xvfb. The chromium.gpu and chromium.gpu.fyi bots deliberately do not specify this flag.

This is because of https://cs.chromium.org/chromium/src/testing/buildbot/chromium.perf.fyi.json?rcl=2ac1c7c72c3c369a7e99a76a22b3322687179439&l=458

Back then the benchmark doesn't run the bot without --xvfb. With the reconfiguration, maybe we can drop the flag? 

P/S: I am not sure about when we need --xvfb on Linux, would appreciate if someone who understands this can help explain :-P

Comment 16 by kbr@chromium.org, Mar 17 2018

Some of the Perf bots are running inside VMs and on those you would need --xvfb. I advocate for the Perf team to stop running any tests inside VMs because they're not realistic end-user configurations, and to run everything on bare metal hardware.

Ah, thanks for the explanation Ken!
Thanks for the investigations! I will drop the '--xvfb' flag and see if it works.

BTW: How do we know if it's a VM or real machine?
Project Member

Comment 19 by bugdroid1@chromium.org, Mar 20 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/89d534b8328ca3a2b21d02bb747cc7451d6e7ee3

commit 89d534b8328ca3a2b21d02bb747cc7451d6e7ee3
Author: Chong Zhang <chongz@chromium.org>
Date: Tue Mar 20 01:25:41 2018

Remove --xvfb flag on Mojo Linux Perf bot

The flag is making the bot to load kernel driver and causes flakiness.

We want to match chromium.gpu and chromium.gpu.fyi where they don't
specify this flag.

Background:
The flag was added in Ife4228a86fa055416ec20a8049085bf4c2c33ce0 to fix
a DISPLAY issue. The bot seems to be reconfigured since then and we
want to drop the flag.

Note:
We should only need --xvfb inside VMs.

Bug:  822479 
Change-Id: I4709dbf3ca7e75e0697b5d9bede3af5eab320a04
Reviewed-on: https://chromium-review.googlesource.com/969616
Reviewed-by: Dirk Pranke <dpranke@chromium.org>
Commit-Queue: Chong Zhang <chongz@chromium.org>
Cr-Commit-Position: refs/heads/master@{#544250}
[modify] https://crrev.com/89d534b8328ca3a2b21d02bb747cc7451d6e7ee3/testing/buildbot/chromium.perf.fyi.json

I wasn't aware that any of the perf bots were VMs.  We have vms that trigger the jobs, but all the bots in swarming that are running the script should be bare metal AFAIK.  

I might be mistaken, how do you identify them?
Status: Fixed (was: Untriaged)
Got a green build with the correct driver version 384.111:
https://chromium-swarm.appspot.com/task?id=3c5ca0d064095a10&refresh=10&show_raw=1

Seems that '--xvfb' is the cause, thanks all for the help!

Closing as fixed since the original issue has been resolved. However I'm still curious about how to identify VMs - Can I assume 'Bot Dimensions -> inside_docker: 0' tells something?

Comment 22 by kbr@chromium.org, Mar 20 2018

It used to be the case that all the Perf bots which weren't explicitly named "GPU" were VMs and not physical hardware. I do see now that for example https://ci.chromium.org/buildbot/chromium.perf/Win%2010%20Perf/ is a VM but that it now triggers its jobs on physical hardware with an Intel GPU.

https://ci.chromium.org/buildbot/chromium.perf/Win%207%20Perf/ however is still triggering its jobs on the built-in Matrox GPU on the labs bots, which is not a useful configuration to test in my opinion. There may be other similar misconfigurations of the Linux Perf bots.

chongz@ assuming that a given bot is in the Swarming pool, like:
https://chromium-swarm.appspot.com/bot?id=build113-b4&sort_stats=total%3Adesc

then it's easy to see whether it's physical hardware or not; just look for the "gpu" dimension and see whether it has anything reasonable in it like an NVIDIA, AMD or Intel GPU. If it reports "none" or the built-in Matrox GPU (vendor 102b) then it's probably a VM. You can also sometimes tell by the machine name.

kbr@ That's really helpful information, thanks for the detailed explanation!

Sign in to add a comment