Linux Chrome startup is flaky on ChromeDriver waterfall without --disable-gpu |
||||||||||
Issue descriptionChrome Version: 65.0.* OS: Linux only On the ChromeDriver waterfall tests, we encountered numerous instances of Chrome browser stops responding soon after start up (e.g., see [1]). ChromeDriver log always shows GPU related error messages before this happens (e.g., [2] between timestamps 21.051 and 50.906), so we speculatively added --disable-gpu flag to the tests, and this flag indeed stopped the failures from occurring. So far we haven't been able to repro this issue on any machines other than the ChromeDriver waterfall. This prevented us from bisecting the issue, as the ChromeDriver waterfall isn't configured for bisecting. Not sure whether this is due to GPU differences or VM configuration or something else. Waterfall history indicates that this issue likely started occurring in the commit range https://crrev.com/520710..520747 [1] https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.chromedriver%2FLinux%2F32499%2F%2B%2Frecipes%2Fsteps%2Fpython_tests_v522596_%2F0%2Fstdout [2] http://chromedriver-data.storage.googleapis.com/server_logs/chromedriver_log_OhVe3r
,
Dec 12 2017
[SEVERE]: Timed out receiving message from renderer Also to note, this appears all over a run_py_tests.py run but isn't fatal or effects test results. Also generally the message is coupled with a time that is much less than what is in the above logs.
,
Dec 14 2017
Possibly related to: https://chromium.googlesource.com/chromium/src/+/e36bfd5b9a11989786d4a40afe4d2f21a941b979 ? Mo, what do you think? Is it expected that the GPU process will work on the machines on the ChromeDriver waterfall? Most of Chrome's testing machines are VMs and GPU functionality doesn't work there. But most probably the browser shouldn't fail in this way in this case, especially since ChromeDriver is probably used in many web companies' continuous integration systems, run on VMs.
,
Dec 14 2017
Not sure if the GPU process ever really worked on the ChromeDriver waterfall, but at least it didn't cause any issues before. I think Chrome should handle the case when GPU isn't available, without forcing the users to add --disable-gpu switch.
,
Dec 14 2017
Is this the bot to look at? https://luci-milo.appspot.com/buildbot/chromium.chromedriver/Linux/?limit=200 It looks like run_all_tests.py started passing again recently; is this issue still happening? We want to see whether the warning about losing the UI shared context was happening before Mo's patch landed.
,
Dec 14 2017
Ah, cool, luci-milo offers more history than the old buildbot view. https://luci-milo.appspot.com/buildbot/chromium.chromedriver/Linux/?limit=400 This should show the history before the failures started.
,
Dec 14 2017
It seems the bot turned green lately. Is it because --disable-gpu is explicitly passed in? By looking at the log, I don't think there is enough info to tell if my CL is the culprit, or if it is, then why. If ChromeDrive folks still want this issue to be figured out, then I need some help setting up an environment to reproduce locally. If you guys think there is no need to do anything further, please close the bug.
,
Dec 14 2017
,
Dec 14 2017
Re comments 5 and 7, we have explicitly added --disable-gpu to the tests on waterfall as a workaround to this issue.
,
Dec 14 2017
So if GPU acceleration is not desired on ChromeDrive, we can just add a logic to automatically insert that switch in Chrome. How do we detect it's ChromeDrive reliably?
,
Dec 14 2017
Many Chromedriver tests rely on gpu support. For example, we have an entire framework for video playback performance that would be useless if GPU acceleration were turned off. Additionally, Chromedriver is the default way to benchmark Chrome against other browsers in an automated way. With GPU acceleration turned off, we would start looking pretty bad.
,
Dec 14 2017
Then someone please help me to set up a repro environment so I can repro and debug why the flaky crash.
,
Dec 15 2017
The problem is that Chromedriver devs haven't been able to get a local repro. +vhang@/chrome-labs: 1. Does chrome-labs maintain the VM that runs the chromedriver waterfall tests? looks like they are all run by https://build.chromium.org/deprecated/chromium.chromedriver/buildslaves/slave108-c1 2. If we wanted one, how hard would it be to get another VM checked in the chromedriver linux pool (https://luci-milo.appspot.com/buildbot/chromium.chromedriver/Linux/)? (Then we can take slave108-c1 off of the continuous builds temporarily and get the flakiness reproing.) Seems like if there were another slave available then we could just add it to https://cs.chromium.org/chromium/build/masters/master.chromium.chromedriver/slaves.cfg
,
Dec 18 2017
Since buildbot is deprecated anyway, I figured I would try to repro by running in swarming. I wrote this CL: https://chromium-review.googlesource.com/c/chromium/src/+/831052 (it should help with issue 793370 anyway) It seems to fail on GCE VMs: (a) https://chromium-swarm.appspot.com/task?id=3a8198059326da10&refresh=10&show_raw=1 (b) https://chromium-swarm.appspot.com/task?id=3a83deb272e92310&refresh=10&show_raw=1 And it fails on physical machines: https://chromium-swarm.appspot.com/task?id=3a83de8622938d10&refresh=10&show_raw=1 But it works on chrome labs VMs: https://chromium-swarm.appspot.com/task?id=3a771e9b42cf0e10&refresh=10&show_raw=1 https://chromium-swarm.appspot.com/task?id=3a83e76f3dbeaf10&refresh=10&show_raw=1 Also, unassigning this until we figure out how to repro it.
,
Dec 19
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jan 2
GPU Triage: crouleau@, is this bug still applicable?
,
Jan 2
John can triage. Maybe just archive this?
,
Jan 3
|
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by crouleau@chromium.org
, Dec 12 2017