New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 796437 link

Starred by 4 users

Issue metadata

Status: Verified
Owner:
OOO until 2019-01-24
Closed: Dec 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 1
Type: Bug-Regression

Blocked on:
issue 704024
issue 796386

Blocking:
issue 744658
issue 797444



Sign in to add a comment

New CHECK in GPU info collection affecting Linux bots on GPU FYI waterfall

Project Member Reported by kbr@chromium.org, Dec 20 2017

Issue description

Comment 1 by kbr@chromium.org, Dec 20 2017

Cc: senorblanco@chromium.org
+senorblanco, current pixel wrangler

Comment 2 by kbr@chromium.org, Dec 20 2017

Blockedon: 796386
Cc: zmo@chromium.org jamescook@chromium.org
This has to wait for https://chromium-review.googlesource.com/835420, the fix for  Issue 796386 , to land.

Comment 3 by kbr@chromium.org, Dec 20 2017

Status: Started (was: Assigned)
Increasing the watchdog timeout from 5s to 20s on the Linux Debug bots in:
https://chromium-review.googlesource.com/835475

Project Member

Comment 4 by bugdroid1@chromium.org, Dec 20 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/35434ad196a8b643bef9c26dd500b9a787a226b3

commit 35434ad196a8b643bef9c26dd500b9a787a226b3
Author: Kenneth Russell <kbr@chromium.org>
Date: Wed Dec 20 03:32:50 2017

Increase DevTools' GPU info watchdog to 20s on Linux Debug.

These bots are crashing with the 5s timeout.

BUG= 796437 
TBR=zmo@chromium.org, pfeldman@chromium.org

Change-Id: I2c2b9b491af93a7dcce86a27f6d5af68bb2ee81f
Reviewed-on: https://chromium-review.googlesource.com/835475
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#525249}
[modify] https://crrev.com/35434ad196a8b643bef9c26dd500b9a787a226b3/content/browser/devtools/protocol/system_info_handler.cc

Ken, bot is better now, but flakily fails to start browser.
Is it the same reason, or something else?

Comment 6 by kbr@chromium.org, Dec 20 2017

Something's definitely going wrong and it's probably the same root cause.

It looks like something is generally going wrong with browser startup and/or GPU process launching. For example these WebGL conformance tests:
https://ci.chromium.org/buildbot/chromium.gpu.fyi/Linux%20Debug%20%28NVIDIA%29/41100

failed because 3 times, the browser failed to start properly (DevTools failed to connect), and this error is in the logs:

[8897:8936:1220/142520.877302:ERROR:browser_gpu_channel_host_factory.cc(120)] Failed to launch GPU process.

We're still trying to figure out what's going on. It's probably a race condition due to the Debug builds and these bots being so slow.

Comment 7 by kbr@chromium.org, Dec 20 2017

Components: Tests>Telemetry
It's almost certain that there's some sort of race condition in this area -- probably competing timeouts. But the key is:

INFO:root:Browser started (pid=19253).
INFO:root:OS: linux trusty
ERROR:root:Failed with WebSocketTimeoutException while starting the browser backend.

So Telemetry thinks the browser started, but the first connection via DevTools failed. It's not clear what command exactly failed. Maybe the SystemInfo collection?

Comment 8 by kbr@chromium.org, Dec 20 2017

Blockedon: 704024
Fixing this is going to remove code added to work around  Issue 704024 .

Project Member

Comment 9 by bugdroid1@chromium.org, Dec 21 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/928fc9d37730186db0551c401314f41db4111806

commit 928fc9d37730186db0551c401314f41db4111806
Author: Kenneth Russell <kbr@chromium.org>
Date: Thu Dec 21 00:43:56 2017

Increase a couple of Telemetry's internal timeouts to 60s.

The first WebSocket connection to the browser, and fetching of
SystemInfo, depends on a lot of work being done in the target browser,
and in Linux Debug builds on the bots, these timeouts (of 10s in some
cases) were being hit.

Explicitly indicate which are the likely first calls to the newly-
started browser, and increase their timeouts.

It seemed difficult to figure out the browser configuration
(Release/Debug) at this level, so do this in all cases.

Make GetSystemInfo's timeout an optional argument, stop specifying it
at the call site, and use the increased timeout on all platforms.

BUG= chromium:796437 
TBR=nednguyen@google.com, perezju@chromium.org

Change-Id: I34426a5a30dbc2231fc740476cbdcf343a2d1ffb
Reviewed-on: https://chromium-review.googlesource.com/838184
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Commit-Queue: Kenneth Russell <kbr@chromium.org>

[modify] https://crrev.com/928fc9d37730186db0551c401314f41db4111806/telemetry/telemetry/internal/backends/chrome/chrome_browser_backend.py
[modify] https://crrev.com/928fc9d37730186db0551c401314f41db4111806/telemetry/telemetry/internal/backends/chrome_inspector/devtools_client_backend.py

Comment 10 by kbr@chromium.org, Dec 21 2017

Cc: kbr@chromium.org
 Issue 796706  has been merged into this issue.
Project Member

Comment 11 by bugdroid1@chromium.org, Dec 21 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/be456d029607d8460a955e0f4af93db838d766e5

commit be456d029607d8460a955e0f4af93db838d766e5
Author: Zhenyao Mo <zmo@chromium.org>
Date: Thu Dec 21 03:44:52 2017

Add some DLOG to tell why GpuProcessHost::Get() returns nullptr

BUG= 796437 
TEST=bots (logs are not polluted by this)
R=kbr@chromium.org

Change-Id: I9a87b15801f236837d9d89e8a59deab473b23ae3
Reviewed-on: https://chromium-review.googlesource.com/837598
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Commit-Queue: Zhenyao Mo <zmo@chromium.org>
Cr-Commit-Position: refs/heads/master@{#525587}
[modify] https://crrev.com/be456d029607d8460a955e0f4af93db838d766e5/content/browser/gpu/gpu_process_host.cc

Comment 12 by kbr@chromium.org, Dec 21 2017

Catapult change rolled in here:
https://chromium-review.googlesource.com/838369

Let's see whether it's cleared up the bots.

Comment 13 by kbr@chromium.org, Dec 22 2017

Cc: perezju@chromium.org nedngu...@google.com
That change *definitely* fixed the flakiness issues on this bot. It's been rock-solid green all day.

perezju@ requested a follow-up to restore one of the two timeouts in that file to the original value. Doing that in https://chromium-review.googlesource.com/841505 and CQ'ing that TBR'd so that we can hopefully have some runs overnight with that change auto-rolled in.

Project Member

Comment 14 by bugdroid1@chromium.org, Dec 22 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/cd1fd5940f684f7f327cdf301e5a554a31b31861

commit cd1fd5940f684f7f327cdf301e5a554a31b31861
Author: Kenneth Russell <kbr@chromium.org>
Date: Fri Dec 22 04:01:42 2017

Restore original timeout in create-and-connect method.

The increased timeout in an earlier commit is definitely needed in
GetSystemInfo, but it is probably not necessary in
_CreateAndConnectBrowserInspectorWebsocketIfNeeded. Restore original
10s timeout there.

BUG= chromium:796437 
TBR=nednguyen@google.com, perezju@chromium.org

Change-Id: Ib01d60630282a1c1c68a63d80552588838bd9e47
Reviewed-on: https://chromium-review.googlesource.com/841505
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Reviewed-by: Kenneth Russell <kbr@chromium.org>

[modify] https://crrev.com/cd1fd5940f684f7f327cdf301e5a554a31b31861/telemetry/telemetry/internal/backends/chrome_inspector/devtools_client_backend.py

Comment 15 by kbr@chromium.org, Dec 22 2017

Blocking: 797444
Status: Verified (was: Started)
cd1fd5940f684f7f327cdf301e5a554a31b31861 rolled forward into Chromium in this Catapult roll:

https://chromium-review.googlesource.com/842082

No evidence of flakes on either of these bots:

https://ci.chromium.org/buildbot/chromium.gpu/Linux%20Debug%20%28NVIDIA%29/?limit=200
https://ci.chromium.org/buildbot/chromium.gpu.fyi/Linux%20Debug%20%28NVIDIA%29/?limit=200

Finally fixed.

(Also filed  Issue 797444  about the auto-rolls not updating bugs any more.)

Sign in to add a comment