Issue metadata
Sign in to add a comment
|
New CHECK in GPU info collection affecting Linux bots on GPU FYI waterfall |
||||||||||||||||||||||
Issue descriptionThe CHECK added in this CL: https://chromium-review.googlesource.com/832933 has turned the Linux Debug bots red: https://ci.chromium.org/buildbot/chromium.gpu/Linux%20Debug%20%28NVIDIA%29/?limit=200 https://ci.chromium.org/buildbot/chromium.gpu.fyi/Linux%20Debug%20%28NVIDIA%29/?limit=200 First failing build on each: https://ci.chromium.org/buildbot/chromium.gpu/Linux%20Debug%20%28NVIDIA%29/87268 https://ci.chromium.org/buildbot/chromium.gpu.fyi/Linux%20Debug%20%28NVIDIA%29/41072 We need to try increasing the timeout from 5s to something longer on the Debug bots.
,
Dec 20 2017
This has to wait for https://chromium-review.googlesource.com/835420, the fix for Issue 796386 , to land.
,
Dec 20 2017
Increasing the watchdog timeout from 5s to 20s on the Linux Debug bots in: https://chromium-review.googlesource.com/835475
,
Dec 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/35434ad196a8b643bef9c26dd500b9a787a226b3 commit 35434ad196a8b643bef9c26dd500b9a787a226b3 Author: Kenneth Russell <kbr@chromium.org> Date: Wed Dec 20 03:32:50 2017 Increase DevTools' GPU info watchdog to 20s on Linux Debug. These bots are crashing with the 5s timeout. BUG= 796437 TBR=zmo@chromium.org, pfeldman@chromium.org Change-Id: I2c2b9b491af93a7dcce86a27f6d5af68bb2ee81f Reviewed-on: https://chromium-review.googlesource.com/835475 Reviewed-by: Kenneth Russell <kbr@chromium.org> Commit-Queue: Kenneth Russell <kbr@chromium.org> Cr-Commit-Position: refs/heads/master@{#525249} [modify] https://crrev.com/35434ad196a8b643bef9c26dd500b9a787a226b3/content/browser/devtools/protocol/system_info_handler.cc
,
Dec 20 2017
Ken, bot is better now, but flakily fails to start browser. Is it the same reason, or something else?
,
Dec 20 2017
Something's definitely going wrong and it's probably the same root cause. It looks like something is generally going wrong with browser startup and/or GPU process launching. For example these WebGL conformance tests: https://ci.chromium.org/buildbot/chromium.gpu.fyi/Linux%20Debug%20%28NVIDIA%29/41100 failed because 3 times, the browser failed to start properly (DevTools failed to connect), and this error is in the logs: [8897:8936:1220/142520.877302:ERROR:browser_gpu_channel_host_factory.cc(120)] Failed to launch GPU process. We're still trying to figure out what's going on. It's probably a race condition due to the Debug builds and these bots being so slow.
,
Dec 20 2017
It's almost certain that there's some sort of race condition in this area -- probably competing timeouts. But the key is: INFO:root:Browser started (pid=19253). INFO:root:OS: linux trusty ERROR:root:Failed with WebSocketTimeoutException while starting the browser backend. So Telemetry thinks the browser started, but the first connection via DevTools failed. It's not clear what command exactly failed. Maybe the SystemInfo collection?
,
Dec 20 2017
,
Dec 21 2017
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/928fc9d37730186db0551c401314f41db4111806 commit 928fc9d37730186db0551c401314f41db4111806 Author: Kenneth Russell <kbr@chromium.org> Date: Thu Dec 21 00:43:56 2017 Increase a couple of Telemetry's internal timeouts to 60s. The first WebSocket connection to the browser, and fetching of SystemInfo, depends on a lot of work being done in the target browser, and in Linux Debug builds on the bots, these timeouts (of 10s in some cases) were being hit. Explicitly indicate which are the likely first calls to the newly- started browser, and increase their timeouts. It seemed difficult to figure out the browser configuration (Release/Debug) at this level, so do this in all cases. Make GetSystemInfo's timeout an optional argument, stop specifying it at the call site, and use the increased timeout on all platforms. BUG= chromium:796437 TBR=nednguyen@google.com, perezju@chromium.org Change-Id: I34426a5a30dbc2231fc740476cbdcf343a2d1ffb Reviewed-on: https://chromium-review.googlesource.com/838184 Reviewed-by: Kenneth Russell <kbr@chromium.org> Commit-Queue: Kenneth Russell <kbr@chromium.org> [modify] https://crrev.com/928fc9d37730186db0551c401314f41db4111806/telemetry/telemetry/internal/backends/chrome/chrome_browser_backend.py [modify] https://crrev.com/928fc9d37730186db0551c401314f41db4111806/telemetry/telemetry/internal/backends/chrome_inspector/devtools_client_backend.py
,
Dec 21 2017
,
Dec 21 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/be456d029607d8460a955e0f4af93db838d766e5 commit be456d029607d8460a955e0f4af93db838d766e5 Author: Zhenyao Mo <zmo@chromium.org> Date: Thu Dec 21 03:44:52 2017 Add some DLOG to tell why GpuProcessHost::Get() returns nullptr BUG= 796437 TEST=bots (logs are not polluted by this) R=kbr@chromium.org Change-Id: I9a87b15801f236837d9d89e8a59deab473b23ae3 Reviewed-on: https://chromium-review.googlesource.com/837598 Reviewed-by: Kenneth Russell <kbr@chromium.org> Commit-Queue: Zhenyao Mo <zmo@chromium.org> Cr-Commit-Position: refs/heads/master@{#525587} [modify] https://crrev.com/be456d029607d8460a955e0f4af93db838d766e5/content/browser/gpu/gpu_process_host.cc
,
Dec 21 2017
Catapult change rolled in here: https://chromium-review.googlesource.com/838369 Let's see whether it's cleared up the bots.
,
Dec 22 2017
That change *definitely* fixed the flakiness issues on this bot. It's been rock-solid green all day. perezju@ requested a follow-up to restore one of the two timeouts in that file to the original value. Doing that in https://chromium-review.googlesource.com/841505 and CQ'ing that TBR'd so that we can hopefully have some runs overnight with that change auto-rolled in.
,
Dec 22 2017
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/cd1fd5940f684f7f327cdf301e5a554a31b31861 commit cd1fd5940f684f7f327cdf301e5a554a31b31861 Author: Kenneth Russell <kbr@chromium.org> Date: Fri Dec 22 04:01:42 2017 Restore original timeout in create-and-connect method. The increased timeout in an earlier commit is definitely needed in GetSystemInfo, but it is probably not necessary in _CreateAndConnectBrowserInspectorWebsocketIfNeeded. Restore original 10s timeout there. BUG= chromium:796437 TBR=nednguyen@google.com, perezju@chromium.org Change-Id: Ib01d60630282a1c1c68a63d80552588838bd9e47 Reviewed-on: https://chromium-review.googlesource.com/841505 Commit-Queue: Kenneth Russell <kbr@chromium.org> Reviewed-by: Kenneth Russell <kbr@chromium.org> [modify] https://crrev.com/cd1fd5940f684f7f327cdf301e5a554a31b31861/telemetry/telemetry/internal/backends/chrome_inspector/devtools_client_backend.py
,
Dec 22 2017
cd1fd5940f684f7f327cdf301e5a554a31b31861 rolled forward into Chromium in this Catapult roll: https://chromium-review.googlesource.com/842082 No evidence of flakes on either of these bots: https://ci.chromium.org/buildbot/chromium.gpu/Linux%20Debug%20%28NVIDIA%29/?limit=200 https://ci.chromium.org/buildbot/chromium.gpu.fyi/Linux%20Debug%20%28NVIDIA%29/?limit=200 Finally fixed. (Also filed Issue 797444 about the auto-rolls not updating bugs any more.) |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by kbr@chromium.org
, Dec 20 2017