Issue metadata
Sign in to add a comment
|
Win10 Debug (NVIDIA) can't start Chrome's GPU process (apparently) |
||||||||||||||||||||||
Issue descriptionOn Win10 Debug (NVIDIA) - vm91-m1 - starting with this build: https://ci.chromium.org/buildbot/chromium.gpu/Win10%20Debug%20(NVIDIA)/719 there are a crashes in GPU process startup. It seems there's something wrong with the state of this bot. Some fail with: [6436:8188:0302/144606.167:FATAL:compositor_util.cc(59)] Check failed: manager->IsGpuFeatureInfoAvailable(). Others seem to say this instead: [1552:1144:0302/144640.268:ERROR:gpu_process_host.cc(465)] !GpuDataManagerImpl::GpuAccessAllowed() [1552:1144:0302/144640.462:ERROR:gpu_process_host.cc(465)] !GpuDataManagerImpl::GpuAccessAllowed() [1552:1144:0302/144640.462:ERROR:browser_gpu_channel_host_factory.cc(119)] Failed to launch GPU process. Perhaps the machine needs a reboot? Perhaps something bad has happened to the system or driver state. +zmo because I think you added the IsGpuFeatureInfoAvailable DCHECK recently. Apparently it is not always true, in some weird case.
,
Mar 5 2018
vm91-m1 just launches the tests to chromium-swarm. The shards that are failing are on a variety of hosts which points the cause at not the hosts themselves?
,
Mar 5 2018
Let me build on my Windows bot and see if I can reproduce.
,
Mar 5 2018
Issue 818727 has been merged into this issue.
,
Mar 5 2018
Peter's right; this VM only triggers jobs on Swarming. All of the Swarming bots that are running the tasks from Win10 Debug (NVIDIA) are failing to launch the GPU process. Currently comparing this run: https://ci.chromium.org/buildbot/chromium.gpu/Win10%20Debug%20%28NVIDIA%29/718 and this working shard: https://chromium-swarm.appspot.com/task?id=3c0193806f7e2a10&refresh=10&show_raw=1 to this failing run: https://ci.chromium.org/buildbot/chromium.gpu/Win10%20Debug%20%28NVIDIA%29/719 and this failing shard: https://chromium-swarm.appspot.com/task?id=3c01b4c777793410&refresh=10&show_raw=1 Haven't yet found any suspicious blamelists in the jobs around this one.
,
Mar 5 2018
,
Mar 5 2018
,
Mar 5 2018
Note that the FYI version of this bot is similarly failing: https://ci.chromium.org/buildbot/chromium.gpu.fyi/Win10%20FYI%20Debug%20%28NVIDIA%29/?limit=200
,
Mar 5 2018
It may not mean anything but the passing swarming slave in #5 is an older r210 which is also reporting the onboard matrox (which is disabled in the bios). The failing slave is a newer r230 which has the onboard matrox fully disabled by the bios (i.e the os does not see it at all)
,
Mar 5 2018
,
Mar 5 2018
On my local machine, FATAL:process_map.cc(60): Check failed: it_and_inserted.second Not sure it's related to this bug or it's my local build gets corrupted.
,
Mar 5 2018
,
Mar 5 2018
Thanks Mo for doing the build – can you continue to dig to figure out what's going on?
,
Mar 5 2018
To be specific, not just GPU process, renderer process also failed to launch. I am manually bisecting right now.
,
Mar 5 2018
,
Mar 5 2018
,
Mar 5 2018
a few people in windows chat are also working on this, it's to do with exported functions it seems? "2410:4690 @ 1057360531 - LdrpAllocateTls - INFO: TlsVector 1DD6C1F8 Index 14 : 520 bytes copied from 68266000 to 1E027248 9378:8f04 @ 1057360531 - LdrpNameToOrdinal - WARNING: Procedure "?GetSystemScaleFactor@ScreenWin@win@display@@SAMXZ" could not be located in DLL at base 0x4C940000. 2410:4690 @ 1057360531 - LdrpAllocateTls - INFO: TlsVector 1DD6C1F8 Index 15 : 2 bytes copied from 72572000 to 1DCD7FC0 9378:8f04 @ 1057360531 - LdrpReportError - ERROR: Locating export "?GetSystemScaleFactor@ScreenWin@win@display@@SAMXZ" for DLL "C:\src\c\src\out\default\ui_base.dll" failed with status: 0xc0000139."
,
Mar 5 2018
Re #17: Using a non-component build resolved my problem.
,
Mar 5 2018
where is the chat happening, i.e. what's the room name? I'm also finding that win7 isn't starting, I'm getting an error about API-MS-WIN-POWER-BASE-L1-1-0.DLL
,
Mar 5 2018
is the regression range from comment 5: https://chromium.googlesource.com/chromium/src/+log/c830643ee44dc236f77e6b561b45ba697c71d12f..5a8814f02d665603fea6dc6372005eef196275f6 if so, only a few CLs...?
,
Mar 5 2018
https://chromium.googlesource.com/chromium/src/+/dc478ef126031896134faa8fc988d5f6cc5d87b7 is believed to be the source of #17, which may or may not be the source of the original bug.
,
Mar 5 2018
if #20 is correct, then https://chromium-review.googlesource.com/653581 is the only CL that vaguely touches the area in #17
,
Mar 5 2018
I just filed crbug.com/818852. Blamelists are skipping CLs.
,
Mar 5 2018
AIUI, the issue with the CL cited in #21 is that we have a name collision between chrome's Display.dll (used in component builds) and the system's Display.dll in System32. The new mitigation policy forces us to load the system32 version, which lacks the APIs we use.
,
Mar 5 2018
In the skipped blamelist, there is a Win sandbox related CL: crrev.com/540634 Looks a very likely candidate
,
Mar 5 2018
,
Mar 5 2018
,
Mar 5 2018
,
Mar 5 2018
Can we increase trybot coverage to build/test this configuration? I think it was just a component build that broke? We can add extra trybots in presubmit for sandbox changes. Is 'win_optional_gpu_tests_rel' the right bot?
,
Mar 6 2018
It's a DCHECK failure and win_optional_gpu_tests_rel doesn't have dcheck_always_on any more.
,
Mar 6 2018
Actually win_optional_gpu_tests_rel and the other trybots do set dcheck_always_on; it's just the waterfall bots which do not set it any more. This can be seen in https://cs.chromium.org/chromium/src/tools/mb/mb_config.pyl . We don't have capacity on the GPU bots to add a new CQ bot for the component build. Right now Chromium's testing strategy for the component build is that it's tested on the waterfall Debug bots. (See mb_config.pyl and how the various waterfall and trybots are configured. If we want to switch win7_chromium_rel_ng to test the component build rather than the statically linked build, that's fine with me though probably not fine with others. Not sure what tests win_chromium_dbg_ng runs? Basically – if one of the non-GPU Win Chromium trybot configurations can be switched to test the component build, I would hope that at least one of the regular, non-GPU, test harnesses would have caught this regression.
,
Mar 6 2018
For the record, this was a debug, component-build, run-time-gpu issue. A CQ bot that builds this config, and runs any test that fires up a Chromium with GPU child process would be required. Oh, and it would need to be a bot that is kept up to date with latest Win10 (as new security mitigations are usually in the latest update). Fix close to landing: https://chromium-review.googlesource.com/c/chromium/src/+/950128
,
Mar 6 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/fc897c242fac290704e0f3e9297ce97012e849f0 commit fc897c242fac290704e0f3e9297ce97012e849f0 Author: Penny MacNeil <pennymac@chromium.org> Date: Tue Mar 06 01:32:20 2018 [Windows Sandbox] Disable PreferSys32 mitigation for now. Breaking debug component build, gpu process. There is a component DLL called display.dll that is clashing with the same name in system32. Temporary disable for the moment. (Note: also removing the other IMAGE_LOAD mitigations from the browser process (post-startup) for now. Only set in child processes.) BUG= 818388 TEST=sbox_integration_tests.exe, ProcessMitigationsTest.* Change-Id: I1198e2222493a792df675d4b7675220d0da3459f Reviewed-on: https://chromium-review.googlesource.com/950128 Reviewed-by: Will Harris <wfh@chromium.org> Commit-Queue: Penny MacNeil <pennymac@chromium.org> Cr-Commit-Position: refs/heads/master@{#541021} [modify] https://crrev.com/fc897c242fac290704e0f3e9297ce97012e849f0/content/app/sandbox_helper_win.cc [modify] https://crrev.com/fc897c242fac290704e0f3e9297ce97012e849f0/services/service_manager/sandbox/win/sandbox_win.cc
,
Mar 6 2018
,
Mar 6 2018
|
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by yhirano@chromium.org
, Mar 5 2018