"SitePerProcessBrowserTest.*" tests are flaky |
|||||||||||||||||||||||||
Issue description"SitePerProcessBrowserTest.KillingRendererClearsDescendantProxies" is flaky. This issue was created automatically by the chromium-try-flakes app. Please find the right owner to fix the respective test/step and assign this issue to them. If the step/test is infrastructure-related, please add Infra-Troopers label and change issue status to Untriaged. When done, please remove the issue from Sheriff Bug Queue by removing the Sheriff-Chromium label. We have detected 3 recent flakes. List of all flakes can be found at https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNySwsSBUZsYWtlIkBTaXRlUGVyUHJvY2Vzc0Jyb3dzZXJUZXN0LktpbGxpbmdSZW5kZXJlckNsZWFyc0Rlc2NlbmRhbnRQcm94aWVzDA. Flaky tests should be disabled within 30 minutes unless culprit CL is found and reverted. Please see more details here: https://sites.google.com/a/chromium.org/dev/developers/tree-sheriffs/sheriffing-bug-queues#triaging-auto-filed-flakiness-bugs This flaky test/step was previously tracked in issue 693365 .
,
Jan 2 2018
,
Jan 2 2018
Sadrul and Navid, it looks like this is a hit testing crash via RenderWidgetInputHandler::GetWidgetRoutingIdAtPoint. Could some of your recent work have caused this? It sounds like a real bug rather than a test that should be disabled.
[ RUN ] SitePerProcessBrowserTest.KillingRendererClearsDescendantProxies
DevTools listening on ws://127.0.0.1:57030/devtools/browser/a3527897-738b-4956-a9f4-e63e9dc0730c
[3120:4936:0102/093348.602:INFO:dxva_video_decode_accelerator_win.cc(1315)] mf.dll is required for hardware video decoding
[3120:4936:0102/093348.603:INFO:media_foundation_video_encode_accelerator_win.cc(376)] mf.dll is required for encoding
[4904:1380:0102/093349.122:WARNING:render_frame_host_impl.cc(2680)] OnDidStopLoading was called twice.
[4904:1380:0102/093349.413:WARNING:render_frame_host_impl.cc(2680)] OnDidStopLoading was called twice.
[5760:4764:0102/093349.421:FATAL:PaintLayer.cpp(1894)] Check failed: GetLayoutObject().GetDocument().Lifecycle().GetState() >= DocumentLifecycle::kCompositingClean.
Backtrace:
base::debug::StackTrace::StackTrace [0x00007FF662675F34+36]
logging::LogMessage::~LogMessage [0x00007FF6625ECBEE+94]
blink::PaintLayer::HitTestLayer [0x00007FF66432F21F+127]
blink::PaintLayer::HitTest [0x00007FF66432F06B+459]
blink::LayoutView::HitTestNoLifecycleUpdate [0x00007FF6641F3CC1+273]
blink::LayoutView::HitTest [0x00007FF6641F3A55+53]
blink::EventHandler::HitTestResultAtPoint [0x00007FF6640612DA+602]
blink::WebFrameWidgetImpl::HitTestResultForRootFramePos [0x00007FF664F441E0+176]
blink::WebFrameWidgetImpl::CoreHitTestResultAt [0x00007FF664F427E2+226]
blink::WebFrameWidgetImpl::HitTestResultAt [0x00007FF664F426C8+40]
content::RenderWidgetInputHandler::GetWidgetRoutingIdAtPoint [0x00007FF6647C7934+132]
content::InputTargetClientImpl::FrameSinkIdAt [0x00007FF6647341BC+44]
viz::mojom::InputTargetClientStubDispatch::AcceptWithResponder [0x00007FF660F4C750+256]
viz::mojom::InputTargetClientStub<mojo::RawPtrImplRefTraits<viz::mojom::InputTargetClient> >::AcceptWithResponder [0x00007FF664734287+55]
mojo::InterfaceEndpointClient::HandleValidatedMessage [0x00007FF6623A96C9+697]
mojo::FilterChain::Accept [0x00007FF6623B1B64+148]
mojo::InterfaceEndpointClient::HandleIncomingMessage [0x00007FF6623AA607+119]
mojo::internal::MultiplexRouter::ProcessIncomingMessage [0x00007FF6623A5E96+726]
mojo::internal::MultiplexRouter::Accept [0x00007FF6623A5939+345]
mojo::FilterChain::Accept [0x00007FF6623B1B64+148]
mojo::Connector::ReadSingleMessage [0x00007FF6623AEBB6+438]
mojo::Connector::ReadAllAvailableMessages [0x00007FF6623AF3CD+125]
mojo::Connector::OnHandleReadyInternal [0x00007FF6623AF238+136]
mojo::SimpleWatcher::OnHandleReady [0x00007FF662B43E7F+255]
base::debug::TaskAnnotator::RunTask [0x00007FF662690AC0+288]
blink::scheduler::TaskQueueManager::ProcessTaskFromWorkQueue [0x00007FF6621F74BE+542]
blink::scheduler::TaskQueueManager::DoWork [0x00007FF6621F6D28+952]
base::debug::TaskAnnotator::RunTask [0x00007FF662690AC0+288]
blink::scheduler::internal::ThreadControllerImpl::DoWork [0x00007FF6621FBFC0+240]
base::debug::TaskAnnotator::RunTask [0x00007FF662690AC0+288]
base::internal::IncomingTaskQueue::RunTask [0x00007FF6626B409B+123]
base::MessageLoop::RunTask [0x00007FF662622279+665]
base::MessageLoop::DeferOrRunPendingTask [0x00007FF6626226CC+188]
base::MessageLoop::DoWork [0x00007FF66262293D+557]
base::MessagePumpDefault::Run [0x00007FF6626B658F+175]
base::MessageLoop::Run [0x00007FF662621AD9+201]
base::RunLoop::Run [0x00007FF66260B4D9+249]
content::RendererMain [0x00007FF664721B70+624]
content::ContentMainRunnerImpl::Run [0x00007FF661077072+418]
service_manager::Main [0x00007FF6636A60E1+1201]
content::ContentMain [0x00007FF66107684E+62]
content::LaunchTests [0x00007FF6623004E1+369]
main [0x00007FF6649922E0+72]
__scrt_common_main_seh [0x00007FF66549A779+285] (f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:283)
BaseThreadInitThunk [0x00007FFE086C8364+20]
RtlUserThreadStart [0x00007FFE08D75E91+33]
Received fatal exception EXCEPTION_BREAKPOINT
Backtrace:
base::debug::BreakDebugger [0x00007FF662696B3D+13]
logging::LogMessage::~LogMessage [0x00007FF6625ECF5E+974]
blink::PaintLayer::HitTestLayer [0x00007FF66432F21F+127]
blink::PaintLayer::HitTest [0x00007FF66432F06B+459]
blink::LayoutView::HitTestNoLifecycleUpdate [0x00007FF6641F3CC1+273]
blink::LayoutView::HitTest [0x00007FF6641F3A55+53]
blink::EventHandler::HitTestResultAtPoint [0x00007FF6640612DA+602]
blink::WebFrameWidgetImpl::HitTestResultForRootFramePos [0x00007FF664F441E0+176]
blink::WebFrameWidgetImpl::CoreHitTestResultAt [0x00007FF664F427E2+226]
blink::WebFrameWidgetImpl::HitTestResultAt [0x00007FF664F426C8+40]
content::RenderWidgetInputHandler::GetWidgetRoutingIdAtPoint [0x00007FF6647C7934+132]
content::InputTargetClientImpl::FrameSinkIdAt [0x00007FF6647341BC+44]
viz::mojom::InputTargetClientStubDispatch::AcceptWithResponder [0x00007FF660F4C750+256]
viz::mojom::InputTargetClientStub<mojo::RawPtrImplRefTraits<viz::mojom::InputTargetClient> >::AcceptWithResponder [0x00007FF664734287+55]
mojo::InterfaceEndpointClient::HandleValidatedMessage [0x00007FF6623A96C9+697]
mojo::FilterChain::Accept [0x00007FF6623B1B64+148]
mojo::InterfaceEndpointClient::HandleIncomingMessage [0x00007FF6623AA607+119]
mojo::internal::MultiplexRouter::ProcessIncomingMessage [0x00007FF6623A5E96+726]
mojo::internal::MultiplexRouter::Accept [0x00007FF6623A5939+345]
mojo::FilterChain::Accept [0x00007FF6623B1B64+148]
mojo::Connector::ReadSingleMessage [0x00007FF6623AEBB6+438]
mojo::Connector::ReadAllAvailableMessages [0x00007FF6623AF3CD+125]
mojo::Connector::OnHandleReadyInternal [0x00007FF6623AF238+136]
mojo::SimpleWatcher::OnHandleReady [0x00007FF662B43E7F+255]
base::debug::TaskAnnotator::RunTask [0x00007FF662690AC0+288]
blink::scheduler::TaskQueueManager::ProcessTaskFromWorkQueue [0x00007FF6621F74BE+542]
blink::scheduler::TaskQueueManager::DoWork [0x00007FF6621F6D28+952]
base::debug::TaskAnnotator::RunTask [0x00007FF662690AC0+288]
blink::scheduler::internal::ThreadControllerImpl::DoWork [0x00007FF6621FBFC0+240]
base::debug::TaskAnnotator::RunTask [0x00007FF662690AC0+288]
base::internal::IncomingTaskQueue::RunTask [0x00007FF6626B409B+123]
base::MessageLoop::RunTask [0x00007FF662622279+665]
base::MessageLoop::DeferOrRunPendingTask [0x00007FF6626226CC+188]
base::MessageLoop::DoWork [0x00007FF66262293D+557]
base::MessagePumpDefault::Run [0x00007FF6626B658F+175]
base::MessageLoop::Run [0x00007FF662621AD9+201]
base::RunLoop::Run [0x00007FF66260B4D9+249]
content::RendererMain [0x00007FF664721B70+624]
content::ContentMainRunnerImpl::Run [0x00007FF661077072+418]
service_manager::Main [0x00007FF6636A60E1+1201]
content::ContentMain [0x00007FF66107684E+62]
content::LaunchTests [0x00007FF6623004E1+369]
main [0x00007FF6649922E0+72]
__scrt_common_main_seh [0x00007FF66549A779+285] (f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:283)
BaseThreadInitThunk [0x00007FFE086C8364+20]
RtlUserThreadStart [0x00007FFE08D75E91+33]
../../content/browser/site_per_process_browsertest.cc(3565): error: Expected equality of these values:
site_c_url
Which is: http://baz.com:57031/title1.html
node4->current_url()
Which is:
../../content/browser/site_per_process_browsertest.cc(3581): error: Expected equality of these values:
" Site A ------------ proxies for B C\n" " |--Site B ------- proxies for A C\n" " | +--Site C -- proxies for A B\n" " +--Site A ------- proxies for B C\n" "Where A = http://a.com/\n" " B = http://bar.com/\n" " C = http://baz.com/"
Which is: " Site A ------------ proxies for B C\n |--Site B ------- proxies for A C\n | +--Site C -- proxies for A B\n +--Site A ------- proxies for B C\nWhere A = http://a.com/\n B = http://bar.com/\n C = http://baz.com/"
DepictFrameTree(root)
Which is: " Site A ------------ proxies for B C\n |--Site B ------- proxies for A C\n | +--Site C -- proxies for A B\n +--Site A ------- proxies for B C\nWhere A = http://a.com/\n B = http://bar.com/\n C = http://baz.com/ (no process)"
With diff:
@@ -5,3 +5,3 @@
Where A = http://a.com/
B = http://bar.com/
- C = http://baz.com/
+ C = http://baz.com/ (no process)
[ FAILED ] SitePerProcessBrowserTest.KillingRendererClearsDescendantProxies, where TypeParam = and GetParam() = (3008 ms)
,
Jan 3 2018
As this is assigned to an owner, I am taking it out of the sheriff queue.
,
Jan 3 2018
,
Jan 3 2018
,
Jan 3 2018
,
Jan 3 2018
Detected 9 new flakes for test/step "SitePerProcessBrowserTest.KillingRendererClearsDescendantProxies". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNySwsSBUZsYWtlIkBTaXRlUGVyUHJvY2Vzc0Jyb3dzZXJUZXN0LktpbGxpbmdSZW5kZXJlckNsZWFyc0Rlc2NlbmRhbnRQcm94aWVzDA. This message was posted automatically by the chromium-try-flakes app. Since flakiness is ongoing, the issue was moved back into Sheriff Bug Queue (unless already there).
,
Jan 4 2018
Chris, I was wondering whether you can help with this issue. It seems that we do call UpdateLifecycleToPrePaintClean https://cs.chromium.org/chromium/src/third_party/WebKit/Source/core/layout/LayoutView.cpp?gsn=HitTest&l=120 before calling the hittest. But the state of the layout seems to remain stale. When I follow the UpdateLifecycleToPrePaintClean path down I can see some comments about OOPIF and the cases that we may not update the layout. In this case we do have an OOPIF in the test page. But I'm not sure which one of those checks get hit as I can not get this test to fail on my local machine. Do you have any insight about this?
,
Jan 4 2018
,
Jan 5 2018
Detected 11 new flakes for test/step "SitePerProcessBrowserTest.KillingRendererClearsDescendantProxies". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNySwsSBUZsYWtlIkBTaXRlUGVyUHJvY2Vzc0Jyb3dzZXJUZXN0LktpbGxpbmdSZW5kZXJlckNsZWFyc0Rlc2NlbmRhbnRQcm94aWVzDA. This message was posted automatically by the chromium-try-flakes app. Since flakiness is ongoing, the issue was moved back into Sheriff Bug Queue (unless already there).
,
Jan 5 2018
That is indeed odd. It should be advancing the lifecycly for that frame, not sure why it does not. The next step is to reproduce this locally and debug..
,
Jan 5 2018
Anyone has any suggestion on how to reproduce this locally? I'm just building a debug build and run the test a few times hoping to crash. But all of the runs passed. Am I missing a particular config maybe? I assume the test automatically enables the OOPIF. So I don't need to pass any extra parameters.
,
Jan 6 2018
Detected 4 new flakes for test/step "SitePerProcessBrowserTest.KillingRendererClearsDescendantProxies". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNySwsSBUZsYWtlIkBTaXRlUGVyUHJvY2Vzc0Jyb3dzZXJUZXN0LktpbGxpbmdSZW5kZXJlckNsZWFyc0Rlc2NlbmRhbnRQcm94aWVzDA. This message was posted automatically by the chromium-try-flakes app.
,
Jan 8 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/24a68053f3548ec8c97442b95e3301c45100e79d commit 24a68053f3548ec8c97442b95e3301c45100e79d Author: Alice Boxhall <aboxhall@chromium.org> Date: Mon Jan 08 02:42:42 2018 Disable flaky test SitePerProcessBrowserTest.KillingRendererClearsDescendantProxies on Win Disabled test until we can figure out the root cause. Bug: 798476 Change-Id: I76bbd8226483a88984880dc014e65dfe96064f80 TBR=nzolghadr@chromium.org Change-Id: I76bbd8226483a88984880dc014e65dfe96064f80 Reviewed-on: https://chromium-review.googlesource.com/852595 Commit-Queue: Alice Boxhall <aboxhall@chromium.org> Reviewed-by: Alice Boxhall <aboxhall@chromium.org> Cr-Commit-Position: refs/heads/master@{#527557} [modify] https://crrev.com/24a68053f3548ec8c97442b95e3301c45100e79d/content/browser/site_per_process_browsertest.cc
,
Jan 8 2018
,
Jan 9 2018
The test automatically enables OOPIFs and doesn't require any special flags to run, but I haven't been able to repro locally either. :( I've tried local Linux builds, debug and release, with --enable-pixel-output-in-tests, with flags that are passed to the bots, in a group with other tests -- no luck. I noticed that the flakes all seem to be on the Win bots, so I also tried it on my Win box - again, no luck, but you might want to try the same, just in case my Win checkout isn't fully up to date. Other than that, my only suggestion would be to put some debugging info in, upload to gerrit, and run win7_chromium_rel_ng or win10_chromium_x64_rel_ng, to see if you can get to any failures and collect any new info. I know that's a really inefficient way to debug though. :(
,
Jan 9 2018
Should've mentioned that I've tried all of #17 with --gtest-repeat=50, and still no failures came up.
,
Jan 9 2018
Yup. Dave and I were talking yesterday and he suggested the try bot way too.
,
Jan 11 2018
Any progress? Looks like this is affecting multiple tests: see issue 797860 and issue 801146 for the following tests. SitePerProcess/TaskManagerOOPIFBrowserTest.KillSubframe/0 SitePerProcess/TaskManagerOOPIFBrowserTest.OrderingOfDependentRows/0 Feel free to dupe those into here to have a single place to investigate, as long as they all get fixed and re-enabled. Do we have evidence this crash is happening in the wild as well?
,
Jan 11 2018
Marking as a blocking bug for the meta bug for OOPIF Viz hit testing ( issue 796651 ).
,
Jan 11 2018
I haven't seen any report in the wild yet. I'm sending CLs to Gerrit and wait for the try bot to hopefully fail and see what the output is and which DCHECK failed. Every trial takes an hour or two until I get the try bot result back. So it is not a very fast progress.
,
Jan 11 2018
I enabled the test and tried to get the bots to fail. After a few runs still all the windows bots pass. https://chromium-review.googlesource.com/c/chromium/src/+/861943
,
Jan 12 2018
creis@ any idea how to proceed?
,
Jan 12 2018
Try changing the DCHECK that is failing into a CHECK, in case the bots you are testing against don't use DCHECK mode.
,
Jan 12 2018
Also, do you have access tp a windows machine you can build on?
,
Jan 12 2018
chrishtr@: Good point-- the try bots might not hit the DCHECKs. If that doesn't help, you could try landing a CL to re-enable it, possibly with some log statements that indicate what was happening at the time of the crash. Then if it flakes again, we have more info.
,
Jan 13 2018
#25-27: I thought the bots all run with dcheck_always_on, so that may not help? After all, looking at #3, the failure itself is a DCHECK, right? Another way might be to get access to a trybot VM where this it failing somewhat consistently and debug there. Might be quite involved, but mentioning it just in case. See https://sites.google.com/a/google.com/chrome-infrastructure/golo/remote-access. I'd try the suggestion from #27 first though.
,
Jan 16 2018
Is there any chance that the lifecycle is not being updated because of the early-return here [1]? I am not really sure what 'IsActive()' means ... is it possible that the browser sends a targeting request while the page is hidden (with dirty layout)? Note that the targeting requests use a different message-pipe, so the IPCs sent for visibility and targeting request will not necessarily be handled in the same order in the renderer that they were sent in the browser. [1] https://cs.chromium.org/chromium/src/third_party/WebKit/Source/core/frame/LocalFrameView.cpp?type=cs&sq=package:chromium&l=3132
,
Jan 16 2018
Another option is that the test is done, and it is closing the renderer/page, but there were in-flight targeting requests that get handled after that. Is that possible?
,
Jan 16 2018
,
Jan 16 2018
Seeing the same flake on ContentSettingBubbleModelMixedScriptOopifTest.MixedContentInCrossSiteIframe on Win7, apparently.
,
Jan 16 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9238b731beea3586e3c4557b96e942b2d3c8ce25 commit 9238b731beea3586e3c4557b96e942b2d3c8ce25 Author: Sadrul Habib Chowdhury <sadrul@chromium.org> Date: Tue Jan 16 19:08:06 2018 blink: Replace a DCHECK with a DCHECK_GE. Use DCHECK_GE, so that when the dcheck hits, we get a little bit more information about the failure. BUG= 798476 Cq-Include-Trybots: master.tryserver.blink:linux_trusty_blink_rel;master.tryserver.chromium.linux:linux_layout_tests_slimming_paint_v2 Change-Id: I99620ad9d6ec30577b30349cfb036249e48ec2b4 Reviewed-on: https://chromium-review.googlesource.com/867479 Commit-Queue: Chris Harrelson <chrishtr@chromium.org> Reviewed-by: Chris Harrelson <chrishtr@chromium.org> Cr-Commit-Position: refs/heads/master@{#529482} [modify] https://crrev.com/9238b731beea3586e3c4557b96e942b2d3c8ce25/third_party/WebKit/Source/core/paint/PaintLayer.cpp
,
Jan 18 2018
Some results from the DCHECK -> DCHECK_GE change: Check failed: GetLayoutObject().GetDocument().Lifecycle().GetState() >= DocumentLifecycle::kCompositingClean (10 vs. 13) (complete logs: https://logs.chromium.org/v/?s=chromium%2Fbb%2Ftryserver.chromium.win%2Fwin10_chromium_x64_rel_ng%2F64016%2F%2B%2Frecipes%2Fsteps%2Fcontent_browsertests__with_patch__on_Windows-10-14393%2F0%2Flogs%2FSitePerProcessBrowserTest.SubframeTouchEventRouting%2F0) It looks like the state is kLayoutClean, but we expect it to be kCompositingClean. I don't know much about the lifecycle in blink. So it doesn't give me any clues. But maybe someone more familiar with it can speculate what's happening?
,
Jan 18 2018
Sadrul, were are you seeing these failures? I have enabled the test and also changed the DCHECK to CHECK and I see no failure caused by this test in the try bots. https://chromium-review.googlesource.com/c/chromium/src/+/861943
,
Jan 18 2018
Just in random code-reviews when I look at the try failures. (on win7_ or win10_ bots, usually)
,
Jan 18 2018
But this particular test has been disabled for some time. How did you see that?
,
Jan 18 2018
That last failure is in a different test (see the logs). The same crash is affecting a lot of tests, even browser_tets (see, for example https://bugs.chromium.org/p/chromium/issues/detail?id=801146#c7)
,
Jan 18 2018
@sadrul: can you reproduce the failure locally on any test? That would be very helpful.
,
Jan 18 2018
I have unfortunately been unable to repro locally. I have attempted to run a lot of tests in parallel, slow down the process using nice (nice -n 15 ...) to see if that exposes any potential race, but no luck so far. :(
,
Jan 19 2018
,
Jan 19 2018
,
Jan 19 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d35abd616a8f88e34f37a28be800694ae741db51 commit d35abd616a8f88e34f37a28be800694ae741db51 Author: Sorin Jianu <sorin@chromium.org> Date: Fri Jan 19 01:14:17 2018 Disable flaky test SitePerProcessBrowserTest.SubframeTouchEventRouting. TBR: kenrb@chromium.org Bug: 798476 , 802828 Change-Id: I62f25924477f4d281483063bfb0a56b93252db90 Reviewed-on: https://chromium-review.googlesource.com/875294 Reviewed-by: Sorin Jianu <sorin@chromium.org> Commit-Queue: Sorin Jianu <sorin@chromium.org> Cr-Commit-Position: refs/heads/master@{#530374} [modify] https://crrev.com/d35abd616a8f88e34f37a28be800694ae741db51/content/browser/site_per_process_browsertest.cc
,
Jan 19 2018
,
Jan 19 2018
Issue 802827 has been merged into this issue.
,
Jan 19 2018
,
Jan 19 2018
Issue 803627 has been merged into this issue.
,
Jan 19 2018
,
Jan 19 2018
Issue 803641 has been merged into this issue.
,
Jan 19 2018
I'm still seeing flakes related to this on https://ci.chromium.org/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/, looks like it affects all of them (many were individually disabled). Coalesced all bug reports to make sure they're all re-enabled when this is fixed (and also they should all be disabled for now if you're actively working on this as it affects CQ). Thanks
,
Jan 19 2018
Thanks. This potentially affects all browser tests that pass input events to out-of-process iframes.
,
Jan 19 2018
Comment 34 mentions that the OOPIF is layout clean but not compositing clean. That could result from the OOPIF being render throttled when we send it a hit test request, but it is hard to believe that because the OOPIF should be visible in most of these tests, and the tests know not to attempt to route events before we have received a compositor frame from the OOPIF. Still, given the lack of progress on this, it might be worth temporarily disabling OOPIF throttling to see if that resolves the issue. Alternatively, we could temporarily modify LayoutView::HitTest to stop hit testing if the call to UpdateLifecycleToPrePaintClean() didn't go as far as it needs to. WebFrameWidgetImpl could return the current frame as the hit test result in that case, and I would expect tests to mostly behave normally. We would still need to figure out the underlying problem, but it would save the sheriffs a lot of work and make it so that we still get to keep all of our test coverage.
,
Jan 19 2018
> That could result from the OOPIF being render throttled when we send it a hit > test request, but it is hard to believe that because the OOPIF should be > visible in most of these tests, See comments #29 and #30 above. The visibility and targeting use two separate message-pipes I believe. So it is possible that these messages are processed out-of-sync in the renderer.
,
Jan 19 2018
An alternate is to bind the hit testing mojo channel to be an associated channel to the frame input handler channel. This is where the visibility (and all input) messages are sent.
,
Jan 19 2018
What I mean is that before the OOPIF submits a compositor frame, the browser won't know to do asynchronous hit testing and the InputTargetClient interface won't be used. So it it must have received layout at some point before this crash could occur. I don't know how these frames could go from an unthrottled state to a throttled state, even though throttling might be consistent with this DCHECK. Does anybody have any idea why we are seeing this behavior on tests that don't even use input? Is it just other sharded tests driving the mouse on the bots and it happening to pass over these tests' windows?
,
Jan 19 2018
If the window happens to show up right under the cursor, I think we send a mouse-move in those cases too.
,
Jan 19 2018
Using associated-pipe could help. But it'd be nice for performance reasons if targeting could remain a separate pipe, and not be delayed by other messages.
,
Jan 19 2018
I've managed occasional repros on a local Windows machine, and have some limited debugger output. It seems that we are sending a hit test query to a frame where LocalFrameView::IsAttached() is returning false. I'm trying to look at lifetimes to understand how this can be happening, because I think detachment from the frame tree is supposed to cause the RenderFrame to be synchronously destroyed, in which case it shouldn't be possible for these messages to arrive.
,
Jan 19 2018
It doesn't look like this has to do with destruction, rather this appears to be queries getting sent too early. I can observe InputTargetImpl::FrameSinkIdAt() getting called on a frame for which the LocalFrameView has not yet had LocalFrameView::BeginLifecycleUpdates() called -- and that prevents compositing. In that case we hit the DCHECK in PaintLayer.cpp. It's possible that we hit this problem on OOPIFs but not the main frame because OOPIFs are lifecycle throttled before BeginLifecycleUpdates() is called, and the main frame is not. I'm not exactly clear on how we get to the state of having a surface with a compositor frame for this renderer and then we send a message to the renderer to learn that it isn't pumping frames. This would be possible in a case where the OOPIF had been rendering one Document and then loaded a new Document (because the LocalFrameView gets cycled), but I don't know why that would be happening in these tests. The initial empty Document, before the real one, shouldn't render.
,
Jan 19 2018
,
Jan 19 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8547da34bf6cdaaece12d13caa13e461fe7fa54a commit 8547da34bf6cdaaece12d13caa13e461fe7fa54a Author: Ken Buchanan <kenrb@chromium.org> Date: Fri Jan 19 23:56:55 2018 Prevent hit testing frames before lifecycle updates This is a speculative fix to a problem that is causing flakiness on site isolation browser tests. It looks like the cause is that hit test queries are being sent to renderers before the LocalFrameView has begun lifecycle updates. These fail on out-of-process iframes when the Document has had layout but has not composited because BeginLifecycleUpdates() has not yet been called. This triggers a DCHECK on the hit testing path in PaintLayer. This change prevents hit testing in that case. Bug: 798476 Change-Id: Ie44fd3d5b988e1c758b7472af6444d7b260b15ec Reviewed-on: https://chromium-review.googlesource.com/876939 Commit-Queue: Ken Buchanan <kenrb@chromium.org> Reviewed-by: Dave Tapuska <dtapuska@chromium.org> Cr-Commit-Position: refs/heads/master@{#530669} [modify] https://crrev.com/8547da34bf6cdaaece12d13caa13e461fe7fa54a/third_party/WebKit/Source/core/frame/LocalFrameView.cpp [modify] https://crrev.com/8547da34bf6cdaaece12d13caa13e461fe7fa54a/third_party/WebKit/Source/core/frame/LocalFrameView.h [modify] https://crrev.com/8547da34bf6cdaaece12d13caa13e461fe7fa54a/third_party/WebKit/Source/core/input/EventHandler.cpp
,
Jan 20 2018
Listing two more issues that seem to have the same hit testing crash problem. It's not clear to me that all the bugs gab@ marked as duplicates are the same hit testing crash, though. For example, issue 802827 doesn't seem to have crashes in the failures: https://logs.chromium.org/v/?s=chromium%2Fbb%2Ftryserver.chromium.win%2Fwin10_chromium_x64_rel_ng%2F62958%2F%2B%2Frecipes%2Fsteps%2Fcontent_browsertests__with_patch__on_Windows-10-14393%2F0%2Flogs%2FSitePerProcessBrowserTest.InputEventRouterGesturePreventDefaultTargetMapTest%2F0 kenrb@: Can you try to put together a list of all the tests disabled in these various bugs (either duped to this one or listed as blocked on it)? My guess is that many can be re-enabled after r530669 (assuming that fixes the hit testing crash) and the rest may need a different fix. Either way, let's aim to get them all re-enabled in M66. Thanks!
,
Jan 20 2018
Yes I'll sort through these. And that's true, we've seen a few tests that are just racy and failing because of the higher latency in the new hit test system.
,
Jan 22 2018
,
Jan 22 2018
,
Jan 22 2018
Did the fix in comment 61 work?
,
Jan 22 2018
As far as I can tell. There were two tests that had seen flakes from this bug that weren't disabled yet, and I don't any instances since that fix landed. I will be re-enabling all of the other tests that were flaking with the same crash stack shortly.
,
Jan 22 2018
Cool!
,
Jan 22 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/cacd0085c363f70055d6a20a51ad670a1681c411 commit cacd0085c363f70055d6a20a51ad670a1681c411 Author: Ken Buchanan <kenrb@chromium.org> Date: Mon Jan 22 20:38:12 2018 Re-enable various SitePerProcessBrowserTests A bug that was causing flakes in many OOPIF-based browser tests was fixed in r530669. This re-enables several tests that were disabled due to that bug. TBR=msramek@chromium.org Bug: 798476 , 803641 , 803628 , 803627 , 802828 , 802281 Change-Id: Ib37c6761cd1ee46d8e49ef393db21209935862dd Reviewed-on: https://chromium-review.googlesource.com/879101 Commit-Queue: Ken Buchanan <kenrb@chromium.org> Reviewed-by: Alex Moshchuk <alexmos@chromium.org> Cr-Commit-Position: refs/heads/master@{#530981} [modify] https://crrev.com/cacd0085c363f70055d6a20a51ad670a1681c411/chrome/browser/ui/content_settings/content_setting_bubble_model_browsertest.cc [modify] https://crrev.com/cacd0085c363f70055d6a20a51ad670a1681c411/content/browser/site_per_process_browsertest.cc
,
Jan 23 2018
I haven't seen any flakes since the tests were re-enabled.
,
Jan 24 2018
|
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by tbansal@chromium.org
, Jan 2 2018Components: -Tests>Flaky Internals>Sandbox>SiteIsolation