Should iframes be able to capture pointer input |
||||||||||||||||||
Issue descriptionChrome currently lacks any explicit mouse/touch capturing API, but we're in the process of adding one as part of pointer events ( issue 196799 ), see https://w3c.github.io/pointerevents/#setting-pointer-capture. The security question is: is it OK from a security perspective to allow any arbitrary iframe to capture mouse/touch input? Standards discussion is here: https://github.com/w3c/pointerevents/issues/16, but we'd like to move the detailed discussion to a non-public forum since IE and Edge are already shipping implementations. Potential attack I can imagine: - A website has an on-screen keyboard or pin-pad - Also embeds an ad (or some other embed) in an iframe - A malicious ad wants to track pointer behavior, so periodically does a 'setPointerCapture' to an element in it's own frame for about 16ms then does a `releasePointerCapture`. - Over time the ad can build up a pattern of mouse/touch events that give it some insight into the location of the mouse/finger in the main document - With enough data the ad can likely predict things like text/numbers being entered into in-page keyboards / pin-pads Mitigations: - in-page keyboards / pin-pads are rare in practice. But tracking the location of input could be bad in a variety of situations (sensitive button clicks perhaps?). - pointers can only be captured while "active" (mouse button pressed, touch contacting screen), so most of the time the request will fail - the malicious code must constantly probe. In particular tap/click gestures will be hard to catch - when the pointer is captured to the iframe, the input won't be sent to the main page and so if the malicious code is probing for very long the user will notice that input isn't working reliably - the pointer ID for touch isn't very predictable (especially if the user hasn't touched the ad frame at all), so may need to probe a substantial number of possible IDs. The ID is likely predictable for mouse though. Note that this is separate from the potential user-annoyance concern. I think we all agree that sandboxed iframes should be restricted in their ability to capture. The question here is what should be possible in non-sandboxed iframes. So our request of the security team is would we be willing to ship such an API? Or is the risk great enough that we'd require this scenario to be blocked somehow prior to ship. Note that there could be some web compat implications to breaking such scenarios, but we don't have any data on the potential impact. Edge has been shipping such an API for several years and they say they feel the risk in practice is low and the implementation complexity of implementing a mitigation is potentially non-trivial so they'd rather not change the spec for this unless there's a good reason. Example proof-of-concept exploit (try in Edge): http://output.jsbin.com/nadoxi kenrb: are you perhaps the right person to help us figure this out?
,
Apr 26 2016
,
Apr 26 2016
,
Apr 26 2016
,
Apr 26 2016
Note that I encouraged the Edge folks to participate actively on this issue since AFAIK we don't have a better non-public forum for cross-vendor security bug discussion (their bug tracker cannot add non-Microsoft people to private bugs yet).
,
Apr 27 2016
,
Apr 27 2016
Sorry I'm having a tough time understanding the mechanics of pointer captur. A few questions to clarify my understanding: 1) Can an ad call setPointerCapture at any time or only on specific events, such as mouse clicks? Do those events have to be inside its frame? 2) When setPointerCapture is called successfully, what information will the ad receive, until pointer capture is released? 3) What events will cause pointer capture to be released (besides the ad calling release itself)?
,
Apr 27 2016
setPointerCapture() can be called at any time; it does not need to be called from inside an event handler. However, you must provide the pointer ID and that pointer ID must be associated with an active pointer, e.g., a mouse with a pressed button or a finger in contact with the screen. The pointer can be captured to any element. Successfully capturing the pointer retargets the event dispatching. The information gained would be limited to what you can pull from an event object while the pointer is still active. For example, you can determine the location within the page that the pointer is currently occupying, which would allow you to determine which element the pointer is over. Pointer capture will be released if another call to setPointerCapture() occurs for the same pointer ID or if the pointer becomes inactive, e.g., mouse button is released or the finger is removed from the screen.
,
Apr 27 2016
Enabling iframes to steal events targeted at a parent document certainly doesn't feel right, though I agree that the risk associated with the specific attack you are mentioning is low due to difficulty of implementing it. I'm curious, what are legitimate use cases for iframed content to use this API? It seems like it would be sound to only allow iframes to do this if their embedder explicitly allows it (via, for instance, iframe sandbox properties), as a signal that they know they are potentially handing control of all pointer events. Certainly this API empowers iframes to be a nuisance to their embedders, and there is always the potential for more nefarious applications to arise.
,
Apr 27 2016
,
Apr 27 2016
One important note to clarify is that, in the case of cross-domain iframes, the target of the event will not be a node from the other document. a.k.a. this isn't an xss issue. There are scenarios involving drag/drop and gesture detection where it could be useful to understand the input from another document. I think Scott had mentioned an example where he had built applications enabling input handoff/coordination between documents. You can imagine valid scenarios where, rather than trying to guess the pointerID, one document postMessage's the ID to another document to coordinate input handling or gesture recognition in some form.
,
Apr 27 2016
Thanks for the explanation in #8! As an alternative suggestion to the one in #9 (which also seems reasonable): What if we only allowed setPointerCapture on pointers that first became active in the calling frame?
,
Apr 28 2016
I'm not really sure this merits adding this restriction. But if we do, I would prefer #12. I think we'd introduce a concept of an "associated document" for each pointer. We would add the following step to Section 5.2.1 [1]: * If the event name is pointerdown, set the pointer's "associated document" to the Owner Document of the node returned by hit testing. Then in Section 10.1 [2], change step 2 to: * If the Element from which this method is invoked does not participate in the tree of the specified pointer's "associated document", throw an exception with the name InvalidStateError. [1] http://w3c.github.io/pointerevents/#firing-events-using-the-pointerevent-interface [2] http://w3c.github.io/pointerevents/#setting-pointer-capture
,
Apr 28 2016
I'm punting this to mkwst (and +jww), who might know if somebody in the Chrome Security OWP area has looked at this topic.
,
Apr 28 2016
So I remember now a good scenario we had for allowing this. There's a user experience we call "semantic zoom": https://channel9.msdn.com/Series/Introducing-Windows-8/Semantic-Zoom In this use case, you recognize the pinch/stretch gesture in JavaScript and use it to switch the UI between two different views. To do this, you need to be able to recognize these gestures on the page no matter where the fingers happened to land, including if one of the fingers happens to be over one document/frame and another over a different document. Right now, I'm still in favor of not restricting this because of use cases like this. That said, I think I would loosen #13 to only block if the Element is not of the same origin as the associated document of the pointer (e.g. you can still steal same origin pointers). Note that in the semantic zoom scenario, you still want this to work even in cross origin scenarios. So even this proposal would break it.
,
May 3 2016
Thanks, the semantic zoom case is interesting. It still requires co-ordination with script running in the iframe though (eg. to know when a pointerdown occurs and to get the pointer ID), right? Since it requires cross-frame co-operation, it's possible to implement without cross-frame capture at all, but relying on capture probably makes it more efficient, right? We've talked about adding back a "get all active pointers API", right? If we had that, then I'd definitely agree that scenarios like semantic zoom would be more compelling (because it would be possible to implement without any co-ordination with the iframe). Does the semantic zoom scenario ever rely on the inner iframe capturing input from the outer frame? It seems like in practice these types of scenarios are always about a container document wanting to take control of input currently destined for a sub-document (which is possible already, eg. via transparent overlay divs). While the security concern is strictly the inverse scenario. Perhaps we could get the best of both worlds here by allowing capture to be transferred only to ancestor frames of the current target? Note that GMail today embeds google input tools (https://www.google.com/inputtools/services/) which provides an on-screen keyboard / handwriting recognition, etc. I don't know of any way for 3rd-party script to run in an iframe hosted by gmail (script should be stripped out from any html e-mail and isn't allowed in gmail ads) but it's not hard to imagine some scenarios where it could happen (accidentally or as a legitimate design point).
,
May 3 2016
> Since it requires cross-frame co-operation, it's possible to implement without cross-frame capture at all, but relying on capture probably makes it more efficient, right? No, not exactly. Taking capture removes the effect of the pointer on its original hit test target. You could hack around this, perhaps, with stopImmediatePropagation(), but there's no guarantee you've avoided running all code in the original event listener path. Capture completely redirects the path and avoids side effects in the frame. >We've talked about adding back a "get all active pointers API", right? If we had that, then I'd definitely agree that scenarios like semantic zoom would be more compelling (because it would be possible to implement without any co-ordination with the iframe). I don't see that the requirement of co-ordination makes the given scenario any less compelling. If we care about this security issue (still not concerned it's high enough risk to warrant that), then a get all pointers API would presumably require some sort of cross-frame coordination or approval to work anyway in the cross-domain scenarios, which this scenario is expected to work in. > Does the semantic zoom scenario ever rely on the inner iframe capturing input from the outer frame? It seems like in practice these types of scenarios are always about a container document wanting to take control of input currently destined for a sub-document (which is possible already, eg. via transparent overlay divs). While the security concern is strictly the inverse scenario. > Perhaps we could get the best of both worlds here by allowing capture to be transferred only to ancestor frames of the current target The most common scenario for semantic zoom is that the inner frame take capture from the outer frame. For example, the List View control in Windows uses Semantic Zoom to switch been macro/micro views of the list. That list view is an inner frame embedded in a larger document (the app frame). Moreover, there is no implicit principle that the ancestor frame is more trusted. Framing victim sites is a common attack. So I don't see how such a restriction would improve security.
,
Jun 22 2016
Sorry, I missed Ken punting this to me back in April. :/ The information leakage here seems minor, and Jacob's proposal in #15 seems like a pretty complete mitigation: capture could be allowed in same-origin frames (which have complete control over their same-origin parents anyway), as well as cross-origin frames with which a user directly interacts. This would also obviate the need to special-case sandboxed frames. If that mitigation satisfies the various use cases y'all are considering, I think it's a pretty reasonable balance between power and caution. Would y'all be happy with that mitigation strategy? (If that's unacceptable, another avenue to explore would be to mitigate the attacks by making it impossible to persistently poll for events. Perhaps allow a frame some small number of "misses" where no active pointer was available, and then lock down the API to prevent capture. We'd need to persist this count through navigation of a frame and its children (similar to the way we persist sandbox attributes) to avoid trivial bypass, etc. and I'm not sure it's worth the complexity...)
,
Jun 22 2016
Ah, I missed the last sentence of #15: "Note that in the semantic zoom scenario, you still want this to work even in cross origin scenarios. So even this proposal would break it." Can you elaborate? When would a user expect a nested cross-origin frame to navigate/zoom based on their interaction with its container? I watched the video at https://channel9.msdn.com/Series/Introducing-Windows-8/Semantic-Zoom, but the examples there were all fairly integrated application environments; it's not clear to me how you want it to work on the web more widely.
,
Jun 22 2016
Thanks Mike. I think Jacob is saying that if the user pinch-zooms anywhere in the document (even outside the list view UI widget), it's the list view widget that wants to respond to that gesture. So you could imagine some UX where you'd want this even in cross-origin composition cases. But how would co-ordination work when there are multiple such UI elements in the document? Eg. imagine I have multiple iframes on my page (which don't know about eachother) which all want to respond to semantic zoom. Doesn't some code in the containing document need to either arbitrate (choose one over the other) or multiplex (zoom multiple widgets at once)? Whatever that arbitration mechanism is, it seems like it would resolve the issue here. Eg. maybe it's up to the top document to set the capture to the frame to be zoomed, and then that frame starts picking up the events? Anyway, I think we all agree we need to do something for sandboxed iframes to prevent malicious / attention grabbing frames from annoying the user. Ideally I think we'd find a mitigation that would just work everywhere (without special casing sandboxed iframes). Jacob, your "associated document" seems unnecessarily strict in that it doesn't allow a pointer to be captured after it moves into the bounds of a new frame. Rather than treat "pointerdown" specially, perhaps the associated document should just be the document of the last target of any pointer event for the pointer? I'd also be fine with relaxations in the multi-pointer case. Eg. if ANY pointer is currently targeted to a frame, then others (at least of the same pointerType) can be re-targeted there too. Maybe we can also define some way for a frame to explicitly transfer capture to another. Eg. This is a little hacky, but if WindowProxy had setPointerCapture on it, then it could make sense for the containing document to use iframeElement.contentWindow.setPointerCapture(id) to explicitly send capture down into a specific frame (instead of postMessage into it telling the frame to steal capture from it).
,
Jun 29 2016
,
Oct 13 2016
,
Nov 22 2016
,
Nov 29 2016
,
Jan 11 2017
/cc domenic for context on https://github.com/whatwg/html/issues/2259 Perhaps this doesn't need to be Restrict-View anymore? I'm convinced the severity is pretty low.
,
Jan 11 2017
There's more discussion here: https://github.com/w3c/pointerevents/issues/16. It sounds like everyone agrees that the security risk here is very low, so I'm removing the Restriction on this bug to better support public discussion. Feel free to re-add it if you disagree.
,
Jan 11 2017
Oops, yes, the restrict-view was an oversight, it should have been cleared when this was triaged to Type-Bug.
,
Jan 20 2017
,
Feb 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a7a6a91e6090c759c6699e8062a034fb41900059 commit a7a6a91e6090c759c6699e8062a034fb41900059 Author: rbyers <rbyers@chromium.org> Date: Wed Feb 08 16:22:02 2017 Add UseCounter for setPointerCapture outside dispatch Attempt to measure whether any sites might break if setPointerCapture became a no-op outside of a context that was dispatching an event for that pointer. BUG=606896 Review-Url: https://codereview.chromium.org/2635583002 Cr-Commit-Position: refs/heads/master@{#449007} [modify] https://crrev.com/a7a6a91e6090c759c6699e8062a034fb41900059/third_party/WebKit/Source/core/frame/UseCounter.h [modify] https://crrev.com/a7a6a91e6090c759c6699e8062a034fb41900059/third_party/WebKit/Source/core/input/PointerEventManager.cpp [modify] https://crrev.com/a7a6a91e6090c759c6699e8062a034fb41900059/third_party/WebKit/Source/core/input/PointerEventManager.h [modify] https://crrev.com/a7a6a91e6090c759c6699e8062a034fb41900059/tools/metrics/histograms/histograms.xml
,
Jun 2 2017
,
Nov 10 2017
,
Feb 18 2018
|
||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||
Comment 1 by rbyers@chromium.org
, Apr 26 2016