New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 838242 link

Starred by 3 users

Issue metadata

Status: WontFix
Owner:
Closed: Jun 2018
Cc:
Components:
EstimatedDays: ----
NextAction: 2018-07-02
OS: Linux , Windows , Mac
Pri: 2
Type: Bug



Sign in to add a comment

chrome.webRequest: resources reported with missing frameIds

Reported by amiag...@gmail.com, Apr 30 2018

Issue description

UserAgent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36

Steps to reproduce the problem:
1. Create an extension that listens for all requests using chrome.webRequest.onBeforeRequest.
2. Keep a mapping of tab IDs to frame IDs to frame URLs.
3. Using your mapping, try to associate a frame URL with every request.

What is the expected behavior?
You are able to supply context (document/frame URLs for every request).

What went wrong?
Some requests arrive with a frame ID that did not go through chrome.webRequest.onBeforeRequest (and therefore you do not know about).

Did this work before? N/A 

Does this work in other browsers? N/A

Chrome version: 66.0.3359.139  Channel: n/a
OS Version: 
Flash Version: 

You can see the problem by loading the attached demo extension and watching for MISSING FRAME DATA messages printed in the background page.

This problem seems to happen with and without cache disabled in Dev Tools.

This bug is related to https://github.com/EFForg/privacybadger/issues/1997. I am trying to find a workaround for the attribution problem by comparing initiator URLs to frame URLs, but I found that some requests arrive with unknown-to-me frame IDs, which means I can't verify whether I assigned the correct parent document to those resources.
 
webext_webRequest_docUrl_attribution_demo.zip
1.2 KB Download

Comment 1 by amiag...@gmail.com, Apr 30 2018

This is related to Issue 665843 in the sense that we run into the same larger problem, that of correctly and synchronously attributing every request to the tab URL it came from. We get tab ID already, but maintaining an accurate mapping of IDs to URLs is tricky as you can see in Issue 665843 and https://github.com/EFForg/privacybadger/issues/1997.
Labels: Needs-Triage-M66
Labels: Needs-Feedback Triaged-ET
Tested the issue on chrome reported version 66.0.3359.139 using Ubuntu 14.04 with steps mentioned below:
1) Launched chrome reported version and installed the extension provided in comment#0
2) On chrome://extensions page for the extension installed, clicked on background page link
3) Developer tools got opened, didn't observed data on Console

@Reporter: Please find the attached screencast for your reference and let us know if we missed anything in verifying the issue, if possible could you please provide the screencast of the issue which helps us in better understanding, any further inputs will be most helpful.

Thanks!
838242.ogv
2.1 MB View Download

Comment 4 by amiag...@gmail.com, May 3 2018

You have to visit a page where lots of frames get loaded, such as nytimes.com; sorry if that wasn't clear.
Project Member

Comment 5 by sheriffbot@chromium.org, May 3 2018

Cc: viswa.karala@chromium.org
Labels: -Needs-Feedback
Thank you for providing more feedback. Adding the requester to the cc list.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Components: Platform>Extensions>API
Cc: sindhu.chelamcherla@chromium.org
Labels: M-68 FoundIn-68 Target-68 OS-Mac OS-Windows
Status: Untriaged (was: Unconfirmed)
Able to reproduce this issue on reported version 66.0.3359.139 and latest canary 68.0.3429.0 using Mac 10.13.3, windows 10 and Ubuntu 17.10. i.e: Observing "MISSING FRAME DATA FOR tabId=16 frameId=0, docUrls=Object, details= ..." in background page console.

This issue is seen from M-60. Hence considering this issue as Non-Regression and marking as Untriaged.

Thanks!
Owner: karandeepb@chromium.org
Status: Assigned (was: Untriaged)

Comment 9 by amiag...@gmail.com, Jun 7 2018

Please let me know if I can help with anything.

Comment 10 by amiag...@gmail.com, Jun 22 2018

Could somebody from the extensions team who knows the chrome.webRequest API please take a look at this?

Privacy Badger is affected more than other extensions by bugs in the webRequest API as Privacy Badger makes blocking decisions based on request attribution (instead of manually composed lists of URL patterns). If Privacy Badger can't correctly attribute a request to the top-level document URL that originated it, Privacy Badger will block and/or allow resources it shouldn't have.
Have you taken into account that not all requests will have a frame id by design? Not every web request comes from a render frame. E.g. requests made by the browser and those made on behalf of service workers. 

Can you describe the issue you are facing in more detail?

Comment 12 by amiag...@gmail.com, Jun 22 2018

If we stick to nytimes.com, what I see is requests that belong to (advertising-related) frames reported with valid (>0) frame IDs, but these IDs reference frames that never went through my webRequest listener. You can see this for yourself by loading the demo extension attached to this issue and visiting nytimes.com.
Labels: Needs-Feedback
NextAction: 2018-07-02
So again this seems WAI to me. Consider a page with the following html:

<html>
 <iframe srcdoc="<img src='https://www.w3schools.com/html/pic_trulli.jpg'/>"></iframe>
</body>
</html>

If you run your extension on this page, it will print "MISSING FRAME DATA". 
Basically, it's possible for a frame to generate network requests while there being no requests for the frame itself. 

So I am not sure what the issue is amiagkov@?

Comment 14 by amiag...@gmail.com, Jun 26 2018

I'm just trying to figure out how to accurately assign every request to the precise tab URL it came from. This is harder than it seems; going off of tab ID is not enough as the document may have changed in the meantime.

This (correctly attributing requests to top-level documents) is clearly an issue that affects many Chrome extensions. If you visit a resource-rich page and then navigate away from the page while it's still loading, your privacy/ad blocking extension is likely to mis-report resources belonging to the previous site on the site you just navigated to. I could reproduce this problem with Ghostery, uBlock Origin, etc.

In Privacy Badger's case, this common problem is not just a visual nit. Since Privacy Badger learns from browsing, incorrect attribution can lead to incorrect blocking decisions.

Comment 15 by amiag...@gmail.com, Jun 26 2018

Thanks for pointing out that request frame IDs can point to inline frames. This wasn't obvious to me and may be worth noting in the docs.

Comment 16 by amiag...@gmail.com, Jun 26 2018

Can I rely on the initiator property of the request details object? https://developer.chrome.com/extensions/webRequest states:

>The origin where the request was initiated. This does not change through redirects. If this is an opaque origin, the string 'null' will be used.

Are these always tab (top-level document) URLs? What are "opaque origins"?
Status: WontFix (was: Assigned)
>> I'm just trying to figure out how to accurately assign every request to the precise tab URL it came from. This is harder than it seems; going off of tab ID is not enough as the document may have changed in the meantime.

And it's also really hard within Chromium currently. The last I looked into it, some refactoring was needed to support this. 

>> This (correctly attributing requests to top-level documents) is clearly an issue that affects many Chrome extensions. If you visit a resource-rich page and then navigate away from the page while it's still loading, your privacy/ad blocking extension is likely to mis-report resources belonging to the previous site on the site you just navigated to. I could reproduce this problem with Ghostery, uBlock Origin, etc.

Yeah I think most of such extensions rely on some combination of webRequest + webNavigation API to support their use cases. But agreed, it would be better if we could just send the top level document url.

>> Thanks for pointing out that request frame IDs can point to inline frames. This wasn't obvious to me and may be worth noting in the docs.
I think it's implied. The web request API will only notify users of actual network requests.

>> Are these always tab (top-level document) URLs? What are "opaque origins"?
No, this the origin of the requesting frame. If an iframe to xyz.com makes a request to abc.com, then the initiator for the request would be xyz.com.

>> Can I rely on the initiator property of the request details object?

Not all requests are made by a render frame. So this may be 'null'. There may be other cases I am missing here.

Will close this for now. Feel free to open a feature request for top level document url and we can track it there. 

Comment 18 by amiag...@gmail.com, Jun 26 2018

>I think it's implied. The web request API will only notify users of actual network requests.

Well, yes, but when one looks at documentation and sees frameId on the details object, one might expect to be able to construct a hierarchy of frames using this information, but that's a gotcha, you can't, as frames might be inline in the HTML. I don't mean to argue but based on my experience this is unexpected and not user-friendly. I can see how an extension may want to keep track of frames for other purposes, not just in an attempt to compensate for not being provided tab URLs. But yes, this is getting off track, thank you for your help, I'll open a new issue.

Comment 19 by amiag...@gmail.com, Jun 26 2018

Filed issue 856766.
The NextAction date has arrived: 2018-07-02

Sign in to add a comment