src of iframe is not restored when opening MHTML |
||||||
Issue description
Version: M55
What steps will reproduce the problem?
(1) Save any page with iframe in MHTML (use clank with offline feature, or command line option --save-page-as-mhtml is needed on desktop)
(2) Reopen the saved MHTML file
(3) Inspect the src attribute of the <iframe> element.
What is the expected output?
See the original URL in the src attribute.
What do you see instead?
The cid: URI is displayed instead. Using JS console leads to the same result.
> document.getElementsByTagName('iframe')[0].src
Note that other resources like <img> work as expected. The following statement prints the original URL.
> document.getElementsByTagName('img')[0].src
This is needed because DOM distiller scans the .src of iframes to detect interesting embeds.
,
Sep 15 2016
DOM distiller is the backend of Reader Mode, and it scans the DOM tree to find the main article and relevant embeds, like youtube video, or twitter posts. For normal web pages, youtube embed can be detected by looking at the .src of iframes. Since .src is not preserved for MHTML snapshots, this no longer works. Preserving the original state as much as possible in MHTML is generally a goal worth achieving, but I can see how this became difficult with iframes. If there are no duplicated URLs in these iframes, URIs can still be used as the unique identifier. If the iframe doesn't navigate away, the original src can still be used. Similarly, if OOPIF is not enabled, we can still use the original URL. However, these conditions would certainly make our logic more complicated, with multiple paths to test, and might not worth it in the end if users couldn't tell the difference. I tried one possible workaround for Reader Mode: special casing cid: URIs, and peeking inside its contentDocument. However, this didn't work because all frames are considered cross-origin frames in MHTML. I guess for security reasons this restriction is hard to remove. Another possibility is to preserve the original src in a new attribute like data-original-src, but this feels hacky and non-standard.
,
Sep 16 2016
RE: if OOPIF is not enabled OOPIFs are always enabled in M55 since r414879 (+nasko@). OOPIFs are in the process of launching in M54 (currently in a 50-50 finch trial for the Beta channel). RE: If the iframe doesn't navigate away, the original src can still be used. It is not always possible to see (from inside a renderer process) if a subframe/iframe has navigated away (e.g. if it is a cross-site iframe - i.e. an iframe that can be potentially put into an OOPIF). ---- I wonder in what scenario DOM distiller needs to digest HTML documents embedded in MHTML files. I am guessing (?) that on Android (AFAIK the only platform where DOM distiller works) the majority of MHTML files are generated by Offline Pages feature. If an MHTML file is generated by Offline Pages feature, then wouldn't it already be "distilled" before getting serialized into MHTML (and therefore the contents of MHTML file wouldn't need to be "distlled" again)? ---- At the end of the day, if DOM distiller needs to make decisions based on information coming from cross-site frames, then (IMO) it might not be possible to implement DOM distiller entirely within a single renderer process - i.e. it might be necessary to have the browser process stitch together pieces of information from multiple frames / renderers (potentially belonging to different origins). I see that OOPIFs dependencies diagram lists DOM Distiller in the dependencies of the --top-document-isolation launch (AFAIK the target for this launch is Q4 2016 or Q1 2017). OTOH, I don't see a link to a bug or a contact person :-( The dependencies diagram can be found at http://csreis.github.io/oop-iframe-dependencies/ (+creis@).
,
Sep 16 2016
RE: OOPIFs are always enabled in M55. Maybe this came out to strongly. "OOPIFs are always enabled in M55" is technically correct (the best kind of "correct"? :-), but right now OOPIFs are only used to isolate Chrome extensions from web content (i.e. we are making --isolate-extensions mode the default in M55). This means that OOPIFs do not currently exist in default Chrome on Android (because there are no Chrome extensions on Android). Isolating web pages from each other will happen at a later milestone (and at that point OOPIFs will be possible on Android; obviously before launching OOPIFs more broadly we need to collectively ensure that more web features are compatible with OOPIFs).
,
Sep 16 2016
DOM distiller is implemented in Java and compiled to JavaScript. As long as OOPIF preserves the behavior seen by user scripts, enabling it should be transparent to DOM distiller. Since OOPIF shouldn't break user scripts, I'm not too worried about it breaking DOM distiller. I manually tested this combination a while ago, and it went well. DOM distiller rarely needs to look inside an iframe. For most embeds, looking at the src attribute is enough, like Youtube and Vimeo. We do need to peek inside expanded Twitter embeds to get the twitter ID, since it doesn't set src, and used like a poor man's shadow DOM. I think it is considered same-origin. Twitter is moving to the real shadow DOM though (issue 602178), so we won't need to access anything inside an iframe after it's fully rolled out. RE: distilling MHTML I remember seeing distillation discussions in Offline Pages feature on Clank, but it seems it's not implemented the way you described. If you save a page offline, it is archived as is. I guess this is more in line with users' expectation. After all, what if the user want the original page if we only save the distilled version? If the page is distillable, the Reader Mode prompt would show up if you open that MHTML, so the user gets to view both formats. I think this is the best of the two worlds, except the Reader Mode infobar could be annoying at times, but this is the UI we've got. --- Given the way DOM distiller recognize embeds, retaining iframe.src is crucial. I understand serializing to MHTML is unavoidably a lossy transform. We strip out <script> and <noscript> tags, etc, and iframe.src is one of them. In our architecture, keeping .src is a lot of work, but keeping it elsewhere is hacky. The occurrence of this bug is not that frequent, so if we don't have a good solution, it's probably fine.
,
Sep 17 2016
Please feel free to send any Offline Pages questions to me. We are planning to revamp MHTML (maybe even call it a different name) in the months ahead, due to various reasons. If DOM Distiller has ideas of how it can be changed to make it more consumable we are ready to listen. It is definitely possible to preserve the original URL, in some form. The issue with OOPIF described here is only because of current implementation, which obviously can be changed (by making the parent frame or the browser process to output the Content-Location header of the frame for example). Offline Pages indeed do not distill content before saving. It might change in the future though, we indeed discussed that and combined this can provide excellent feature. Please feel free to meet face-to-face and chat about possible plans here. I'm changing this bug to be a Feature Request.
,
Sep 19 2016
dimich@, for DOM Distiller compatibility, it is best that from JavaScript's perspective, the archived DOM is indistinguishable from the live one. I understand this is not always easy to achieve 100%. The issues with our current MHTML implementation so far are: - this iframe.src issue - all iframes are treated as cross-origin. In order to make Offline Pages work better with DOM distiller, even if the archived file is the original one, it still makes sense to do a "dry-run" and cache all the detected resources. Some web pages uses lazy-loading techniques so that not all the resources are visible for DOM serializer. DOM distiller has some logic specifically for lazy-loading images, and this might help when the user decides to distill the offlined page when the device is indeed offline. Bling does this for the Reading List feature.
,
Sep 21 2016
,
Sep 23 2016
,
Jan 26 2017
I'm closing it as won't fix since we don't have a plan yet for this. If we ever get to a design doc with specific use cases, we'll open tracking issues as appropriate. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by lukasza@chromium.org
, Sep 15 2016