Lazily-loaded main content is not extracted by DomDistiller |
|||||||||||
Issue descriptionVersion: M55 OS: iOS 9.3.5 What steps will reproduce the problem? (0) Enable Reader mode from Google Chrome Canary's experimental settings and restart the app (1) Go to http://rue89.nouvelobs.com/2016/09/19/operation-juncker-lesprit-youtubeurs-mate-com-google-265183 (2) Select Reader mode from the menu What is the expected output? Page should be distilled What do you see instead? No data is returned Please use labels and text to provide additional information.
,
Sep 21 2016
I couldn't reproduce this on DOM distiller ToT on linux, with device emulation set to iPhone 5. Is this an iOS-specific bug?
,
Sep 21 2016
,
Sep 22 2016
Hmm. Attached screenshot from Bling.
,
Sep 22 2016
I also tried evaluating the DOM distiller JS code on Laptop Safari, and it did return expected results. Could you confirm the version of DOM distiller in your build is ToT?
,
Sep 23 2016
Thanks for looking into this. I don't know which version of DOM distiller we have currently in Bling. lod@ or noyau@ would know.
,
Sep 27 2016
We use the same version as the one in Chromium for the relevant channel as we use the same DEPS. Right now in Tot (aka, Canary):
'src/third_party/dom_distiller_js/dist':
Var('chromium_git') + '/external/github.com/chromium/dom-distiller-dist.git' + '@' + 'a018e245289d10291c59ebf827bd748bfeb93343',
,
Sep 27 2016
Repro locally on iPhone simulator. What DOM distiller sees on iPhone is the DOM state in the loading stage (where you see the Newton's cradle animation), not the steady state. The only text DOM distiller sees is "Je récupère votre article, un peu de patience...", so as a result, nothing is extracted. On Android and desktop, the original WebContents is used as the input for DOM distiller. Does it work similarly on iOS as well? If a new WebContents is fed to DOM distiller, waiting for onload() event works for most pages, but probably not this one. That site heavily rely on JS code. When JS is disabled, there's no graceful fallback.
,
Oct 24 2016
Hey, noyau@, do you know whether it is possible to use the DOM in the original page as the input of DOM distiller on iOS? If not, this bug might be tricky to solve.
,
Oct 26 2016
noyau@ is OOO. Maybe lod@ or olivierrobin@ could weigh in on wychen@'s question in #9. Thanks.
,
Oct 26 2016
We often distill the page without even loading it (e.g. user added a URL to reading list from another application) so we cannot reuse the DOM that is displayed. IIUC the code, we create a background WKWebView that will create the DOM, the DOM distiller javascript is injected in the page. It seems we rely on the WebState PageLoaded event to see when the page is loaded. This may be too early (specially if some javascript is used to build the page).
,
Dec 1 2016
,
Dec 20 2016
Any updates on this one?
,
Dec 21 2016
I'd lower the priority for supporting web pages that do not have graceful fallback without JS. PageLoaded is the most obvious event to inject distiller JS, but if the main content is not there at that time, it's hard to detect when exactly the content is loaded. We support lazy-loading to some extent, like for images. However, I feel it's gone too far when the content is also loaded by the JS. This issue is more about when to inject DOM distiller JS code, and less about DOM distiller itself now. Could we reuse the DOM when possible? For example, if the page is already shown when we select the Reader Mode, it might be possible to skip creating the background WebView. This would retain the DOM state and also reduce data usage.
,
Dec 21 2016
Hi We don't always have a web view when distilling the page. We can add items to the reading list that we never loaded (from a long press on a link for example). It is possible to not support these pages, but in this case, there should be a correct error and not a page displaying "No Data Found".
,
Dec 21 2016
I understand we don't always have a web view available, hence the "when possible". It might be too much trouble to introduce another path for this kind of pages. When the distilled content is an empty string, showing "No data found." is the error message. Could you describe more about the "correct error" you have in mind? We can see how we can make the UI more acceptable in this case.
,
Dec 21 2016
We consider reusing the WKWebView when available (this would actually solve crbug.com/675274) but this will not happen on V1. Returning a page containing "No data found." is not an error. The initial page could show "No data found." There should be a flag in the DistilledArticleProto or in the DistillationFinishedCallback that reports the error. Or at least, the returned HTML should be completely empty and not contain a placeholder string.
,
Dec 21 2016
DistilledArticleProto has a field .html, and when it's an empty string, it can be regarded as the error state. To avoid confusing the users, the error message is shown instead of a empty page. "No data found." is injected here: https://cs.chromium.org/chromium/src/components/dom_distiller/core/viewer.cc?sq=package:chromium&dr=C&rcl=1482293976&l=96
,
Dec 21 2016
,
Dec 21 2016
Is there a way to not inject this? Or to add a flag if you inject this?
,
Dec 21 2016
Adding a 2seconds delay between loading and distilling seems to help a lot on this. Mardini, this would make every distilling attempt 2 seconds longer. Is this acceptable?
,
Dec 21 2016
As long as we don't have direct access to a "reader mode" that is triggered explicitly by the user, then yes, this is acceptable. FWIW, we had distillation not working on Washington Post and this was fixed by introducing this delay.
,
Dec 22 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4af6f965f87724fcfc7b7e2b900522edf57a5212 commit 4af6f965f87724fcfc7b7e2b900522edf57a5212 Author: olivierrobin <olivierrobin@chromium.org> Date: Thu Dec 22 12:28:35 2016 Report empty article as empty. Dom Distiller adds "No data found" to the HTML if the page is empty. This prevents RL to handle correctly the error. BUG= 648923 Review-Url: https://codereview.chromium.org/2594663005 Cr-Commit-Position: refs/heads/master@{#440391} [modify] https://crrev.com/4af6f965f87724fcfc7b7e2b900522edf57a5212/ios/chrome/browser/dom_distiller/distiller_viewer.cc
,
Dec 22 2016
,
Jan 2 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/31daab21ab182a3272daec45c77626c1628bd4d3 commit 31daab21ab182a3272daec45c77626c1628bd4d3 Author: olivierrobin <olivierrobin@chromium.org> Date: Mon Jan 02 12:32:36 2017 Create distiller files for Reading List. DomDistillerService is used for ReadingList and ReaderMode. The requirements for both feature are different. As DomDistillerService main purpose (cross-platform) is Reader Mode, this CL creates files for the Reading List distillation. These files ensure that: - WebState used for distillation survives at least 10 seconds after the end of distillation to allow loading favicon - There is a 2 seconds delay between end of page load and start of distillation. - Reading List files are in Reading List components. These 2 requirements are irrelevant for Reader Mode as the page is already loaded and the favicon is not used. This CL also restore the behavior of Reader Mode before cl/2529283002. BUG= 648923 Review-Url: https://codereview.chromium.org/2604773002 Cr-Commit-Position: refs/heads/master@{#441060} [modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/components/dom_distiller/ios/BUILD.gn [modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/components/dom_distiller/ios/distiller_page_factory_ios.h [modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/components/dom_distiller/ios/distiller_page_factory_ios.mm [modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/components/dom_distiller/ios/distiller_page_ios.h [modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/components/dom_distiller/ios/distiller_page_ios.mm [delete] https://crrev.com/bfde2fdb4910e7628b949bc39afb3340d249ddd7/components/dom_distiller/ios/favicon_web_state_dispatcher.h [modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/components/reading_list/ios/BUILD.gn [add] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/components/reading_list/ios/favicon_web_state_dispatcher.h [modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/dom_distiller/BUILD.gn [modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/dom_distiller/distiller_viewer.cc [modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/dom_distiller/distiller_viewer.h [modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/dom_distiller/dom_distiller_service_factory.cc [modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/BUILD.gn [rename] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/favicon_web_state_dispatcher_impl.h [rename] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/favicon_web_state_dispatcher_impl.mm [rename] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/favicon_web_state_dispatcher_impl_unittest.mm [add] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/reading_list_distiller_page.h [add] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/reading_list_distiller_page.mm [add] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/reading_list_distiller_page_factory.h [add] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/reading_list_distiller_page_factory.mm [modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/reading_list_download_service.cc [modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/reading_list_download_service.h [modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/reading_list_download_service_factory.cc [modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/url_downloader.cc [modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/url_downloader.h [modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/url_downloader_unittest.mm [modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/ui/reader_mode/reader_mode_controller.mm [modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/test/BUILD.gn
,
Jan 5 2017
There is a work around for Chrome iOS. Letting the bug open if DOM distiller wants to add a solution.
,
Jan 12 2017
If the article is not in the input DOM, there's really nothing DOM distiller can do to extract it. Closing it.
,
Jan 24 2017
Verified in 58.0.2991.0 dev, iPhone 6 iOS 9.3.5 Tested the url from comment #0. URL is getting distilled and opening correct when device is offline. Looks good. |
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by justincohen@chromium.org
, Sep 21 2016Status: Assigned (was: Untriaged)