New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 648923 link

Starred by 3 users

Issue metadata

Status: Verified
Owner:
Closed: Jan 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: iOS
Pri: 2
Type: Bug



Sign in to add a comment

Lazily-loaded main content is not extracted by DomDistiller

Project Member Reported by mard...@chromium.org, Sep 21 2016

Issue description

Version: M55
OS: iOS 9.3.5

What steps will reproduce the problem?
(0) Enable Reader mode from Google Chrome Canary's experimental settings and restart the app
(1) Go to http://rue89.nouvelobs.com/2016/09/19/operation-juncker-lesprit-youtubeurs-mate-com-google-265183
(2) Select Reader mode from the menu

What is the expected output?
Page should be distilled

What do you see instead?
No data is returned


Please use labels and text to provide additional information.

 
Owner: lod@chromium.org
Status: Assigned (was: Untriaged)

Comment 2 by wychen@chromium.org, Sep 21 2016

I couldn't reproduce this on DOM distiller ToT on linux, with device emulation set to iPhone 5. Is this an iOS-specific bug?

Comment 3 by wychen@chromium.org, Sep 21 2016

Cc: -wychen@chromium.org mdjones@chromium.org lod@chromium.org
Owner: wychen@chromium.org
Hmm. Attached screenshot from Bling.
image.png
34.4 KB View Download

Comment 5 by wychen@chromium.org, Sep 22 2016

I also tried evaluating the DOM distiller JS code on Laptop Safari, and it did return expected results. Could you confirm the version of DOM distiller in your build is ToT?
Thanks for looking into this.

I don't know which version of DOM distiller we have currently in Bling. lod@ or noyau@ would know.

Comment 7 by noyau@chromium.org, Sep 27 2016

We use the same version as the one in Chromium for the relevant channel as we use the same DEPS. Right now in Tot (aka, Canary):

  'src/third_party/dom_distiller_js/dist':
    Var('chromium_git') + '/external/github.com/chromium/dom-distiller-dist.git' + '@' + 'a018e245289d10291c59ebf827bd748bfeb93343',

Comment 8 by wychen@chromium.org, Sep 27 2016

Cc: k...@chromium.org
Repro locally on iPhone simulator.

What DOM distiller sees on iPhone is the DOM state in the loading stage (where you see the Newton's cradle animation), not the steady state. The only text DOM distiller sees is "Je récupère votre article, un peu de patience...", so as a result, nothing is extracted.

On Android and desktop, the original WebContents is used as the input for DOM distiller. Does it work similarly on iOS as well? If a new WebContents is fed to DOM distiller, waiting for onload() event works for most pages, but probably not this one. That site heavily rely on JS code. When JS is disabled, there's no graceful fallback.

Comment 9 by wychen@chromium.org, Oct 24 2016

Hey, noyau@, do you know whether it is possible to use the DOM in the original page as the input of DOM distiller on iOS? If not, this bug might be tricky to solve.
Cc: olivierrobin@chromium.org
noyau@ is OOO. Maybe lod@ or olivierrobin@ could weigh in on wychen@'s question in #9.

Thanks.
We often distill the page without even loading it (e.g. user added a URL to reading list from another application) so we cannot reuse the DOM that is displayed.

IIUC the code, we create a background WKWebView that will create the DOM, the DOM distiller javascript is injected in the page.

It seems we rely on the WebState PageLoaded event to see when the page is loaded. This may be too early (specially if some javascript is used to build the page).


Comment 12 by jif@chromium.org, Dec 1 2016

Cc: -lod@chromium.org jif@chromium.org
Labels: Hotlist-ReadingList
Any updates on this one?
Labels: -Pri-1 Pri-2
I'd lower the priority for supporting web pages that do not have graceful fallback without JS. PageLoaded is the most obvious event to inject distiller JS, but if the main content is not there at that time, it's hard to detect when exactly the content is loaded. We support lazy-loading to some extent, like for images. However, I feel it's gone too far when the content is also loaded by the JS.

This issue is more about when to inject DOM distiller JS code, and less about DOM distiller itself now. Could we reuse the DOM when possible? For example, if the page is already shown when we select the Reader Mode, it might be possible to skip creating the background WebView. This would retain the DOM state and also reduce data usage.
Hi

We don't always have a web view when distilling the page.
We can add items to the reading list that we never loaded (from a long press on a link for example).
It is possible to not support these pages, but in this case, there should be a correct error and not a page displaying "No Data Found".
I understand we don't always have a web view available, hence the "when possible". It might be too much trouble to introduce another path for this kind of pages.

When the distilled content is an empty string, showing "No data found." is the error message. Could you describe more about the "correct error" you have in mind? We can see how we can make the UI more acceptable in this case.
We consider reusing the WKWebView when available (this would actually solve crbug.com/675274) but this will not happen on V1.

Returning a page containing "No data found." is not an error. The initial page could show "No data found."
There should be a flag in the DistilledArticleProto or in the DistillationFinishedCallback that reports the error.
Or at least, the returned HTML should be completely empty and not contain a placeholder string.
DistilledArticleProto has a field .html, and when it's an empty string, it can be regarded as the error state.

To avoid confusing the users, the error message is shown instead of a empty page. "No data found." is injected here:
https://cs.chromium.org/chromium/src/components/dom_distiller/core/viewer.cc?sq=package:chromium&dr=C&rcl=1482293976&l=96
Summary: Lazily-loaded main content is not extracted by DomDistiller (was: DomDistiller doesn't return any distilled content on some pages)
Is there a way to not inject this?
Or to add a flag if you inject this?
Cc: mard...@chromium.org
Adding a 2seconds delay between loading and distilling seems to help a lot on this.
Mardini, this would make every distilling attempt 2 seconds longer.
Is this acceptable?
As long as we don't have direct access to a "reader mode" that is triggered explicitly by the user, then yes, this is acceptable. 

FWIW, we had distillation not working on Washington Post and this was fixed by introducing this delay. 
Project Member

Comment 23 by bugdroid1@chromium.org, Dec 22 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4af6f965f87724fcfc7b7e2b900522edf57a5212

commit 4af6f965f87724fcfc7b7e2b900522edf57a5212
Author: olivierrobin <olivierrobin@chromium.org>
Date: Thu Dec 22 12:28:35 2016

Report empty article as empty.

Dom Distiller adds "No data found" to the HTML if the page is empty.
This prevents RL to handle correctly the error.

BUG= 648923 

Review-Url: https://codereview.chromium.org/2594663005
Cr-Commit-Position: refs/heads/master@{#440391}

[modify] https://crrev.com/4af6f965f87724fcfc7b7e2b900522edf57a5212/ios/chrome/browser/dom_distiller/distiller_viewer.cc

Labels: M-57
Project Member

Comment 25 by bugdroid1@chromium.org, Jan 2 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/31daab21ab182a3272daec45c77626c1628bd4d3

commit 31daab21ab182a3272daec45c77626c1628bd4d3
Author: olivierrobin <olivierrobin@chromium.org>
Date: Mon Jan 02 12:32:36 2017

Create distiller files for Reading List.

DomDistillerService is used for ReadingList and ReaderMode.
The requirements for both feature are different.
As DomDistillerService main purpose (cross-platform) is Reader Mode,
this CL creates files for the Reading List distillation.
These files ensure that:
- WebState used for distillation survives at least 10 seconds after the
  end of distillation to allow loading favicon
- There is a 2 seconds delay between end of page load and start of
  distillation.
- Reading List files are in Reading List components.

These 2 requirements are irrelevant for Reader Mode as the page is
already loaded and the favicon is not used.
This CL also restore the behavior of Reader Mode before cl/2529283002.

BUG= 648923 

Review-Url: https://codereview.chromium.org/2604773002
Cr-Commit-Position: refs/heads/master@{#441060}

[modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/components/dom_distiller/ios/BUILD.gn
[modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/components/dom_distiller/ios/distiller_page_factory_ios.h
[modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/components/dom_distiller/ios/distiller_page_factory_ios.mm
[modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/components/dom_distiller/ios/distiller_page_ios.h
[modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/components/dom_distiller/ios/distiller_page_ios.mm
[delete] https://crrev.com/bfde2fdb4910e7628b949bc39afb3340d249ddd7/components/dom_distiller/ios/favicon_web_state_dispatcher.h
[modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/components/reading_list/ios/BUILD.gn
[add] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/components/reading_list/ios/favicon_web_state_dispatcher.h
[modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/dom_distiller/BUILD.gn
[modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/dom_distiller/distiller_viewer.cc
[modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/dom_distiller/distiller_viewer.h
[modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/dom_distiller/dom_distiller_service_factory.cc
[modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/BUILD.gn
[rename] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/favicon_web_state_dispatcher_impl.h
[rename] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/favicon_web_state_dispatcher_impl.mm
[rename] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/favicon_web_state_dispatcher_impl_unittest.mm
[add] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/reading_list_distiller_page.h
[add] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/reading_list_distiller_page.mm
[add] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/reading_list_distiller_page_factory.h
[add] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/reading_list_distiller_page_factory.mm
[modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/reading_list_download_service.cc
[modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/reading_list_download_service.h
[modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/reading_list_download_service_factory.cc
[modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/url_downloader.cc
[modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/url_downloader.h
[modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/reading_list/url_downloader_unittest.mm
[modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/browser/ui/reader_mode/reader_mode_controller.mm
[modify] https://crrev.com/31daab21ab182a3272daec45c77626c1628bd4d3/ios/chrome/test/BUILD.gn

There is a work around for Chrome iOS.
Letting the bug open if DOM distiller wants to add a solution.
Status: Fixed (was: Assigned)
If the article is not in the input DOM, there's really nothing DOM distiller can do to extract it. Closing it.
Status: Verified (was: Fixed)
Verified in 58.0.2991.0 dev,  iPhone 6 iOS 9.3.5

Tested the url from comment #0. URL is getting distilled and opening correct when device is offline. Looks good.

Sign in to add a comment