New issue
Advanced search Search tips

Issue 676265 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: iOS
Pri: 2
Type: Bug



Sign in to add a comment

Allow pagination_algo = "none" in DOM distiller

Project Member Reported by olivierrobin@chromium.org, Dec 21 2016

Issue description

We don't support multipage in iOS RL and the RegExps to find the next page use English words.
If a link is next, we will distill the second page and just throw it.

Allow setting pagination_algo to null (or default to none instead of "next" if the option is not used) to avoid a costly regex processing.
 

Comment 1 by wychen@chromium.org, Feb 24 2017

The next page detecting algorithm can be inaccurate at times, especially on non-English pages. However, we tuned it to bias toward false negative (not returning one when there's a next page link) than false positive (returning a bad next page link). In this case, would disabling next page still be better than the current heuristics?
Independently of the solution we will use for iOS, I think an option to disable pagination would be really useful (specially as pagination involve regexps that can be heavy).
For iOS, I am a little reluctant to distill pages that where not added by the user.
Are there restrictions on pages that can be considered as page 2 (same origin as page 1?)

Comment 3 by wychen@chromium.org, Feb 27 2017

I agree being able to skip pagination detection could be useful if the result is not used.

I'd say there are few false positives in our our next page detection, given that the page contains an article. They need to be in the same origin as page 1 indeed. Our algorithm does rely on language-dependent hints, but also some language-neutral ones like numeric patterns. In many cases, even if the page is intended for non-English readers, the HTML id and class names are still in English, so the language-dependent hints are applicable more often than we'd think.

Comment 4 by wychen@chromium.org, Feb 27 2017

The false positive rate could be high if the page doesn't contain an article. 

This symptom was mostly suppressed by this CL "Stop fetching the next page if the first page has no content":
https://codereview.chromium.org/1891103002

I am hitting a DCHECK when there is a pagination. For example while distilling: https://ar.m.wikipedia.org/wiki/%D8%A5%D8%B3%D8%AD%D8%A7%D9%82_%D9%86%D9%8A%D9%88%D8%AA%D9%86
Thanks for reporting this next page bug!
Note that the DCHECK is a check on mime type, but does not stop or alter the distillation.

Comment 8 by wychen@chromium.org, Mar 16 2017

Labels: Hotlist-GoodFirstBug
Allow pagination detection to be skipped should be fairly easy, so labeling this Good First Bug.

Is the usefulness of page stitching in iOS Reading List still unclear?
I personally think it's pretty useful to distill and save the whole article if it was divided over multiple pages and it would help us achieve feature-parity with other browsers who support this functionality.
The false positive next page link in #c5 is separated here:
https://bugs.chromium.org/p/chromium/issues/detail?id=702424

Sign in to add a comment