New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 702424 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 3
Type: Bug



Sign in to add a comment

False positive of next page detection in DOM distiller due to URL encoding

Project Member Reported by wychen@chromium.org, Mar 16 2017

Issue description

Chrome Version: M59

There should be no "next page" for https://ar.m.wikipedia.org/wiki/%D8%A5%D8%B3%D8%AD%D8%A7%D9%82_%D9%86%D9%8A%D9%88%D8%AA%D9%86

However, a URL to an image is returned.
 

Comment 1 by wychen@chromium.org, Mar 16 2017

The false matching is due to URL encoding.

The verbose log is:

18)https://ar.m.wikipedia.org/wiki/%D9%85%D9%84%D9%81:GodfreyKneller-IsaacNewton-1689.jpg, txt=[], dbg=[-> https://ar.m.wikipedia.org/wiki/%D9%85%D9%84%D9%81:GodfreyKneller-IsaacNewton-1689.jpg; txt+class+id= image view-border-box ; score=25: posParent -  mw-mf-page-center; remains: 8%A5%D8%B3%D8%AD%D8%A7%D9%82_%D9%86%D9%8A%D9%88%D8%AA%D9%86, 9%85%D9%84%D9%81:GodfreyKneller-IsaacNewton-1689.jpg; remains: 8, 9; score=50: diff = 1; found: score=50, txt=[], https://ar.m.wikipedia.org/wiki/%D9%85%D9%84%D9%81:GodfreyKneller-IsaacNewton-1689.jpg]

This can be fixed by decoding the URL to Unicode first.

Comment 2 by wychen@chromium.org, Mar 16 2017

Summary: False positive of next page detection in DOM distiller due to URL encoding (was: False positive of next page detection in DOM distiller)
Project Member

Comment 3 by bugdroid1@chromium.org, Oct 15 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/dom-distiller/+/0bde3157d739fc184452db5f33519b719e87145f

commit 0bde3157d739fc184452db5f33519b719e87145f
Author: Wei-Yin Chen (陳威尹) <wychen@chromium.org>
Date: Fri Oct 13 23:02:58 2017

Fix false positive of next page detection

When calculating the page number difference, the URLs were encoded,
so finding common prefix char-by-char can cut in the middle of
the escape sequence.

Now the URLs are URL decoded first.

Bug:  702424 
Change-Id: I0e4b8d1f7c45dabe62d72deb357156d50cd1e2a9
Reviewed-on: https://chromium-review.googlesource.com/719700
Reviewed-by: Matthew Jones <mdjones@chromium.org>

[modify] https://crrev.com/0bde3157d739fc184452db5f33519b719e87145f/javatests/org/chromium/distiller/PagingLinksFinderTest.java
[modify] https://crrev.com/0bde3157d739fc184452db5f33519b719e87145f/java/org/chromium/distiller/StringUtil.java
[modify] https://crrev.com/0bde3157d739fc184452db5f33519b719e87145f/java/org/chromium/distiller/PagingLinksFinder.java

Comment 4 by wychen@chromium.org, Oct 15 2017

Status: Fixed (was: Assigned)
Project Member

Comment 5 by bugdroid1@chromium.org, Oct 16 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c47f4cf8bc009b3a3c9f73b3a3ffb3bb4faff102

commit c47f4cf8bc009b3a3c9f73b3a3ffb3bb4faff102
Author: Wei-Yin Chen (陳威尹) <wychen@chromium.org>
Date: Mon Oct 16 19:22:27 2017

Roll DOM Distiller JavaScript distribution package

Diff since last roll:
https://github.com/chromium/dom-distiller/compare/8de0cacfed...0bde3157d7

Picked up changes:
0bde315 Fix false positive of next page detection
f1d9b2d Upload dom-distiller changes to Gerrit by default
4405dfd Fix build status badge image on GitHub

Bug:  702424 
Change-Id: Ie3beed6d18a2316ae829cf72c45657f1355bcaed
Reviewed-on: https://chromium-review.googlesource.com/719483
Reviewed-by: Matthew Jones <mdjones@chromium.org>
Commit-Queue: Wei-Yin Chen (陳威尹) <wychen@chromium.org>
Cr-Commit-Position: refs/heads/master@{#509136}
[modify] https://crrev.com/c47f4cf8bc009b3a3c9f73b3a3ffb3bb4faff102/DEPS
[modify] https://crrev.com/c47f4cf8bc009b3a3c9f73b3a3ffb3bb4faff102/third_party/dom_distiller_js/README.chromium

Sign in to add a comment