New issue
Advanced search Search tips

Issue 695867 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: iOS
Pri: 2
Type: Bug



Sign in to add a comment

Omitting a header in distilled version

Project Member Reported by mard...@chromium.org, Feb 24 2017

Issue description

Chrome Version: M58 Canary
OS: iOS 10.2.1

What steps will reproduce the problem?
- In Bling M57 or M58, go to http://m.20minutes.fr/amp/a/2009539
- add page to reading list
- go offline 

What is the expected result?
- All headers in the page need to be there. However, "Sur le travail, plutôt d’accord, sauf sur le revenu universel" seems to be missing

What happens instead?
- Page looks fine except that there is a header missing (see attachments)

Note: Attached distilled version screenshots for Firefox and Safari as well.

 
ONLINE-CHROME.PNG
384 KB View Download
DISTILLED-CHROME.PNG
313 KB View Download
DISTILLED-SAFARI.PNG
136 KB View Download
Distilled Firefox attached. 
DISTILLED-FIREFOX.PNG
126 KB View Download

Comment 2 by wychen@chromium.org, Feb 24 2017

The last link is also missing.

I looked into what went wrong.

1) One known issue is that when there are lots of links at the end of the article, they are often ignored. The classification algorithm tend to reject when the link density is too high. This is kind of hard to solve. Without reading the text content, it's hard to know whether the list of links are part of the article, or just "list of related articles", which is popular on news sites.

2) The heading detection wrongly put the links as headings. This inadvertently enabled header fusion, which retains most of the list by accident. I think fixing this part would actually remove the whole list and make things worse in this particular case.

Sign in to add a comment