Omitting a header in distilled version |
|
Issue descriptionChrome Version: M58 Canary OS: iOS 10.2.1 What steps will reproduce the problem? - In Bling M57 or M58, go to http://m.20minutes.fr/amp/a/2009539 - add page to reading list - go offline What is the expected result? - All headers in the page need to be there. However, "Sur le travail, plutôt d’accord, sauf sur le revenu universel" seems to be missing What happens instead? - Page looks fine except that there is a header missing (see attachments) Note: Attached distilled version screenshots for Firefox and Safari as well.
,
Feb 24 2017
The last link is also missing. I looked into what went wrong. 1) One known issue is that when there are lots of links at the end of the article, they are often ignored. The classification algorithm tend to reject when the link density is too high. This is kind of hard to solve. Without reading the text content, it's hard to know whether the list of links are part of the article, or just "list of related articles", which is popular on news sites. 2) The heading detection wrongly put the links as headings. This inadvertently enabled header fusion, which retains most of the list by accident. I think fixing this part would actually remove the whole list and make things worse in this particular case. |
|
►
Sign in to add a comment |
|
Comment 1 by mard...@chromium.org
, Feb 24 2017126 KB
126 KB View Download