New issue
Advanced search Search tips

Issue 645690 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 3
Type: Bug



Sign in to add a comment

Fast article element detection should ignore small elements

Project Member Reported by wychen@chromium.org, Sep 10 2016

Issue description

Version: M54

What steps will reproduce the problem?
(1) Run DOM distiller on http://japanese.engadget.com/2016/09/09/3dcg-saya2016/

What is the expected output?
Extracted content.

What do you see instead?
No data is extracted.

In the fast path, the only detected article element is this one:

<header class="header container" itemscope="" itemtype="http://schema.org/BlogPosting">

Its dimension is around 400x100 px on mobile, or around 800x70 on desktop. We should filter out these small elements.
 

Comment 1 by wychen@chromium.org, Sep 10 2016

Components: UI>Browser>ReaderMode
This is similar to  issue 616954 , which is about invisible <article> elements.
Blocking: 687071
Status: Assigned (was: Untriaged)
Naively making article detection more accurate would adversely affect the quality evaluation. The key difference is that the title is usually no longer within the root element, so the "expand to title" step no longer works properly.
Blocking: -687071

Sign in to add a comment