New issue
Advanced search Search tips

Issue 647083 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 3
Type: Bug



Sign in to add a comment

Main content missing because layout table misclassified as data

Project Member Reported by wychen@chromium.org, Sep 14 2016

Issue description

Version: M55

What steps will reproduce the problem?
(1) Distill https://www.grc.nasa.gov/www/k-12/airplane/bga.html

What is the expected output?
Main content extracted.

What do you see instead?
"No data found"

User feedback:
https://feedback.corp.google.com/product/282/neutron?lView=rd&lReport=18197533560

Quick diagnosis:
The table containing the main content doesn't fit any of the classification rules, and the default is data table.

Possible fix:
- If the table contains <hr>, treat as layout.

 
Status: Assigned (was: Untriaged)

Sign in to add a comment