Failure to distill an article due to invisible <article> |
|||
Issue descriptionVersion: M52 What steps will reproduce the problem? (1) Enable DOM distiller (2) Distill http://www.industrial-lasers.com/articles/print/volume-20/issue-6/features/going-with-the-flo.html What is the expected output? Extracted article. What do you see instead? Empty content.
,
Jun 5 2016
It contains one <article> element, which is invisible. Our fast path wrongly picks that <article> element and treat it as the root in DOM walker. Filtering out invisible or even small article candidates should fix this. Alternatively, if the fast path results in empty content, redo distillation without the fast path.
,
Jun 6 2016
This CL fixes this issue: https://codereview.chromium.org/1411603004/
,
Jun 22 2016
,
Jun 23 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/96665328d45790abbbcc5136fd94b1b3c4954c9c commit 96665328d45790abbbcc5136fd94b1b3c4954c9c Author: wychen <wychen@chromium.org> Date: Thu Jun 23 01:24:03 2016 Roll DOM Distiller JavaScript distribution package Diff since last roll: https://github.com/chromium/dom-distiller/compare/0adf24afe4...54d05ba208 Picked up changes: 54d05ba Discard hidden articles when using fast path f6d2dc1 Support extraction of lazily-loaded images 11fdddc Strip "target" attribute from anchor elements 65c0b6d Fix for LeadImage getting images after last relevant content 0455a46 Support deprecated <object> API of Youtube embeds 8637690 Fix some warnings in Eclipse BUG=478142,481111, 544962 ,597366, 601811 , 616954 Review-Url: https://codereview.chromium.org/2087233004 Cr-Commit-Position: refs/heads/master@{#401500} [modify] https://crrev.com/96665328d45790abbbcc5136fd94b1b3c4954c9c/DEPS [modify] https://crrev.com/96665328d45790abbbcc5136fd94b1b3c4954c9c/third_party/dom_distiller_js/README.chromium
,
Sep 6 2016
|
|||
►
Sign in to add a comment |
|||
Comment 1 by wychen@chromium.org
, Jun 2 2016Owner: wychen@chromium.org