dom distiller failed on some folded page
Reported by
yangxiao...@gmail.com,
Dec 29 2016
|
||
Issue descriptionUserAgent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36 Steps to reproduce the problem: 1. start chrome with --enable-dom-distiller 2. load https://m.sohu.com/n/477121843/?wscrid=1137_4 3. click xpath : /html/body/section[1]/article/div[4]/a. (something like "show less" button in Chinese) 4. the hidden content become visible. 5. distill the page, the content in step 4 is not distilled. What is the expected behavior? The hidden content should be recognized as content. What went wrong? The hidden content is missed in dom distiller result. Did this work before? N/A Does this work in other browsers? Yes Chrome version: 53.0.2785.116 Channel: stable OS Version: Flash Version: Shockwave Flash 24.0 r0
,
Dec 29 2016
DOM distiller can correctly get the lower part if the hidden content is already expanded. I think what happened was: that URL was distilled once when the lower part was hidden. After expansion, reader mode returns the cached old version.
,
Dec 30 2016
I have tried to open another new page which similar to the one in bug description, expand the less content firstly, and then distill the page, the hidden content was still missed.
,
Dec 30 2016
I tried a few pages, and they all worked fine. https://m.sohu.com/n/477121843/?wscrid=1137_4 https://m.sohu.com/n/557066524/?wscrid=95360_20 https://m.sohu.com/n/477353601/?wscrid=95360_9 https://m.sohu.com/n/477367845/?wscrid=95360_1 I'd need more info to reproduce this bug.
,
Dec 30 2016
I have attached three screenshots. step 1: start chrome. chrome --enable-dom-distiller step 2: open https://m.sohu.com/n/477367845/?wscrid=95360_1 step 3: click 'show more' (You can check the screenshot before_expand.png and after_expand.png) step 4: click 'distill page' in the menu of chrome. (after_distill.png)
,
Dec 30 2016
Can you try using Chrome extension to distill the page and see if it's reproducible?
,
Dec 30 2016
I got white page for extension mode as described here. https://github.com/chromium/dom-distiller/issues/8
,
Jan 3 2017
I have tried the latest code in dome_distiller and used it in extension mode, it can distill the hidden content successfully. Sorry, maybe there is some bug in m53 or as you said before, the cache content was shown.
,
Jan 3 2017
I did not notice you are using the old version. Can you upgrade to Chrome 55? (Any non-current Chrom(ium) has many known security issues now)
,
Jan 3 2017
Thanks for reporting this bug. It looks like this bug is no longer reproducible. One possible improvement is to support extracting the whole article even if it is not expanded. For this particular site, this can be done with the following tweaks: - Keep traversing nodes with id="rest_content", even if it is not visible. - Handle lazily-loaded <img> with attribute original-hidden. If these heuristics are general enough, we can consider adding them.
,
Jul 11 2017
|
||
►
Sign in to add a comment |
||
Comment 1 by wychen@chromium.org
, Dec 29 2016Components: UI>Browser>ReaderMode
Owner: wychen@chromium.org
Status: Available (was: Unconfirmed)