New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 677359 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 2
Type: Bug



Sign in to add a comment

dom distiller failed on some folded page

Reported by yangxiao...@gmail.com, Dec 29 2016

Issue description

UserAgent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36

Steps to reproduce the problem:
1. start chrome with --enable-dom-distiller
2. load https://m.sohu.com/n/477121843/?wscrid=1137_4 
3. click xpath : /html/body/section[1]/article/div[4]/a. (something like "show less" button in Chinese)
4. the hidden content become visible.
5. distill the page, the content in step 4 is not distilled. 

What is the expected behavior?
The hidden content should be recognized as content. 

What went wrong?
The hidden content is missed in dom distiller result. 

Did this work before? N/A 

Does this work in other browsers? Yes

Chrome version: 53.0.2785.116  Channel: stable
OS Version: 
Flash Version: Shockwave Flash 24.0 r0
 

Comment 1 by wychen@chromium.org, Dec 29 2016

Cc: mdjones@chromium.org k...@chromium.org
Components: UI>Browser>ReaderMode
Owner: wychen@chromium.org
Status: Available (was: Unconfirmed)

Comment 2 by wychen@chromium.org, Dec 29 2016

DOM distiller can correctly get the lower part if the hidden content is already expanded. I think what happened was: that URL was distilled once when the lower part was hidden. After expansion, reader mode returns the cached old version.
I have tried to open another new page which similar to the one in bug description, expand the less content firstly, and then distill the page, the hidden content was still missed. 
I have attached three screenshots. 

step 1: start chrome.  chrome --enable-dom-distiller 
step 2: open https://m.sohu.com/n/477367845/?wscrid=95360_1  
step 3: click 'show more' (You can check the screenshot before_expand.png and after_expand.png)
step 4: click 'distill page' in  the menu of chrome. (after_distill.png)


before_expand.png
30.3 KB View Download
after_expand.png
64.3 KB View Download
after_distill.png
61.0 KB View Download

Comment 6 by wychen@chromium.org, Dec 30 2016

Can you try using Chrome extension to distill the page and see if it's reproducible?
I got white page for extension mode as described here. https://github.com/chromium/dom-distiller/issues/8
I have tried the latest code in dome_distiller and used it in extension mode, it can distill the hidden content successfully. Sorry, maybe there is some bug in m53 or as you said before, the cache content was shown. 

Comment 9 by phistuck@gmail.com, Jan 3 2017

I did not notice you are using the old version. Can you upgrade to Chrome 55?
(Any non-current Chrom(ium) has many known security issues now)
Thanks for reporting this bug. It looks like this bug is no longer reproducible.

One possible improvement is to support extracting the whole article even if it is not expanded. For this particular site, this can be done with the following tweaks:
- Keep traversing nodes with id="rest_content", even if it is not visible.
- Handle lazily-loaded <img> with attribute original-hidden.

If these heuristics are general enough, we can consider adding them.
Status: WontFix (was: Available)

Sign in to add a comment