New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 685583 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: Feb 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , iOS
Pri: 2
Type: Bug



Sign in to add a comment

uk.businessinsider.com article not distilled because content can be hidden on visibilitychange

Project Member Reported by olivierrobin@chromium.org, Jan 26 2017

Issue description

Hi

Articles from uk.businessinsider.com are not distilled.
IIUC it is because the page does not have an article element.

But on Android, if you set reader-mode-heuristics to 'Always', the distilled page is correct (but not detected as available by "formatted article" or "seems to be an article" heuristic).

Is this a configuration issue?

 
Components: UI>Browser>ReaderMode
Labels: M-57 Hotlist-ReadingList
Worth noting that Business Insiders articles are distilled fine on Firefox and Safari on iOS. 
Let me clarify this a bit. You said that on Clank, the distillability detection is wrong, but if set to Always, the distilled page is correct. However, on Bling, the distilled page is wrong. Is this what you meant?

I tried to use iPhone user agent, but the distilled page looked find.
Correct.
On Chrome iOS, distilled version is empty.
Cc: k...@chromium.org
I can reproduce this issue on iPhone emulator. Strangely, if I use Inspector to inject DOM distiller JS code, it extracts the article correctly. The culprit might be in the glue logic, not in the DOM distiller code. I'll have to dig deeper.
Cc: olivierrobin@chromium.org
Hi wychen@, 

Any updates on this one? Business Insider is quite an important site to distill correctly. If the issue is on the Bling side maybe olivierrobin@ or eugenebut@ could take a look? 

Thanks.
Cc: pinkerton@chromium.org
+ Pink for visibility

Comment 8 by wychen@chromium.org, Feb 15 2017

Given that injecting DOM distiller JS code through Inspector on iPhone emulator can extract the article correctly, having someone looking on the Bling side might be more efficient.
I can take a look if you if there is something specific to look for.
I don't know where the culprit is yet, so this will be a debugging process ahead. It might make sense to do the following:
- Get Bling to the state of reproducing this issue
- Use the exact same copy of DOM distiller JS code in Bling, and inject it through Safari Inspector to Bling. It should distill this time.
- Dump the JSON output of the distiller JS code used inside Bling and see what's different. If the output is the same, then the culprit would be in the glue logic.
JSON in ReadingList

{
    1 = "What you need to know on Wall Street right now";
    10 =     (
    );
    2 =     {
        1 = "";
    };
    3 =     {
    };
    5 =     {
        1 = "What you need to know on Wall Street right now";
        2 = Article;
        3 = "http://uk.businessinsider.com/wall-street-biggest-stories-february-15-2017";
        4 = "Finance Insider is Business Insider's midday summary of the top stories of the past 24 hours.";
        5 = "Business Insider";
        6 = "";
        7 = "";
        8 =         {
            1 = "";
            2 = "";
            3 = "";
            4 = "";
            5 =             (
                "http://uk.businessinsider.com/author/business-insider"
            );
        };
        9 =         (
                        {
                1 = "http://static4.uk.businessinsider.com/image/5807bfb8dd089523498b49da-1190-625/what-you-need-to-know-on-wall-street-right-now.jpg";
                2 = "";
                3 = "";
                4 = "";
                5 = 0;
                6 = 0;
            }
        );
    };
    6 =     {
        1 = "0.5650000000023283";
        2 = "4.014999999999418";
        3 = "2.379999999990105";
        4 = "0.03500000000349246";
        5 = "22.64999999999418";
        6 =         (
                        {
                1 = OpenGraphProtocolParser;
                2 = "0.1350000000093132";
            },
                        {
                1 = SchemaOrgParserAccessor;
                2 = "0.04499999999825377";
            },
                        {
                1 = IEReadingViewParser;
                2 = "0.03500000000349246";
            },
                        {
                1 = "OpenGraphProtocolParser.findPrefixes";
                2 = "0.4149999999935972";
            },
                        {
                1 = "OpenGraphProtocolParser.parseMetaTags";
                2 = "1.100000000005821";
            },
                        {
                1 = "OpenGraphProtocolParser.imageParser.verify";
                2 = "0.005000000004656613";
            },
                        {
                1 = "OpenGraphProtocolParser.checkRequired";
                2 = "0.2050000000017462";
            },
                        {
                1 = "OpenGraphProtocolParser.parse";
                2 = "3.089999999996508";
            },
                        {
                1 = Pagination;
                2 = "4.919999999998254";
            },
                        {
                1 = "SchemaOrgParser.parse";
                2 = "0.430000000007567";
            }
        );
    };
    7 =     {
        1 = "DomDistiller debug level: 0\n";
    };
    8 =     {
        1 = 0;
    };
    9 = auto;
}
Using webinspector

{1: "What you need to know on Wall Street right now", 2: {1: "<img src=\"http://static3.uk.businessinsider.com/im…=\"Boe Hartman\" width=\"115\" height=\"86\"></LI></UL>"}, 3: {}, 5: Object, 6: Object, 7: {1: "DomDistiller debug level: 0↵"}, 8: {1: 429}, 9: "auto", 10: [{1: "http://static3.uk.businessinsider.com/image/57338ae6848fb611197b7148-50/business-insider.jpg"}, {1: "http://static6.uk.businessinsider.com/image/58a487…ed-a-big-promotion-in-its-new-online-business.jpg"}]} = $1

More formatted webinspector
https://screenshot.googleplex.com/p5PQna9aG66
So if I understand correctly, the HTML is in the field "2".
So it is present when running the script in inspector, but not when distilling the page.
Cc: wychen@chromium.org
Owner: olivierrobin@chromium.org
Status: Started (was: Assigned)
I did more debugging.
The page is only distilled if the WKWebView is in the view hierarchy.
Not sure what difference it makes in the page, but if I add the WKWebView while distilling, it works.

What does it mean to be in the view hierarchy? Does this affect the visibility of HTML elements? Distiller ignores invisible ones. You can pass "{2:9}" as the option argument to get more debugging info.
We need to add the view using
[applicationWindow addSubview:wkWebView] for the distillation to work.

You can see my fix here https://codereview.chromium.org/2695313004/ .
I guess this particular page monitors some UI events and only renders if it is displayed.
This finding is awesome! It never would have occurred to me that not being in the view hierarchy could cause this.
Labels: OS-Linux
Turns out desktop Chrome also have this symptom, but Clank is OK. Not a top priority to fix desktop though.
I investigated more.
The page uses document.visibilityState and document.hidden API.
The result is

"visible" if the page is in the view hierarchy
"prerender" if the page is not.

businessinsider uses this value to not render the whole page in prerender mode
(it sets body = hidden on prerender).
This javascript
Object.defineProperty(document, 'hidden', {value: false});
Object.defineProperty(document, 'visibilityState', {value: 'visible'});
var event = new Event('visibilitychange');
document.dispatchEvent(event);

make the distillation work.
I don't know if Dom distiller would consider adding this to the script.
I don't know either if this is a better/more performant/more robust solution than adding webView to the view hierarchy.

Please comment if you think this is a better solution.
I'm not sure if the JavaScript method would work on platforms other than iOS. We inject the distiller JS code in an isolated world, which only share the DOM state with the main world, where the web page JS code resides. AFAIK, the events and properties are not part of the shared states.
Project Member

Comment 24 by bugdroid1@chromium.org, Feb 20 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/47b54ff088d9c52ece1c28b06e647e64ec8014d7

commit 47b54ff088d9c52ece1c28b06e647e64ec8014d7
Author: olivierrobin <olivierrobin@chromium.org>
Date: Mon Feb 20 17:46:11 2017

[iOS Reading List] Add distilling WKWebView to the view hierarchy.

Some pages (e.g. uk.businessinsider.com) are not rendered correctly
unless the view is added in the view hierarchy.
Add temporarily the WKWebView to the view hierarchy during distillation.
Move it out ouf the screen to make it invisible to the user.

BUG= 685583 

Review-Url: https://codereview.chromium.org/2695313004
Cr-Commit-Position: refs/heads/master@{#451643}

[modify] https://crrev.com/47b54ff088d9c52ece1c28b06e647e64ec8014d7/ios/chrome/browser/reading_list/reading_list_distiller_page.mm

The way they hide the content is by adding class name "hidden" to <body>. One way to deal with this in DOM distiller is to ignore visibility info of <body> when traversing it. This heuristic is not very general though, since they could've applied the visibility change to other nodes, like <article> or something.
Summary: uk.businessinsider.com article not distilled because content can be hidden on visibilitychange (was: uk.businessinsider.com article not distilled)
Re #c25: OK, this won't work because attribute "visibility: hidden" is inherited, so dealing with <body> is not enough. If "opacity: 0" or "display: none" is used instead, this would work.
Status: Fixed (was: Started)
Status: Verified (was: Fixed)
Verified in 60.0.3112.7 canary, iPhone 6 plus iOS 10.2.1, iPad mini 10.3 beta 7

uk.businessinsider.com articles are getting distilled.
Looks good.

Sign in to add a comment