uk.businessinsider.com article not distilled because content can be hidden on visibilitychange |
||||||||||
Issue descriptionHi Articles from uk.businessinsider.com are not distilled. IIUC it is because the page does not have an article element. But on Android, if you set reader-mode-heuristics to 'Always', the distilled page is correct (but not detected as available by "formatted article" or "seems to be an article" heuristic). Is this a configuration issue?
,
Jan 26 2017
Worth noting that Business Insiders articles are distilled fine on Firefox and Safari on iOS.
,
Feb 1 2017
Let me clarify this a bit. You said that on Clank, the distillability detection is wrong, but if set to Always, the distilled page is correct. However, on Bling, the distilled page is wrong. Is this what you meant? I tried to use iPhone user agent, but the distilled page looked find.
,
Feb 1 2017
Correct. On Chrome iOS, distilled version is empty.
,
Feb 2 2017
I can reproduce this issue on iPhone emulator. Strangely, if I use Inspector to inject DOM distiller JS code, it extracts the article correctly. The culprit might be in the glue logic, not in the DOM distiller code. I'll have to dig deeper.
,
Feb 15 2017
Hi wychen@, Any updates on this one? Business Insider is quite an important site to distill correctly. If the issue is on the Bling side maybe olivierrobin@ or eugenebut@ could take a look? Thanks.
,
Feb 15 2017
+ Pink for visibility
,
Feb 15 2017
Given that injecting DOM distiller JS code through Inspector on iPhone emulator can extract the article correctly, having someone looking on the Bling side might be more efficient.
,
Feb 15 2017
I can take a look if you if there is something specific to look for.
,
Feb 15 2017
I don't know where the culprit is yet, so this will be a debugging process ahead. It might make sense to do the following: - Get Bling to the state of reproducing this issue - Use the exact same copy of DOM distiller JS code in Bling, and inject it through Safari Inspector to Bling. It should distill this time. - Dump the JSON output of the distiller JS code used inside Bling and see what's different. If the output is the same, then the culprit would be in the glue logic.
,
Feb 15 2017
JSON in ReadingList
{
1 = "What you need to know on Wall Street right now";
10 = (
);
2 = {
1 = "";
};
3 = {
};
5 = {
1 = "What you need to know on Wall Street right now";
2 = Article;
3 = "http://uk.businessinsider.com/wall-street-biggest-stories-february-15-2017";
4 = "Finance Insider is Business Insider's midday summary of the top stories of the past 24 hours.";
5 = "Business Insider";
6 = "";
7 = "";
8 = {
1 = "";
2 = "";
3 = "";
4 = "";
5 = (
"http://uk.businessinsider.com/author/business-insider"
);
};
9 = (
{
1 = "http://static4.uk.businessinsider.com/image/5807bfb8dd089523498b49da-1190-625/what-you-need-to-know-on-wall-street-right-now.jpg";
2 = "";
3 = "";
4 = "";
5 = 0;
6 = 0;
}
);
};
6 = {
1 = "0.5650000000023283";
2 = "4.014999999999418";
3 = "2.379999999990105";
4 = "0.03500000000349246";
5 = "22.64999999999418";
6 = (
{
1 = OpenGraphProtocolParser;
2 = "0.1350000000093132";
},
{
1 = SchemaOrgParserAccessor;
2 = "0.04499999999825377";
},
{
1 = IEReadingViewParser;
2 = "0.03500000000349246";
},
{
1 = "OpenGraphProtocolParser.findPrefixes";
2 = "0.4149999999935972";
},
{
1 = "OpenGraphProtocolParser.parseMetaTags";
2 = "1.100000000005821";
},
{
1 = "OpenGraphProtocolParser.imageParser.verify";
2 = "0.005000000004656613";
},
{
1 = "OpenGraphProtocolParser.checkRequired";
2 = "0.2050000000017462";
},
{
1 = "OpenGraphProtocolParser.parse";
2 = "3.089999999996508";
},
{
1 = Pagination;
2 = "4.919999999998254";
},
{
1 = "SchemaOrgParser.parse";
2 = "0.430000000007567";
}
);
};
7 = {
1 = "DomDistiller debug level: 0\n";
};
8 = {
1 = 0;
};
9 = auto;
}
,
Feb 15 2017
Using webinspector
{1: "What you need to know on Wall Street right now", 2: {1: "<img src=\"http://static3.uk.businessinsider.com/im…=\"Boe Hartman\" width=\"115\" height=\"86\"></LI></UL>"}, 3: {}, 5: Object, 6: Object, 7: {1: "DomDistiller debug level: 0↵"}, 8: {1: 429}, 9: "auto", 10: [{1: "http://static3.uk.businessinsider.com/image/57338ae6848fb611197b7148-50/business-insider.jpg"}, {1: "http://static6.uk.businessinsider.com/image/58a487…ed-a-big-promotion-in-its-new-online-business.jpg"}]} = $1
,
Feb 15 2017
More formatted webinspector https://screenshot.googleplex.com/p5PQna9aG66
,
Feb 16 2017
So if I understand correctly, the HTML is in the field "2". So it is present when running the script in inspector, but not when distilling the page.
,
Feb 16 2017
,
Feb 16 2017
I did more debugging. The page is only distilled if the WKWebView is in the view hierarchy. Not sure what difference it makes in the page, but if I add the WKWebView while distilling, it works.
,
Feb 16 2017
What does it mean to be in the view hierarchy? Does this affect the visibility of HTML elements? Distiller ignores invisible ones. You can pass "{2:9}" as the option argument to get more debugging info.
,
Feb 16 2017
We need to add the view using [applicationWindow addSubview:wkWebView] for the distillation to work. You can see my fix here https://codereview.chromium.org/2695313004/ . I guess this particular page monitors some UI events and only renders if it is displayed.
,
Feb 16 2017
This finding is awesome! It never would have occurred to me that not being in the view hierarchy could cause this.
,
Feb 17 2017
Turns out desktop Chrome also have this symptom, but Clank is OK. Not a top priority to fix desktop though.
,
Feb 17 2017
I investigated more. The page uses document.visibilityState and document.hidden API. The result is "visible" if the page is in the view hierarchy "prerender" if the page is not. businessinsider uses this value to not render the whole page in prerender mode (it sets body = hidden on prerender).
,
Feb 17 2017
This javascript
Object.defineProperty(document, 'hidden', {value: false});
Object.defineProperty(document, 'visibilityState', {value: 'visible'});
var event = new Event('visibilitychange');
document.dispatchEvent(event);
make the distillation work.
I don't know if Dom distiller would consider adding this to the script.
I don't know either if this is a better/more performant/more robust solution than adding webView to the view hierarchy.
Please comment if you think this is a better solution.
,
Feb 17 2017
I'm not sure if the JavaScript method would work on platforms other than iOS. We inject the distiller JS code in an isolated world, which only share the DOM state with the main world, where the web page JS code resides. AFAIK, the events and properties are not part of the shared states.
,
Feb 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/47b54ff088d9c52ece1c28b06e647e64ec8014d7 commit 47b54ff088d9c52ece1c28b06e647e64ec8014d7 Author: olivierrobin <olivierrobin@chromium.org> Date: Mon Feb 20 17:46:11 2017 [iOS Reading List] Add distilling WKWebView to the view hierarchy. Some pages (e.g. uk.businessinsider.com) are not rendered correctly unless the view is added in the view hierarchy. Add temporarily the WKWebView to the view hierarchy during distillation. Move it out ouf the screen to make it invisible to the user. BUG= 685583 Review-Url: https://codereview.chromium.org/2695313004 Cr-Commit-Position: refs/heads/master@{#451643} [modify] https://crrev.com/47b54ff088d9c52ece1c28b06e647e64ec8014d7/ios/chrome/browser/reading_list/reading_list_distiller_page.mm
,
Feb 22 2017
The way they hide the content is by adding class name "hidden" to <body>. One way to deal with this in DOM distiller is to ignore visibility info of <body> when traversing it. This heuristic is not very general though, since they could've applied the visibility change to other nodes, like <article> or something.
,
Feb 22 2017
,
Feb 22 2017
Re #c25: OK, this won't work because attribute "visibility: hidden" is inherited, so dealing with <body> is not enough. If "opacity: 0" or "display: none" is used instead, this would work.
,
Feb 23 2017
,
May 30 2017
Verified in 60.0.3112.7 canary, iPhone 6 plus iOS 10.2.1, iPad mini 10.3 beta 7 uk.businessinsider.com articles are getting distilled. Looks good. |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by olivierrobin@chromium.org
, Jan 26 2017