DOMDistiller should handle the folded subheadings and edit links on wikipedia |
|||||||||||
Issue descriptionOS: iOS10 platform: iPhone 6s What steps will reproduce the problem? (1) Open a wikipedia page with formula (e.g. https://en.m.wikipedia.org/wiki/Velocity) (2) Distill it What is the expected output? Formula should have been saved What do you see instead? The formula are not saved
,
Sep 19 2016
Would you elaborate what you mean by being saved? DOM distiller emitted 41 image URLs. Do you already know which formula is missing?
,
Sep 20 2016
I did see another issue though. The headings are all gone, and the editing links are shown instead.
,
Dec 1 2016
,
Dec 22 2016
Adding screenshots for distilled versions of https://en.m.wikipedia.org/wiki/Velocity from Firefox and from Chrome Canary on iOS. Note that FF distilled the content that was collapsed but showed the edit links. Chrome didn't distill the content under the subheading (and didn't show the edit button). The best behaviour would be to distill all the content, show the images, and not have links to edit.
,
Dec 22 2016
,
Dec 23 2016
The result of Bling in #c5 is different from comment in #c3 because of device width. For wider devices, the sections would be expanded, and the headings come with an editing link.
,
Dec 28 2016
For folded sections, we might be able to use aria-expanded attribute as a hint.
,
Jan 17 2017
,
Feb 23 2017
gambard@, when you said the formula are not saved, did you refer to the narrow case similar to #c5, where the sections are collapsed, with no edit links in the original page? From device emulation, iPhone 5 and iPhone 6 are the narrow case, while iPhone 6 Plus is the wide case. Looks like we need to properly handle both cases just for iPhone.
,
Mar 1 2017
Yes, I think it was the narrow case.
,
Mar 2 2017
,
Mar 2 2017
Issue 697884 has been merged into this issue.
,
Mar 2 2017
Found the criteria in wikipedia's JS code:
isWideScreen: memoize(function() {
var val = parseInt(mw.config.get('wgMFDeviceWidthTablet'), 10);
return window.innerWidth >= val || window.innerHeight >= val;
})
wgMFDeviceWidthTablet is '720px'
The height of iPhone 6 Plus is 736.
According to http://mydevice.io/devices/, wikipedia should probably use 800px as the threshold, and name it something like DeviceLongEdgeTablet.
,
Mar 3 2017
We could probably add a workaround for wikipedia to wait for the permanent solution in dom distiller.
Injecting
"(() => {"
" var s = document.createElement('style');"
" s.innerHTML='.client-js .collapsible-block { display: block }';"
" document.head.appendChild(s);"
"})()";
works (in wikipedia current version).
,
Mar 3 2017
Unhiding folded sections and extracting attributes of the lazily loaded images are doable quickly, but tuning the content extraction algorithm requires much more quality validation. Known issues include: - If a subheading has a "main article" or "see also" link below, the subheading and the link below would be gone. - If the link density is high, like reference sections or list of links in general, some content could be missing. - Inline images like formula or symbols becomes block images. - Tagline is missing. If we unhide folded sections without addressing these quality issues, it might give users false sense of completeness of the extraction results, which might be worse than it is now IMO. I think unhiding the folded section can be done in DOM distiller, so that it's not platform-dependent.
,
Mar 3 2017
There are a few stages of improvement. 0. Current status: shows nothing inside folded sections. Obviously broken. 1. Unhide the sections either by https://codereview.chromium.org/2730863002/ or in DOM distiller: get *most* contents inside sections, but no images. 2. Handle lazily loaded images 3. Handle "edit" link 4. Handle subheadings with links below 5. Handle sections with high link density The ideal stage 5 could take a week or two including proper quality assessment, but stage up to 3 can be done within days. The thing is, do we want to ship in stage 1~3? Having more content extracted is generally better, but giving false sense of correctness or completeness is more subtle and kind of dangerous. WDYT?
,
Mar 3 2017
I think we want to five the best experience. The idea of https://codereview.chromium.org/2730863002/ is to be a temporary workaround as mardini@ wanted a fast solution. Personnally, I think that getting to step 3 is already good. mardini, WDYT?
,
Mar 3 2017
To better assess the quality, you can give it a try and do some side-by-side comparisons. 1. https://codereview.chromium.org/2726123004/ 2. https://codereview.chromium.org/2729143003/ 3. https://codereview.chromium.org/2729233002/ The quality issue might not be obvious if you only look at the distilled content, without comparing with the original. This is the dangerous part.
,
Mar 3 2017
OK. One step back. Stage 2 (lazily loaded images) would have performance implications, and requires more time to properly benchmark and mitigate. I'm afraid it won't be enough time for M57. Unhiding folded section and "edit" link should be low-risk.
,
Mar 7 2017
+ Claude Doing point#1 in comment#17 is better than nothing (get *most* contents inside sections, but no images.). Is that happening for M57? Thanks.
,
Mar 7 2017
According to https://bugs.chromium.org/p/chromium/issues/detail?id=692553#c17, looks like Olivier's fix for wikipedia would be cherry-picked.
,
Mar 7 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f2b4f5683185e5bfdc90c42f97d49ed3e06ca1f2 commit f2b4f5683185e5bfdc90c42f97d49ed3e06ca1f2 Author: olivierrobin <olivierrobin@chromium.org> Date: Tue Mar 07 20:07:02 2017 [Reading List] Expand all Wikipedia sections on distillation. Most part of Wikipedia articles is hidden when the page loads. DOM distiller will eventually handle this issue, but for the time being, this is a workaround. The workaround set the style to block before distilling the page. BUG=647667, 699215 Review-Url: https://codereview.chromium.org/2730863002 Cr-Commit-Position: refs/heads/master@{#455186} [modify] https://crrev.com/f2b4f5683185e5bfdc90c42f97d49ed3e06ca1f2/ios/chrome/browser/reading_list/reading_list_distiller_page.h [modify] https://crrev.com/f2b4f5683185e5bfdc90c42f97d49ed3e06ca1f2/ios/chrome/browser/reading_list/reading_list_distiller_page.mm
,
Mar 8 2017
Correct! Olivier's fix for wikipedia will be cherry-picked into M57.
,
Mar 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d7135f8eb25e16eadab36355f6684836122c24ef commit d7135f8eb25e16eadab36355f6684836122c24ef Author: Olivier Robin <olivierrobin@chromium.org> Date: Wed Mar 08 08:26:30 2017 [Reading List] Expand all Wikipedia sections on distillation. Most part of Wikipedia articles is hidden when the page loads. DOM distiller will eventually handle this issue, but for the time being, this is a workaround. The workaround set the style to block before distilling the page. BUG=647667, 699215 Review-Url: https://codereview.chromium.org/2730863002 Cr-Commit-Position: refs/heads/master@{#455186} (cherry picked from commit f2b4f5683185e5bfdc90c42f97d49ed3e06ca1f2) Review-Url: https://codereview.chromium.org/2733323004 . Cr-Commit-Position: refs/branch-heads/2987@{#799} Cr-Branched-From: ad51088c0e8776e8dcd963dbe752c4035ba6dab6-refs/heads/master@{#444943} [modify] https://crrev.com/d7135f8eb25e16eadab36355f6684836122c24ef/ios/chrome/browser/reading_list/reading_list_distiller_page.h [modify] https://crrev.com/d7135f8eb25e16eadab36355f6684836122c24ef/ios/chrome/browser/reading_list/reading_list_distiller_page.mm
,
Mar 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3ff6fa0d932afd0d27a11f4bd1de9b04586b35b8 commit 3ff6fa0d932afd0d27a11f4bd1de9b04586b35b8 Author: Olivier Robin <olivierrobin@chromium.org> Date: Wed Mar 08 08:30:55 2017 [Reading List] Expand all Wikipedia sections on distillation. Most part of Wikipedia articles is hidden when the page loads. DOM distiller will eventually handle this issue, but for the time being, this is a workaround. The workaround set the style to block before distilling the page. BUG=647667, 699215 Review-Url: https://codereview.chromium.org/2730863002 Cr-Commit-Position: refs/heads/master@{#455186} (cherry picked from commit f2b4f5683185e5bfdc90c42f97d49ed3e06ca1f2) Review-Url: https://codereview.chromium.org/2737743003 . Cr-Commit-Position: refs/branch-heads/3029@{#59} Cr-Branched-From: 939b32ee5ba05c396eef3fd992822fcca9a2e262-refs/heads/master@{#454471} [modify] https://crrev.com/3ff6fa0d932afd0d27a11f4bd1de9b04586b35b8/ios/chrome/browser/reading_list/reading_list_distiller_page.h [modify] https://crrev.com/3ff6fa0d932afd0d27a11f4bd1de9b04586b35b8/ios/chrome/browser/reading_list/reading_list_distiller_page.mm
,
Mar 10 2017
,
Mar 10 2017
Issue is fixed. Checked on chrome dev version 57.0.2987.101 on iPhone 6 plus with iOS 10.3 iPhone 7 with iOS 10.2.1 iPhone 7 plus with iOS 10.3 following the steps mentioned in #0 and issue 697887 . Verified that content under subsections are displayed in offline mode. (Images under subsections are not displayed). Headings of subsections are displayed.
,
Mar 14 2017
More on the "edit" links. On desktop version, we want to remove the "[edit]" part, and its HTML is: <span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/w/index.php?title=Velocity&action=edit&section=1" title="Edit section: Constant velocity vs acceleration">edit</a><span class="mw-editsection-bracket">]</span></span> On mobile version, we want to remove the edit icon, and its HTML is: <a href="/w/index.php?title=Velocity&action=edit&section=1" title="Edit section: Constant velocity vs acceleration" data-section="1" class="mw-ui-icon mw-ui-icon-element mw-ui-icon-edit-enabled edit-page">Edit</a> Note that there are "red links" (links to an entry that's not there yet) that also have "action=edit" in the href.
,
Mar 15 2017
Issue is fixed. Checked on chrome beta version 58.0.3029.19 on iPhone 6 plus with iOS 10.3 beta 6 iPhone 6s plus with iOS 10.2.1 iPhone 7 plus with iOS 10.2.1 following the steps mentioned in #0 and issue 697887 . Verified that content under subsections are displayed in offline mode. (Images under subsections are not displayed).
,
Apr 4 2017
Any update on the DOM distiller fix? Will it make it to M58?
,
Apr 4 2017
I'll land and roll the following CLs this week, and then cherry-pick to M58. Sounds good? https://codereview.chromium.org/2726123004/ https://codereview.chromium.org/2729233002/ https://codereview.chromium.org/2729143003/
,
Apr 11 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e7dcd5dedea72f0619d7523bbca7344794f677b2 commit e7dcd5dedea72f0619d7523bbca7344794f677b2 Author: wychen <wychen@chromium.org> Date: Tue Apr 11 02:44:20 2017 Roll DOM Distiller JavaScript distribution package Diff since last roll: https://github.com/chromium/dom-distiller/compare/489c6609cb...8de0cacfed Picked up changes: 8de0cac Handle image lazy loading on Wikipedia 9c2021c Ignore the "Edit" links in wiki pages cd538d1 Unhide folded sections in wiki pages 4d66ebb Make sure images are loaded before running tests 2d7b629 Make sure lazily-loaded images have absolute URL BUG=647667 Review-Url: https://codereview.chromium.org/2809883002 Cr-Commit-Position: refs/heads/master@{#463503} [modify] https://crrev.com/e7dcd5dedea72f0619d7523bbca7344794f677b2/DEPS [modify] https://crrev.com/e7dcd5dedea72f0619d7523bbca7344794f677b2/third_party/dom_distiller_js/README.chromium
,
Feb 15 2018
,
Jan 7
If I understand correctly, the underlying problem is not really to do with device width. That's merely the trigger. There will still be a class of devices narrower where sections don't expand by default, and presumably distiller is meant to work correctly there as well. I don't know what HTML or DOM the distiller uses as source, but I suspect it might be taking the active DOM from the web page (where JS was enabled and may've loaded without issues and made changes to the DOM). I think that because the mobile site in question does have sections expanded by default when JS is disabled, JS failed to load, or the browser doesn't pass our JS feature-test. If that is true, then the distiller is taking an active document (as served with JS enabled and successfully run), and then takes it out of context where JS and most of CSS are missing. In theory, progressive enhancement techniques should allow that to work correctly. But, at the same time, I also think it is quite reasonable for a site to make changes to the DOM /after/ the JS loads that wouldn't be compatible with being cloned and re-opened without that same JS active. Is there a @media query or other mechanism that sites can use to aid tools in doing this and/or if not already, should distiller be operating on the original source with <noscript> considered, or something like that. |
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by stkhapugin@chromium.org
, Sep 16 2016Owner: wychen@chromium.org
Status: Assigned (was: Untriaged)