Page distillation incorrect (doubles the first page) |
|||||
Issue descriptionChrome Version: M58.0.3012.0 OS: iOS10.2.x iPad and Phone - go to http://www.stereophile.com/content/auralic-altair-da-processor#8t9PKsgtrkf1kF2V.97 - add it to reading list, make sure it is distilled (green checkbox) - turn on airplane mode - load page from reading list to get distilled version expected: - the article is spread across two pages, both pages get distilled actual: - the 1st page of the article is duplicated twice. - you can see a repeat of the text and images on page 1. page 2 is not included.
,
Feb 21 2017
,
Feb 21 2017
Works on Linux 57.0.2987.54 Beta as expected. On Clank, only first page is shown. Couldn't repro the issue described above by using device emulation in Chrome. I'll have to get an iPhone emulator. On desktop version of the page, the link says "Page 2". On mobile version (tested on Clank), it's "View More", so the next page algorithm doesn't recognize it.
,
Feb 21 2017
It might be too late to cherry-pick for M57. Punting to M58.
,
Feb 23 2017
I built ToT and tried it on iPhone simulator. On both reader mode and reading list, only the 1st page is distilled. The result is the same as on Clank. It's still wrong, but in a different way.
Is this still reproducible on your side? If so, would you mind dumping the JSON output of distiller? Its [3] should be {}, denoting "next page URL" not found.
,
Feb 23 2017
As for not finding the next page link, one possible heuristic is to consider the <link rel="next"> URL. However, we don't want to directly use it, since it might not be what we want. Adding bonus points that URL might work, but I'll need to run this through our validation dataset.
,
Feb 23 2017
I noticed the doubling on iPad, so phone behavior could be different. Neither show the second page, however.
,
Feb 23 2017
I did a quick test on iPad Air 2 simulator. The page looks like desktop version, so there's a link saying "Page 2". The next page URL is extracted as expected, and Reader Mode shows two pages stitched together correctly.
,
Feb 24 2017
Hmm, I tried it again and you're right, both pages are correctly distilled with no doubling. However, this time the images for the second page are broken placeholders. I reproduced the doubling a couple of times before I filed the bug on my iPad, not sure why it's suddenly different. I did update the dev channel, so perhaps something got fixed in the meantime. I'm now on 58.0.3019.0dev (was on .3012 before).
,
Feb 24 2017
pagination is not supported at the moment on iOS. I filed a bug to fully not support it as IIUC, it is really linked to locale and does not support non-English pages (crbug.com/676265).
,
Feb 24 2017
We haven't changed the next page detection algorithm for a while, so I'm not sure what happened. In case you are able to reproduce it in the future, it might be useful to archive the page in MHTML and zip it to attach in the bug. I couldn't repro the broken placeholder. On the second page, both images are correctly extracted. They don't look like lazily loaded either. One small tip: writing in a format of "issue 676265" would turn it to a link. Strangely crbug URLs are not converted on the web interface.
,
Feb 24 2017
I tried this again (clear reading list, add page, turn on airplane mode, load page) and now the second page isn't distilled at all. It ends after the section on Roon Relief at footnote 1. How can i save the page from my ipad in MHTML format so you can look at it?
,
Feb 24 2017
By "load page" in comment 12, I mean "load page from reading list"
,
Feb 27 2017
An easier way to test would be loading in Reader Mode on Bling. Slightly fewer steps to trigger DOM distiller, and ideally the distillation behaves the same. To save as MHTML, you'll need a command line option --save-page-as-mhtml. This is not available on the phones, so it's better to check the problem is still reproducible with device emulation on desktop/laptop. Now that the behavior on iPad is the same as desktop, and iPhone is the same as Clank, it's now expected behavior, so you don't need to archive the page, unless things change.
,
Mar 16 2017
Cannot reproduce, close for now. Feel free to reopen if we repro again. Thanks for your bug report! |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by fgor...@chromium.org
, Feb 21 2017