Regression: PDFs with OCR don't display correctly
Reported by
ya...@impossibledreams.net,
May 9 2018
|
||||
Issue descriptionUserAgent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0 Example URL: http://hebrewbooks.org/pdfpager.aspx?req=24642&st=&pgnum=216&hilite= Steps to reproduce the problem: 1. Visit site provided. What is the expected behavior? Image of the page should display with the OCR text remaining invisible What went wrong? The OCR text gets displayed instead Does it occur on multiple sites: Yes Is it a problem with a plugin? Yes PDFium Did this work before? Yes 52.0.2743.116 Does this work in other browsers? Yes Chrome version: 65.0.3325.181 Channel: stable OS Version: Ubuntu Flash Version: disabled
,
May 9 2018
Bisected to r515815 = 09cd842d6982d4816f29392c872004ef6e04795d = https://crrev.com/c/765052 "Roll src/third_party/pdfium/ 8baea3c69..9fa503624 (2 commits)" Suspecting https://pdfium.googlesource.com/pdfium.git/+/6e4656f88fba94f706e0e42d1b548e28f6645594
,
May 9 2018
npm, can you take a look at this? The suspect CL is one of yours.
,
May 9 2018
It most certainly is my CL. Bug reporter, do you have a PNG showing what you expect the correct rendering of this PDF to look like?
,
May 9 2018
,
May 9 2018
Issue appears when viewing the whole pdf. http://download.hebrewbooks.org/downloadhandler.ashx?req=24642
,
May 9 2018
I imagine the PDF attached in the bug report cannot be added to our tests?
,
May 9 2018
The original one page PDF is public domain so you can use it for testing if you want
,
May 10 2018
the one-page will not display the issue. I have attached the 33MB pdf. Issue appears on pages 41, 54, 216 and perhaps other pages. Sumatra, Acrobat, pdfjs, foxit render without issues.
,
May 10 2018
The following revision refers to this bug: https://pdfium.googlesource.com/pdfium/+/401e618415d424f8a8b48f15e6710fa0e75d0615 commit 401e618415d424f8a8b48f15e6710fa0e75d0615 Author: Nicolas Pena <npm@chromium.org> Date: Thu May 10 16:47:16 2018 Remove a completeness check from CJBig2_GRRDProc::DecodeTemplate0Opt https://pdfium-review.googlesource.com/c/pdfium/+/18333 introduced several checks to prevent timeouts in JBig2. One of these is breaking the PDF in the bug, so this CL removes that check. Bug: chromium:841200 Change-Id: Ia75c699b7fddc26f0353b0d64349898c4d1f744d Reviewed-on: https://pdfium-review.googlesource.com/32250 Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: Nicolás Peña Moreno <npm@chromium.org> [modify] https://crrev.com/401e618415d424f8a8b48f15e6710fa0e75d0615/core/fxcodec/jbig2/JBig2_GrrdProc.cpp
,
May 10 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/511f71f2d113829989df1949d1a9343f1d0019f8 commit 511f71f2d113829989df1949d1a9343f1d0019f8 Author: pdfium-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <pdfium-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Date: Thu May 10 20:13:17 2018 Roll src/third_party/pdfium/ 95061379c..80302c77a (5 commits) https://pdfium.googlesource.com/pdfium.git/+log/95061379c945..80302c77a854 $ git log 95061379c..80302c77a --date=short --no-merges --format='%ad %ae %s' 2018-05-10 rharrison Use test_dir instead of 'pdfium' for source type 2018-05-10 thestig Add CPDF_Transparency. 2018-05-10 thestig Make GetTestDataDir() work in a non-standalone checkout. 2018-05-10 tsepez Fix destruction order in CPDF_Dibsource. 2018-05-10 npm Remove a completeness check from CJBig2_GRRDProc::DecodeTemplate0Opt Created with: roll-dep src/third_party/pdfium BUG= chromium:841513 , chromium:840695 , chromium:841200 The AutoRoll server is located here: https://pdfium-roll.skia.org Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. TBR=dsinclair@chromium.org Change-Id: I579c4a7663af521bb842f5e0f309f2bcd71732f3 Reviewed-on: https://chromium-review.googlesource.com/1054263 Reviewed-by: pdfium-chromium-autoroll <pdfium-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Commit-Queue: pdfium-chromium-autoroll <pdfium-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#557647} [modify] https://crrev.com/511f71f2d113829989df1949d1a9343f1d0019f8/DEPS
,
May 10 2018
The fix above was for the original PDF (1 page). I will need to check that it fixes all the broken pages of the PDF in #9.
,
May 11 2018
Compared pages in #9. They render incorrectly in Chrome Stable but correctly on Canary, so marking as Fixed.
,
May 11 2018
Thank you! |
||||
►
Sign in to add a comment |
||||
Comment 1 by junov@chromium.org
, May 9 2018