New issue
Advanced search Search tips

Issue 841200 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: May 2018
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 2
Type: Bug



Sign in to add a comment

Regression: PDFs with OCR don't display correctly

Reported by ya...@impossibledreams.net, May 9 2018

Issue description

UserAgent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0

Example URL:
http://hebrewbooks.org/pdfpager.aspx?req=24642&st=&pgnum=216&hilite=

Steps to reproduce the problem:
1. Visit site provided.

What is the expected behavior?
Image of the page should display with the OCR text remaining invisible

What went wrong?
The OCR text gets displayed instead

Does it occur on multiple sites: Yes

Is it a problem with a plugin? Yes PDFium

Did this work before? Yes 52.0.2743.116 

Does this work in other browsers? Yes

Chrome version: 65.0.3325.181  Channel: stable
OS Version: Ubuntu
Flash Version: disabled
 
HebrewBooksOrg_24642_page_216.pdf
108 KB Download

Comment 1 by junov@chromium.org, May 9 2018

Components: -Blink Internals>Plugins>PDF

Comment 2 by woxxom@gmail.com, May 9 2018

Bisected to r515815 = 09cd842d6982d4816f29392c872004ef6e04795d = https://crrev.com/c/765052
"Roll src/third_party/pdfium/ 8baea3c69..9fa503624 (2 commits)"

Suspecting https://pdfium.googlesource.com/pdfium.git/+/6e4656f88fba94f706e0e42d1b548e28f6645594
Owner: npm@chromium.org
npm, can you take a look at this? The suspect CL is one of yours.

Comment 4 by npm@chromium.org, May 9 2018

It most certainly is my CL. Bug reporter, do you have a PNG showing what you expect the correct rendering of this PDF to look like?

Comment 5 by woxxom@gmail.com, May 9 2018

expected.png
197 KB View Download
Issue appears when viewing the whole pdf.

http://download.hebrewbooks.org/downloadhandler.ashx?req=24642

519A0F11-19BE-4D3D-8B6B-713AB1D6A6AA.png
555 KB View Download

Comment 7 by npm@chromium.org, May 9 2018

Status: Started (was: Unconfirmed)
I imagine the PDF attached in the bug report cannot be added to our tests?
The original one page PDF is public domain so you can use it for testing if you want
the one-page will not display the issue.
I have attached the 33MB pdf. Issue appears on pages 41, 54, 216 and perhaps other pages.
Sumatra, Acrobat, pdfjs, foxit render without issues.
Hebrewbooks_org_24642.pdf
31.6 MB Download
Project Member

Comment 10 by bugdroid1@chromium.org, May 10 2018

The following revision refers to this bug:
  https://pdfium.googlesource.com/pdfium/+/401e618415d424f8a8b48f15e6710fa0e75d0615

commit 401e618415d424f8a8b48f15e6710fa0e75d0615
Author: Nicolas Pena <npm@chromium.org>
Date: Thu May 10 16:47:16 2018

Remove a completeness check from CJBig2_GRRDProc::DecodeTemplate0Opt

https://pdfium-review.googlesource.com/c/pdfium/+/18333 introduced
several checks to prevent timeouts in JBig2. One of these is breaking
the PDF in the bug, so this CL removes that check.

Bug:  chromium:841200 
Change-Id: Ia75c699b7fddc26f0353b0d64349898c4d1f744d
Reviewed-on: https://pdfium-review.googlesource.com/32250
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Nicolás Peña Moreno <npm@chromium.org>

[modify] https://crrev.com/401e618415d424f8a8b48f15e6710fa0e75d0615/core/fxcodec/jbig2/JBig2_GrrdProc.cpp

Project Member

Comment 11 by bugdroid1@chromium.org, May 10 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/511f71f2d113829989df1949d1a9343f1d0019f8

commit 511f71f2d113829989df1949d1a9343f1d0019f8
Author: pdfium-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <pdfium-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Thu May 10 20:13:17 2018

Roll src/third_party/pdfium/ 95061379c..80302c77a (5 commits)

https://pdfium.googlesource.com/pdfium.git/+log/95061379c945..80302c77a854

$ git log 95061379c..80302c77a --date=short --no-merges --format='%ad %ae %s'
2018-05-10 rharrison Use test_dir instead of 'pdfium' for source type
2018-05-10 thestig Add CPDF_Transparency.
2018-05-10 thestig Make GetTestDataDir() work in a non-standalone checkout.
2018-05-10 tsepez Fix destruction order in CPDF_Dibsource.
2018-05-10 npm Remove a completeness check from CJBig2_GRRDProc::DecodeTemplate0Opt

Created with:
  roll-dep src/third_party/pdfium
BUG= chromium:841513 , chromium:840695 , chromium:841200 


The AutoRoll server is located here: https://pdfium-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.


TBR=dsinclair@chromium.org

Change-Id: I579c4a7663af521bb842f5e0f309f2bcd71732f3
Reviewed-on: https://chromium-review.googlesource.com/1054263
Reviewed-by: pdfium-chromium-autoroll <pdfium-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Commit-Queue: pdfium-chromium-autoroll <pdfium-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#557647}
[modify] https://crrev.com/511f71f2d113829989df1949d1a9343f1d0019f8/DEPS

Comment 12 by npm@chromium.org, May 10 2018

The fix above was for the original PDF (1 page). I will need to check that it fixes all the broken pages of the PDF in #9.

Comment 13 by npm@chromium.org, May 11 2018

Status: Fixed (was: Started)
Compared pages in #9. They render incorrectly in Chrome Stable but correctly on Canary, so marking as Fixed.
Thank you!

Sign in to add a comment