New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 738325 link

Starred by 3 users

Issue metadata

Status: Duplicate
Merged: issue 739381
Owner:
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows
Pri: 1
Type: Bug-Regression



Sign in to add a comment

Regression: Content of the pdf is seen different

Project Member Reported by keerthan...@techmahindra.com, Jun 30 2017

Issue description

Chrome Version:61.0.3145.0
OS: Ubuntu 14.04, windows

What steps will reproduce the problem?
(1)Launch chrome and navigate to http://cb.vu/unixtoolbox.pdf
(2)Observe 


Expected: PDf should be seen as per the attached expected screenshot
Actual: Instead, it is seen different

This is a Regression issue seen from M-61

Manual Bisect Info:
====================
Good Build:61.0.3144.0
Bad Build: 61.0.3145.0

 
EXpectedPDF.png
157 KB View Download
ActualPdf.png
152 KB View Download
Description: Show this description
Labels: OS-Windows
Status: Untriaged (was: Unconfirmed)
Able to reproduce the issue in Ubuntu 14.04 using latest chrome version #61.0.3145.0.
Unable to reproduce the issue in OS-Mac.
Labels: hasbisect-per-revision ReleaseBlock-Beta M-61
Owner: bunge...@chromium.org
Status: Assigned (was: Untriaged)
Using the per-revision bisect providing the bisect results,

Good build: 61.0.3144.0 (Revision: 483234).
Bad build: 61.0.3145.0 (Revision: 483574).

You are probably looking for a change made after 483452 (known good), but no later than 483453 (first known bad).

---------------
https://chromium.googlesource.com/chromium/src/+log/f2c67e43244683696e038cc51eb7d0dbd5c68e00..4d6320ef3dee4ac9f29055b6b9d75497ea67796f

From the CL above, assigning the issue to the concern owner

@bungeman: Could you please look into the issue, pardon me if it has nothing to do with your changes and if possible please assign it to concern owner.

Reviewed-on: https://chromium-review.googlesource.com/550379

Adding Release Block Beta as it's a recent Regression broken in M61. Please undo if someone feels otherwise.

Thanks.
I am able to reproduce and have further bisected the FreeType roll. The commit causing this appears to be 

https://chromium.googlesource.com/chromium/src/third_party/freetype2/+/75cb071b3fbfa2315c5d458fee2bb465a14568ae
[sfnt] Synthesize a Unicode charmap if one is missing.

I do not yet know why this is happening, but will investigate.
Cc: caryclark@chromium.org
Indeed, commenting out FT_CONFIG_OPTION_POSTSCRIPT_NAMES in third_party/freetype/include/freetype-custom-config/ftoption.h seems to 'fix' this issue. It's not clear yet if this is what we should do if there is some other issue.

Comment 7 by ajha@chromium.org, Jul 4 2017

This is marked as Beta blocker and M-61 will probably be branched on 07/20. Would be good to have this fixed before branch point, please plan the fix accordingly.

Thank you!
I've extracted the font in question from the pdf, it's a subsetted (by Prince, it looks like) Verdana which has only a Mac Roman encoding which looks strange (non-standard / compacted). For example the code point 0x34 in Mac Roman normally maps to '4' but in this font maps instead to a glyph that looks like 'T'. The resulting bad rendering seems to be a result of using a Unicode map created from a Mac Roman map, but the Mac Roman map wasn't actually Mac Roman.

If I copy-paste out of the bad rendering, the text is copied correctly. It appears that Putting a breakpoint down inside pdfium to just print out the text it asks to draw as ascii shows that if interpreted as ascii the bad rendering is what you get.

Inside the pdf itself it appears that the character codes actually are this gibberish (if you try to interpret the content of the pdf as Mac Roman then the it would look like the bad version). However, it appears the pdf uses a /ToUnicode to map this not-Mac-Roman to Unicode (which is why things used to work and copy-paste still works).
Cc: dsinclair@chromium.org
Components: -UI Internals>Plugins>PDF
I think the issue here may be that CPDF_TrueTypeFont::LoadGlyphMap uses UnicodeFromCharCode (aka /ToUnicode) as a kind of last resort instead of as the primary source of truth (which seems to be required by pdf32000 9.10.2). Indeed, if I locally change CPDF_TrueTypeFont::LoadGlyphMap to do the UnicodeFromCharCode bit at the top when bToUnicode is true then everything works.
Cc: npm@chromium.org
npm@ for comments on the ToUnicode usage.

Comment 11 by npm@chromium.org, Jul 5 2017

Cc: lemzw...@googlemail.com
ToUnicode maps charcodes to unicode. m_GlyphIndex maps charcodes to glyphs, there's no unicodes in there (the primary purpose of ToUnicode maps is to allow text extraction, and of course to allow the PDF to use drawing commands using unicode characters instead of charcodes). FreeType is supposed to be able to do charcodes->glyphs via FT_Get_Char_Index, but it seems that an incorrect charmap was introduced in that CL, thus causing this to be unreliable. Is this intended or what am I misunderstanding?
Mergedinto: 739381
Status: Duplicate (was: Assigned)
That's a good summary of what is happening, yes. However, the sticky bit here is that the font had a real not-really-Mac-Roman and a virtual not-really-Unicode map. Then the PDF had a /ToUnicode key on the text. In the end, I think that PDFium really is doing the right thing here, and it was just the bad virtual not-really-Unicode map that was throwing things off.

Sign in to add a comment