New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 813705 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Apr 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows , Chrome , Mac
Pri: 2
Type: Bug



Sign in to add a comment

Chrome browser issue: Chinese characters are not displayed properly in Chrome Browser PDF viewer.

Project Member Reported by ryutas@chromium.org, Feb 20 2018

Issue description

Chrome version: 64.0.3282.144 (Official Build) (64-bit) 
OS version: Windows 10, MacOS10.13.3
Case#:14981856

Description: Chinese characters are not displayed properly in Chrome Browser PDF viewer after upgrade to version 64.
Sample PDF : https://drive.google.com/open?id=16fOSrr4_GHFuILlkXh12MyhsL31XDMVz

Steps to reproduce: 
On version 63 the Chinese character are displayed correctly:  https://drive.google.com/open?id=15gdxOCLSdYeX8VQbs_048y1laX68WgzN
However, once upgrade to version 64, the same PDF file is displaying the same character incorrectly: https://drive.google.com/open?id=1FzGPYYlKNSD6V1gAW6V1G6HsMwC6xHlk


Checked the same file opened through Firefox PDF viewers and it OK to display: https://drive.google.com/open?id=13qSaFgYiTnzXiLRRWWVVdOT7UHY5DNLU
Here is the same file opened through Adobe PDF viewers and it is also OK: https://drive.google.com/open?id=1yHJ6UOQ_qCtiqScPIoT6kvL7J6wSHkf6


- What's the expected behavior? 
For the correct Chinese character to show 

- What's the actual result? 
Browser display different Chinese character on v64 

- Does it affect all devices? Yes 
- Is this reproducible in Beta? Yes 
- Is this reproducible in Canary? Yes 
- Is this reproducible in Incognito? Yes 
- Is this reproducible on different browsers? No 

This information was provided by the customer: 
- the font ""Arial Unicode MS"" is not embedded in the PDF document.
- the CID to GID map refers to an identity mapping for Unicode 
- the requested string is <*4f60* *597d *0020 003a 0020 0042 004f 004e 004a 
004f 0055 0052>Tj 
- when copying the text from the displayed PDF and pasting it to the Google 
Chrome address bar it appears as expected. 

Sample PDF :https://drive.google.com/open?id=16fOSrr4_GHFuILlkXh12MyhsL31XDMVz
OK in Google Chrome 63 : https://drive.google.com/open?id=15gdxOCLSdYeX8VQbs_048y1laX68WgzN
Not OK in Google Chrome 64: https://drive.google.com/open?id=1FzGPYYlKNSD6V1gAW6V1G6HsMwC6xHlk
OK in Adobe Acrobat Reader:https://drive.google.com/open?id=1yHJ6UOQ_qCtiqScPIoT6kvL7J6wSHkf6
OK in Firefox:https://drive.google.com/open?id=13qSaFgYiTnzXiLRRWWVVdOT7UHY5DNLU


 
Owner: thestig@chromium.org
Status: Assigned (was: Untriaged)
This reproduced reliably on Mac. On Windows, the Chinese characters came up blank for some reason. Maybe I don't have the right fonts installed?

Bisected to https://chromium.googlesource.com/chromium/src//+log/5eede2ad..66811f65, so very likely due to r514528 which rolled in https://pdfium.googlesource.com/pdfium.git/+/979e916f
1And we can't just revert, because the CL that caused this change was to fix a different CJK rendering error -  bug 781785 .

Comment 3 by geom...@gmail.com, Feb 22 2018

Hi I'm the customer who reported the issue to Chris.
It's a stupid question, but can't you just revert so what was working works again ? It is affecting our activities in Mainland China (wrong chinese characters) and in Europa (currency character for euros "€" is not displayed).

Here is the font "Arial Unicode MS" : https://up.neissa.org/ARIALUNI
But to use it you have to own a Microsoft Office licence (or maybe some other conditions, it is a Microsoft product).
Reverting will of course make you happy, but it will make someone else's bug reappear. I want to fix this bug and get the PDF Viewer in state where the PDFs from both parties are displayed correctly.

As a general note, to repeat what I said on  bug 781785 :

In the PDF, the fonts mentioned are not embedded, so it's up to the PDF viewer to do best effort font substitution. Thus there is no guarantee the PDF will display correctly. e.g. What happens if the OS does not have the "Arial Unincode MS" font? The software that generated the PDF should embed those fonts.

Comment 5 by geom...@gmail.com, Feb 27 2018

Yes of course please fix both issues, I didn't mean not to. Please simply take in consideration that it is deeply affecting our business. If it takes months to fix both issues I would really appreciate if you revert the patch and restore the other issue. I may seem very selfish but we are rushed out by many of our customers and final users, which is a loss of time and energy for everybody.

It might be off topic but we are not embeddind a 22MB font in a 22KB file for bandwidth optimisation, and we have chosen not to embed a subset for performances (avoiding looking up which characters are being used). Having the font in the OS is a prerequisite, not having it ends up with the display of random characters. The behavior is compliant with the PDF 1.7 specification and Adobe Acrobat Reader handles it this way (as does Google Chrome <= 64).

That being said, we are still loving Google Chrome and the Chromium project, you guys really do magic every day! Thank you.

Comment 6 by geom...@gmail.com, Mar 26 2018

Hi thestig@chromium.org,

It has been a month now, is everything OK ? I am sure you have a lot to do besides this issue but it is deeply affecting us.
If you are struggling with the specifications I may be able to help you (I once made a PDF writer), please feel free to ask.

Best regards,
geompse@gmail.com
Cc: npm@chromium.org
I have nearly 150 bugs assigned to me, so it takes a bit of time to circle back to this. I looked into this and I have a CL [1] that fixes this bug without breaking  bug 781785 . However, it does not make a lot of sense to me, as the PDF spec says the CIDToGIDMap entry is only valid when the associated TrueType font is embedded in the PDF.

npm: Any thoughts on this?

[1] https://pdfium-review.googlesource.com/29710

Comment 8 by geom...@gmail.com, Apr 4 2018

Yes, this is extremely confusing because the PDF 1.7 reference states clearly : "This entry [CIDToGIDMap] may appear only in a Type 2 CIDFont whose associated TrueType font program is embedded in the PDF file [..]." and "If the TrueType font program is not embedded but is referenced by name, the Type 2 CIDFont dictionary must not contain a CIDToGIDMap entry, since it is not meaningful to refer to glyph indices in an external font program.".

Anyway the Adobe PDF Reader uses this dictionnary when present with no embedded font program, exactly has you did in the CL above. This is also what some other PDF parsers do.

There are other digressions from the PDF reference to handle invalid implementations (retro compatibility) or broken PDF files (i.e. partial upload), even in PDFium. The philosophy in PDF parsing is generaly to not go against the reference (here it will work if the PDF complies to the reference, either embedded font program with the CIDToGIDMap entry or non embedded font program without it, and also work if the PDF does not). It may be worth addind a comment in the code stating explicitely that it does comply with it while not respecting it and why.

Thanks for your time.
Labels: -M-65 -M-64 M-67
My impression is that fonts and glyph mappings are complicated, and PDF generators don't always write out PDFs according to the spec. Some PDF viewers are accommodating to out of spec PDFs, so now users expect other PDF viewers to be able to view the same out of spec PDFs, which leads to this bug and the mess we are in.

The code in CPDF_CIDFont is rather brittle, and we do not have a sufficiently large corpus of test PDFs to catch all the exceptional cases. Did you generate the test PDF for this bug? If you can give us permission to add it, or something like it to our public test corpus, that will help us do automate testing and make sure these render correctly. The test PDF, as is, is still a bit hard to test with since not all of our machines have the Arial Unicode MS font installed.

Comment 10 by npm@chromium.org, Apr 4 2018

We should not revert my CL because treating a substitute font with a matching name as if it was embedded is wrong. This is kind of what the PDF is assuming, it's terrible practice. It is much preferred to embed a subset of the font. By embedding just a subset, the file size shouldn't increase too much while still providing the necessary information to the PDF viewer. But maybe the PDF creator used does not have that ability.

It seems that using the CIDToGIDMap even if the font file is not embedded is OK because it will only affect bad PDFs which include the map even though the font is not embedded. It's unclear to me how many PDFs use CIDtoGID with non-embedded fonts, and these are the ones that could potentially regress.

Comment 11 by geom...@gmail.com, Apr 5 2018

You are both right, it is quite wrong but it has been done in the past.

thestig : I made the PDF sample specifically for this issue, and I give the permission to you and to anyone else to use it, store it, alter it, publish it.

npm : someone at Google may have this statistic on a worldwide scale, but it has been 10 years now since our 150000 users generate PDF files like this on a daily basis so there are at least millions of thoose out there in the wild.
I sent the proposed CL out for review. It fixes this bug without triggering  bug 781785  again. But who knows what other crazy PDFs are out there that may break as we tweak this font code.
Project Member

Comment 13 by bugdroid1@chromium.org, Apr 10 2018

The following revision refers to this bug:
  https://pdfium.googlesource.com/pdfium/+/cd7b4ab7519c31a27c8d31e0cb694257fea84fd1

commit cd7b4ab7519c31a27c8d31e0cb694257fea84fd1
Author: Lei Zhang <thestig@chromium.org>
Date: Tue Apr 10 21:49:05 2018

Load CIDToGIDMap stream for CID fonts if it exists.

BUG= chromium:813705 

Change-Id: I57265aad11d8f717f6ce33cc09241f4bef6381db
Reviewed-on: https://pdfium-review.googlesource.com/29710
Reviewed-by: Nicolás Peña Moreno <npm@chromium.org>
Commit-Queue: Lei Zhang <thestig@chromium.org>

[modify] https://crrev.com/cd7b4ab7519c31a27c8d31e0cb694257fea84fd1/core/fpdfapi/font/cpdf_cidfont.cpp

Status: Fixed (was: Assigned)
This will be fixed in Chrome 67.

Please note this code is a bit brittle. If you consider this issue (or anything else Google Chrome does) to be business critical, please consider installing a non-stable Google Chrome channel and occasionally testing to make sure the non-stable channel works for you. See https://support.google.com/chrome/a/answer/6025002?hl=en

Comment 15 by geom...@gmail.com, Apr 11 2018

This will be fixed in Chrome 67.
> Thank you very much

[..] please consider installing a non-stable Google Chrome channel and occasionally testing [..]
> That is a good idea unfortunately we do not have the resources for it (human nor time). Hopefuly addind the PDF sample to the Pdfium test cases will prevent this behavior from being reverted again.
Project Member

Comment 16 by bugdroid1@chromium.org, Apr 11 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ff4cbba00a2f6f40c62e3c64526e860d995cd2e1

commit ff4cbba00a2f6f40c62e3c64526e860d995cd2e1
Author: pdfium-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <pdfium-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Wed Apr 11 08:08:14 2018

Roll src/third_party/pdfium/ 4027887ee..6bebd2e3c (10 commits)

https://pdfium.googlesource.com/pdfium.git/+log/4027887ee29a..6bebd2e3cfb7

$ git log 4027887ee..6bebd2e3c --date=short --no-merges --format='%ad %ae %s'
2018-04-11 hnakashima Avoid stack overflow when loading CPDF_Function.
2018-04-10 thestig Add static_asserts for FX_RECT and FX_COLORREF.
2018-04-10 thestig Load CIDToGIDMap stream for CID fonts if it exists.
2018-04-10 rharrison Roll DEPS for Clang and build
2018-04-10 rharrison Add an assert to make sure all code is included in static lib
2018-04-10 thestig Remove CFX_Rect.
2018-04-10 thestig Change CFX_RenderDevice::FillRect() to take FX_RECT by const-ref.
2018-04-10 thestig Change FillRectWithBlend methods to take FX_RECT by const-ref.
2018-04-10 hnakashima Implement CPDFSDK_XFAWidgetHandler::OnKillFocus.
2018-04-10 hnakashima Break down CXFA_FFWidget::On{L|R}ButtonDown() into two steps.

Created with:
  roll-dep src/third_party/pdfium
BUG= chromium:830221 , chromium:813705 , chromium:820256 , chromium:820256 


The AutoRoll server is located here: https://pdfium-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.


TBR=dsinclair@chromium.org

Change-Id: Iae9f069e452471eac037e59354fa106060058af5
Reviewed-on: https://chromium-review.googlesource.com/1005962
Reviewed-by: pdfium-chromium-autoroll <pdfium-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Commit-Queue: pdfium-chromium-autoroll <pdfium-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#549817}
[modify] https://crrev.com/ff4cbba00a2f6f40c62e3c64526e860d995cd2e1/DEPS

Sign in to add a comment