Chrome browser issue: Chinese characters are not displayed properly in Chrome Browser PDF viewer. |
||||
Issue descriptionChrome version: 64.0.3282.144 (Official Build) (64-bit) OS version: Windows 10, MacOS10.13.3 Case#:14981856 Description: Chinese characters are not displayed properly in Chrome Browser PDF viewer after upgrade to version 64. Sample PDF : https://drive.google.com/open?id=16fOSrr4_GHFuILlkXh12MyhsL31XDMVz Steps to reproduce: On version 63 the Chinese character are displayed correctly: https://drive.google.com/open?id=15gdxOCLSdYeX8VQbs_048y1laX68WgzN However, once upgrade to version 64, the same PDF file is displaying the same character incorrectly: https://drive.google.com/open?id=1FzGPYYlKNSD6V1gAW6V1G6HsMwC6xHlk Checked the same file opened through Firefox PDF viewers and it OK to display: https://drive.google.com/open?id=13qSaFgYiTnzXiLRRWWVVdOT7UHY5DNLU Here is the same file opened through Adobe PDF viewers and it is also OK: https://drive.google.com/open?id=1yHJ6UOQ_qCtiqScPIoT6kvL7J6wSHkf6 - What's the expected behavior? For the correct Chinese character to show - What's the actual result? Browser display different Chinese character on v64 - Does it affect all devices? Yes - Is this reproducible in Beta? Yes - Is this reproducible in Canary? Yes - Is this reproducible in Incognito? Yes - Is this reproducible on different browsers? No This information was provided by the customer: - the font ""Arial Unicode MS"" is not embedded in the PDF document. - the CID to GID map refers to an identity mapping for Unicode - the requested string is <*4f60* *597d *0020 003a 0020 0042 004f 004e 004a 004f 0055 0052>Tj - when copying the text from the displayed PDF and pasting it to the Google Chrome address bar it appears as expected. Sample PDF :https://drive.google.com/open?id=16fOSrr4_GHFuILlkXh12MyhsL31XDMVz OK in Google Chrome 63 : https://drive.google.com/open?id=15gdxOCLSdYeX8VQbs_048y1laX68WgzN Not OK in Google Chrome 64: https://drive.google.com/open?id=1FzGPYYlKNSD6V1gAW6V1G6HsMwC6xHlk OK in Adobe Acrobat Reader:https://drive.google.com/open?id=1yHJ6UOQ_qCtiqScPIoT6kvL7J6wSHkf6 OK in Firefox:https://drive.google.com/open?id=13qSaFgYiTnzXiLRRWWVVdOT7UHY5DNLU
,
Feb 21 2018
1And we can't just revert, because the CL that caused this change was to fix a different CJK rendering error - bug 781785 .
,
Feb 22 2018
Hi I'm the customer who reported the issue to Chris. It's a stupid question, but can't you just revert so what was working works again ? It is affecting our activities in Mainland China (wrong chinese characters) and in Europa (currency character for euros "€" is not displayed). Here is the font "Arial Unicode MS" : https://up.neissa.org/ARIALUNI But to use it you have to own a Microsoft Office licence (or maybe some other conditions, it is a Microsoft product).
,
Feb 22 2018
Reverting will of course make you happy, but it will make someone else's bug reappear. I want to fix this bug and get the PDF Viewer in state where the PDFs from both parties are displayed correctly. As a general note, to repeat what I said on bug 781785 : In the PDF, the fonts mentioned are not embedded, so it's up to the PDF viewer to do best effort font substitution. Thus there is no guarantee the PDF will display correctly. e.g. What happens if the OS does not have the "Arial Unincode MS" font? The software that generated the PDF should embed those fonts.
,
Feb 27 2018
Yes of course please fix both issues, I didn't mean not to. Please simply take in consideration that it is deeply affecting our business. If it takes months to fix both issues I would really appreciate if you revert the patch and restore the other issue. I may seem very selfish but we are rushed out by many of our customers and final users, which is a loss of time and energy for everybody. It might be off topic but we are not embeddind a 22MB font in a 22KB file for bandwidth optimisation, and we have chosen not to embed a subset for performances (avoiding looking up which characters are being used). Having the font in the OS is a prerequisite, not having it ends up with the display of random characters. The behavior is compliant with the PDF 1.7 specification and Adobe Acrobat Reader handles it this way (as does Google Chrome <= 64). That being said, we are still loving Google Chrome and the Chromium project, you guys really do magic every day! Thank you.
,
Mar 26 2018
Hi thestig@chromium.org, It has been a month now, is everything OK ? I am sure you have a lot to do besides this issue but it is deeply affecting us. If you are struggling with the specifications I may be able to help you (I once made a PDF writer), please feel free to ask. Best regards, geompse@gmail.com
,
Apr 4 2018
I have nearly 150 bugs assigned to me, so it takes a bit of time to circle back to this. I looked into this and I have a CL [1] that fixes this bug without breaking bug 781785 . However, it does not make a lot of sense to me, as the PDF spec says the CIDToGIDMap entry is only valid when the associated TrueType font is embedded in the PDF. npm: Any thoughts on this? [1] https://pdfium-review.googlesource.com/29710
,
Apr 4 2018
Yes, this is extremely confusing because the PDF 1.7 reference states clearly : "This entry [CIDToGIDMap] may appear only in a Type 2 CIDFont whose associated TrueType font program is embedded in the PDF file [..]." and "If the TrueType font program is not embedded but is referenced by name, the Type 2 CIDFont dictionary must not contain a CIDToGIDMap entry, since it is not meaningful to refer to glyph indices in an external font program.". Anyway the Adobe PDF Reader uses this dictionnary when present with no embedded font program, exactly has you did in the CL above. This is also what some other PDF parsers do. There are other digressions from the PDF reference to handle invalid implementations (retro compatibility) or broken PDF files (i.e. partial upload), even in PDFium. The philosophy in PDF parsing is generaly to not go against the reference (here it will work if the PDF complies to the reference, either embedded font program with the CIDToGIDMap entry or non embedded font program without it, and also work if the PDF does not). It may be worth addind a comment in the code stating explicitely that it does comply with it while not respecting it and why. Thanks for your time.
,
Apr 4 2018
My impression is that fonts and glyph mappings are complicated, and PDF generators don't always write out PDFs according to the spec. Some PDF viewers are accommodating to out of spec PDFs, so now users expect other PDF viewers to be able to view the same out of spec PDFs, which leads to this bug and the mess we are in. The code in CPDF_CIDFont is rather brittle, and we do not have a sufficiently large corpus of test PDFs to catch all the exceptional cases. Did you generate the test PDF for this bug? If you can give us permission to add it, or something like it to our public test corpus, that will help us do automate testing and make sure these render correctly. The test PDF, as is, is still a bit hard to test with since not all of our machines have the Arial Unicode MS font installed.
,
Apr 4 2018
We should not revert my CL because treating a substitute font with a matching name as if it was embedded is wrong. This is kind of what the PDF is assuming, it's terrible practice. It is much preferred to embed a subset of the font. By embedding just a subset, the file size shouldn't increase too much while still providing the necessary information to the PDF viewer. But maybe the PDF creator used does not have that ability. It seems that using the CIDToGIDMap even if the font file is not embedded is OK because it will only affect bad PDFs which include the map even though the font is not embedded. It's unclear to me how many PDFs use CIDtoGID with non-embedded fonts, and these are the ones that could potentially regress.
,
Apr 5 2018
You are both right, it is quite wrong but it has been done in the past. thestig : I made the PDF sample specifically for this issue, and I give the permission to you and to anyone else to use it, store it, alter it, publish it. npm : someone at Google may have this statistic on a worldwide scale, but it has been 10 years now since our 150000 users generate PDF files like this on a daily basis so there are at least millions of thoose out there in the wild.
,
Apr 5 2018
I sent the proposed CL out for review. It fixes this bug without triggering bug 781785 again. But who knows what other crazy PDFs are out there that may break as we tweak this font code.
,
Apr 10 2018
The following revision refers to this bug: https://pdfium.googlesource.com/pdfium/+/cd7b4ab7519c31a27c8d31e0cb694257fea84fd1 commit cd7b4ab7519c31a27c8d31e0cb694257fea84fd1 Author: Lei Zhang <thestig@chromium.org> Date: Tue Apr 10 21:49:05 2018 Load CIDToGIDMap stream for CID fonts if it exists. BUG= chromium:813705 Change-Id: I57265aad11d8f717f6ce33cc09241f4bef6381db Reviewed-on: https://pdfium-review.googlesource.com/29710 Reviewed-by: Nicolás Peña Moreno <npm@chromium.org> Commit-Queue: Lei Zhang <thestig@chromium.org> [modify] https://crrev.com/cd7b4ab7519c31a27c8d31e0cb694257fea84fd1/core/fpdfapi/font/cpdf_cidfont.cpp
,
Apr 10 2018
This will be fixed in Chrome 67. Please note this code is a bit brittle. If you consider this issue (or anything else Google Chrome does) to be business critical, please consider installing a non-stable Google Chrome channel and occasionally testing to make sure the non-stable channel works for you. See https://support.google.com/chrome/a/answer/6025002?hl=en
,
Apr 11 2018
This will be fixed in Chrome 67. > Thank you very much [..] please consider installing a non-stable Google Chrome channel and occasionally testing [..] > That is a good idea unfortunately we do not have the resources for it (human nor time). Hopefuly addind the PDF sample to the Pdfium test cases will prevent this behavior from being reverted again.
,
Apr 11 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ff4cbba00a2f6f40c62e3c64526e860d995cd2e1 commit ff4cbba00a2f6f40c62e3c64526e860d995cd2e1 Author: pdfium-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <pdfium-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Date: Wed Apr 11 08:08:14 2018 Roll src/third_party/pdfium/ 4027887ee..6bebd2e3c (10 commits) https://pdfium.googlesource.com/pdfium.git/+log/4027887ee29a..6bebd2e3cfb7 $ git log 4027887ee..6bebd2e3c --date=short --no-merges --format='%ad %ae %s' 2018-04-11 hnakashima Avoid stack overflow when loading CPDF_Function. 2018-04-10 thestig Add static_asserts for FX_RECT and FX_COLORREF. 2018-04-10 thestig Load CIDToGIDMap stream for CID fonts if it exists. 2018-04-10 rharrison Roll DEPS for Clang and build 2018-04-10 rharrison Add an assert to make sure all code is included in static lib 2018-04-10 thestig Remove CFX_Rect. 2018-04-10 thestig Change CFX_RenderDevice::FillRect() to take FX_RECT by const-ref. 2018-04-10 thestig Change FillRectWithBlend methods to take FX_RECT by const-ref. 2018-04-10 hnakashima Implement CPDFSDK_XFAWidgetHandler::OnKillFocus. 2018-04-10 hnakashima Break down CXFA_FFWidget::On{L|R}ButtonDown() into two steps. Created with: roll-dep src/third_party/pdfium BUG= chromium:830221 , chromium:813705 , chromium:820256 , chromium:820256 The AutoRoll server is located here: https://pdfium-roll.skia.org Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. TBR=dsinclair@chromium.org Change-Id: Iae9f069e452471eac037e59354fa106060058af5 Reviewed-on: https://chromium-review.googlesource.com/1005962 Reviewed-by: pdfium-chromium-autoroll <pdfium-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Commit-Queue: pdfium-chromium-autoroll <pdfium-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#549817} [modify] https://crrev.com/ff4cbba00a2f6f40c62e3c64526e860d995cd2e1/DEPS |
||||
►
Sign in to add a comment |
||||
Comment 1 by thestig@chromium.org
, Feb 21 2018Status: Assigned (was: Untriaged)