Soft hyphens in text turn into white spaces in PDF
Reported by
kwk...@vivliostyle.com,
May 15 2017
|
||||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:55.0) Gecko/20100101 Firefox/55.0 Steps to reproduce the problem: 1. Open the attached HTML file 2. Print to PDF 3. Open the PDF, copy text "hyphenation" and paste it somewhere What is the expected behavior? The pasted text should be "hyphenation", containing no white spaces. What went wrong? The pasted text is "hy phen a tion", containing white spaces at locations where soft hyphens exist in the source HTML file. Did this work before? N/A Does this work in other browsers? Yes Chrome version: 58.0.3029.110 (Official Build) (64bit) Channel: stable OS Version: OS X 10.12 Flash Version: Shockwave Flash 24.0 r0 Searching a text on a PDF viewer is also impossible due to this problem.
,
May 15 2017
,
May 15 2017
Able to reproduce the issue on Mac-10.12.4,Windows-7 and Linux Ubuntu-14.04 using chrome stable version 58.0.3029.110 and canary 60.0.3100.0 with the steps mentioned in comment#0. This is Non-regression issue, observed from M30# 30.0.1599.0 and marking it as Untiaged to get more inputs from dev team. Thanks..
,
May 15 2017
,
May 15 2017
Please try to reproduce with Chrome m59. (currently the beta channel.) I believe this will be better with m59+.
,
May 16 2017
It seems that it reproduces with Chrome m59 (beta) and m60 (Canary).
,
May 11 2018
This bug is marked as duplicate of 514494, which is marked as fixed. However, the original issue still persists in Chrome 66.
,
May 11 2018
,
May 11 2018
Confirmed this is still occurring in 66.0.3359.139.
,
Sep 11
Took another look at this. Looks like what is happening is the ­ in the HTML, which indicates that a potential hyphenation location, is being turned into Unicode 0x0003 when printing to PDF. 0x0003 is an end of text region marker, which is wrong. The correct Unicode character is 0x00AD, soft hyphen. Opening this PDF in Acrobat has the same behaviour as Chrome, i.e. copying 'hyphenation' will paste out as 'hy phen a tion'. My understanding is that the Chrome PDF viewer is working as intended with this PDF, but there is a bug in printing HTML -> PDF. Since this is a PDF printing/generation side issue, I am going to send it to thestig@.
,
Sep 11
Redirecting to halcanary@ to see if this is a SkPDF issue.
,
Sep 12
I tested this with Chrome/dev on my Macbook {Mozilla/5.0 (Macintosh; Intel Mac
OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.9
Safari/537.36}.
Here's the relevant section from the PDF's content stream:
BT
/F0 16 Tf
1 0 0 -1 8 22 Tm
<004B005C0053004B0048005100440057004C00520051> Tj
ET
These translate to the following unicode code points via the ToUnicode table:
Glyphs: 004B 005C 0053 004B 0048 0051 0044 0057 004C 0052 0051
Test: h y p h e n a t i o n
Seems right to me.
,
Sep 18
I confirmed that the problem does not occur on Chrome 71 (Canary). Thank you for the fix. Can I expect that the fix will be released in the next stable version (m70)?
,
Sep 18
#13 - you can install Chrome beta to confirm this.
,
Sep 18
I tried on Linux with r592000, which is 71.0.3556.0, and got: BT /F0 16 Tf 1 0 0 -1 8 23 Tm <004B005C0003> Tj 16 0 Td <0053004B004800510003> Tj 31.101563 0 Td <00440003> Tj 7.1015625 0 Td <0057004C00520051> Tj ET
,
Sep 18
Same result with 70.0.3538.16 on Linux. I'll try Mac and Windows and see if there's some platform-specific behavior here.
,
Sep 18
It does vary for me based on platform. Chrome Mac 71.0.3555.0: <004B005C0053004B0048005100440057004C00520051> Tj Chrome Windows 70.0.3538.16: (via Remote Desktop, which may make a difference) BT /F0 16 Tf 1 0 0 -1 8 23 Tm <004B> Tj 7 0 Td <005C> Tj 7 0 Td <0003> Tj 0 0 Td <0053004B> Tj 15 0 Td <0048> Tj 7 0 Td <0051> Tj 7 0 Td <0003> Tj 0 0 Td <0044> Tj 7 0 Td <0003> Tj 0 0 Td <0057> Tj 4 0 Td <004C> Tj 3 0 Td <00520051> Tj ET
,
Jan 21
(2 days ago)
Reproduced on linux on chrome-73.
Why is blink printing space glyphs? Here's what Skia is receiving:
"glyphs" : [ 75, 92, 3, 83, 75, 72, 81, 3, 68, 3, 87, 76, 82, 81 ],
"positions" : [ 0, 8, 16, 16, 24, 32, 39.1015625, 47.1015625, 47.1015625,
54.203125, 54.203125, 58.6484375, 63.09375, 71.09375 ]
glyphId 3 translates to U+0020 (SPACE).
,
Jan 21
(2 days ago)
,
Yesterday
(29 hours ago)
,
Yesterday
(27 hours ago)
halcanary: What part of Blink is sending the spaces to Skia? We should be adding a Blink>Foo component here.
,
Yesterday
(27 hours ago)
Whichever part is creating a SkTextBlob.
,
Yesterday
(26 hours ago)
,
Yesterday
(26 hours ago)
I suspect that fixing http://crbug.com/738643 would fix this. |
||||||||||||
►
Sign in to add a comment |
||||||||||||
Comment 1 by ranjitkan@chromium.org
, May 15 2017