New issue
Advanced search Search tips

Issue 722156 link

Starred by 3 users

Issue metadata

Status: Available
Merged: issue 514494
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Mac
Pri: 3
Type: Bug



Sign in to add a comment

Soft hyphens in text turn into white spaces in PDF

Reported by kwk...@vivliostyle.com, May 15 2017

Issue description

UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:55.0) Gecko/20100101 Firefox/55.0

Steps to reproduce the problem:
1. Open the attached HTML file
2. Print to PDF
3. Open the PDF, copy text "hyphenation" and paste it somewhere

What is the expected behavior?
The pasted text should be "hyphenation", containing no white spaces.

What went wrong?
The pasted text is "hy phen a tion", containing white spaces at locations where soft hyphens exist in the source HTML file.

Did this work before? N/A 

Does this work in other browsers? Yes

Chrome version: 58.0.3029.110 (Official Build) (64bit)  Channel: stable
OS Version: OS X 10.12
Flash Version: Shockwave Flash 24.0 r0

Searching a text on a PDF viewer is also impossible due to this problem.
 
soft-hyphen.html
291 bytes View Download
soft-hyphen-chrome.pdf
22.9 KB Download
Labels: Needs-Triage-M58
Components: Internals>Plugins>PDF Internals>Skia>PDF
Labels: -Hotlist-Interop
Labels: M-60 OS-Linux OS-Windows
Status: Untriaged (was: Unconfirmed)
Able to reproduce the issue on Mac-10.12.4,Windows-7 and Linux Ubuntu-14.04 using chrome stable version 58.0.3029.110 and canary 60.0.3100.0 with the steps mentioned in comment#0.
This is Non-regression issue, observed from M30# 30.0.1599.0 and marking it as Untiaged to get more inputs from dev team.

Thanks..

Comment 4 by weili@chromium.org, May 15 2017

Mergedinto: 514494
Status: Duplicate (was: Untriaged)
Please try to reproduce with Chrome m59.  (currently the beta channel.) I believe this will be better with m59+.
It seems that it reproduces with Chrome m59 (beta) and m60 (Canary).
soft-hyphen-m59.pdf
22.9 KB Download
soft-hyphen-m60.pdf
22.9 KB Download
This bug is marked as duplicate of 514494, which is marked as fixed.
However, the original issue still persists in Chrome 66.
Owner: rharrison@chromium.org
Status: Unconfirmed (was: Duplicate)
Components: -Internals>Skia>PDF Internals>Printing
Labels: -Pri-2 -M-60 -Needs-Triage-M58 M-66 Pri-3
Status: Assigned (was: Unconfirmed)
Confirmed this is still occurring in 66.0.3359.139.
Components: -Internals>Plugins>PDF
Owner: thestig@chromium.org
Took another look at this. Looks like what is happening is the &shy in the HTML, which indicates that a potential hyphenation location, is being turned into Unicode 0x0003 when printing to PDF. 0x0003 is an end of text region marker, which is wrong.  The correct Unicode character is 0x00AD, soft hyphen.

Opening this PDF in Acrobat has the same behaviour as Chrome, i.e. copying 'hyphenation' will paste out as 'hy phen a tion'. My understanding is that the Chrome PDF viewer is working as intended with this PDF, but there is a bug in printing HTML -> PDF.

Since this is a PDF printing/generation side issue, I am going to send it to thestig@.
Components: Internals>Skia>PDF
Labels: -M-66
Owner: halcanary@chromium.org
Redirecting to halcanary@ to see if this is a SkPDF issue.
I tested this with Chrome/dev on my Macbook {Mozilla/5.0 (Macintosh; Intel Mac
OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.9
Safari/537.36}.

Here's the relevant section from the PDF's content stream:

    BT
    /F0 16 Tf
    1 0 0 -1 8 22 Tm
    <004B005C0053004B0048005100440057004C00520051> Tj
    ET

These translate to the following unicode code points via the ToUnicode table:

    Glyphs: 004B 005C 0053 004B 0048 0051 0044 0057 004C 0052 0051 
    Test:      h    y    p    h    e    n    a    t    i    o    n

Seems right to me.

Soft hyphen.pdf
12.6 KB Download
I confirmed that the problem does not occur on Chrome 71 (Canary).
Thank you for the fix.
Can I expect that the fix will be released in the next stable version (m70)?
#13 - you can install Chrome beta to confirm this.
I tried on Linux with r592000, which is 71.0.3556.0, and got:

BT
/F0 16 Tf
1 0 0 -1 8 23 Tm
<004B005C0003> Tj
16 0 Td <0053004B004800510003> Tj
31.101563 0 Td <00440003> Tj
7.1015625 0 Td <0057004C00520051> Tj
ET

Soft hyphen.pdf
18.2 KB Download
Same result with 70.0.3538.16 on Linux. I'll try Mac and Windows and see if there's some platform-specific behavior here.
It does vary for me based on platform.

Chrome Mac 71.0.3555.0:

<004B005C0053004B0048005100440057004C00520051> Tj


Chrome Windows 70.0.3538.16: (via Remote Desktop, which may make a difference)

BT
/F0 16 Tf
1 0 0 -1 8 23 Tm
<004B> Tj
7 0 Td <005C> Tj
7 0 Td <0003> Tj
0 0 Td <0053004B> Tj
15 0 Td <0048> Tj
7 0 Td <0051> Tj
7 0 Td <0003> Tj
0 0 Td <0044> Tj
7 0 Td <0003> Tj
0 0 Td <0057> Tj
4 0 Td <004C> Tj
3 0 Td <00520051> Tj
ET

Comment 18 by halcanary@google.com, Jan 21 (2 days ago)

Cc: halcanary@chromium.org
Owner: ----
Status: Available (was: Assigned)
Reproduced on linux on chrome-73.

Why is blink printing space glyphs?  Here's what Skia is receiving:

    "glyphs" : [ 75, 92, 3, 83, 75, 72, 81, 3, 68, 3, 87, 76, 82, 81 ],
    "positions" : [ 0, 8, 16, 16, 24, 32, 39.1015625, 47.1015625, 47.1015625,
                    54.203125, 54.203125, 58.6484375, 63.09375, 71.09375 ]

glyphId 3 translates to U+0020 (SPACE).



crbug-722156_Chrome_73-0-3680-0.pdf
22.7 KB Download

Comment 19 by halcanary@google.com, Jan 21 (2 days ago)

Components: -Internals>Skia>PDF

Comment 20 by halcanary@google.com, Yesterday (29 hours ago)

Cc: -halcanary@chromium.org halcanary@google.com

Comment 21 by thestig@chromium.org, Yesterday (27 hours ago)

halcanary: What part of Blink is sending the spaces to Skia? We should be adding a Blink>Foo component here.

Comment 22 by halcanary@google.com, Yesterday (27 hours ago)

Whichever part is creating a SkTextBlob.

Comment 23 by halcanary@google.com, Yesterday (26 hours ago)

Cc: bunge...@chromium.org

Comment 24 by halcanary@google.com, Yesterday (26 hours ago)

I suspect that fixing http://crbug.com/738643 would fix this.

Sign in to add a comment