New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 624035 link

Starred by 5 users

Issue metadata

Status: Archived
Owner: ----
Closed: Sep 13
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Mac
Pri: 2
Type: Bug

Blocked on:
issue 58402



Sign in to add a comment

Find in pdf should find text broken across lines

Reported by mr.ber...@gmail.com, Jun 28 2016

Issue description

UserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36

Example URL:
http://www.dfg.de/formulare/54_01/54_01_de.pdf

Steps to reproduce the problem:
1. Open http://www.dfg.de/formulare/54_01/54_01_de.pdf
2. Search for "Ba-sismodul" or "Basismodul" (both without quotation marks)

What is the expected behavior?
At least one is found (ideally, both).

What went wrong?
It is not. Adobe Reader DC, however, does find it "Ba-sismodul".

Does it occur on multiple sites: N/A

Is it a problem with a plugin? Yes pdf

Did this work before? No 

Does this work in other browsers? Yes 

Chrome version: 51.0.2704.106  Channel: stable
OS Version: 10.0
Flash Version: Shockwave Flash 22.0 r0

Now, the correct word is "Basismodul", but it is hyphenated.

For context, Adobe Reader DC and Chrome PDF behave differently when copying text with a hyphen at the end of the line:

Adobe copies all hyphens:
- "Ba-sismodul" stays "Ba-sismodul" (incorrect)
- "Noether-Programms" (page 2, line 1) stays "Noether-Programms" (correct)

Chrome removes all hyphens:
- "Ba-sismodul" becomes "Basismodul" (nice!)
- "Noether-Programms" becomes "NoetherProgramms" (booh)

Why am I posting this context? Because at least, Adobe Reader finds "Ba-sismodul", which is consistent with the text that you obtain when you copy text from Adobe Reader. Which means, copy "Ba-sismodul" from pdf, Ctrl-F, Ctrl-V, Enter - found something.

The same is not true in Chrome: "Ba-sismodul" becomes "Basismodul", but "Basismodul" is not found in the pdf. This is what I propose to fix.

Of source, "Noether-Programms" should also be found, so the hyphenated variante should be findable, as well (as should "Ba-sismodul", then).
 
Cc: rnimmagadda@chromium.org
Components: Internals>Plugins>PDF UI>Browser>FindInPage
Labels: -Type-Compat M-52 OS-Linux OS-Mac Type-Bug
Status: Untriaged (was: Unconfirmed)
Able to repro this issue on Windows 7, MAC (10.11.5) & Ubuntu Trusty (14.04) for Google Chrome Stable Version - 51.0.2704.106

This is a Non-Regression issue existing from M30 - # 30.0.1549.0

Note: Firefox too has the same behavior.
624035.mov
14.4 MB Download
Project Member

Comment 2 by sheriffbot@chromium.org, Jun 29 2016

Labels: -M-52 M-53 MovedFrom-52
Moving this nonessential bug to the next milestone.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member

Comment 3 by sheriffbot@chromium.org, Jul 1 2016

Labels: -M-53 MovedFrom-53
This issue has been moved once and is lower than Pri-1. Removing the milestone.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Comment 4 by npm@chromium.org, Oct 25 2016

Status: Available (was: Untriaged)
Blockedon: 58402
First, about the problem of "Noether-Programms" becoming "NoetherProgramms", there isn't much we could do to differentiate this from the case of "Ba-sismodul" without getting language specific.

As for search not finding occurrences split across lines, this is related to  bug 58402 . However, it should not be a duplicate because this report includes the complication of a hyphen in the middle of the word.
Status: Archived (was: Available)
Archiving old bugs that haven't been actively assigned in over 180 days.

If you feel this issue should still be addressed, feel free to reopen it or to file a new issue. Thanks!

Sign in to add a comment