New issue
Advanced search Search tips

Issue 654578 link

Starred by 6 users

Issue metadata

Status: Fixed
Owner:
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Mac
Pri: 2
Type: Bug

Blocked on:
issue 788103



Sign in to add a comment

Find in page in a PDF does not highlight correct

Project Member Reported by thestig@chromium.org, Oct 10 2016

Issue description

Version: Chrome 55
OS: Linux

What steps will reproduce the problem?
(1) Open http://www.acrotex.net/blog/wp-content/uploads/2011/07/pdfblog_24.pdf
(2) Press ctrl+f, search for "roun"
(3) Now type d to search for "round"

What is the expected output?

The only occurrence of is the word snippet "been around since"

After (2) "roun" is highlighted.
After (3) "round" is highlighted.

What do you see instead?

After (2) "roun" is highlighted.
After (3) "und s" is highlighted.

 
Project Member

Comment 1 by sheriffbot@chromium.org, Oct 11 2017

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available. If you change it back, also remove the "Hotlist-Recharge-Cold" label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Owner: rharrison@chromium.org
Status: Assigned (was: Untriaged)
Confirmed that this is still happening. I suspect this is an issue how the PDF viewer is getting ranges of text from the PDFium API.
Owner: hnakashima@chromium.org
Blockedon: 788103
Cc: rharrison@chromium.org
 Issue 788811  has been merged into this issue.
Cc: -rharrison@chromium.org hnakashima@chromium.org
Labels: -Hotlist-Recharge-Cold
Owner: rharrison@chromium.org
Status: Started (was: Assigned)
Feels like working on  bug 788103  flushed out the cause of this as well. I bet if we write a test PDF that exhibit this bug, we can use it as a test for both bugs.
Project Member

Comment 8 by bugdroid1@chromium.org, Nov 30 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/99088a45b34bccdbb7fc16bdaf0952ce966d17ef

commit 99088a45b34bccdbb7fc16bdaf0952ce966d17ef
Author: Ryan Harrison <rharrison@chromium.org>
Date: Thu Nov 30 20:43:07 2017

Add conversion between index spaces

The lack of conversion was causing an offset error because some of the
numbers being used were in the character list index space and some of
them were in the text buffer index space. This CL combined with
https://pdfium-review.googlesource.com/c/pdfium/+/20014 in PDFium
resolves outstanding issues with Find highlights in PDFs with control
characters in the text body.

BUG= chromium:654578 

Change-Id: I5f600a59926f137ed0a0901711a3ff57d3e42e34
Reviewed-on: https://chromium-review.googlesource.com/801310
Reviewed-by: dsinclair <dsinclair@chromium.org>
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Cr-Commit-Position: refs/heads/master@{#520666}
[modify] https://crrev.com/99088a45b34bccdbb7fc16bdaf0952ce966d17ef/pdf/pdfium/pdfium_engine.cc

Project Member

Comment 9 by bugdroid1@chromium.org, Nov 30 2017

The following revision refers to this bug:
  https://pdfium.googlesource.com/pdfium/+/8b357e7504ea804293983453540ae91c9fc57922

commit 8b357e7504ea804293983453540ae91c9fc57922
Author: Ryan Harrison <rharrison@chromium.org>
Date: Thu Nov 30 21:02:41 2017

Rewrite lower level details of extracting text from page

The current implementation of text extraction was difficult to
understand, duplicated logic that existed in other methods, and wasn't
clear about the units the inputs were in. It also didn't handle
control characters correctly.

The new implementation leans on the methods for converting indices
between the text buffer index and character list index spaces to avoid
duplication of code. It also makes it clear to the reader that inputs
are in the character list index space. Finally, it fixes issues being
seen in Chrome with respect of ranges being slightly off.

This CL also adds a test for extracting text that has control
characters.

BUG= pdfium:942 , chromium:654578 

Change-Id: Id9d1f360c2d7492c7b5a48d6c9ae29f530892742
Reviewed-on: https://pdfium-review.googlesource.com/20014
Commit-Queue: Ryan Harrison <rharrison@chromium.org>
Reviewed-by: dsinclair <dsinclair@chromium.org>
Reviewed-by: Henrique Nakashima <hnakashima@chromium.org>

[modify] https://crrev.com/8b357e7504ea804293983453540ae91c9fc57922/fpdfsdk/fpdftext_embeddertest.cpp
[modify] https://crrev.com/8b357e7504ea804293983453540ae91c9fc57922/core/fpdftext/cpdf_textpagefind.cpp
[modify] https://crrev.com/8b357e7504ea804293983453540ae91c9fc57922/fpdfsdk/fpdftext.cpp
[modify] https://crrev.com/8b357e7504ea804293983453540ae91c9fc57922/core/fpdftext/cpdf_linkextract.cpp
[add] https://crrev.com/8b357e7504ea804293983453540ae91c9fc57922/testing/resources/control_characters.pdf
[add] https://crrev.com/8b357e7504ea804293983453540ae91c9fc57922/testing/resources/control_characters.in
[modify] https://crrev.com/8b357e7504ea804293983453540ae91c9fc57922/core/fpdftext/cpdf_textpage.cpp
[modify] https://crrev.com/8b357e7504ea804293983453540ae91c9fc57922/core/fpdftext/cpdf_textpage.h

Labels: -Pri-3 M-64 Pri-2
Status: Fixed (was: Started)
Project Member

Comment 11 by bugdroid1@chromium.org, Dec 1 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8f0e7748d905b50d30e4d6f92c3a5aed58a888c4

commit 8f0e7748d905b50d30e4d6f92c3a5aed58a888c4
Author: pdfium-deps-roller@chromium.org <pdfium-deps-roller@chromium.org>
Date: Fri Dec 01 00:40:28 2017

Roll src/third_party/pdfium/ fee910e6f..1980f10ff (15 commits)

https://pdfium.googlesource.com/pdfium.git/+log/fee910e6f81f..1980f10ff2b8

$ git log fee910e6f..1980f10ff --date=short --no-merges --format='%ad %ae %s'
2017-11-30 dsinclair Simplify XDP parsing code
2017-11-30 dsinclair Rename XFA_ATTRIBUTEENUM to XFA_AttributeEnum enum class
2017-11-30 dsinclair Move packet information into simple parser
2017-11-30 dsinclair Make parsers work off XFA_PacketType enum
2017-11-30 dsinclair A CXFA_Node can only be in one packet
2017-11-30 dsinclair Cleanup XFA packet code
2017-11-30 rharrison Rewrite lower level details of extracting text from page
2017-11-30 dsinclair Create CXFA_Node::NameToAttributeEnum
2017-11-30 dsinclair Move setting of XML content back to specific set methods
2017-11-30 dsinclair Rename GetAttributeEnumById to CXFA_Node::AttributeEnumToName
2017-11-30 dsinclair Remove the packets from attribute data.
2017-11-30 dsinclair Generate XFA node attribute information
2017-11-30 thestig Fix GBK2K-H CMap usage.
2017-11-30 thestig Use initializer list in CPDF_DataAvail ctor.
2017-11-30 thestig Relax checks in CFX_FaceCache::LoadGlyphPath().

Created with:
  roll-dep src/third_party/pdfium
BUG= 654578 , 788864 


The AutoRoll server is located here: https://pdfium-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.


TBR=dsinclair@chromium.org

Change-Id: Ic87fbd3ca5dbec12418aa60db84ae9e894431881
Reviewed-on: https://chromium-review.googlesource.com/802188
Reviewed-by: <pdfium-deps-roller@chromium.org>
Commit-Queue: <pdfium-deps-roller@chromium.org>
Cr-Commit-Position: refs/heads/master@{#520774}
[modify] https://crrev.com/8f0e7748d905b50d30e4d6f92c3a5aed58a888c4/DEPS

Cc: rbasuvula@chromium.org
Labels: TE-Verified-M64 TE-Verified-64.0.3282.0
Tested the issue on Ubuntu 14.04, Windows-7&10 and Mac OS 10.12.6 using chrome latest Canary M64-64.0.3282.0 by following steps mentioned in the original comment. Observed that find in page in a PDF highlight displaying as expected. Hence adding TE-Verified label.

Please find the screen cast(Ubuntu 14.04) for reference.

Thank you!
654578.ogv
2.6 MB View Download

Sign in to add a comment