Find in page in a PDF does not highlight correct |
|||||||
Issue descriptionVersion: Chrome 55 OS: Linux What steps will reproduce the problem? (1) Open http://www.acrotex.net/blog/wp-content/uploads/2011/07/pdfblog_24.pdf (2) Press ctrl+f, search for "roun" (3) Now type d to search for "round" What is the expected output? The only occurrence of is the word snippet "been around since" After (2) "roun" is highlighted. After (3) "round" is highlighted. What do you see instead? After (2) "roun" is highlighted. After (3) "und s" is highlighted.
,
Oct 11 2017
Confirmed that this is still happening. I suspect this is an issue how the PDF viewer is getting ranges of text from the PDFium API.
,
Oct 11 2017
,
Nov 23 2017
,
Nov 29 2017
,
Nov 29 2017
,
Nov 29 2017
Feels like working on bug 788103 flushed out the cause of this as well. I bet if we write a test PDF that exhibit this bug, we can use it as a test for both bugs.
,
Nov 30 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/99088a45b34bccdbb7fc16bdaf0952ce966d17ef commit 99088a45b34bccdbb7fc16bdaf0952ce966d17ef Author: Ryan Harrison <rharrison@chromium.org> Date: Thu Nov 30 20:43:07 2017 Add conversion between index spaces The lack of conversion was causing an offset error because some of the numbers being used were in the character list index space and some of them were in the text buffer index space. This CL combined with https://pdfium-review.googlesource.com/c/pdfium/+/20014 in PDFium resolves outstanding issues with Find highlights in PDFs with control characters in the text body. BUG= chromium:654578 Change-Id: I5f600a59926f137ed0a0901711a3ff57d3e42e34 Reviewed-on: https://chromium-review.googlesource.com/801310 Reviewed-by: dsinclair <dsinclair@chromium.org> Commit-Queue: Ryan Harrison <rharrison@chromium.org> Cr-Commit-Position: refs/heads/master@{#520666} [modify] https://crrev.com/99088a45b34bccdbb7fc16bdaf0952ce966d17ef/pdf/pdfium/pdfium_engine.cc
,
Nov 30 2017
The following revision refers to this bug: https://pdfium.googlesource.com/pdfium/+/8b357e7504ea804293983453540ae91c9fc57922 commit 8b357e7504ea804293983453540ae91c9fc57922 Author: Ryan Harrison <rharrison@chromium.org> Date: Thu Nov 30 21:02:41 2017 Rewrite lower level details of extracting text from page The current implementation of text extraction was difficult to understand, duplicated logic that existed in other methods, and wasn't clear about the units the inputs were in. It also didn't handle control characters correctly. The new implementation leans on the methods for converting indices between the text buffer index and character list index spaces to avoid duplication of code. It also makes it clear to the reader that inputs are in the character list index space. Finally, it fixes issues being seen in Chrome with respect of ranges being slightly off. This CL also adds a test for extracting text that has control characters. BUG= pdfium:942 , chromium:654578 Change-Id: Id9d1f360c2d7492c7b5a48d6c9ae29f530892742 Reviewed-on: https://pdfium-review.googlesource.com/20014 Commit-Queue: Ryan Harrison <rharrison@chromium.org> Reviewed-by: dsinclair <dsinclair@chromium.org> Reviewed-by: Henrique Nakashima <hnakashima@chromium.org> [modify] https://crrev.com/8b357e7504ea804293983453540ae91c9fc57922/fpdfsdk/fpdftext_embeddertest.cpp [modify] https://crrev.com/8b357e7504ea804293983453540ae91c9fc57922/core/fpdftext/cpdf_textpagefind.cpp [modify] https://crrev.com/8b357e7504ea804293983453540ae91c9fc57922/fpdfsdk/fpdftext.cpp [modify] https://crrev.com/8b357e7504ea804293983453540ae91c9fc57922/core/fpdftext/cpdf_linkextract.cpp [add] https://crrev.com/8b357e7504ea804293983453540ae91c9fc57922/testing/resources/control_characters.pdf [add] https://crrev.com/8b357e7504ea804293983453540ae91c9fc57922/testing/resources/control_characters.in [modify] https://crrev.com/8b357e7504ea804293983453540ae91c9fc57922/core/fpdftext/cpdf_textpage.cpp [modify] https://crrev.com/8b357e7504ea804293983453540ae91c9fc57922/core/fpdftext/cpdf_textpage.h
,
Nov 30 2017
,
Dec 1 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8f0e7748d905b50d30e4d6f92c3a5aed58a888c4 commit 8f0e7748d905b50d30e4d6f92c3a5aed58a888c4 Author: pdfium-deps-roller@chromium.org <pdfium-deps-roller@chromium.org> Date: Fri Dec 01 00:40:28 2017 Roll src/third_party/pdfium/ fee910e6f..1980f10ff (15 commits) https://pdfium.googlesource.com/pdfium.git/+log/fee910e6f81f..1980f10ff2b8 $ git log fee910e6f..1980f10ff --date=short --no-merges --format='%ad %ae %s' 2017-11-30 dsinclair Simplify XDP parsing code 2017-11-30 dsinclair Rename XFA_ATTRIBUTEENUM to XFA_AttributeEnum enum class 2017-11-30 dsinclair Move packet information into simple parser 2017-11-30 dsinclair Make parsers work off XFA_PacketType enum 2017-11-30 dsinclair A CXFA_Node can only be in one packet 2017-11-30 dsinclair Cleanup XFA packet code 2017-11-30 rharrison Rewrite lower level details of extracting text from page 2017-11-30 dsinclair Create CXFA_Node::NameToAttributeEnum 2017-11-30 dsinclair Move setting of XML content back to specific set methods 2017-11-30 dsinclair Rename GetAttributeEnumById to CXFA_Node::AttributeEnumToName 2017-11-30 dsinclair Remove the packets from attribute data. 2017-11-30 dsinclair Generate XFA node attribute information 2017-11-30 thestig Fix GBK2K-H CMap usage. 2017-11-30 thestig Use initializer list in CPDF_DataAvail ctor. 2017-11-30 thestig Relax checks in CFX_FaceCache::LoadGlyphPath(). Created with: roll-dep src/third_party/pdfium BUG= 654578 , 788864 The AutoRoll server is located here: https://pdfium-roll.skia.org Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. TBR=dsinclair@chromium.org Change-Id: Ic87fbd3ca5dbec12418aa60db84ae9e894431881 Reviewed-on: https://chromium-review.googlesource.com/802188 Reviewed-by: <pdfium-deps-roller@chromium.org> Commit-Queue: <pdfium-deps-roller@chromium.org> Cr-Commit-Position: refs/heads/master@{#520774} [modify] https://crrev.com/8f0e7748d905b50d30e4d6f92c3a5aed58a888c4/DEPS
,
Dec 1 2017
Tested the issue on Ubuntu 14.04, Windows-7&10 and Mac OS 10.12.6 using chrome latest Canary M64-64.0.3282.0 by following steps mentioned in the original comment. Observed that find in page in a PDF highlight displaying as expected. Hence adding TE-Verified label. Please find the screen cast(Ubuntu 14.04) for reference. Thank you! |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by sheriffbot@chromium.org
, Oct 11 2017Status: Untriaged (was: Available)