Monorail Project: pdfium Issues People Development process History Sign in
New issue
Advanced search Search tips
Issue 492 PDF Tagging and highliting does not seem to work as it does in Adobe Acrobat
Starred by 6 users Reported by weboracl...@gmail.com, May 10 2016 Back to list
Status: Accepted
Owner: ----
Api



Sign in to add a comment
What steps will reproduce the problem?

1. Open the following url
https://storage.googleapis.com/pdf-test/test.pdf?calcrtid=Tag-1 

in 
Internet Explorer 9/10/11 (Environment: Adobe Acrobat installed + Window 7; if you don't have this configuration you can obtain the VM here https://developer.microsoft.com/en-us/microsoft-edge/tools/vms/windows/


2. you will see that word "morbi" is highlighted


What is the expected output? What do you see instead?

Open same URL in Chrome (Tested on most versions up to Version 50.0.2661.94 m) and you will see that the word morbi is not highlighted

It would be expected to bring the user to the tag on the page where it was created and highlight the tagged text

What version of the product are you using? On what operating system?

Chrome (Tested on most versions up to Version 50.0.2661.94 m)

Please provide any additional information below.

additional items tagged in this document are:

Tag-1 => p1 (475,512,690,681) 'morbi'

https://storage.googleapis.com/pdf-test/test.pdf?calcrtid=Tag-1

Tag-2 => p2 (430,472,685,672) 'adipiscing'

https://storage.googleapis.com/pdf-test/test.pdf?calcrtid=Tag-2

Tag-3 => p3 (475,472,690,679) 'commodo'

https://storage.googleapis.com/pdf-test/test.pdf?calcrtid=Tag-3

Tag-4 => p4 (475,543,678,668) 'viverra' 

https://storage.googleapis.com/pdf-test/test.pdf?calcrtid=Tag-4



Also here is the content of the (simplified) OpenAction script:

this.disclosed = true;
this.syncAnnotScan();
var u = this.URL;
var args = u.split('calcrtid=');
var tags;
if ( args.length > 1 ) {
	tags = args[1].split('&');
	this.gotoNamedDest(tags[0]);
    var a = this.getAnnot( this.pageNum, tags[0] );
	var hilite = ( u.indexOf( 'hilite=true' ) == -1 ) ? 0 : 1;
	if ( hilite ) {
		var selected = '';
		if ( a != null ) {
			selected = a.name;
		}
		var list = this.getAnnots();
		for ( var ix = 0; ix < list.length; ++ix ) {
			var ann = list[ ix ];
			if ( ann.type == 'Highlight' ) {
				var nm = ann.name;
				ann.hidden = false;
				if ( nm == selected ) {
					ann.contents = nm + ' (SELECTED)';
				} else {
					ann.contents = nm;
				}
			}
		}
	}
    if ( a != null ) {
		a.hidden = false;
		var rect = a.rect;
		if ( rect != null ) {
			this.scroll( rect[ 2 ] + 100, rect[ 3 ] + 100 );
		}
	}
	this.closeDoc(true);
 }


This functionality is very important for many users in my case more than 250,000+.  Thank you


 
Project Member Comment 1 by thestig@chromium.org, May 10 2016
Status: Accepted
Is the Javascript actually relevant, or is it just a case of highlight annotations not displaying?

FWIW, opening the PDF in Safari does not show any highlighting either.
Hi yes OpenAction script is for highlighting. 
Comment 3 by brz...@gmail.com, May 11 2016
Key part of the OpenAction script is capturing the tag name from the URL.
The tag name is used to position to proper page (goto action using named destination matching tag name), turn on the proper highlighting (un-hide annotation matching tag name), and scroll the annotation rectangle into view.

Chrome does not set the URL, so the tag is unknown, which results in no highlighting and no positioning of the annotation into view.

Using the parameter #nameddest=Tag-4 with Chrome does position to the correct page, but the parameter value is not available in the OpenAction script, so the proper annotation cannot be identified. Thus, the highlighting does not appear, nor does the rectangle scroll into view.
Attaching example of what is expected when I navigate to 
https://storage.googleapis.com/pdf-test/test.pdf?calcrtid=Tag-2
 
Tag-2 => p2 (430,472,685,672) 'adipiscing'


expected-pdf-page-2.jpg
280 KB View Download
Project Member Comment 5 by bugdroid1@chromium.org, Aug 8 2016
The following revision refers to this bug:
  https://pdfium.googlesource.com/pdfium.git/+/33c4cdb4efbacb73151c982549151ea4e545eff8

commit 33c4cdb4efbacb73151c982549151ea4e545eff8
Author: tonikitoo <tonikitoo@igalia.com>
Date: Mon Aug 08 17:52:51 2016

Add support to Document::URL property getter.

As per the PDF specification at [1]

"
This property specifies the document's URL.
".

IE/Acrobat supports it, and getting it implemented
would be one step forward in order to support Acrobat JS
script as the one in [2].

[1] http://partners.adobe.com/public/developer/en/acrobat/sdk/5186AcroJS.pdf
[2] https://bugs.chromium.org/p/pdfium/issues/detail?id=492

BUG=492

Review-Url: https://codereview.chromium.org/2219183002

[modify] https://crrev.com/33c4cdb4efbacb73151c982549151ea4e545eff8/fpdfsdk/javascript/Document.cpp
[modify] https://crrev.com/33c4cdb4efbacb73151c982549151ea4e545eff8/fpdfsdk/javascript/Document.h
[modify] https://crrev.com/33c4cdb4efbacb73151c982549151ea4e545eff8/testing/resources/javascript/document_props.in
[modify] https://crrev.com/33c4cdb4efbacb73151c982549151ea4e545eff8/testing/resources/javascript/document_props_expected.txt

Project Member Comment 6 by bugdroid1@chromium.org, Aug 19 2016
The following revision refers to this bug:
  https://pdfium.googlesource.com/pdfium.git/+/618cb1f3e561b5d2a1dea9ec4653804f0da7267c

commit 618cb1f3e561b5d2a1dea9ec4653804f0da7267c
Author: tonikitoo <tonikitoo@igalia.com>
Date: Fri Aug 19 03:10:17 2016

Add initial Document::getAnnot support

CL implements the first step in order to support
Annotations manipulation in PDFium: Document::getAnnot.

The method takes two arguments, an integer (page number)
and a string (annotation name).
When called, it iterates over the annotations on
the given page number, searching for the one whose name
matches the string in the second parameter.
If found, then an Annot instance (see Annot.cpp/g added by this
CL), is bound to a Javascript object and returned.

With the use cases described in bug [1] as an initial test case,
CL adds support to the following Annotation object properties:

- hidden
- name
- type

Idea is to keep evolving the implementation with more methods
and properties in follow up CLs.

[1] https://bugs.chromium.org/p/pdfium/issues/detail?id=492

BUG=pdfium:492

Review-Url: https://codereview.chromium.org/2260663002

[modify] https://crrev.com/618cb1f3e561b5d2a1dea9ec4653804f0da7267c/BUILD.gn
[add] https://crrev.com/618cb1f3e561b5d2a1dea9ec4653804f0da7267c/fpdfsdk/javascript/Annot.cpp
[add] https://crrev.com/618cb1f3e561b5d2a1dea9ec4653804f0da7267c/fpdfsdk/javascript/Annot.h
[modify] https://crrev.com/618cb1f3e561b5d2a1dea9ec4653804f0da7267c/fpdfsdk/javascript/Document.cpp
[modify] https://crrev.com/618cb1f3e561b5d2a1dea9ec4653804f0da7267c/fpdfsdk/javascript/cjs_runtime.cpp
[modify] https://crrev.com/618cb1f3e561b5d2a1dea9ec4653804f0da7267c/pdfium.gyp
[modify] https://crrev.com/618cb1f3e561b5d2a1dea9ec4653804f0da7267c/testing/resources/javascript/document_methods.in
[modify] https://crrev.com/618cb1f3e561b5d2a1dea9ec4653804f0da7267c/testing/resources/javascript/document_methods_expected.txt

Project Member Comment 7 by bugdroid1@chromium.org, Aug 19 2016
The following revision refers to this bug:
  https://pdfium.googlesource.com/pdfium.git/+/bb5fa043a7ef2de165c7903548e5663a6f8bcf9a

commit bb5fa043a7ef2de165c7903548e5663a6f8bcf9a
Author: tonikitoo <tonikitoo@igalia.com>
Date: Fri Aug 19 18:18:29 2016

Stub out Document::syncAnnotScan method.

The PDF specification [1] says:

"
syncAnnotScan guarantees that all annotations will be scanned
by the time this method returns.
(..)
Normally a background task runs that examine every page and
looks for annotations during idle times.
"

The statement details specifically how Acrobat implements
this method.
Although, neither the method itself nor the background scanner
task are implemented in PDFium (as of today, Ago/2016),
not having ::syncAnnotScan at least stubbed out can be considered
harmfull since its absence makes JS acrobat scripts silently
fail when it has a call to it.

Given that, and following a stub-out pattern present in other
methods including ::addAnnot and ::addField, CL provides
a stubbed out implementation of Document::syncAnnotScan.

[1] http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/js_api_reference.pdf

BUG=pdfium:492

Review-Url: https://codereview.chromium.org/2265553002

[modify] https://crrev.com/bb5fa043a7ef2de165c7903548e5663a6f8bcf9a/fpdfsdk/javascript/Document.cpp
[modify] https://crrev.com/bb5fa043a7ef2de165c7903548e5663a6f8bcf9a/fpdfsdk/javascript/Document.h
[modify] https://crrev.com/bb5fa043a7ef2de165c7903548e5663a6f8bcf9a/testing/resources/javascript/document_methods.in
[modify] https://crrev.com/bb5fa043a7ef2de165c7903548e5663a6f8bcf9a/testing/resources/javascript/document_methods_expected.txt

Project Member Comment 8 by bugdroid1@chromium.org, Aug 24 2016
The following revision refers to this bug:
  https://pdfium.googlesource.com/pdfium.git/+/ade4b495433751ac853f2d677b9e1da94d0d6bf7

commit ade4b495433751ac853f2d677b9e1da94d0d6bf7
Author: tonikitoo <tonikitoo@igalia.com>
Date: Wed Aug 24 17:37:00 2016

Lazy generate an "AP" when an Annot's hidden state changes

Now that Document::getAnnot works and annotation instances
can have its properties changed, consider the following
scenario:

- A PDF content has an annotation without AP and
CPVT_GenerateAP is called to generate one.
- However the annotation also has its hidden flag set (/F 2),
and CPVT_GenerateAP bails out earlier, not generating an AP.
- When the PDF's Javascript runs, it acquires an instance of
this annotation object, bounded to JS using Document::getAnnot(),
and set its "hidden" flag to false.
- At this point, the annotation should get drawn, but it does
not because its "AP" was never generated.

CL fixes this scenario by making PDFium able to lazy
generate APs, if needed.

BUG=pdfium:492

Review-Url: https://codereview.chromium.org/2265313002

[modify] https://crrev.com/ade4b495433751ac853f2d677b9e1da94d0d6bf7/core/fpdfdoc/cpdf_annot.cpp
[modify] https://crrev.com/ade4b495433751ac853f2d677b9e1da94d0d6bf7/core/fpdfdoc/include/cpdf_annot.h

Project Member Comment 9 by bugdroid1@chromium.org, Aug 26 2016
The following revision refers to this bug:
  https://pdfium.googlesource.com/pdfium.git/+/3e98158a6c47361ca7d6c2c18d47c9f8f3aabb8a

commit 3e98158a6c47361ca7d6c2c18d47c9f8f3aabb8a
Author: tonikitoo <tonikitoo@igalia.com>
Date: Fri Aug 26 15:37:10 2016

Extend pdfium_test capability so that more Javascript can be executed

In [1], the lack of support of pdfium_test to some application
level hooks was felt.
More specifically, the lack of implementation of the hook FFI_GetPage,
called  when 'this.getAnnot()' is executed in an Acrobar JS context,
makes it non-trivial to JS texts that manipulate PDF annotations.

[1] https://codereview.chromium.org/2265313002/

Here is the failing call stack in pdfium_test:

0 ::RenderPdf                              (samples/pdfium_test.cc)
1 ::FORM_DoDocumentOpenAction              (fpdfsdk/fpdfformfill.cpp)
2 CPDFSDK_Document::ProcOpenAction         (fpdfsdk/fsdk_mgr.cpp)
3 CPDFSDK_ActionHandler::DoAction_DocOpen  (fpdfsdk/fsdk_actionhandler.cpp)
  <----v8---->
4 Document::getAnnot                       (fpdfsdk/javascript/Document.cpp)
5 CPDFSDK_Document::GetPageView            (fpdfsdk/fsdk_mgr.cpp)
6 CPDFDoc_Environment::FFI_GetPage         (fpdfsdk/include/fsdk_mgr.h)

(frame 6 returns nullptr, and getAnnot call in frame 4 bails)

CL extends pdfium_test app with a FFI_GetPage hook implementation.

Basically what FFI_GetPage does is returning a FPDF_PAGE instance.
In case of pdfium_test, FPDF_PAGE instances were only created on demand
when the page was going to get rendered, and then discarded.

Since FFI_GetPage can be called by JS before pages are rendered,
CL moved the page creation code into a helper function, and cached
the FPDF_PAGE instances created in a map, so it does not recreate
them needlessly.

BUG=pdfium:492

Review-Url: https://codereview.chromium.org/2277063003

[modify] https://crrev.com/3e98158a6c47361ca7d6c2c18d47c9f8f3aabb8a/samples/pdfium_test.cc
[add] https://crrev.com/3e98158a6c47361ca7d6c2c18d47c9f8f3aabb8a/testing/resources/pixel/bug_492.pdf
[add] https://crrev.com/3e98158a6c47361ca7d6c2c18d47c9f8f3aabb8a/testing/resources/pixel/bug_492.pdf.0.png

Project Member Comment 11 by bugdroid1@chromium.org, Aug 27 2016
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3e0099107a8a8e3f6d685f220530b047c9c90105

commit 3e0099107a8a8e3f6d685f220530b047c9c90105
Author: tonikitoo <tonikitoo@igalia.com>
Date: Sat Aug 27 11:05:49 2016

Allows the PDF engine return the page index it is scrolling to

Chromium/PDFium behave differently from IE/Acrobat, when it
comes to the way some Javascript that scroll to a given
page index is handled.

For instance, lets assume a PDF file is showing its page
'0' and the following JS runs:

  this.pageNum = 1;
  app.alert(this.pageNum);

The output of the alert in IE/Acrobat is '1', whereas it is
'0' in Chromium/PDFium.
This happens because of the asynchronous way Chromium's PDF
plugin handles the "scroll to page X" request.

Also, a similar behavior difference is seen on other
Acrobat JS APIs, including Document::gotoNamedDest, where
the same code path is taken to scroll to a given page index.

CL adds an "optional" class member variable that
caches the page index the PDF is going to be scrolled
to, and allows the PDF plugin to return the target page
index even before the it has finished handling the scroll
request.

BUG=pdfium:492

Review-Url: https://codereview.chromium.org/2271263002
Cr-Commit-Position: refs/heads/master@{#414921}

[modify] https://crrev.com/3e0099107a8a8e3f6d685f220530b047c9c90105/pdf/pdfium/pdfium_engine.cc
[modify] https://crrev.com/3e0099107a8a8e3f6d685f220530b047c9c90105/pdf/pdfium/pdfium_engine.h

Project Member Comment 12 by bugdroid1@chromium.org, Aug 29 2016
The following revision refers to this bug:
  https://pdfium.googlesource.com/pdfium.git/+/5283e674fecf3732d89a8f7f144545af2301ccec

commit 5283e674fecf3732d89a8f7f144545af2301ccec
Author: tonikitoo <tonikitoo@igalia.com>
Date: Mon Aug 29 16:15:47 2016

Fix the test case added in https://codereview.chromium.org/2277063003/

In [1], it was made a mistake in the way the test case
testing/resources/pixel/bug_492.pdf was generated.

This CL aims at fixing this mistake by:

1- keep making use of the new pdfium_test capability
   introduced by [1],
2- add a proper .in file for the test case to generate
   its respective .pdf file.

[1] https://codereview.chromium.org/2277063003/

BUG=pdfium:492

Review-Url: https://codereview.chromium.org/2286023002

[add] https://crrev.com/5283e674fecf3732d89a8f7f144545af2301ccec/testing/resources/pixel/bug_492.in
[delete] https://crrev.com/548ea2f7d0836866c5a5eea20dd707f713e51469/testing/resources/pixel/bug_492.pdf
[modify] https://crrev.com/5283e674fecf3732d89a8f7f144545af2301ccec/testing/resources/pixel/bug_492.pdf.0.png

Project Member Comment 13 by dsinclair@chromium.org, Aug 31 2016
Labels: Api
Sign in to add a comment