New issue
Advanced search Search tips

Issue 638513 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Nov 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Chrome , Mac
Pri: 2
Type: Bug

Blocked on:
issue 657897



Sign in to add a comment

PDF: can't display 147MB PDF stored localy

Reported by bau...@gmail.com, Aug 17 2016

Issue description

UserAgent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.57 Safari/537.36

Steps to reproduce the problem:
1. Download PDF  https://www.microsoft.com/en-US/download/details.aspx?id=53314
2. open it with Chrome

What is the expected behavior?
display PDF without use 300MB and fastly (work fine with FoxitReader, use only 67MB)

What went wrong?
PDF not displayed, progress bar hang in the end; or display 1/1 but screen stay grey; or just stay grey without progress bar.

Did this work before? N/A 

Chrome version: 53.0.2785.57  Channel: beta
OS Version: 6.3
Flash Version: disabled
 
Chrome1.jpg
36.0 KB View Download
Chrome2.jpg
34.4 KB View Download
Components: Internals>Plugins>PDF
Components: -UI
Labels: OS-Chrome OS-Linux OS-Mac
Status: Available (was: Unconfirmed)
As a different test, I ran a debug build of pdfium_test and it chewed 8 minutes of CPU time and still didn't finish. Will try a release build and see how long that takes. Someone will then need to look and see where the time is spent. I bet it's in the PDF parser.
Cc: npm@chromium.org
Release build of pdfium_test without XFA took just under 8 minutes to... render 28365 pages.

npm: Interested in looking at a performance issue and playing with perf tools? i.e. install the linux-tools-common package on Ubuntu, which comes with an executable called perf.
Alternatively, Instruments on the Mac works really well for looking into these kinds of things. (Or valgrind's callgrind)

Comment 5 by npm@chromium.org, Sep 23 2016

Owner: npm@chromium.org
Status: Assigned (was: Available)

Comment 6 by npm@chromium.org, Oct 13 2016

These are the most time-consuming methods for pdfium_test:
33.9% CPDF_Document::FindPDFPage
14.6% CPDF_StreamContentParser::Parse
21.7% CFX_RenderDevice::DrawNormalText
15.4% CPDF_TextPage::ProcessTextObject

Couldn't really find the pdfium usage when running Chrome. The pdf does eventually open, but it hangs on grey for way too long. The most likely culprits for this from the above would be CPDF_Document::FindPDFPage or CPDF_StreamContentParser::Parse.
Project Member

Comment 7 by bugdroid1@chromium.org, Oct 18 2016

The following revision refers to this bug:
  https://pdfium.googlesource.com/pdfium.git/+/7c29e27dae139a205755c1a29b7f3ac8b36ec0da

commit 7c29e27dae139a205755c1a29b7f3ac8b36ec0da
Author: npm <npm@chromium.org>
Date: Tue Oct 18 17:38:20 2016

Traverse PDF page tree only once in CPDF_Document

In our current implementation of CPDF_Document::GetPage, we traverse
the PDF page tree until we find the index we are looking for. This is
slow when we do calls GetPage(0), GetPage(1), ... since in this case
the page tree will be traversed n times if there are n pages. This CL
makes sure the page tree is only traversed once.

Time to load the PDF from the bug below in chrome official build:
Before this CL: 1 minute 40 seconds
After this CL: 5 seconds

BUG= chromium:638513 

Review-Url: https://codereview.chromium.org/2414423002

[modify] https://crrev.com/7c29e27dae139a205755c1a29b7f3ac8b36ec0da/core/fpdfapi/parser/cpdf_document.cpp
[modify] https://crrev.com/7c29e27dae139a205755c1a29b7f3ac8b36ec0da/core/fpdfapi/parser/cpdf_document.h

Comment 8 by npm@chromium.org, Oct 18 2016

Status: Fixed (was: Assigned)
This can always be improved, but load time should be reasonable now.
Project Member

Comment 9 by bugdroid1@chromium.org, Oct 18 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/991471745f77e6f6ddc261e3648b834e78cf1655

commit 991471745f77e6f6ddc261e3648b834e78cf1655
Author: pdfium-deps-roller <pdfium-deps-roller@chromium.org>
Date: Tue Oct 18 21:24:51 2016

Roll src/third_party/pdfium/ 878dd5b12..7c29e27da (2 commits).

https://pdfium.googlesource.com/pdfium.git/+log/878dd5b121b3..7c29e27dae13

$ git log 878dd5b12..7c29e27da --date=short --no-merges --format='%ad %ae %s'
2016-10-18 npm Traverse PDF page tree only once in CPDF_Document
2016-10-18 thestig Add a test case for bug 494057.

BUG= 638513 ,494057

TBR=dsinclair@chromium.org

Review-Url: https://codereview.chromium.org/2428993002
Cr-Commit-Position: refs/heads/master@{#426049}

[modify] https://crrev.com/991471745f77e6f6ddc261e3648b834e78cf1655/DEPS

Comment 10 by mark@chromium.org, Oct 20 2016

Blockedon: 657897

Comment 11 by npm@chromium.org, Oct 21 2016

Status: Assigned (was: Fixed)
Reopening as CL had to be reverted.
Project Member

Comment 12 by bugdroid1@chromium.org, Oct 21 2016

The following revision refers to this bug:
  https://pdfium.googlesource.com/pdfium.git/+/3cad596c55cd4db64e002aba118904f65c385168

commit 3cad596c55cd4db64e002aba118904f65c385168
Author: npm <npm@chromium.org>
Date: Fri Oct 21 23:02:15 2016

Add CPDF_Document::GetPage() unittests

Added a nontrivial page tree and a test that pages are being fetched
properly, both when requested in order and in reverse order. This will
help prevent introducing bugs while changing the way the page tree is
processed.

BUG= chromium:638513 

Review-Url: https://chromiumcodereview.appspot.com/2435783006

[modify] https://crrev.com/3cad596c55cd4db64e002aba118904f65c385168/BUILD.gn
[modify] https://crrev.com/3cad596c55cd4db64e002aba118904f65c385168/core/fpdfapi/parser/cpdf_document.h
[add] https://crrev.com/3cad596c55cd4db64e002aba118904f65c385168/core/fpdfapi/parser/cpdf_document_unittest.cpp

Project Member

Comment 13 by bugdroid1@chromium.org, Oct 21 2016

The following revision refers to this bug:
  https://pdfium.googlesource.com/pdfium.git/+/3cad596c55cd4db64e002aba118904f65c385168

commit 3cad596c55cd4db64e002aba118904f65c385168
Author: npm <npm@chromium.org>
Date: Fri Oct 21 23:02:15 2016

Add CPDF_Document::GetPage() unittests

Added a nontrivial page tree and a test that pages are being fetched
properly, both when requested in order and in reverse order. This will
help prevent introducing bugs while changing the way the page tree is
processed.

BUG= chromium:638513 

Review-Url: https://chromiumcodereview.appspot.com/2435783006

[modify] https://crrev.com/3cad596c55cd4db64e002aba118904f65c385168/BUILD.gn
[modify] https://crrev.com/3cad596c55cd4db64e002aba118904f65c385168/core/fpdfapi/parser/cpdf_document.h
[add] https://crrev.com/3cad596c55cd4db64e002aba118904f65c385168/core/fpdfapi/parser/cpdf_document_unittest.cpp

Project Member

Comment 14 by bugdroid1@chromium.org, Oct 21 2016

The following revision refers to this bug:
  https://pdfium.googlesource.com/pdfium.git/+/3cad596c55cd4db64e002aba118904f65c385168

commit 3cad596c55cd4db64e002aba118904f65c385168
Author: npm <npm@chromium.org>
Date: Fri Oct 21 23:02:15 2016

Add CPDF_Document::GetPage() unittests

Added a nontrivial page tree and a test that pages are being fetched
properly, both when requested in order and in reverse order. This will
help prevent introducing bugs while changing the way the page tree is
processed.

BUG= chromium:638513 

Review-Url: https://chromiumcodereview.appspot.com/2435783006

[modify] https://crrev.com/3cad596c55cd4db64e002aba118904f65c385168/BUILD.gn
[modify] https://crrev.com/3cad596c55cd4db64e002aba118904f65c385168/core/fpdfapi/parser/cpdf_document.h
[add] https://crrev.com/3cad596c55cd4db64e002aba118904f65c385168/core/fpdfapi/parser/cpdf_document_unittest.cpp

Project Member

Comment 15 by bugdroid1@chromium.org, Oct 22 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/41176b1fbb13e3c55f60b35ccdb74a5f501e0960

commit 41176b1fbb13e3c55f60b35ccdb74a5f501e0960
Author: pdfium-deps-roller <pdfium-deps-roller@chromium.org>
Date: Sat Oct 22 04:56:20 2016

Roll src/third_party/pdfium/ 5532835ae..3cad596c5 (2 commits).

https://pdfium.googlesource.com/pdfium.git/+log/5532835aeb88..3cad596c55cd

$ git log 5532835ae..3cad596c5 --date=short --no-merges --format='%ad %ae %s'
2016-10-21 npm Add CPDF_Document::GetPage() unittests
2016-10-21 tsepez Remove dead code in CPDF_CustomAccess

BUG= 638513 

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, see:
http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls

TBR=dsinclair@chromium.org

Review-Url: https://chromiumcodereview.appspot.com/2440263002
Cr-Commit-Position: refs/heads/master@{#426974}

[modify] https://crrev.com/41176b1fbb13e3c55f60b35ccdb74a5f501e0960/DEPS

Project Member

Comment 16 by bugdroid1@chromium.org, Oct 26 2016

The following revision refers to this bug:
  https://pdfium.googlesource.com/pdfium.git/+/d3a2009d75eac3cda442f545ef0865afae7b35cf

commit d3a2009d75eac3cda442f545ef0865afae7b35cf
Author: npm <npm@chromium.org>
Date: Wed Oct 26 18:03:43 2016

Traverse PDF page tree only once in CPDF_Document

Try 2: main fix was recursively popping elements from the stack. Since
the Traverse method can be called on non-root nodes from GetPage(), we
have to make sure to properly update the parents.

Try 1 at https://codereview.chromium.org/2414423002/

In our current implementation of CPDF_Document::GetPage, we traverse
the PDF page tree until we find the index we are looking for. This is
slow when we do calls GetPage(0), GetPage(1), ... since in this case
the page tree will be traversed n times if there are n pages. This CL
makes sure the page tree is only traversed once.

Time to load the PDF from the bug below in chrome official build:
Before this CL: around 1 minute 25 seconds
After this CL: around 4 seconds

BUG= chromium:638513 

Review-Url: https://codereview.chromium.org/2442403002

[modify] https://crrev.com/d3a2009d75eac3cda442f545ef0865afae7b35cf/core/fpdfapi/parser/cpdf_document.cpp
[modify] https://crrev.com/d3a2009d75eac3cda442f545ef0865afae7b35cf/core/fpdfapi/parser/cpdf_document.h

Comment 17 by npm@chromium.org, Oct 26 2016

Status: Fixed (was: Assigned)
Let's try again
Project Member

Comment 18 by bugdroid1@chromium.org, Oct 26 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/57d1a4fe85b070baff03ce36f156ee99a71b0cc1

commit 57d1a4fe85b070baff03ce36f156ee99a71b0cc1
Author: pdfium-deps-roller <pdfium-deps-roller@chromium.org>
Date: Wed Oct 26 21:45:11 2016

Roll src/third_party/pdfium/ 16b703cb5..d3a2009d7 (2 commits).

https://pdfium.googlesource.com/pdfium.git/+log/16b703cb504a..d3a2009d75ea

$ git log 16b703cb5..d3a2009d7 --date=short --no-merges --format='%ad %ae %s'
2016-10-26 npm Traverse PDF page tree only once in CPDF_Document
2016-10-26 tsepez Avoid some FX_BOOL/bool noise in fx_codec_fax.cpp

BUG= 638513 

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, see:
http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls

TBR=dsinclair@chromium.org

Review-Url: https://codereview.chromium.org/2455713002
Cr-Commit-Position: refs/heads/master@{#427822}

[modify] https://crrev.com/57d1a4fe85b070baff03ce36f156ee99a71b0cc1/DEPS

Cc: rbasuvula@chromium.org
Labels: Needs-Feedback
Tested the issue on Windows-10, Ubuntu 14.04 and Mac OS 10.12 using chrome latest Dev M56-56.0.2902.0 by following steps mentioned in the original comment. Observed that PDF is opening in 2-3 seconds and attaching the screen cast for reference. 

Could you please look into it and provide us the feedback. 

Thank you in Advance!
638513.ogv
8.9 MB View Download
Project Member

Comment 20 by bugdroid1@chromium.org, Oct 28 2016

The following revision refers to this bug:
  https://pdfium.googlesource.com/pdfium.git/+/900f421e29daf2ab62de3ae8dc821f031bc7bdb3

commit 900f421e29daf2ab62de3ae8dc821f031bc7bdb3
Author: npm <npm@chromium.org>
Date: Fri Oct 28 21:30:44 2016

Revert of Traverse PDF page tree only once in CPDF_Document Try 2 (patchset #3 id:40001 of https://codereview.chromium.org/2442403002/ )

Reason for revert:
Not quite right yet.

Original issue's description:
> Traverse PDF page tree only once in CPDF_Document
>
> Try 2: main fix was recursively popping elements from the stack. Since
> the Traverse method can be called on non-root nodes from GetPage(), we
> have to make sure to properly update the parents.
>
> Try 1 at https://codereview.chromium.org/2414423002/
>
> In our current implementation of CPDF_Document::GetPage, we traverse
> the PDF page tree until we find the index we are looking for. This is
> slow when we do calls GetPage(0), GetPage(1), ... since in this case
> the page tree will be traversed n times if there are n pages. This CL
> makes sure the page tree is only traversed once.
>
> Time to load the PDF from the bug below in chrome official build:
> Before this CL: around 1 minute 25 seconds
> After this CL: around 4 seconds
>
> BUG= chromium:638513 
>
> Committed: https://pdfium.googlesource.com/pdfium/+/d3a2009d75eac3cda442f545ef0865afae7b35cf

TBR=tsepez@chromium.org,weili@chromium.org,thestig@chromium.org
# Not skipping CQ checks because original CL landed more than 1 days ago.
BUG= chromium:638513 

Review-Url: https://codereview.chromium.org/2461063003

[modify] https://crrev.com/900f421e29daf2ab62de3ae8dc821f031bc7bdb3/core/fpdfapi/parser/cpdf_document.cpp
[modify] https://crrev.com/900f421e29daf2ab62de3ae8dc821f031bc7bdb3/core/fpdfapi/parser/cpdf_document.h

Cc: -npm@chromium.org
Labels: -Needs-Feedback
Status: Started (was: Fixed)
Project Member

Comment 22 by bugdroid1@chromium.org, Oct 30 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/20945fd7014d65578c88a08f9d8e90d4877cee6d

commit 20945fd7014d65578c88a08f9d8e90d4877cee6d
Author: pdfium-deps-roller <pdfium-deps-roller@chromium.org>
Date: Sun Oct 30 00:09:30 2016

Roll src/third_party/pdfium/ dc0401a0d..747dcf775 (4 commits).

https://pdfium.googlesource.com/pdfium.git/+log/dc0401a0da44..747dcf775c5c

$ git log dc0401a0d..747dcf775 --date=short --no-merges --format='%ad %ae %s'
2016-10-29 dsinclair Revert of Change FX_BOOL definition from int to bool. (patchset #15 id:250001 of https://codereview.chromium.org/2453473003/ )
2016-10-28 tsepez Change FX_BOOL definition from int to bool.
2016-10-28 tsepez Fix FX_BOOL / int issue in fx_skia_device.cpp
2016-10-28 npm Revert of Traverse PDF page tree only once in CPDF_Document Try 2 (patchset #3 id:40001 of https://codereview.chromium.org/2442403002/ )

BUG= 638513 

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, see:
http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls

TBR=dsinclair@chromium.org

Review-Url: https://codereview.chromium.org/2459173002
Cr-Commit-Position: refs/heads/master@{#428617}

[modify] https://crrev.com/20945fd7014d65578c88a08f9d8e90d4877cee6d/DEPS

Project Member

Comment 23 by bugdroid1@chromium.org, Nov 4 2016

The following revision refers to this bug:
  https://pdfium.googlesource.com/pdfium.git/+/ec64cee9acccd0d1e574bbbd8aa91b08c1cf254f

commit ec64cee9acccd0d1e574bbbd8aa91b08c1cf254f
Author: npm <npm@chromium.org>
Date: Fri Nov 04 19:54:51 2016

Traverse PDF page tree only once in CPDF_Document Try 3

Now, we do not start traversal from where we were at, but from the top.
This makes the code less prone to bugs, as now there is no need to call
methods to recursively fix things. This will save a lot of time when
the trees are rather flat, as in the PDF file in the bug. It can still
be slow, for instance if we have a chain of page nodes, and the last
in the chain contains all of the pages (this is artificial).

Try 2 at https://codereview.chromium.org/2442403002/

Also added test where Try 2 would have failed.

Tested the pdf from the bug on my Mac:
With this CL: load in 21 seconds
Without this CL: did not load in 4 minutes, got tired of waiting

BUG= chromium:638513 

Review-Url: https://codereview.chromium.org/2470803003

[modify] https://crrev.com/ec64cee9acccd0d1e574bbbd8aa91b08c1cf254f/core/fpdfapi/parser/cpdf_document.cpp
[modify] https://crrev.com/ec64cee9acccd0d1e574bbbd8aa91b08c1cf254f/core/fpdfapi/parser/cpdf_document.h
[modify] https://crrev.com/ec64cee9acccd0d1e574bbbd8aa91b08c1cf254f/core/fpdfapi/parser/cpdf_document_unittest.cpp

Project Member

Comment 24 by bugdroid1@chromium.org, Nov 4 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ab77e5ae52017574204e37b9f9dfccea6fad6513

commit ab77e5ae52017574204e37b9f9dfccea6fad6513
Author: pdfium-deps-roller <pdfium-deps-roller@chromium.org>
Date: Fri Nov 04 23:09:46 2016

Roll src/third_party/pdfium/ 761eed284..ec64cee9a (2 commits).

https://pdfium.googlesource.com/pdfium.git/+log/761eed284e12..ec64cee9accc

$ git log 761eed284..ec64cee9a --date=short --no-merges --format='%ad %ae %s'
2016-11-04 npm Traverse PDF page tree only once in CPDF_Document Try 3
2016-11-04 tsepez Reland "Remove CPDF_Object::Release() in favor of direct delete"

BUG= 638513 

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, see:
http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls

TBR=dsinclair@chromium.org

Review-Url: https://codereview.chromium.org/2480883002
Cr-Commit-Position: refs/heads/master@{#430059}

[modify] https://crrev.com/ab77e5ae52017574204e37b9f9dfccea6fad6513/DEPS

Comment 25 by npm@chromium.org, Nov 8 2016

Status: Fixed (was: Started)

Sign in to add a comment