PDF: can't display 147MB PDF stored localy
Reported by
bau...@gmail.com,
Aug 17 2016
|
|||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.57 Safari/537.36 Steps to reproduce the problem: 1. Download PDF https://www.microsoft.com/en-US/download/details.aspx?id=53314 2. open it with Chrome What is the expected behavior? display PDF without use 300MB and fastly (work fine with FoxitReader, use only 67MB) What went wrong? PDF not displayed, progress bar hang in the end; or display 1/1 but screen stay grey; or just stay grey without progress bar. Did this work before? N/A Chrome version: 53.0.2785.57 Channel: beta OS Version: 6.3 Flash Version: disabled
,
Sep 21 2016
As a different test, I ran a debug build of pdfium_test and it chewed 8 minutes of CPU time and still didn't finish. Will try a release build and see how long that takes. Someone will then need to look and see where the time is spent. I bet it's in the PDF parser.
,
Sep 21 2016
Release build of pdfium_test without XFA took just under 8 minutes to... render 28365 pages. npm: Interested in looking at a performance issue and playing with perf tools? i.e. install the linux-tools-common package on Ubuntu, which comes with an executable called perf.
,
Sep 22 2016
Alternatively, Instruments on the Mac works really well for looking into these kinds of things. (Or valgrind's callgrind)
,
Sep 23 2016
,
Oct 13 2016
These are the most time-consuming methods for pdfium_test: 33.9% CPDF_Document::FindPDFPage 14.6% CPDF_StreamContentParser::Parse 21.7% CFX_RenderDevice::DrawNormalText 15.4% CPDF_TextPage::ProcessTextObject Couldn't really find the pdfium usage when running Chrome. The pdf does eventually open, but it hangs on grey for way too long. The most likely culprits for this from the above would be CPDF_Document::FindPDFPage or CPDF_StreamContentParser::Parse.
,
Oct 18 2016
The following revision refers to this bug: https://pdfium.googlesource.com/pdfium.git/+/7c29e27dae139a205755c1a29b7f3ac8b36ec0da commit 7c29e27dae139a205755c1a29b7f3ac8b36ec0da Author: npm <npm@chromium.org> Date: Tue Oct 18 17:38:20 2016 Traverse PDF page tree only once in CPDF_Document In our current implementation of CPDF_Document::GetPage, we traverse the PDF page tree until we find the index we are looking for. This is slow when we do calls GetPage(0), GetPage(1), ... since in this case the page tree will be traversed n times if there are n pages. This CL makes sure the page tree is only traversed once. Time to load the PDF from the bug below in chrome official build: Before this CL: 1 minute 40 seconds After this CL: 5 seconds BUG= chromium:638513 Review-Url: https://codereview.chromium.org/2414423002 [modify] https://crrev.com/7c29e27dae139a205755c1a29b7f3ac8b36ec0da/core/fpdfapi/parser/cpdf_document.cpp [modify] https://crrev.com/7c29e27dae139a205755c1a29b7f3ac8b36ec0da/core/fpdfapi/parser/cpdf_document.h
,
Oct 18 2016
This can always be improved, but load time should be reasonable now.
,
Oct 18 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/991471745f77e6f6ddc261e3648b834e78cf1655 commit 991471745f77e6f6ddc261e3648b834e78cf1655 Author: pdfium-deps-roller <pdfium-deps-roller@chromium.org> Date: Tue Oct 18 21:24:51 2016 Roll src/third_party/pdfium/ 878dd5b12..7c29e27da (2 commits). https://pdfium.googlesource.com/pdfium.git/+log/878dd5b121b3..7c29e27dae13 $ git log 878dd5b12..7c29e27da --date=short --no-merges --format='%ad %ae %s' 2016-10-18 npm Traverse PDF page tree only once in CPDF_Document 2016-10-18 thestig Add a test case for bug 494057. BUG= 638513 ,494057 TBR=dsinclair@chromium.org Review-Url: https://codereview.chromium.org/2428993002 Cr-Commit-Position: refs/heads/master@{#426049} [modify] https://crrev.com/991471745f77e6f6ddc261e3648b834e78cf1655/DEPS
,
Oct 20 2016
,
Oct 21 2016
Reopening as CL had to be reverted.
,
Oct 21 2016
The following revision refers to this bug: https://pdfium.googlesource.com/pdfium.git/+/3cad596c55cd4db64e002aba118904f65c385168 commit 3cad596c55cd4db64e002aba118904f65c385168 Author: npm <npm@chromium.org> Date: Fri Oct 21 23:02:15 2016 Add CPDF_Document::GetPage() unittests Added a nontrivial page tree and a test that pages are being fetched properly, both when requested in order and in reverse order. This will help prevent introducing bugs while changing the way the page tree is processed. BUG= chromium:638513 Review-Url: https://chromiumcodereview.appspot.com/2435783006 [modify] https://crrev.com/3cad596c55cd4db64e002aba118904f65c385168/BUILD.gn [modify] https://crrev.com/3cad596c55cd4db64e002aba118904f65c385168/core/fpdfapi/parser/cpdf_document.h [add] https://crrev.com/3cad596c55cd4db64e002aba118904f65c385168/core/fpdfapi/parser/cpdf_document_unittest.cpp
,
Oct 21 2016
The following revision refers to this bug: https://pdfium.googlesource.com/pdfium.git/+/3cad596c55cd4db64e002aba118904f65c385168 commit 3cad596c55cd4db64e002aba118904f65c385168 Author: npm <npm@chromium.org> Date: Fri Oct 21 23:02:15 2016 Add CPDF_Document::GetPage() unittests Added a nontrivial page tree and a test that pages are being fetched properly, both when requested in order and in reverse order. This will help prevent introducing bugs while changing the way the page tree is processed. BUG= chromium:638513 Review-Url: https://chromiumcodereview.appspot.com/2435783006 [modify] https://crrev.com/3cad596c55cd4db64e002aba118904f65c385168/BUILD.gn [modify] https://crrev.com/3cad596c55cd4db64e002aba118904f65c385168/core/fpdfapi/parser/cpdf_document.h [add] https://crrev.com/3cad596c55cd4db64e002aba118904f65c385168/core/fpdfapi/parser/cpdf_document_unittest.cpp
,
Oct 21 2016
The following revision refers to this bug: https://pdfium.googlesource.com/pdfium.git/+/3cad596c55cd4db64e002aba118904f65c385168 commit 3cad596c55cd4db64e002aba118904f65c385168 Author: npm <npm@chromium.org> Date: Fri Oct 21 23:02:15 2016 Add CPDF_Document::GetPage() unittests Added a nontrivial page tree and a test that pages are being fetched properly, both when requested in order and in reverse order. This will help prevent introducing bugs while changing the way the page tree is processed. BUG= chromium:638513 Review-Url: https://chromiumcodereview.appspot.com/2435783006 [modify] https://crrev.com/3cad596c55cd4db64e002aba118904f65c385168/BUILD.gn [modify] https://crrev.com/3cad596c55cd4db64e002aba118904f65c385168/core/fpdfapi/parser/cpdf_document.h [add] https://crrev.com/3cad596c55cd4db64e002aba118904f65c385168/core/fpdfapi/parser/cpdf_document_unittest.cpp
,
Oct 22 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/41176b1fbb13e3c55f60b35ccdb74a5f501e0960 commit 41176b1fbb13e3c55f60b35ccdb74a5f501e0960 Author: pdfium-deps-roller <pdfium-deps-roller@chromium.org> Date: Sat Oct 22 04:56:20 2016 Roll src/third_party/pdfium/ 5532835ae..3cad596c5 (2 commits). https://pdfium.googlesource.com/pdfium.git/+log/5532835aeb88..3cad596c55cd $ git log 5532835ae..3cad596c5 --date=short --no-merges --format='%ad %ae %s' 2016-10-21 npm Add CPDF_Document::GetPage() unittests 2016-10-21 tsepez Remove dead code in CPDF_CustomAccess BUG= 638513 Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, see: http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls TBR=dsinclair@chromium.org Review-Url: https://chromiumcodereview.appspot.com/2440263002 Cr-Commit-Position: refs/heads/master@{#426974} [modify] https://crrev.com/41176b1fbb13e3c55f60b35ccdb74a5f501e0960/DEPS
,
Oct 26 2016
The following revision refers to this bug: https://pdfium.googlesource.com/pdfium.git/+/d3a2009d75eac3cda442f545ef0865afae7b35cf commit d3a2009d75eac3cda442f545ef0865afae7b35cf Author: npm <npm@chromium.org> Date: Wed Oct 26 18:03:43 2016 Traverse PDF page tree only once in CPDF_Document Try 2: main fix was recursively popping elements from the stack. Since the Traverse method can be called on non-root nodes from GetPage(), we have to make sure to properly update the parents. Try 1 at https://codereview.chromium.org/2414423002/ In our current implementation of CPDF_Document::GetPage, we traverse the PDF page tree until we find the index we are looking for. This is slow when we do calls GetPage(0), GetPage(1), ... since in this case the page tree will be traversed n times if there are n pages. This CL makes sure the page tree is only traversed once. Time to load the PDF from the bug below in chrome official build: Before this CL: around 1 minute 25 seconds After this CL: around 4 seconds BUG= chromium:638513 Review-Url: https://codereview.chromium.org/2442403002 [modify] https://crrev.com/d3a2009d75eac3cda442f545ef0865afae7b35cf/core/fpdfapi/parser/cpdf_document.cpp [modify] https://crrev.com/d3a2009d75eac3cda442f545ef0865afae7b35cf/core/fpdfapi/parser/cpdf_document.h
,
Oct 26 2016
Let's try again
,
Oct 26 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/57d1a4fe85b070baff03ce36f156ee99a71b0cc1 commit 57d1a4fe85b070baff03ce36f156ee99a71b0cc1 Author: pdfium-deps-roller <pdfium-deps-roller@chromium.org> Date: Wed Oct 26 21:45:11 2016 Roll src/third_party/pdfium/ 16b703cb5..d3a2009d7 (2 commits). https://pdfium.googlesource.com/pdfium.git/+log/16b703cb504a..d3a2009d75ea $ git log 16b703cb5..d3a2009d7 --date=short --no-merges --format='%ad %ae %s' 2016-10-26 npm Traverse PDF page tree only once in CPDF_Document 2016-10-26 tsepez Avoid some FX_BOOL/bool noise in fx_codec_fax.cpp BUG= 638513 Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, see: http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls TBR=dsinclair@chromium.org Review-Url: https://codereview.chromium.org/2455713002 Cr-Commit-Position: refs/heads/master@{#427822} [modify] https://crrev.com/57d1a4fe85b070baff03ce36f156ee99a71b0cc1/DEPS
,
Oct 27 2016
Tested the issue on Windows-10, Ubuntu 14.04 and Mac OS 10.12 using chrome latest Dev M56-56.0.2902.0 by following steps mentioned in the original comment. Observed that PDF is opening in 2-3 seconds and attaching the screen cast for reference. Could you please look into it and provide us the feedback. Thank you in Advance!
,
Oct 28 2016
The following revision refers to this bug: https://pdfium.googlesource.com/pdfium.git/+/900f421e29daf2ab62de3ae8dc821f031bc7bdb3 commit 900f421e29daf2ab62de3ae8dc821f031bc7bdb3 Author: npm <npm@chromium.org> Date: Fri Oct 28 21:30:44 2016 Revert of Traverse PDF page tree only once in CPDF_Document Try 2 (patchset #3 id:40001 of https://codereview.chromium.org/2442403002/ ) Reason for revert: Not quite right yet. Original issue's description: > Traverse PDF page tree only once in CPDF_Document > > Try 2: main fix was recursively popping elements from the stack. Since > the Traverse method can be called on non-root nodes from GetPage(), we > have to make sure to properly update the parents. > > Try 1 at https://codereview.chromium.org/2414423002/ > > In our current implementation of CPDF_Document::GetPage, we traverse > the PDF page tree until we find the index we are looking for. This is > slow when we do calls GetPage(0), GetPage(1), ... since in this case > the page tree will be traversed n times if there are n pages. This CL > makes sure the page tree is only traversed once. > > Time to load the PDF from the bug below in chrome official build: > Before this CL: around 1 minute 25 seconds > After this CL: around 4 seconds > > BUG= chromium:638513 > > Committed: https://pdfium.googlesource.com/pdfium/+/d3a2009d75eac3cda442f545ef0865afae7b35cf TBR=tsepez@chromium.org,weili@chromium.org,thestig@chromium.org # Not skipping CQ checks because original CL landed more than 1 days ago. BUG= chromium:638513 Review-Url: https://codereview.chromium.org/2461063003 [modify] https://crrev.com/900f421e29daf2ab62de3ae8dc821f031bc7bdb3/core/fpdfapi/parser/cpdf_document.cpp [modify] https://crrev.com/900f421e29daf2ab62de3ae8dc821f031bc7bdb3/core/fpdfapi/parser/cpdf_document.h
,
Oct 28 2016
,
Oct 30 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/20945fd7014d65578c88a08f9d8e90d4877cee6d commit 20945fd7014d65578c88a08f9d8e90d4877cee6d Author: pdfium-deps-roller <pdfium-deps-roller@chromium.org> Date: Sun Oct 30 00:09:30 2016 Roll src/third_party/pdfium/ dc0401a0d..747dcf775 (4 commits). https://pdfium.googlesource.com/pdfium.git/+log/dc0401a0da44..747dcf775c5c $ git log dc0401a0d..747dcf775 --date=short --no-merges --format='%ad %ae %s' 2016-10-29 dsinclair Revert of Change FX_BOOL definition from int to bool. (patchset #15 id:250001 of https://codereview.chromium.org/2453473003/ ) 2016-10-28 tsepez Change FX_BOOL definition from int to bool. 2016-10-28 tsepez Fix FX_BOOL / int issue in fx_skia_device.cpp 2016-10-28 npm Revert of Traverse PDF page tree only once in CPDF_Document Try 2 (patchset #3 id:40001 of https://codereview.chromium.org/2442403002/ ) BUG= 638513 Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, see: http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls TBR=dsinclair@chromium.org Review-Url: https://codereview.chromium.org/2459173002 Cr-Commit-Position: refs/heads/master@{#428617} [modify] https://crrev.com/20945fd7014d65578c88a08f9d8e90d4877cee6d/DEPS
,
Nov 4 2016
The following revision refers to this bug: https://pdfium.googlesource.com/pdfium.git/+/ec64cee9acccd0d1e574bbbd8aa91b08c1cf254f commit ec64cee9acccd0d1e574bbbd8aa91b08c1cf254f Author: npm <npm@chromium.org> Date: Fri Nov 04 19:54:51 2016 Traverse PDF page tree only once in CPDF_Document Try 3 Now, we do not start traversal from where we were at, but from the top. This makes the code less prone to bugs, as now there is no need to call methods to recursively fix things. This will save a lot of time when the trees are rather flat, as in the PDF file in the bug. It can still be slow, for instance if we have a chain of page nodes, and the last in the chain contains all of the pages (this is artificial). Try 2 at https://codereview.chromium.org/2442403002/ Also added test where Try 2 would have failed. Tested the pdf from the bug on my Mac: With this CL: load in 21 seconds Without this CL: did not load in 4 minutes, got tired of waiting BUG= chromium:638513 Review-Url: https://codereview.chromium.org/2470803003 [modify] https://crrev.com/ec64cee9acccd0d1e574bbbd8aa91b08c1cf254f/core/fpdfapi/parser/cpdf_document.cpp [modify] https://crrev.com/ec64cee9acccd0d1e574bbbd8aa91b08c1cf254f/core/fpdfapi/parser/cpdf_document.h [modify] https://crrev.com/ec64cee9acccd0d1e574bbbd8aa91b08c1cf254f/core/fpdfapi/parser/cpdf_document_unittest.cpp
,
Nov 4 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ab77e5ae52017574204e37b9f9dfccea6fad6513 commit ab77e5ae52017574204e37b9f9dfccea6fad6513 Author: pdfium-deps-roller <pdfium-deps-roller@chromium.org> Date: Fri Nov 04 23:09:46 2016 Roll src/third_party/pdfium/ 761eed284..ec64cee9a (2 commits). https://pdfium.googlesource.com/pdfium.git/+log/761eed284e12..ec64cee9accc $ git log 761eed284..ec64cee9a --date=short --no-merges --format='%ad %ae %s' 2016-11-04 npm Traverse PDF page tree only once in CPDF_Document Try 3 2016-11-04 tsepez Reland "Remove CPDF_Object::Release() in favor of direct delete" BUG= 638513 Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, see: http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls TBR=dsinclair@chromium.org Review-Url: https://codereview.chromium.org/2480883002 Cr-Commit-Position: refs/heads/master@{#430059} [modify] https://crrev.com/ab77e5ae52017574204e37b9f9dfccea6fad6513/DEPS
,
Nov 8 2016
|
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by durga.behera@chromium.org
, Aug 19 2016