New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 701427 link

Starred by 3 users

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

PDF text sometimes renders wrong in test

Project Member Reported by krasin@chromium.org, Mar 14 2017

Issue description

Chrome Version: tip
OS: Linux x86-64

What steps will reproduce the problem?

I was unable to reproduce this issue locally. It sometimes shows up on CFI Linux bot with the following test cases failing:
https://build.chromium.org/p/chromium.fyi/builders/CFI%20Linux/builds/7651

PDFExtensionTest.PdfAccessibilityEnableLater
PDFExtensionTest.PdfAccessibility
PDFExtensionTest.PdfAccessibilityInOOPIF
PDFExtensionTest.PdfAccessibilityInIframe

There is a lot of log spam in the test output, but the immediate cause for the queries to fail looks the following:

../../chrome/browser/pdf/pdf_extension_test.cc:659: Failure
Value of: kExpectedPDFAXTree == ax_tree_dump
  Actual: false
Expected: true
Expected:
embeddedObject
  group
    region 'Page 1'
      paragraph
        staticText '1 First Section
'
          inlineTextBox '1 '
          inlineTextBox 'First Section
'
      paragraph
        staticText 'This is the first section.
1'
          inlineTextBox 'This is the first section.
'
          inlineTextBox '1'
    region 'Page 2'
      paragraph
        staticText '1.1 First Subsection
'
          inlineTextBox '1.1 '
          inlineTextBox 'First Subsection
'
      paragraph
        staticText 'This is the first subsection.
2'
          inlineTextBox 'This is the first subsection.
'
          inlineTextBox '2'
    region 'Page 3'
      paragraph
        staticText '2 Second Section
'
          inlineTextBox '2 '
          inlineTextBox 'Second Section
'
      paragraph
        staticText '3'
          inlineTextBox '3'


Actual:
embeddedObject
  group
    region 'Page 1'
      paragraph
        staticText '1 First Section
'
          inlineTextBox '1 '
          inlineTextBox 'First Section
'
      paragraph
        staticText 'This is the rst section.
1'
          inlineTextBox 'This is the rst section.
'
          inlineTextBox '1'
    region 'Page 2'
      paragraph
        staticText '1.1 First Subsection
'
          inlineTextBox '1.1 '
          inlineTextBox 'First Subsection
'
      paragraph
        staticText 'This is the rst subsection.
2'
          inlineTextBox 'This is the rst subsection.
'
          inlineTextBox '2'
    region 'Page 3'
      paragraph
        staticText '2 Second Section
'
          inlineTextBox '2 '
          inlineTextBox 'Second Section
'
      paragraph
        staticText '3'
          inlineTextBox '3'

As you can see the main different is "This is the rst subsection." instead of "This is the first subsection.". It reminds me of race conditions, but I am not sure if that's what happens here.

If anyone has any hints, please, put them here.

 
Cc: dmazz...@chromium.org

Comment 2 by krasin@chromium.org, Mar 14 2017

If this test was reproducible, this is the way to build it:

GYP_DEFINES='buildtype=Official' gclient sync
gn gen out/cfi '--args=is_debug=false is_cfi=true is_component_build=false' --check
ninja -C out/cfi browser_tests # Will take ~40 minutes at the last link step
./out/cfi/browser_tests --gtest_filter=PDFExtensionTest.PdfAccessibility

I believe this test failure has nothing about CFI, and something about timings. No evidence, though.

Comment 3 by raymes@chromium.org, Mar 14 2017

Owner: dmazz...@chromium.org
Status: Assigned (was: Untriaged)
dmazzoni added these tests. Over to him.
This is a very mysterious failure. I can't think of what could cause this type of corruption.

Is it reasonable to assume these changes are CFI-related?


Comment 6 by krasin@chromium.org, Mar 15 2017

CFI (in this incarnation) does a very simple thing: if it does not like a virtual call, it simply aborts a process with UD2 instruction. Not only I don't observe any aborts here, CFI failures are very deterministic.

Another possibility is a miscompilation of a sort, but such issues are deterministic as well.

What happens during the test output generation? Are there any threads / processes which communicate with each other or it's just a bunch of functions invoked consequently?
Yes, this test is asynchronous. Data is sent from the PDF process to the render process, and from the render process to the browser process.

Almost everything is done with strings and simple data structures, though, so it's not clear how we could get errors such as that one.

I'll try to reproduce locally with those compile flags.

Comment 8 by krasin@chromium.org, Mar 16 2017

Great news! The test is now failing on 'ThinLTO Linux ToT' bot, which has nothing to do with CFI:
https://build.chromium.org/p/chromium.fyi/builders/ThinLTO%20Linux%20ToT/builds/1307

The failure message is the same.

It really feels like a race condition of a sort.

Comment 9 by krasin@chromium.org, Mar 17 2017

I plan to disable these tests from running on all buildbots, as the tests are broken and no action has been taken for 3 days:

PDFExtensionTest.PdfAccessibilityEnableLater
PDFExtensionTest.PdfAccessibility
PDFExtensionTest.PdfAccessibilityInOOPIF
PDFExtensionTest.PdfAccessibilityInIframe

I will create a CL for that soon. Please, object, if there are reasons not to do that.
I sent https://codereview.chromium.org/2751973009/ for a review.
Project Member

Comment 11 by bugdroid1@chromium.org, Mar 18 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a5082d6ce45219eba13fae950a5fbdda07fe3442

commit a5082d6ce45219eba13fae950a5fbdda07fe3442
Author: krasin <krasin@chromium.org>
Date: Sat Mar 18 00:12:51 2017

Disable 4 PDFExtensionTest test cases as they fail on multiple bots.

BUG=701427

Review-Url: https://codereview.chromium.org/2751973009
Cr-Commit-Position: refs/heads/master@{#457908}

[modify] https://crrev.com/a5082d6ce45219eba13fae950a5fbdda07fe3442/chrome/browser/pdf/pdf_extension_test.cc

Owner: raymes@chromium.org
Interesting - I can reproduce this locally, but when I open the PDF in Chrome it's broken in a similar way. See attached screenshot.

Visually it shows "This is the  rst section" instead of "This is the First section".

Possibly a font issue?

I don't understand why this would be working on some bots but not others. Either way this looks like the bug is not in accessibility code or in the test, but the accessibility test is surfacing a real error somewhere.

Reassigning to raymes@ to triage and tell me if this looks like a real bug, or a known issue to work around. If the latter, I'll modify the test to make it tolerant of this issue.

pdf_screenshot.png
25.0 KB View Download
Specifically this looks like an issue with the "fi" ligature.

Hi Dominic,

thank you for digging into this. That definitely moves us one step closer to the understanding of the real issue.
I created this change to re-enable the tests.

https://codereview.chromium.org/2760053002

Project Member

Comment 16 by bugdroid1@chromium.org, Mar 22 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9d1abef69b6441eba82f137f2b38ef3d6935182a

commit 9d1abef69b6441eba82f137f2b38ef3d6935182a
Author: dmazzoni <dmazzoni@chromium.org>
Date: Wed Mar 22 19:24:09 2017

Re-enable 4 PDF accessibility tests by making them more robust.

Work around issues where the PDF plug-in is returning inconsistent string
on different platforms, regarding whitespace and "fi" ligatures.

BUG=701427

Review-Url: https://codereview.chromium.org/2760053002
Cr-Commit-Position: refs/heads/master@{#458835}

[modify] https://crrev.com/9d1abef69b6441eba82f137f2b38ef3d6935182a/chrome/browser/pdf/pdf_extension_test.cc

Owner: dsinclair@chromium.org
Summary: PDF text sometimes renders wrong in test (was: PDFExtensionTest mysterious failures)
Thanks for looking at it dmazzoni. This seems like a rendering issue in the plugin then. Assigning to dsinclair.
Owner: npm@chromium.org
npm@ can you check if this is something weird in the font code?

Comment 19 by npm@chromium.org, Mar 27 2017

I'm unable to reproduce. When I open test-bookmark.pdf on Chrome 57.0.2987.110 or 59.0.3047.0, I see the correct text. On which Chrome version and OS were you able to see a problem?

Comment 20 by npm@chromium.org, Mar 27 2017

Components: Internals>Plugins>PDF
Labels: Needs-Feedback
See above, it was reproducing consistently on some of our bots.

On my Linux workstation, the bug reproduces with a vanilla open-source Chromium build, but not with an official Google Chrome build. That may be a coincidence but I wonder if Chromium doesn't include something useful for dealing with ligatures...

Comment 22 by npm@chromium.org, Mar 28 2017

I've looked at this and don't see a problem. I also don't see how the rendering could be flaky.
* The fonts are embedded, so character rendering should be pretty consistent.
* The only strange thing about the "fi" is that it is represented by "\014" in the PDF. This is allowed under Table 3.2 of PDF spec 1.7, it's octal for charcode 12. But we handle that correctly.

The only thing I can think of is this: freetype was updated for the bots, it does not like the embedded fonts anymore, and we have to find substitutes. We then fail to render properly with these. But that still doesn't explain the rendering in #12. I'm stuck until I can reproduce that.
Anything you'd like me to check locally since I was able to reproduce it?

Comment 24 by npm@chromium.org, Mar 28 2017

Bad rendering reproduces for you on a clean ToT build, correct? If it does reproduce consistently, a bisect would help (probably something close to the date of this bug report?).

What do you get if you run:
freetype-config --ftversion
I just tried bisecting. The builds I got from the archive all worked fine - everything was "good". But when I try my own trunk build of Chrome from the same machine, it fails.

> freetype-config --ftversion
2.5.2

My gn args:
is_component_build = true
is_debug = false
use_goma = true

As another data point, I tried to reproduce this bug on many machines: my desktops, Google Compute Engine instances of various sorts at no availability. This is something about the system.

Comment 27 by npm@chromium.org, Mar 30 2017

Cc: npm@chromium.org
Labels: -Needs-Feedback
Owner: ----
Status: Untriaged (was: Assigned)
That's also my ftversion. So I don't know what the problem is.
Cc: drott@chromium.org
This is something wacky with freetype. Using the system freetype (the default on linux) I have this example failing. The failure is in FT_New_Memory_Face returning Freetype Error Code 2 (which I believe is Unknown_File_Format).

If I then set pdf_bundle_freetype = true to force the use of our internal freetype the file works correctly and I get the 'fi'.

As far as I can tell (from the dpkg version) my system freetype is the same as npm@'s so I don't know why it would fail for me and work for npm@.

Adding drott@ in case there is something about freetype that we're missing here?

(It looks like my system freetype is 2.5.2-1ubuntu2.6 and the internal one is listed as VER-2-7-1-updates)

Comment 29 by drott@chromium.org, Mar 30 2017

I'm guessing your FreeType needs type1 and/or specifically type1cid module support in FreeType. I saw issues with these accessibility tests when moving to shared FreeType on Chromium in FreeType because the test file seems to use Type1 fonts. Without type1 font support it did use Arial or something sans-serif at least as fallback. I did not check what the difference is if I disable type1cid. 

Even if you have identical version numbers of FreeType, perhaps your system FreeTypes differ in module configuration and the things they compile in? You can experiment with this by removing modules and files for third_party/BUILD.gn to force-reproduce the same error. The difference between Chromium's FreeType and PDFiums were in type1.c, type1cid.c and psaux.c, and 

FT_USE_MODULE( FT_Driver_ClassRec, t1_driver_class )
FT_USE_MODULE( FT_Driver_ClassRec, t1cid_driver_class 
FT_USE_MODULE( FT_Module_Class, psaux_module_class )

respectively.
Can we just fail all PDF tests if you try to build with system freetype?

We just just have the test suite fail with an error message saying to rebuild with our built-in FreeType. That should probably be the bot configuration.

Alternatively, could we at least spew a message to the console when this happens, explaining that we didn't get Type-1 font support and that PDF bugs should be expected unless you fix FreeType?

Comment 31 by npm@chromium.org, Mar 31 2017

We can't fail tests if using system freetype, that's the default for Linux. Unless we start shipping Freetype on Linux as well, I think it makes sense to test with system freetype.

I think it is reasonable to add a message when Freetype fails to load an embedded font with Unknown_File_Format. But as far as I know we don't have this message spewing set up for internal PDFium methods.

For now, your test could probably check for "first" on OS!=Linux, but keep "*rst" on Linux.
Status: Available (was: Untriaged)
I do not understand affect on white space or "fi" ligatures, but I do have a program that reads PDFs created by chrome and converted to text by PDFBox that have broken with chrome updates.  The first time was around October, 2016 sometime.  I have lost the details.  The last time was around 6/9.  The creator of the reports ran on 6/10 and 6/12 in which time the report format changed (I suspect with the chrome 59.0.3071.86 (Official Build)).

The October change had some change between x'C2A0' to space or visa versa.  The 59.0.3071.86 change was similar, but I am more familiar with it.  To correct my program, I had to make two changes to get similar results in either format.  I changed the converted text from PDF as such:

change x'20C2A00A' to x'0A', then
change x'C2A0' to x'20'

In java:
        text = text.replace(" " + TAGC2A0 + "\n" , "\n");           // Replace SP + &nbsp + LF with LF 
        text = text.replace(TAGC2A0, " ");                          // Replace &nbsp to SP
        
It then works as before.

I have no control over the created report (only read).  It was created on a windows machine.

In looking at the logs of changes for this chrome release, I found this bug report.  Have no idea if related, but only saw this and one other report related to PDFs.
I hope it's okay to jump in here. I have some more details and a different repro case that might shed some light.

When using a web font, I see the same behaviour where certain characters appear to automatically be treated as pair in the browser (e.g. "fi" or "fl"). If you try to select them with your mouse, they can only be selected as a pair.

If you print this page to PDF, the pair does not appear to get saved as text. The rendering of the PDF looks correct however - so perhaps it's being converted to an image instead? Unfortunately the real impact of this issue is that it breaks the ability to search the PDF for words containing those character pairs.

Chrome: Version 63.0.3239.132
OS: Windows 10

To test this:
1. Open the attached HTML in Chrome, which uses a web font from fonts.googleapis.com
2. Notice you can select the word "verification" letter by letter on the first line (no font specified) but that the characters "fi" get selected as a pair in the second line (using a Google web font).
3. Print the page to PDF
4. Open the resulting PDF and search/find for "verification". Note that the web font version is not found.
5. If you select the web font "verification" and copy/paste it into a text editor, the "fi" pair is missing.


I've attached the PDF as well if you just want to inspect that. 
webfont_encoding_issue.html
406 bytes View Download
webfont_encoding_issue.pdf
42.0 KB Download
Project Member

Comment 35 by sheriffbot@chromium.org, Jan 14

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Sign in to add a comment