Headless Chrome Puppeteer generated PDF does not show some Unicode fonts on Acrobat
Reported by
ali.ra...@veeva.com,
Dec 13
|
||||
Issue description
Steps to reproduce
1. Use the HTML with Headless Chrome to generate a PDF.
2. Open the PDF using Acrobat Reader, and notice that there's a dialog box shown for missing fonts. Note that on different Windows systems, the PDF output is different (e.g, the PDF that I attached was generated on my system, and it shows some fonts missing, while on another Windows system, some other fonts were missing.) It seems that Headless embeds fonts differently than what Acrobat is expecting.
Tell us about your environment:
Puppeteer version: 1.10.0
Platform / OS version: Win10
URLs (if applicable):
Node.js version: 10.13.0
What steps will reproduce the problem?
Use the code below to render the HTML file to generate a PDF.
Please include code that reproduces the issue.
==========================================================
const puppeteer = require('puppeteer');
const path = require('path');
async function createHeadlessChromeInstance() {
browser = await puppeteer.launch(
{
ignoreHTTPSErrors: false
}
);
const page = await browser.newPage();
// Set viewport to a fixed size.
await page.setViewport({width: 750, height: 600});
return page;
}
async function generatePdf() {
// create page instance
let page;
let navigationTimeout = 60000; // 1 min
let response;
try {
page = await createHeadlessChromeInstance();
response = await page.goto("file://" + 'c:\\testdata_font.htm', {
timeout: navigationTimeout,
waitUntil: ['load', 'networkidle2']
});
await page.waitFor(500);
}
catch (error) {
throw error;
}
// generate pdf for the current page
pdfProperties = {};
pdfProperties.path = 'output.pdf';
pdfProperties.margin = '1in';
pdfProperties.displayHeaderFooter = false;
pdfProperties.printBackground = true;
pdfProperties.width = '8.27in';
pdfProperties.height = '11.7in';
await page.pdf(pdfProperties);
await browser.close();
}
async function main() {
try {
await generatePdf();
} catch(e) {
return process.exit(1);
}
console.log("pdf conversion completed successfully.");
return process.exit(0);
}
main();
==========================================================
The HTML is:
==========================================================
<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
</head>
<body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
<h1 style="font-family:Aldhabi;">This is a Aldhabi الخطوط العربية النصي</h1>
<h1 style="font-family:Arabic Typesetting;">Arabic Typesetting الخطوط العربية النصي</h1>
<h1 style="font-family:Shonar Bangla;">This is a Bangla Supplemental Fonts:</h1>
<h1 style="font-family:Shonar Bangla;">This is a বাংলায় টেক্সট</h1>
<h1 style="font-family:DengXian ;">This is a DengXian 中文文本</h1>
<h1 style="font-family:KaiTi;">This is a KaiTi 中文文本</h1>
<h1 style="font-family:DFKai-SB;">This is a DFKai-SB 中文文本</h1>
<h1 style="font-family:Aparajita;">This is a Aparajita तेक्स्त इन देवनगरी</h1>
<h1 style="font-family:Sanskrit Text;">This is a Sanskrit Text तेक्स्त इन देवनगरी</h1>
<h1 style="font-family:FrankRuehl;">This is a FrankRuehl טקסט בעברית</h1>
<h1 style="font-family:Meiryo;">This is a Meiryo テキストは日本語です</h1>
<h1 style="font-family:Tunga;">This is a Tunga ಪಠ್ಯವು ಕನ್ನಡದಲ್ಲಿದೆ</h1>
<h1 style="font-family:Batang;">This is a Batang 텍스트는 한국에있다</h1>
<h1 style="font-family:Karthika;">This is a Karthika ടെക്സ്റ്റ് മലയാളത്തിലാണ്</h1>
<h1 style="font-family:Gautami;">This is a Gautami, టెక్స్ట్ టెలోగిలో ఉంది</h1>
<h1 style="font-family:DilleniaUPC;">This is a DilleniaUPC ข้อความเป็นภาษาไทย</h1>
<h1 style="font-family:Latha;">This is a Latha தமிழ் மொழியில் உள்ளது</h1>
</body>
</html>
==========================================================
What is the expected result?
A PDF file which is shown correctly on Acrobat Reader.
What happens instead?
The PDF (output.pdf) is shown incorrectly on Acrobat Reader. If I open the PDF with Chrome browser or some other PDF reader, then the fonts are shown. I have attached a screenshot of the issue (headless_font_not_showing.png).
If I directly open the HTML with Chrome browser, and use "print to pdf", the resulting PDF works fine with Acrobat.
Also, if I open output.pdf with Chrome browser (the fonts show fine) and copy all contents to clipboard, then pasting it in some other text processor results in missing text as well. So it appears that the PDF contains those Unicode fonts in a way that are not portable across products.
I was redirected from https://github.com/GoogleChrome/puppeteer/issues/3668
,
Dec 14
Some notes: On different computers, the PDF output is different for Unicode fonts. The output.pdf that I attached was generated on my Win10 machine, but on a Win2016 Server with different fontset installed, the PDF shows other fonts missing. So I'll be curious to know if there's a way to manipulate fonts with Headless Chrome. For example, redirecting the PDF functionality to use a different set of fonts instead of using the system-default. Or, if there's a way to embed Unicode fonts differently which works across products.
,
Dec 14
,
Dec 17
This could be a bug in Adobe Reader. Have you tried contacting Adobe?
,
Dec 18
Yes, this would appear as an Adobe bug, but as I mentioned here (https://github.com/GoogleChrome/puppeteer/issues/3668#issuecomment-447133532), the issue is not seen when the PDF is generated directly by Chrome browser's save-as-PDF functionality. Also, when the headless-chrome-generated PDF (attached output.pdf) is opened via Chrome browser and all fonts are shown properly, copying the contents to clipboard and pasting it to other text processing apps results in missing text as well. So there appears to be something different (in how Unicode fonts are processed) with Chromium-save-as-PDF vs Chrome-browser-save-as-PDF functionality. Does it look like something that's a Chromium bug?
,
Dec 18
Yesterday I tried opening the PDF in Adobe Acrobat Pro to see if its "Preflight" validator would tell me anything. It did not explain the error.
,
Dec 19
TL;DR: On some systems, the generated PDF shows errors with Preflight tool. On other systems, Preflight report shows Unicode value which is "not valid". So Chromium-generated PDF appears to have issues with Unicode characters. Long story: I have attached a PDF (testdata_font_win2016.pdf) generated on Windows 2016 with the same HTML and same Puppeteer code. It does show errors when "Preflight" validator is used (inflight_shows_errors.png attached). The earlier file (output.pdf) that I had attached does not show errors with Preflight tool, but if you generate a report using Preflight (testdata_font_report.pdf attached) and open page 10 and subsequent pages, you'll see glyphs used in the PDF show Unicode value such as "U+4E2D" for some characters that are not displayed on the PDF. And from this page (https://www.fileformat.info/info/unicode/char/4E2D/index.htm), it appears that this is not a valid Unicode character, whatever that means.
,
Dec 19
+npm FYT. BTW is Puppeteer really required for this bug? Can one just load the HTML, print normally in Chrome instead, and generate a similarly problematic PDF via Save As PDF?
,
Dec 19
>> Can one just load the HTML, print normally in Chrome instead, and generate a similarly problematic PDF via Save As PDF? No, the problem is not seen when you do "Save As PDF" directly via browser. But it is seen with Puppeteer, which is used in some of our workflows to convert html to pdf.
,
Jan 10
As per comment#9, the issue seems to be specific to Puppeteer which is out of scope for TE hence adding "TE-NeedsTriageHelp" and requesting the respective team to look into the issue and help in further triaging. Thanks.! |
||||
►
Sign in to add a comment |
||||
Comment 1 by susan.boorgula@chromium.org
, Dec 14Labels: Needs-Milestone