The Subject header is incorrectly encoded
Reported by
firstspa...@gmail.com,
Dec 14 2017
|
||||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36 Steps to reproduce the problem: 1. Save any page as mhtml file 2. if subject has non-printable ASCII symbols mhtml file will contain something like this: Subject: =?utf-8?Q?=C2=AB=D0=9B=D1=83=D1=87=D1=88=D0=B5 =D0=B1=D1=8B =D0=BF=D1=80=D0=BE=D0=BC= =D0=BE=D0=BB=D1=87=D0=B0=D0=BB=C2=BB: =D0=B2 =D0=A1=D0=BE=D0=B2=D1=84=D0=B5= =D0=B4=D0=B5 =D0=BE=D1=82=D0=B2=D0=B5=D1=82=D0=B8=D0=BB=D0=B8 =D0=BD=D0=B0 = =D1=81=D0=BB=D0=BE=D0=B2=D0=B0 =D0=BF=D0=BE=D1=81=D0=BB=D0=B0 =D0=A1=D0=A8= =D0=90 =D0=BE =D0=9A=D1=80=D1=8B=D0=BC=D0=B5: =D0=AF=D0=BD=D0=B4=D0=B5=D0= =BA=D1=81.=D0=9D=D0=BE=D0=B2=D0=BE=D1=81=D1=82=D0=B8?= What is the expected behavior? In accordance with RFC2047: An 'encoded-word' may not be more than 75 characters long, including 'charset', 'encoding', 'encoded-text', and delimiters. If it is desirable to encode more text than will fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's (separated by CRLF **SPACE**) may be used. So right encoding should looks like this: Subject: =?UTF-8?B?0K/QvdC00LXQutGBLtCd0L7QstC+0YHRgtC4OiDQk9C70LDQstC90Ys=?= =?UTF-8?B?0LUg0L3QvtCy0L7RgdGC0Lgg0YHQtdCz0L7QtNC90Y8sINGB0LDQvNGL0LUg0YE=?= =?UTF-8?B?0LLQtdC20LjQtSDQuCDQv9C+0YHQu9C10LTQvdC40LUg0L3QvtCy0L7RgdGC0Lgg?= =?UTF-8?B?0KDQvtGB0YHQuNC4INC+0L3Qu9Cw0LnQvQ==?= What went wrong? In current version of encoding (without leading space in the beginning of every new line) some existing mhtml parsers cannot correctly parse the file. Did this work before? N/A Chrome version: 63.0.3239.84 Channel: stable OS Version: 10.0 Flash Version:
,
Dec 15 2017
Unable to reproduce this isseu on reported version 63.0.3239.84 using Windows 10 with steps mentioned below. 1. Opened https://bugs.chromium.org/p/chromium/issues/detail?id=794835 >> From context menu selected save as and saved with extension mhtml 2. Added above mentioned text to saved html file, opened in browser and observed blank page. @Reporter: Please provide sample URL to check this issue and also let us know where to check this issue. This would help in further triaging of this issue. Thanks!
,
Dec 15 2017
As example: https://news.yandex.ru/
,
Dec 15 2017
I believe this issue has relation to Blink>SavePage component but don`t know how to set it. Comment 3 contain good sample URL. When I try to save https://news.yandex.ru/ to "WebPage, single file" the resulting mhtml file contain Subject: =?utf-8?Q?=D0=AF=D0=BD=D0=B4=D0=B5=D0=BA=D1=81.=D0=9D=D0=BE=D0=B2=D0=BE=D1=81=D1=82= =D0=B8: =D0=93=D0=BB=D0=B0=D0=B2=D0=BD=D1=8B=D0=B5 =D0=BD=D0=BE=D0=B2=D0=BE= =D1=81=D1=82=D0=B8 =D1=81=D0=B5=D0=B3=D0=BE=D0=B4=D0=BD=D1=8F, =D1=81=D0=B0= =D0=BC=D1=8B=D0=B5 =D1=81=D0=B2=D0=B5=D0=B6=D0=B8=D0=B5 =D0=B8 =D0=BF=D0=BE= =D1=81=D0=BB=D0=B5=D0=B4=D0=BD=D0=B8=D0=B5 =D0=BD=D0=BE=D0=B2=D0=BE=D1=81= =D1=82=D0=B8 =D0=A0=D0=BE=D1=81=D1=81=D0=B8=D0=B8 =D0=BE=D0=BD=D0=BB=D0=B0= =D0=B9=D0=BD?= Note: "WebPage, single file" can be enabled with chrome://flags/#save-page-as-mhtml
,
Dec 15 2017
,
Dec 15 2017
Thank you for providing more feedback. Adding requester "sc00335628@techmahindra.com" to the cc list and removing "Needs-Feedback" label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Dec 15 2017
jianli@ - could you PTAL? AFAIK the encoding was introduced in your https://chromium-review.googlesource.com/c/chromium/src/+/701262/4/third_party/WebKit/Source/platform/mhtml/MHTMLArchive.cpp
,
Dec 15 2017
I am working on it. Currently it seems to only affects using 3rd party MHTML parser to read a MHTML file saved by Chrome for those web pages with long subject. To help us evaluate the impact of this issue, which MHTML parser is affected?
,
Dec 15 2017
Initially the problem was detected by user of FAR file manager. FAR has a plugin named Observer. This plugin allows user to extract objects (images, styles and so on) from mhtml files (and from any other mime files). Also 7Zip plugin eDecoder fails on such files. In any case it will be better if Chrome (and all Chromium based browsers) creates correct mhtml files in accordance with RFC. All other browsers tested by me (old Opera, FF before 57 and IE) create mhtml files with RFC correct headers.
,
Jan 19 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a97e4b274530e17656f7d493f4eb950502311e08 commit a97e4b274530e17656f7d493f4eb950502311e08 Author: Jian Li <jianli@chromium.org> Date: Fri Jan 19 00:56:13 2018 Encode Subject header correctly per RFC 2047 Thee're some differences between RFC 2047 which we should be used to encode header value and RFC 2045 for body: 1) Use CRLF+SPACE for soft line break. 2) SPACE and TAB should always be encoded. 3) Multiple encoded text should be used Did manual test with FAR file manager w/ Observer plugin and 7-Zip w/ eDecoder plugin. Bug: 794835 Change-Id: I5b87b7392d2208dd58bf512c7ee59c87bc32a85a Reviewed-on: https://chromium-review.googlesource.com/835009 Reviewed-by: Xianzhu Wang <wangxianzhu@chromium.org> Reviewed-by: Łukasz Anforowicz <lukasza@chromium.org> Reviewed-by: Daniel Cheng <dcheng@chromium.org> Commit-Queue: Jian Li <jianli@chromium.org> Cr-Commit-Position: refs/heads/master@{#530371} [modify] https://crrev.com/a97e4b274530e17656f7d493f4eb950502311e08/third_party/WebKit/Source/core/frame/MHTMLTest.cpp [add] https://crrev.com/a97e4b274530e17656f7d493f4eb950502311e08/third_party/WebKit/Source/core/testing/data/mhtml/soft_line_break.mht [modify] https://crrev.com/a97e4b274530e17656f7d493f4eb950502311e08/third_party/WebKit/Source/platform/mhtml/MHTMLArchive.cpp [modify] https://crrev.com/a97e4b274530e17656f7d493f4eb950502311e08/third_party/WebKit/Source/platform/text/QuotedPrintable.cpp [modify] https://crrev.com/a97e4b274530e17656f7d493f4eb950502311e08/third_party/WebKit/Source/platform/text/QuotedPrintable.h
,
Jan 23 2018
,
Jan 24 2018
Tested this issue on Windows 10 on the latest Canary build 66.0.3330.0 by following the below steps. 1. Launched Chrome and enabled the flag #save-page-as-mhtml 2. Navigated to the page https://news.yandex.ru/ -> Save As -> file name: filename.mhtml and Save As type: Web Page, Single File. 3. On clicking on the .mhtml file, the page is invoking, but not sure where to check the Subject header encoding . Attached is the screen cast for reference. jianli@ Can you please check and confirm if anything is missed from our end in reproducing the issue. Also please help us with where to check the Subject header is encoded correctly or no. Thanks..
,
Jan 24 2018
I can confirm that 66.0.3330.0 creates correct mhtml files. If you open created mhtml file in Notepad you will see correct Subject header. Now it looks like this: From: <Saved by Blink> Snapshot-Content-Location: https://news.yandex.ru/... Subject: =?utf-8?Q?=D0=9D=D0=BE=D0=B2=D0=B0=D0=BA=20=D0=BE=D0=B1=D1=81=D1=83=D0=B4?= =?utf-8?Q?=D0=B8=D0=BB=20=D1=81=20=D0=A8=D0=B5=D1=84=D1=87=D0=BE=D0=B2=D0?= =?utf-8?Q?=B8=D1=87=D0=B5=D0=BC=20=D0=BF=D0=BE=D1=81=D1=82=D0=B0=D0=B2=D0?= =?utf-8?Q?=BA=D0=B8=20=D1=80=D0=BE=D1=81=D1=81=D0=B8=D0=B9=D1=81=D0=BA=D0?= =?utf-8?Q?=BE=D0=B3=D0=BE=20=D0=B3=D0=B0=D0=B7=D0=B0=20=D0=B2=20=D0=95=D0?= =?utf-8?Q?=B2=D1=80=D0=BE=D0=BF=D1=83:=20=D0=AF=D0=BD=D0=B4=D0=B5=D0=BA?= =?utf-8?Q?=D1=81.=D0=9D=D0=BE=D0=B2=D0=BE=D1=81=D1=82=D0=B8?= Date: Thu, 24 Jan 2018 16:49:38 -0000 MIME-Version: 1.0 Content-Type: multipart/related; type="text/html"; boundary="----MultipartBoundary--SebsyQyHZ2k8hatblxDRVj7a0TP45ntfqebghJPQs2----" Thanks.
,
Jan 24 2018
Your change meets the bar and is auto-approved for M65. Please go ahead and merge the CL to branch 3325 manually. Please contact milestone owner if you have questions. Owners: cmasso@(Android), cmasso@(iOS), bhthompson@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jan 24 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/bf03fcb2773f803f40f6ffce834677cce0af20d4 commit bf03fcb2773f803f40f6ffce834677cce0af20d4 Author: Jian Li <jianli@chromium.org> Date: Wed Jan 24 22:11:05 2018 [M65 Merge] Encode Subject header correctly per RFC 2047 Thee're some differences between RFC 2047 which we should be used to encode header value and RFC 2045 for body: 1) Use CRLF+SPACE for soft line break. 2) SPACE and TAB should always be encoded. 3) Multiple encoded text should be used Did manual test with FAR file manager w/ Observer plugin and 7-Zip w/ eDecoder plugin. TBR=jianli@chromium.org (cherry picked from commit a97e4b274530e17656f7d493f4eb950502311e08) Bug: 794835 Change-Id: I5b87b7392d2208dd58bf512c7ee59c87bc32a85a Reviewed-on: https://chromium-review.googlesource.com/835009 Reviewed-by: Xianzhu Wang <wangxianzhu@chromium.org> Reviewed-by: Łukasz Anforowicz <lukasza@chromium.org> Reviewed-by: Daniel Cheng <dcheng@chromium.org> Commit-Queue: Jian Li <jianli@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#530371} Reviewed-on: https://chromium-review.googlesource.com/884512 Reviewed-by: Jian Li <jianli@chromium.org> Cr-Commit-Position: refs/branch-heads/3325@{#73} Cr-Branched-From: bc084a8b5afa3744a74927344e304c02ae54189f-refs/heads/master@{#530369} [modify] https://crrev.com/bf03fcb2773f803f40f6ffce834677cce0af20d4/third_party/WebKit/Source/core/frame/MHTMLTest.cpp [add] https://crrev.com/bf03fcb2773f803f40f6ffce834677cce0af20d4/third_party/WebKit/Source/core/testing/data/mhtml/soft_line_break.mht [modify] https://crrev.com/bf03fcb2773f803f40f6ffce834677cce0af20d4/third_party/WebKit/Source/platform/mhtml/MHTMLArchive.cpp [modify] https://crrev.com/bf03fcb2773f803f40f6ffce834677cce0af20d4/third_party/WebKit/Source/platform/text/QuotedPrintable.cpp [modify] https://crrev.com/bf03fcb2773f803f40f6ffce834677cce0af20d4/third_party/WebKit/Source/platform/text/QuotedPrintable.h
,
Jan 24 2018
,
Jan 25 2018
Tested this issue on Windows 10 on the latest Chrome Build 65.0.3325.18 by following the steps mentioned in comment #12. On saving the file as .mhtml file and opening the file in notepad, can observe the Subject header as mentioned in comment #13. Attached is the screen cast for reference. Hence adding TE verified labels as the fix is working as intended. Thanks.. |
||||||||||||
►
Sign in to add a comment |
||||||||||||
Comment 1 by rych...@gmail.com
, Dec 15 2017