ZIP file titled in Japanese character cannot open attached file from Chrome OS. |
||||||||||||
Issue descriptionChromeOS version: 65.0.3325.209, 64.0.3282.190, 63.0.3239.140, 62.0.3202.97 beta (66.0.3359.102) ChromeOS device model: any Chrome OS devices. Case#: 15520130 Description:ZIP file titled in Japanese character cannot open attached file from Chrome OS Steps to reproduce: 1)Create a folder that titled in Japanese character and zip by JP Windows OS. 2)Share the created ZIP file. 3)Open created ZIP file on any managed Chromebook device. 3.1)Access and try to open the ZIP file titled in Japanese character on Google drive. 3.2)Download the ZIP file and uncompress/open from device side. -the message says “Nothing to see here” even there is a folder/file in the ZIP folder. [All troubleshooting steps already taken] - The customer tried to uncompress the same ZIP file titled in Japanese character from any managed Chromebook device with older version of Chrome 64, but the result the same. (I also tested several different OS versions, but could not resolved the issue. ) -We could not reproduce the same issue if compressed a Japanese character file name by “Zip selection” at locally chrome device side. -From local test, I created a compressed ZIP folder(titled in Japanese character) by Eng Mac OS is working OK - The customer found as a workaround when the same ZIP is saved with alphabetic English title instead of using Japanese Character, the same can be opened from ZIP file on Chrome OS devices normally. ( I renamed the same ZIP file titled in Japanese character to English, but cold not fix the issue at local) Current Behavior / Reproduction: The customer is not able to uncompress/open the ZIP file that titled in Japanese character. Expected Behavior: The customer should be able to uncompress/open the ZIP file that titled in Japanese character. Similar case: crbug.com/423842 (I created new crbug case because crbug.com/423842 is old case) Sample ZIP file From the customer: https://drive.google.com/open?id=1_PKEct2rWETorOWFQyefLO8rKtD6qp2t Created MS JP OS:https://drive.google.com/open?id=1bz-Fax8uZKlhvDPUoPf9JEyid_abT-3M Log file. https://drive.google.com/open?id=1POtk2DwmC_G8E6Y5rFj--F2xKEzeh6dz Screen shot https://drive.google.com/open?id=1bRHF2DaI1j1wjbkJ_o0A8E96i7IBH_jb
,
Apr 21 2018
,
Apr 23 2018
I'm assigning to Yamaguchi-san for his assessment. Yamaguchi, can you confirm if this is reproducible with Zip Archiver? I suspect this is related how Windows deal with filenames in UTF16: https://cs.chromium.org/chromium/src/base/files/file_path.h?l=5-16&rcl=3a8e2cfea264a669eb21f46d322aff83471bb0fc
,
Apr 23 2018
> From the customer: https://drive.google.com/open?id=1_PKEct2rWETorOWFQyefLO8rKtD6qp2t > Created MS JP OS:https://drive.google.com/open?id=1bz-Fax8uZKlhvDPUoPf9JEyid_abT-3M The filenames in these files seems to be encoded in Shift JIS. Neither files sets the general purpose bit 11 in the header, indicating it is CP437. https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT APPENDIX D - Language Encoding (EFS) Having said that I think it'd be much better to cover such cases, because it looks users can easily create zip files that is affected by this issue while using some other platforms.
,
Apr 23 2018
This is also reproducible with Zip Archiver. > - The customer found as a workaround when the same ZIP is saved with alphabetic English title instead of using Japanese Character, the same can be opened from ZIP file on Chrome OS devices normally. > ( I renamed the same ZIP file titled in Japanese character to English, but cold not fix the issue at local) I assumed that, in the description of this issue, "ZIP file titled in Japanese character" means: "a file packed in ZIP file has Japanese characters in its file name", but NOT "Has Japanese characters in the filename of the .zip file". Let me know if I got it wrong.
,
Apr 23 2018
The files are dropped from file metadata list because the pNaCl module of Zip Archiver and ZIP unpacker uses the filename as Javascript object's field name, and then dropped when passing it to JS side. https://cs.chromium.org/chromium/src/chrome/browser/resources/chromeos/zip_archiver/cpp/volume.cc?type=cs&q=file:zip_archiver+createentry+%22entry_name+%3D+entry_path%22&sq=package:chromium&l=74 Therefore this is not specific to Japanese language and Shift JIS encoding, but can happen with many other encoding methods other than UTF-8 or CP437. (Even when it didn't happen, we'd still need to guess and convert encodes for making it readable for human beings, or escape some characters to make other parts of our system work.) Unassigned for triage.
,
Apr 23 2018
,
May 1 2018
,
May 10 2018
> Even when it didn't happen, we'd still need to guess and convert encodes for making it readable for human beings, Yes, that's what needs to be done. The ZIP spec about CP437 is useless and should be ignored because a lot of zip archiving tools (esp. on Windows) use the system default encoding/codepage (e.g. Shift_JIS/Windows 932 on Japanese Windows, Big5/Windows 950 on Trad. Chinese Windows, Windows-1252 on Western European Windows, etc) when bit 11 in the zip spec is unset. So, we have to detect the encoding (using CompactEncoding Detector) of a "byte sequence' in the name field and convert that sequence to Unicode assuming that the sequence is in the detected encoding. See https://cs.chromium.org/chromium/src/third_party/ced and base/i18n/encoding_detection.h .
,
May 11 2018
Issue 423842 has been merged into this issue.
,
May 11 2018
Once the encoding is determined (either by CompactEncodingDetector or assuming a legacy encoding per the UI language [1]), Web Encoding API can be used to decode instead of PNaCl. See https://encoding.spec.whatwg.org/ . [1] e.g. If the UI language is Japanese, assume Shift_JIS.
,
May 11 2018
,
May 11 2018
CP437 may be actually used by some zip archiving tools on Windows in (Western) European languages. That could be an issue because CED does not detect CP437 IIRC. To test if CP437 is used or not, 1. set the default code page for non-Unicode app to Windows-1252 in the control panel on Windows. 2. create files whose names have accented Latin letters as used in German/French. 3. Zip them up 4. Examine the byte sequences for file names I hope it's windows-1252 (ISO-8859-1 superset) instead of CP437.
,
May 22 2018
yamaguchi@: Can we get an update on this bug? I've bump the priority on this bug to P1 since it has caused a lot of frustration both internally and externally. Our support folks in Japan encounter this issue on a regular basis. Thanks.
,
May 22 2018
- Theoretically we cannot 100% recover the original file name. (e.g. when the file name is very short and different encoding from user's locale) - We have a pending partial fix, which will emit garbled file names but make files accessible. https://chromium-review.googlesource.com/c/chromium/src/+/1039122 - However since it's still a bad UX, we are going to try another approach to recover original file name. Since the priority has been bumped, I'll give it a try today and decide which fix we will apply for M68.
,
May 22 2018
There was logic to guess encoding in ZIP unpacker based on what Windows encoding uses. Isn't it working?
,
May 22 2018
I have seen that code portion but looking the final result it seems not working perfectly. It might have been missed when we made the extension a component extension (Zip Archiver). I will take a look at that part first.
,
May 22 2018
> There was logic to guess encoding in ZIP unpacker based on what Windows encoding uses. Isn't it working? The extension has a map from locale to a default encoding, but not using it anymore. I guess we had used it when we were using libarchive, but the minizip version that we use doesn't accept that parameter.
,
May 22 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/730acb5f58b5d30189afda725351879cd848c3d0 commit 730acb5f58b5d30189afda725351879cd848c3d0 Author: Tatsuhisa Yamaguchi <yamaguchi@google.com> Date: Tue May 22 09:17:08 2018 Convert file names to UTF-8 when it's declared CP437. ZIP files can have non-UTF8 characters in the filename (and comments). Encoding is declared by the language encoding flag (EFS) in general purpose bit flag. Before this change we assumed that it's always in UTF-8. When the raw data is not valid UTF-8, such archive could not be read because we use file name as a property name of Javascript object. This change ensures that file names are valid UTF-8 byte sequence before using it as field name. In practice, some ZIP files have the flag 0 but using other nonstandard encodings (like Shift-JIS) in the file names. In such case such file names will still look garbled, but at least becomes accessible. Bug: 834544 Cq-Include-Trybots: master.tryserver.chromium.linux:closure_compilation Change-Id: Ib07e572b509353350c83aa4d81e6fa88b3f1d9b5 Reviewed-on: https://chromium-review.googlesource.com/1039122 Commit-Queue: Tatsuhisa Yamaguchi <yamaguchi@chromium.org> Reviewed-by: Yuki Awano <yawano@chromium.org> Cr-Commit-Position: refs/heads/master@{#560519} [modify] https://crrev.com/730acb5f58b5d30189afda725351879cd848c3d0/chrome/browser/chromeos/BUILD.gn [modify] https://crrev.com/730acb5f58b5d30189afda725351879cd848c3d0/chrome/browser/resources/chromeos/zip_archiver/BUILD.gn [modify] https://crrev.com/730acb5f58b5d30189afda725351879cd848c3d0/chrome/browser/resources/chromeos/zip_archiver/cpp/BUILD.gn [add] https://crrev.com/730acb5f58b5d30189afda725351879cd848c3d0/chrome/browser/resources/chromeos/zip_archiver/cpp/char_coding.cc [add] https://crrev.com/730acb5f58b5d30189afda725351879cd848c3d0/chrome/browser/resources/chromeos/zip_archiver/cpp/char_coding.h [modify] https://crrev.com/730acb5f58b5d30189afda725351879cd848c3d0/chrome/browser/resources/chromeos/zip_archiver/cpp/volume.cc [modify] https://crrev.com/730acb5f58b5d30189afda725351879cd848c3d0/chrome/browser/resources/chromeos/zip_archiver/cpp/volume_archive.h [modify] https://crrev.com/730acb5f58b5d30189afda725351879cd848c3d0/chrome/browser/resources/chromeos/zip_archiver/cpp/volume_archive_minizip.cc [modify] https://crrev.com/730acb5f58b5d30189afda725351879cd848c3d0/chrome/browser/resources/chromeos/zip_archiver/cpp/volume_archive_minizip.h [add] https://crrev.com/730acb5f58b5d30189afda725351879cd848c3d0/chrome/browser/resources/chromeos/zip_archiver/test/char_coding_test.cc
,
May 24 2018
The partial fix has landed and will be shipped on M68. Files in non-UTF-8 (e.g. Shift_JIS) will become accessible but with garbled names. Filename extensions will be readable as some encoding systems (incl. ShiftJIS) is compatible within the range of ASCII characters. For example: "ÉVé╡éóâeâLâXâg âhâLâàâüâôâg.txt" for "新規テキスト ドキュメント.txt" An additional fix, which recovers the original file name is currently planned to happen in M69. Filed Issue 846195 . Let us know if this plan is not considered enough. |
||||||||||||
►
Sign in to add a comment |
||||||||||||
Comment 1 by elijahtaylor@chromium.org
, Apr 20 2018