New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 834544 link

Starred by 8 users

Issue metadata

Status: Fixed
Owner:
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

ZIP file titled in Japanese character cannot open attached file from Chrome OS.

Project Member Reported by ryutas@chromium.org, Apr 19 2018

Issue description

ChromeOS version: 65.0.3325.209, 64.0.3282.190, 63.0.3239.140, 62.0.3202.97 beta (66.0.3359.102)
ChromeOS device model: any Chrome OS devices.
Case#: 15520130

Description:ZIP file titled in Japanese character cannot open attached file from Chrome OS

Steps to reproduce: 
1)Create a folder that titled in Japanese character and zip by JP Windows OS.
2)Share the created ZIP file.
3)Open created ZIP file on any managed Chromebook device.
3.1)Access and try to open the ZIP file titled in Japanese character on Google drive.
3.2)Download the ZIP file and uncompress/open from device side.
-the message says “Nothing to see here” even there is a folder/file in the ZIP folder.

[All troubleshooting steps already taken] 
- The customer tried to uncompress the same ZIP file titled in Japanese character from any managed Chromebook device with older version of Chrome 64, but the result the same. (I also tested several different OS versions, but could not resolved the issue. )
-We could not reproduce the same issue if compressed a Japanese character file name by “Zip selection” at locally chrome device side.
-From local test, I created a compressed ZIP folder(titled in Japanese character) by Eng Mac OS is working OK

- The customer found as a workaround when the same ZIP is saved with alphabetic English title instead of using Japanese Character, the same can be opened from ZIP file on Chrome OS devices normally. 
( I renamed the same ZIP file titled in Japanese character to English, but cold not fix the issue at local)

Current Behavior / Reproduction: The customer is not able to uncompress/open the ZIP file that titled in Japanese character.

Expected Behavior: The customer should be able to uncompress/open the ZIP file that  titled in Japanese character.

Similar case:  crbug.com/423842  
(I created new crbug case because  crbug.com/423842  is old case)


Sample ZIP file
From the customer: https://drive.google.com/open?id=1_PKEct2rWETorOWFQyefLO8rKtD6qp2t
Created MS JP OS:https://drive.google.com/open?id=1bz-Fax8uZKlhvDPUoPf9JEyid_abT-3M

Log file.
https://drive.google.com/open?id=1POtk2DwmC_G8E6Y5rFj--F2xKEzeh6dz

Screen shot
https://drive.google.com/open?id=1bRHF2DaI1j1wjbkJ_o0A8E96i7IBH_jb

 
Components: Platform>Apps>FileManager
Status: Unconfirmed (was: Untriaged)
Cc: noel@chromium.org
Labels: CrOSFilesFeature-Zip
Owner: yamaguchi@chromium.org
I'm assigning to Yamaguchi-san for his assessment.

Yamaguchi, can you confirm if this is reproducible with Zip Archiver?

I suspect this is related how Windows deal with filenames in UTF16:
https://cs.chromium.org/chromium/src/base/files/file_path.h?l=5-16&rcl=3a8e2cfea264a669eb21f46d322aff83471bb0fc
Status: Started (was: Unconfirmed)
> From the customer: https://drive.google.com/open?id=1_PKEct2rWETorOWFQyefLO8rKtD6qp2t
> Created MS JP OS:https://drive.google.com/open?id=1bz-Fax8uZKlhvDPUoPf9JEyid_abT-3M

The filenames in these files seems to be encoded in Shift JIS.
Neither files sets the general purpose bit 11 in the header, indicating it is CP437.
https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT APPENDIX D - Language Encoding (EFS)

Having said that I think it'd be much better to cover such cases, because it looks users can easily create zip files that is affected by this issue while using some other platforms.
This is also reproducible with Zip Archiver.


> - The customer found as a workaround when the same ZIP is saved with alphabetic English title instead of using Japanese Character, the same can be opened from ZIP file on Chrome OS devices normally. 
> ( I renamed the same ZIP file titled in Japanese character to English, but cold not fix the issue at local)

I assumed that, in the description of this issue, "ZIP file titled in Japanese character" means:
 "a file packed in ZIP file has Japanese characters in its file name",
but NOT "Has Japanese characters in the filename of the .zip file".
Let me know if I got it wrong.
Cc: weifangsun@chromium.org
Owner: ----
Status: Available (was: Started)
The files are dropped from file metadata list because the pNaCl module of Zip Archiver and ZIP unpacker uses the filename as Javascript object's field name, and then dropped when passing it to JS side.

https://cs.chromium.org/chromium/src/chrome/browser/resources/chromeos/zip_archiver/cpp/volume.cc?type=cs&q=file:zip_archiver+createentry+%22entry_name+%3D+entry_path%22&sq=package:chromium&l=74

Therefore this is not specific to Japanese language and Shift JIS encoding, but can happen with many other encoding methods other than UTF-8 or CP437.

(Even when it didn't happen, we'd still need to guess and convert encodes for making it readable for human beings, or escape some characters to make other parts of our system work.)

Unassigned for triage.
Cc: yamaguchi@chromium.org
Owner: yamaguchi@chromium.org
Status: Started (was: Available)

Comment 9 by js...@chromium.org, May 10 2018

> Even when it didn't happen, we'd still need to guess and convert encodes for making it readable for human beings,

Yes, that's what needs to be done. The ZIP spec about CP437 is useless and should be ignored because a lot of zip archiving tools (esp. on Windows) use  the system default encoding/codepage (e.g. Shift_JIS/Windows 932 on Japanese Windows,  Big5/Windows 950 on Trad. Chinese Windows,  Windows-1252 on Western European Windows, etc) when bit 11 in the zip spec is unset. 

So, we have to detect the encoding (using CompactEncoding Detector) of a "byte sequence' in the name field and convert that sequence  to Unicode assuming that the sequence is in the detected encoding. 

See https://cs.chromium.org/chromium/src/third_party/ced and base/i18n/encoding_detection.h . 

Comment 10 by js...@chromium.org, May 11 2018

Cc: satorux@chromium.org dpolukhin@chromium.org mtomasz@chromium.org
 Issue 423842  has been merged into this issue.

Comment 11 by js...@chromium.org, May 11 2018

Once the encoding is determined (either by CompactEncodingDetector or assuming a legacy encoding per the UI language [1]), Web Encoding API can be used to decode instead of PNaCl.  
See https://encoding.spec.whatwg.org/ . 



[1] e.g. If the UI language is Japanese, assume Shift_JIS. 

Comment 12 by js...@chromium.org, May 11 2018

Cc: jsb...@chromium.org

Comment 13 by js...@chromium.org, May 11 2018

CP437 may be actually used by some zip archiving tools on Windows in (Western) European languages.   That could be an issue because CED does not detect CP437 IIRC. 

To test if CP437 is used or not,  

1. set the default code page for non-Unicode app to Windows-1252 in the control panel on Windows. 
2. create files whose names have accented Latin letters as used in German/French. 
3. Zip them up
4. Examine the byte sequences for file names

I hope it's windows-1252 (ISO-8859-1 superset) instead of CP437. 


Comment 14 by eryen@chromium.org, May 22 2018

Cc: eryen@chromium.org
Labels: -Pri-2 Pri-1
yamaguchi@: Can we get an update on this bug?
I've bump the priority on this bug to P1 since it has caused a lot of frustration both internally and externally. Our support folks in Japan encounter this issue on a regular basis.

Thanks.
Labels: M-68
- Theoretically we cannot 100% recover the original file name. (e.g. when the file name is very short and different encoding from user's locale)
- We have a pending partial fix, which will emit garbled file names but make files accessible. https://chromium-review.googlesource.com/c/chromium/src/+/1039122
- However since it's still a bad UX, we are going to try another approach to recover original file name. Since the priority has been bumped, I'll give it a try today and decide which fix we will apply for M68.
There was logic to guess encoding in ZIP unpacker based on what Windows encoding uses. Isn't it working?
I have seen that code portion but looking the final result it seems not working perfectly.
It might have been missed when we made the extension a component extension (Zip Archiver).
I will take a look at that part first.
> There was logic to guess encoding in ZIP unpacker based on what Windows encoding uses. Isn't it working?
The extension has a map from locale to a default encoding, but not using it anymore. I guess we had used it when we were using libarchive, but the minizip version that we use doesn't accept that parameter.
Project Member

Comment 19 by bugdroid1@chromium.org, May 22 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/730acb5f58b5d30189afda725351879cd848c3d0

commit 730acb5f58b5d30189afda725351879cd848c3d0
Author: Tatsuhisa Yamaguchi <yamaguchi@google.com>
Date: Tue May 22 09:17:08 2018

Convert file names to UTF-8 when it's declared CP437.

ZIP files can have non-UTF8 characters in the filename (and comments).
Encoding is declared by the language encoding flag (EFS) in general
purpose bit flag.
Before this change we assumed that it's always in UTF-8. When the raw
data is not valid UTF-8, such archive could not be read because we use
file name as a property name of Javascript object. This change ensures
that file names are valid UTF-8 byte sequence before using it as field
name.

In practice, some ZIP files have the flag 0 but using other nonstandard
encodings (like Shift-JIS) in the file names. In such case such file
names will still look garbled, but at least becomes accessible.

Bug:  834544 
Cq-Include-Trybots: master.tryserver.chromium.linux:closure_compilation
Change-Id: Ib07e572b509353350c83aa4d81e6fa88b3f1d9b5
Reviewed-on: https://chromium-review.googlesource.com/1039122
Commit-Queue: Tatsuhisa Yamaguchi <yamaguchi@chromium.org>
Reviewed-by: Yuki Awano <yawano@chromium.org>
Cr-Commit-Position: refs/heads/master@{#560519}
[modify] https://crrev.com/730acb5f58b5d30189afda725351879cd848c3d0/chrome/browser/chromeos/BUILD.gn
[modify] https://crrev.com/730acb5f58b5d30189afda725351879cd848c3d0/chrome/browser/resources/chromeos/zip_archiver/BUILD.gn
[modify] https://crrev.com/730acb5f58b5d30189afda725351879cd848c3d0/chrome/browser/resources/chromeos/zip_archiver/cpp/BUILD.gn
[add] https://crrev.com/730acb5f58b5d30189afda725351879cd848c3d0/chrome/browser/resources/chromeos/zip_archiver/cpp/char_coding.cc
[add] https://crrev.com/730acb5f58b5d30189afda725351879cd848c3d0/chrome/browser/resources/chromeos/zip_archiver/cpp/char_coding.h
[modify] https://crrev.com/730acb5f58b5d30189afda725351879cd848c3d0/chrome/browser/resources/chromeos/zip_archiver/cpp/volume.cc
[modify] https://crrev.com/730acb5f58b5d30189afda725351879cd848c3d0/chrome/browser/resources/chromeos/zip_archiver/cpp/volume_archive.h
[modify] https://crrev.com/730acb5f58b5d30189afda725351879cd848c3d0/chrome/browser/resources/chromeos/zip_archiver/cpp/volume_archive_minizip.cc
[modify] https://crrev.com/730acb5f58b5d30189afda725351879cd848c3d0/chrome/browser/resources/chromeos/zip_archiver/cpp/volume_archive_minizip.h
[add] https://crrev.com/730acb5f58b5d30189afda725351879cd848c3d0/chrome/browser/resources/chromeos/zip_archiver/test/char_coding_test.cc

Status: Fixed (was: Started)
The partial fix has landed and will be shipped on M68. Files in non-UTF-8 (e.g. Shift_JIS) will become accessible but with garbled names. Filename extensions will be readable as some encoding systems (incl. ShiftJIS) is compatible within the range of ASCII characters.
For example: "ÉVé╡éóâeâLâXâg âhâLâàâüâôâg.txt" for "新規テキスト ドキュメント.txt"

An additional fix, which recovers the original file name is currently planned to happen in M69. Filed  Issue 846195 .
Let us know if this plan is not considered enough.

Sign in to add a comment