Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Issue 1148 Downloads with Unicode filenames not displaying properly
Starred by 16 users Reported by jacob....@gmail.com, Sep 4 2008 Back to list
Status: Fixed
Owner: ----
Closed: May 2013
Cc: xlyuan@chromium.org, wtc@chromium.org, asanka@chromium.org, darin@chromium.org, paul@chromium.org
Components:
OS: All
Pri: 2
Type: Bug

Blocking:
issue 68204


Sign in to add a comment
Product Version      : 0.2.149.27 (1583)
URLs (if applicable) :
Other browsers tested:
Add OK or FAIL after other browsers where you have tested this issue:
Safari 3:
    Firefox 3: OK
         IE 7: OK

What steps will reproduce the problem?
1. Go to any chinese websites with downloads' filenames in Chinese.
2. Apparently, the filename will not be in Unicode.
3.

What is the expected result?
Filename will remain as unicode.

What happens instead?
Non unicode characters appearing.

Please provide any additional information below. Attach a screenshot if 
possible.
 
 
Unicode.jpg
98.3 KB View Download
Hi, thank you very much for the report. Can you please provide a URL which can 
reproduce this bug?

I tried this on my English Vista with URL http://www.google.cn/search?
as_q=&complete=1&hl=zh-CN&newwindow=1&num=10&btnG=Google+ๆœ็ดข&as_epq=&as_oq=็š„
&as_eq=&lr=&cr=&as_ft=i&as_filetype=doc&as_qdr=all&as_occt=title&as_dt=i&as_sitesearc
h=&as_rights=

Download the fourth file on this page, the Chinese file name shows correctly. Please 
refer to picture attached.
File Name.jpg
36.2 KB View Download
Yes, you're right...not all Chinese websites with chinese filenames will have this 
problem....i'll try to find out which chinese sites will have replicate this problem

Apparently, it only occurs for most Chinese torrent sites...like this one
http://bbsmovie.com/thread-491770-1-1.html
http://bbsmovie.com/attachment.php?aid=370658
http://bbsmovie.com/attachment.php?aid=373087

Comment 3 by js...@chromium.org, Sep 5 2008
Labels: -Area-Unknown Area-I18N
Status: Assigned
Comment 4 by js...@chromium.org, Sep 5 2008
Hi Jacob,

What's your OS locale? It's kinda strange that IE7 works but Chrome does not. If it's   
not Simplified Chinese (to be precise, if it's the default OS codepage is not 
Windows-936), neither IE7 nor Chrome would work. 

If it's Simplified Chinese (the default OS codepage = 936), both should work. (I'm 
currently writing this in SC Windows XP and filenames come out correctly in Chrome 
for files in the urls given by you above.)

I know how Firefox does it (I implemented FF's filename handling code :-)) regardless 
of the OS default codepage.  Once we part with WinHTTP, I plan to do what FF does 
(more or less).

You can check the OS codepage by doing the following:

1. Go to Control Panel - Regional and Language options
2. In Advanced tab, see what's selected for "Languages for non-Unicode programs"

Note that that value can be selected independently of your OS UI language. That is, 
even on Enlgish XP, you can select 'Simplfied Chinese' there and vice versa. 





I'm sorry...it seems that now it won't work on E6/7 either with my Vista Business or 
Ultimate.

It only works on FF3.  Thousand Apologies.

For Unicode, i would suggest using Wininet or WinSocks2.  IIRC, WinHTTP doesn't 
support FTP.


Labels: Mstone-X
Labels: Area-BrowserUI I18N are
Labels: -are -area-i18n
Comment 9 by js...@chromium.org, Sep 30 2008
Labels: -Mstone-X Mstone-1.0
This should be in 1.0 With the new HTTP stack, it should be possible to do what FF 
does. 

To Jacob:  You can make it work with Chrome and IE7 if you switch your default system 
locale to Simplified Chinese (assuming that you mainly visit English and Simplified 
Chinese web sites). I forgot the detailed step to do in Vista (and my Vista vm is not 
working at the moment.). It must be somewhere in Control Panel - Languages(?)





Comment 10 by js...@chromium.org, Sep 30 2008
It's in Control Panel - Regional and Language options - Administrative tab. You can 
change 'language for non-Unicode programs' to Simplified Chinese.
Labels: -Mstone-1.0 Mstone-1.1
Since this depends on new-http, we'll move this to 1.1
Labels: Mstone-2.0
Comment 13 by jon@chromium.org, Apr 3 2009
Labels: JonMoved Mstone-2.1
Moving from milestone 2 to milestone 2.1
Comment 14 by js...@chromium.org, Apr 20 2009
Status: Started
Comment 15 by js...@chromium.org, Apr 29 2009
http://codereview.chromium.org/83002 is the first half of the fix that takes care of 
cases when download is started by using 'Save As' context menu. It's all in Chromium 
land. 

The other half (when a download link is clicked on) needs some Webkit changes. 
Comment 16 by js...@chromium.org, Apr 29 2009
Labels: Area-WebKit
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=15113 

------------------------------------------------------------------------
r15113 | jungshik@google.com | 2009-05-01 15:51:50 -0700 (Fri, 01 May 2009) | 22 lines
Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/download/download_manager.cc?r1=15113&r2=15112
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/download/download_manager.h?r1=15113&r2=15112
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/download/download_manager_unittest.cc?r1=15113&r2=15112
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/download/save_package.cc?r1=15113&r2=15112
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/history/download_types.h?r1=15113&r2=15112
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/net/chrome_url_request_context.cc?r1=15113&r2=15112
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/renderer_host/download_resource_handler.cc?r1=15113&r2=15112
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/tab_contents/render_view_context_menu.cc?r1=15113&r2=15112
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/tab_contents/tab_contents.cc?r1=15113&r2=15112
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/tab_contents/tab_contents_view_win.cc?r1=15113&r2=15112
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/common/os_exchange_data.cc?r1=15113&r2=15112
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/common/render_messages.h?r1=15113&r2=15112
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/renderer/render_view.cc?r1=15113&r2=15112
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/renderer/render_view.h?r1=15113&r2=15112
   M http://src.chromium.org/viewvc/chrome/trunk/src/net/base/net_util.cc?r1=15113&r2=15112
   M http://src.chromium.org/viewvc/chrome/trunk/src/net/base/net_util.h?r1=15113&r2=15112
   M http://src.chromium.org/viewvc/chrome/trunk/src/net/base/net_util_unittest.cc?r1=15113&r2=15112
   M http://src.chromium.org/viewvc/chrome/trunk/src/net/url_request/url_request_context.h?r1=15113&r2=15112
   M http://src.chromium.org/viewvc/chrome/trunk/src/webkit/glue/context_menu.h?r1=15113&r2=15112
   M http://src.chromium.org/viewvc/chrome/trunk/src/webkit/glue/context_menu_client_impl.cc?r1=15113&r2=15112
   M http://src.chromium.org/viewvc/chrome/trunk/src/webkit/glue/resource_handle_impl.cc?r1=15113&r2=15112
   M http://src.chromium.org/viewvc/chrome/trunk/src/webkit/glue/webview_delegate.h?r1=15113&r2=15112
   M http://src.chromium.org/viewvc/chrome/trunk/src/webkit/tools/test_shell/test_webview_delegate.cc?r1=15113&r2=15112
   M http://src.chromium.org/viewvc/chrome/trunk/src/webkit/tools/test_shell/test_webview_delegate.h?r1=15113&r2=15112

This CL makes Chrome on par with Firefox in terms of 'GetSuggestedFilename' for file download via context-menu.

For a download initiated with a click on a link in a web page, a webkit-side change is necessary, which will be done later.

Add a field (referrer_charset) to URLRequestContext and DownloadCreateInfo. It's set to the character encoding of a document where the download request originates from when it's known (download initiated via "save as" in the context menu). 

If it's not known (a download initiated by clicking on a download link or typing a url directly to the omnibox), it's initialized to the default character encoding in the user's preference. I guess this is marginally better than leaving it empty (in that case, step 2b below will be skipped and step 2c will be taken) because a user has a better control over how raw 8bit characters in C-D are interpreted (especially on Windows where a reboot is required to change the OS default codepage). 

This is later passed to GetSuggestedFilename and used as one of fallback encodings (1. UTF-8, 2. origin_charset, 3. default OS codepage). With this change, we support the following:

1. RFC 2047
2. Raw-8bit-characters : a. UTF-8, b. origin_charset, c. default os codepage. 
3. %-escaped UTF-8. 

In this CL, for #3, I didn't add a fallback similar to one used for #2. If necessary, it can be added easily. New entries are added to 3 existing tests. What's previously not covered (raw 8bit Content-Disposition header) is now covered in all 3 tests. 

BUG= 1148 
TEST=net unit test: NetUtilTest.GetFileNameFromCD                    
                    NetUtilTest.GetSuggestedFilename         
     unittest : DownloadManagerTest.TestDownloadFilename

Review URL: http://codereview.chromium.org/83002
------------------------------------------------------------------------

With r15113, the following will work:

1) Download initiated with 'Save As' context menu will get a correct suggested 
filename in the vast majority of cases

2) As long as the default charset /encoding set in Options | Minor tweak | font & 
lang setting matches the charset emitted by a web server, download with clicking a 
link will get a 'correct' filename. With the same condition satisfied, the download 
by typing a url directly in the omnibox will also get the correct filename.

For instance, a simplified Chinese speaker with the default charset set to 
GBK/GB18030 would be happy even if he's on English Windows and he doesn't want to set 
the OS default codepage to GBK for compatibility with other old non-Unicode 
applications. 

What's to be done is to 'inject' the referrer charset to a download initiated by 
clicking a download link. This requires a webkit-side change. 

So, if the default charset is GBK, but a download link is clicked on a Big5 page and 
a server emits filename in C-D in raw Big5, the filename will be garbled. The same is 
true of typing a file download url directly in the omnibox. The latter case will not 
be resolved even with a webkit change. 

Making that work would requires a very reliable (almost magical) encoding detector 
for a short chunk of text. 

I'm keeping this open for further works. I'll also file a webkit bug to add a field 
to Webkit's RequestContextBase (name?) or Chromium's subclass 






Labels: -jonmoved
Labels: -mstone-2.1 mstone-3
Labels: Mstone-4
Should not block Mstone:3
Labels: Downloads
Comment 23 by js...@chromium.org, Jul 24 2009
Update:

http://codereview.chromium.org/149705  was landed (in 
http://src.chromium.org/viewvc/chrome?view=rev&revision=20965 ) recently to make the 
default encoding change immediately (without a restart) reflected in the decoding of 
the filename param in Content-Disposition. 

http://codereview.chromium.org/113069 is an incomplete CL to completely resolve this 
bug. Safari tries something similar (actually more extensive than this in that it 
passes a list of encodings to try including UTF-8, the referrer encoding), but the 
referrer encoding determination does not work correctly (IIRC, apparently because 
clciking the download link starts a new ResourceRequest/Pageload and the referrer 
info is lost in the process). 

Perhaps, we may consider documenting this in the 'Known Issues' page:

1. 'Save As' almost always work correctly while clicking the download link does not.

2. Clicking the download link works if the default encoding (set in Options | 
Advanced | Fonts&Language) matches the actual encoding used by a web server for 
Content-Disposition HTTP header. Or, it the encoding used by a webserver is UTF-8. 


Labels: -Downloads Feature-Downloads
Comment 25 by js...@chromium.org, Oct 22 2009
Labels: -Mstone-4 Mstone-5
Punt it for now. There's a WIP patch for the remaining issue, but far from complete. 
Comment 26 by karen@chromium.org, Oct 22 2009
Labels: -Mstone-5 Mstone-X
Comment 27 by oritm@chromium.org, Dec 18 2009
Labels: -Area-BrowserUI Area-UI-Features
Area-UI-Features label replaces Area-BrowserUI label
Labels: Area-UI
Labels: -Area-UI-Features
Labels: -I18N bulkmove Feature-I18N
Product Version      : 0.2.149.27 (1583)
URLs (if applicable) :
Other browsers tested:
Add OK or FAIL after other browsers where you have tested this issue:
Safari 3:
    Firefox 3: OK
         IE 7: OK

What steps will reproduce the problem?
1. Go to any chinese websites with downloads' filenames in Chinese.
2. Apparently, the filename will not be in Unicode.
3.

What is the expected result?
Filename will remain as unicode.

What happens instead?
Non unicode characters appearing.

Please provide any additional information below. Attach a screenshot if 
possible.
Cc: asanka@chromium.org
Issue 91249 has been merged into this issue.
Comment 32 by muzui...@gmail.com, Feb 22 2013
> For instance, a simplified Chinese speaker with the default charset set to 
GBK/GB18030 would be happy even if he's on English Windows and he doesn't want to set 
the OS default codepage to GBK for compatibility with other old non-Unicode 
applications. 

Actually what I encounter in Linux with LANG en_US.UTF-8. Firefox/Chrome both have the same problem, I report a bug in Firefox bugzilla

https://bugzilla.mozilla.org/show_bug.cgi?id=844038

Hope can help.
Project Member Comment 33 by bugdroid1@chromium.org, Mar 10 2013
Labels: -Area-WebKit -Feature-Downloads -Area-UI -Feature-I18N Cr-Content Cr-UI Cr-UI-I18N Cr-UI-Browser-Downloads
Project Member Comment 34 by bugdroid1@chromium.org, Mar 20 2013
Labels: -Cr-UI-I18N Cr-UI-Internationalization
Project Member Comment 35 by bugdroid1@chromium.org, Apr 6 2013
Labels: -Cr-Content Cr-Blink
Labels: -Mstone-X -bulkmove
Owner: ----
Status: Fixed
If anybody is still having problems with unicode filenames, please file a new bug at http://crbug.com/new
Sign in to add a comment