New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 118 link

Starred by 14 users

Issue metadata

Status: WontFix
Owner: ----
Closed: Jun 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 3
Type: Bug



Sign in to add a comment

Content-Disposition filename parameters are sometimes percent-unescaped

Reported by julian.r...@gmail.com, Sep 2 2008

Issue description

Product Version      : 0.2.149.27 (1583)
URLs (if applicable) :
http://greenbytes.de/tech/tc2231/#attwithfnrawpctenca
http://greenbytes.de/tech/tc2231/#attwithfnrawpctenclong

Other browsers tested:
Add OK or FAIL after other browsers where you have tested this issue:
     Safari 3: OK
    Firefox 3: OK 
         IE 7: FAIL

See URLs.

What is the expected result?

Filename parameter should be used as-is (no percent-decoding/UTF-8 unescaping)

What happens instead?

Filename parameter get's decoded.


Please provide any additional information below. Attach a screenshot if
possible.

See applicable specification, e.g. RFC 2616, the MIME specs, and RFC 2183.

See also test suite at <http://greenbytes.de/tech/tc2231/>
 

Comment 1 by js...@chromium.org, Sep 2 2008

Labels: -Area-Unknown Area-I18N
Status: WontFix
Work as intended.

I don't agree that we should not unescape in this case. There are a lot of web sites 
that do what's expected by IE ( RFC 2184 : http://www.ietf.org/rfc/rfc2184.txt).  

Chrome does support RFC 2047 as well (but not RFC 2231 because there are very few , 
if any, web site emitting RFC 2231 in C-D header field). 

Note that HTTP and HTML standards are not clear about how to encode non-ASCII 
characters in C-D header field. If it's just like emails, RFC 2231 is the one to use, 
but (as I wrote above), very few web sites do that. Instead, 1. raw--8bit byte 
sequence (in UTF-8 or legacy encoding), 2. RFC 2047 or 3. RFC 2184 are used. Firefox 
supports #1, #2 and RFC 2231 while IE only supports #3. Opera supports UTF-8 byte 
sequences (raw). 


Comment 2 by js...@chromium.org, Sep 2 2008

I realize that being compatible to IE probably was intented, but that doesn't make it
the right thing to do automatically.

RFC2184 was obsoleted by RFC2231, and does *not* define the encoding IE (and Chrome)
support. Furthermore, that encoding is ambigious, because it's not clear when to
apply percent-unescaping, and when not to.

I do agree that the specifications are hard to find and not easy to read, but *this*
interpretation clearly is not supported by any reading of the specs. 

That being said, over in the IETF HTTPbis working group we're trying to clarify these
issues, and it would be great if you'd help us in doing so.

Again, note RFC2231 is just an update to RFC2184.

IE does not support RFC2184 -- what it does is something totally proprietary. Maybe
this becomes clearer if you go back to the specs, or to the test suite.



Comment 4 by js...@chromium.org, Sep 2 2008

I'm very glad that at long last  HTTPbis WG has begun to work on that. I'm more than 
willing to help you with this issue and implement what's come out of the discussion 
(as long as that reflects the current practice(s) of web servers in the wild). 

In the past, I sent a couple of emails to the HTTP WG  in the past (then not very 
active) to clarify the spec (at that time, I was all for RFC 2231) when I 
impelemented C-D handling for Firefox but have NOT  heard back.

I'm very well aware that RFC 2184 was obsoleted by RFC 2231, but HTTP has never 
clearly defined what to do with filename parameter in C-D header field. And, we 
cannot simply ignore what numerous web servers do  (you're right that it's not RFC 
2184 per se) out there until the spec is clearly in the place and is actually 
followed.  
 
As for the encoding, IE7 only supports %-escaped UTF-8 and so does Chrome. Given that 
some HTTP implementations (e.g. WinHTTP on Windows) are not so good with arbitrary 
bytes with MSB set (using raw UTF-8 -as is supported by Opera - is out of question), 
'%--escaped UTF-8' seems to be rather practical because 

1) it's simpler than RFC 2231 (note that even for emails, there are only handful of 
MUAs supporting RFC 2231 properly. Among them are Thunderbird, Mutt and Alpine). 

2) there's not much need to use RFC 2231 (why bother to allow non-UTF-8 encodings? I 
don't see much need to specify 'lang' in HTTP C-D header field, either. Also, 
multiline-continuation is not really necessary for HTTP C-D, either).

Anyway, can you tell me how to join the discussion in HTTPbis WG on the issue?  

P.S. 
See also my test cases at http://i18nl10n.com/moztest/download.html 








> I'm very glad that at long last  HTTPbis WG has begun to work on that. I'm more
> than willing to help you with this issue and implement what's come out of the
> discussion (as long as that reflects the current practice(s) of web servers in
> the wild).

The problem is that there is no single way to do it that would work with all UAs. So
whatever comes out will require changes in *some* implementations.

I'm personally interested in this issue because I spent a substantive amount of time
a few years ago, trying to get it work across all UAs. In the end, we defaulted to
RFC 2231 encoding (which works in FF and Opera), and built in a special case for IE
(using UA sniffing). Note that even that special case doesn't work reliably in IE, as
the charset being used *can* depend on the UA's locale, and also IE has limitations
for the header length, breaking it on all but the shorted Asian filenames (recall: 1
Unicode code point -> up to 3 octets -> up to 9 bytes after percent-escaping, and
IE's implementation limit is somewhere around ~160 bytes -- see
<http://lists.w3.org/Archives/Public/ietf-http-wg/2008JulSep/0330.html> -- so this is
really useless in practice).

> In the past, I sent a couple of emails to the HTTP WG  in the past (then not very
> active) to clarify the spec (at that time, I was all for RFC 2231) when I
> impelemented C-D handling for Firefox but have NOT  heard back.

If you recall when it was I can at least attach them to the relevant ticket, and
re-read them. Note that the HTTPbis WG has been formed only last December -- before
that time there hasn't been any active working group for a long time.

> I'm very well aware that RFC 2184 was obsoleted by RFC 2231, but HTTP has never
> clearly defined what to do with filename parameter in C-D header field. And, we
> cannot simply ignore what numerous web servers do  (you're right that it's not RFC
> 2184 per se) out there until the spec is clearly in the place and is actually
> followed.

I'm ready to agree with you that what RFC2616 says isn't helpful, that's why we want
to get rid of that part and move it into separate specs; one defining how the RFC
2231 encoding applies to HTTP headers, another one defining how to use C-D in HTTP.

See: http://greenbytes.de/tech/webdav/draft-reschke-rfc2231-in-http-latest.html, but
note that this is not an official work item of the WG, although it may end up on the
IETF standards track.

> As for the encoding, IE7 only supports %-escaped UTF-8 and so does Chrome. Given
> that some HTTP implementations (e.g. WinHTTP on Windows) are not so good with
> arbitrary bytes with MSB set (using raw UTF-8 -as is supported by Opera - is>

That's ok, that's not *supposed* to work. I also just added a test for raw UTF-8 in
Opera, and that doesn't seem to be recognized as UTF-8, but as ISO-8859-1, as it
should: <http://greenbytes.de/tech/tc2231/#attwithutf8fnplain>.

> out of question), '%--escaped UTF-8' seems to be rather practical because
>
> 1) it's simpler than RFC 2231 (note that even for emails, there are only handful of
> MUAs supporting RFC 2231 properly. Among them are Thunderbird, Mutt and Alpine).

Well, it doesn't work in these other HTTP UAs, it's ambiguous, and not supported by
any spec. RFC 2231 *is* more complex, and that's exactly why I'm working on defining
a profile that makes sense in HTTP (for instance, no continuations).

> 2) there's not much need to use RFC 2231 (why bother to allow non-UTF-8
> encodings? I don't see much need to specify 'lang' in HTTP C-D header field,
> either. Also,multiline-continuation is not really necessary for HTTP C-D, either).

Yes, No, and Yes :-)

Yes, encodings other than UTF-8 do not make sense, thus I'd like to restrict the
"must-understand" set to UTF-8 (a big deficit in RFC 2231 not to define this).

No, language may be interesting in some edge cases, and the IETF has a policy (2277)
that any human-readable text can be language-tagged; thus I'd prefer to leave it in.

And yes, continuations are useless in HTTP.

You may like what
<http://greenbytes.de/tech/webdav/draft-reschke-rfc2231-in-http-latest.html> proposes.

> Anyway, can you tell me how to join the discussion in HTTPbis WG on the issue?

Just subscribe to the mailing list:
<http://lists.w3.org/Archives/Public/ietf-http-wg/> (yes, the new WG is re-using the
"old" mailing list)

> P.S.
> See also my test cases at http://i18nl10n.com/moztest/download.html 

Great; I will try to consolidate those I haven't got yet into my suite.

Note that even though Chromium supports the IE encoding, it may not always get to see those header fields built for IE.

The reason for that is that before Chrome and Safari were released, only IE supported this encoding, but FF and Opera already supported RFC 2231. There's evidence that in several cases, server developers thus decided to UA-sniff, and to send "filename*" (RFC 2231) to everybody except IE.


Comment 7 by abarth@chromium.org, Jan 28 2012

Cc: js...@chromium.org
Labels: Area-Internals Internals-Network-HTTP
Owner: abarth@chromium.org
Status: Assigned
Now that http://tools.ietf.org/html/rfc6266 has been published, we should reconsider this issue.  Apparently only IE and Chrome have this behavior.  (See http://greenbytes.de/tech/tc2231/#attwithfnrawpctenclong)

Also, we now support the filename* parameter, which is a more more sane way to specify non-ASCII file names.

Comment 8 by abarth@chromium.org, Jan 28 2012

Cc: asanka@chromium.org
As far as I can tell, this was fixed with change http://code.google.com/p/chromium/source/detail?r=119378
http://greenbytes.de/tech/tc2231/#attwithfnrawpctenclong still lists Chr18 as failing.  That test case references this bug.
Indeed; apparently I updated only one of the results (see also http://greenbytes.de/tech/tc2231/#attwithfnrawpctenca); the other one passes now as well, so I updated the test result accordingly.
Ah, thanks.

In any case, @jshin, I'm happy to discuss this topic if you still think we should match IE's behavior rather than Firefox's behavior.
I just retested with the dev version from 2012-01-31, and http://greenbytes.de/tech/tc2231/#attwithfnrawpctenclong still seems to fail. Sorry for the confusion.
Project Member

Comment 14 by bugdroid1@chromium.org, Mar 10 2013

Labels: -Area-Internals -Internals-Network-HTTP Cr-Internals-Network-HTTP Cr-Internals
Any progress here?
Cc: abarth@chromium.org
Labels: -Pri-2 Pri-3
Owner: ----
Status: Available

Comment 17 by mef@chromium.org, Nov 17 2015

Labels: Hotlist-Polish
Tested  http://greenbytes.de/tech/tc2231/#attwithfnrawpctenclong and it is still failing in M47. 

Is this something that we would fix at some point, or should it be archived due to the lack of activity?
Labels: Cr-UI-Browser-Downloads
Project Member

Comment 19 by sheriffbot@chromium.org, Jun 15 2016

Labels: Hotlist-OpenBugWithCL
A change has landed for this issue, but it's been open for over 6 months. Please review and close it if applicable. If this issue should remain open, remove the "Hotlist-OpenBugWithCL" label. If no action is taken, it will be archived in 30 days.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Hotlist-Polish -Area-I18N -Hotlist-OpenBugWithCL
According to UMA, of all Content-Disposition headers we see for downloads, 3% of them have 'filename' attributes that contain percent encoded strings. That's a bit high to remove support for these. Then again I don't think they are going away anytime soon unless we remove support and the effect isn't too terrible.

I'll defer to Julian and Adam on what needs to happen. Given the age of the bug, it's probably time to make a call.
Components: -Internals
May be worth experimenting with other browsers again.  If one of them is doing something particularly sane, we could just copy them, and hopefully reduce the number of different behaviors here to n-1, at least for modern browsers, assuming we're no longer matching Edge's current behavior.
The test is over here: <http://greenbytes.de/tech/tc2231/#attwithfnrawpctenca>; I believe the results are still up-to-date (that is, only Chrome and Microsoft browsers trying to decode).
Hrm...Looking at that chart, situation seems like a bit of a mess.  Sites may be using UA sniffing to send to us in one format, and FF in another, so may not be safe to follow their lead in not unescaping here...And worse, we also seem to treat text as UTF-8 when no percents appear.

We may just be stuck keeping both quirks here.
Status: WontFix (was: Available)
There's a surprising number of sites that use non-ASCII characters in 'filename' attributes (around 6% of all Content-Disposition headers seen). That's pretty high. Aside from the ramifications of using non-ASCII octets in HTTP headers, the number alone suggest that we can't deprecate the 'decode as UTF-8' quirk, which those 6% likely rely on.

I'd say we are stuck here :-( Regrettably along with #23, I'm going to mark this as a WontFix.

Components: Internals>Network
Components: -Internals>Network>HTTP

Sign in to add a comment