|Julian Reschke||Mar 14, 2008 6:48 am|
|Lachlan Hunt||Mar 14, 2008 7:42 am|
|Julian Reschke||Mar 14, 2008 7:50 am|
|Julian Reschke||Mar 14, 2008 7:54 am|
|Lachlan Hunt||Mar 14, 2008 8:01 am|
|Julian Reschke||Mar 14, 2008 8:17 am|
|Michael A. Puls II||Mar 14, 2008 9:25 am|
|Julian Reschke||Mar 14, 2008 9:38 am|
|Brian Smith||Mar 14, 2008 11:45 am|
|Julian Reschke||Mar 14, 2008 12:04 pm|
|Maciej Stachowiak||Mar 15, 2008 10:54 pm|
|Julian Reschke||Mar 16, 2008 4:02 am|
|Maciej Stachowiak||Mar 16, 2008 11:34 am|
|Julian Reschke||Mar 16, 2008 12:00 pm|
|Maciej Stachowiak||Mar 16, 2008 3:46 pm|
|Karl Dubost||Mar 16, 2008 10:56 pm|
|Leif Halvard Silli||Mar 17, 2008 11:45 am|
|Julian Reschke||Mar 17, 2008 2:35 pm|
|Brian Smith||Mar 18, 2008 9:01 am|
|Julian Reschke||Mar 18, 2008 9:58 am|
|Brian Smith||Mar 21, 2008 9:24 am|
|Julian Reschke||Mar 21, 2008 5:07 pm|
|Subject:||Re: UA support for Content-Disposition header (filename parameter)|
|From:||Julian Reschke (juli...@gmx.de)|
|Date:||Mar 18, 2008 9:58:01 am|
Brian Smith wrote:
Using Content-Disposition in HTTP is an ad-hoc solution; it isn't standardized
anywhere. The IE encoding (percent-encoded UTF-8) is not locale-sensitive; in
fact, RFC 2231-based encoding is more sensitive to locale because it allows
arbitrary (non-Unicode) encodings.
But RFC2231 is part of Content-Disposition, see RFC2183, which requires RFC2184, which later was obsoleted by RFC2231.
Furthermore, the IE encoding *is* local-sensitive; if you send percent-encoded UTF-8 to a client that isn't configured for UTF-8 encoded URIs, it doesn't work. At least it didn't when I had to deal with unhappy customers in Asia, and opened a support case.
Finally, using percent-escaped UTF-8 breaks all other clients that do not expect any kind of escaping in this place.
Consider a filename that is 8 letters long, in Thai or any African or Asian
language. The 2231-based encoding is something like this:
Content-Disposition: attachment; filename*0==?UTF-8?Q?=1a=1b=1c=2a=2b=2c=3a=3b=3c=4a=4b=4c=5a=5b=5c=6a=6b=6c=7a=7b=7c=?= filename*1==?UTF-8?Q?8a=8b=8c?=
No, it would be
Content-Disposition: attachment; filename*=utf-8''%1A%1B%1C%2A%2B%2C%3A%3B%3C%4A%4B%4C%5A%5B%5C%6A%6B%6C%7A%7B%7C%8A%8B%8C
Notice that the RFC 2231 encoding *requires* the header to be split into
multiple lines (which many implementations do not handle well). Also notice that
it requires two parameters "filename*1" and "filename*2" to be combined together
to get the actual "filename" parameter.
There is no requirement to fold long lines in HTTP headers, after all it's not MIME.
The right thing to do here would be to mandate just the encoding part of RFC2231; not the line splitting functionality.
The Internet Explorer encoding is this:
The header is more compact, the header can be kept on one line, there is no
header-combining magic going on, and there is no need to deal with any encodings
other than UTF-8.
- there is no need to wrap the filename under RFC2231 either, we're not using MIME
- furthermore, yes, a single encoding is good, so I would recommend to specify exactly that
- the example you give does not work in any browser except IE, and only if it is configured for UTF-8 encoded URIs (which was not the default setting around the world a few years ago).
Also, consider this:
Content-Disposition: attachment; filename*1==?UTF-8?Q?8a=8b=8c?= filename*0==?UTF-8?Q?=1a=1b=1c=2a=2b=2c=3a=3b=3c=4a=4b=4c=5a=5b=5c=6a=6b=6c=7a=7b=7c=?=
That is RFC2047-style encoding mixed with RFC2231 line folding -- I didn't recommend that. It may even be illegal.
This is valid according to RFC 2231 but Firefox and Thunderbird do *NOT* parse
it correctly; they assume the parts of the filename are listed in order. So,
there are no fully conforming HTTP+Content-Disposition+RFC2231 implementations.
That is probably true, thus it would make sense to specify the profile that UAs are expected to implement, and this is exactly the reason why I came here with this issue.
The profile would be:
- no line folding (continuations) - use the encoding from <http://greenbytes.de/tech/webdav/rfc2231.html> with the encoding being hardwired to "utf-8".
Well, Microsoft hasn't implemented RFC2231. What makes you think that they would implement another RFC, when history tells that they just ignore it?
They already implemented the Internet Explorer mechanism in Internet Explorer.
It doesn't work in all configurations.
See. How is this a solution when it works only for a subset of the IE installations?
(Also, look at how unfair that both mechanisms are to users of non-Latin
alphabets. It takes 72 bytes for the Internet Explorer encoding and 113 bytes
for the RFC 2231 encoding, just to encode 8 letters in UTF-8.)
That's only true if you insist on line folding.
Otherwise the overhead is exactly 8 characters compared to what IE allows (not that the users would be really interested in that).