roxen.lists.roxen.general

Subject Author Date
Re: Decoding bug (and fix) for multipart/form-data ? Martin Stjernholm <mast[at]roxen[dot]com> 16-03-2009
"Stephen R. van den Berg" <<srb[at]cuci.nl>> wrote:

>     Perform proper characterset decoding for multipart/form-data.

This looks like a beehive. :(

Clearly, Roxen doesn't handle charsets correctly. Worse is that at
least Firefox 3 apparently doesn't handle it correctly either:
According to rfc 2388 and html 4.01 each part should have a
Content-Type with a charset, but Firefox doesn't provide any, even if
it sends the data in utf-8.

Furthermore, it doesn't always use utf-8. E.g. if the <form> tag has
an accept-charset attribute with something else then Firefox uses that
charset, without any hint of it in the response.

The rfc says clearly that if a Content-Type header isn't present then
it should default to text/plain. And the default charset if a charset
parameter isn't present in a content type must be us-ascii according
to the mime rfc (2046, section 4.1.2).

So to begin with the fix should obey the charset provided with each
form-data part, if there is any.

But what's a good way to cope with the broken Firefox behavior? Your
patch uses the same approach as url-encoded variables, complete with
the roxen automatic charset variable hack. The problem is that that's
in direct violation of the standards. :(

I guess the next step is to see how other browsers behave, and see if
the Mozilla folks has something to say about why they still don't
implement a 10 year old standard correctly.

The fun doesn't stop with the charsets, for that matter. The rfc and
the html standard says that multiple file responses should be encoded
as multipart/mixed within multipart/form-data. Firefox doesn't do that
either, instead it just sends more multipart/form-data for the same
form name. Anyway, in that case it's not difficult for Roxen to
support both (it already handles the broken Firefox way, but not the
correct way).


Footnote: This is what my FF 3.0.7 sends in my little test case.
Everything is utf-8 encoded, but there is no content charset spec
anywhere.

"POST /test/charset.html HTTP/1.1\r\n"
"Host: localhost:14741\r\n"
"User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.7)
Gecko/2009030423 Ubuntu/8.10 (intrepid) Firefox/3.0.7\r\n"
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n"
"Accept-Encoding: gzip,deflate\r\n"
"Accept-Charset: UTF-8,*\r\n"
"Keep-Alive: 300\r\n"
"Connection: keep-alive\r\n"
"Referer: http://localhost:14741/test/charset.html\r\n"
"Cookie: RoxenUserID=2db20b201ff3d0aa16377e692847bce1\r\n"
"Content-Type: multipart/form-data;
boundary=---------------------------19179964687172196451241852838\r\n"
"Content-Length: 994\r\n"
"\r\n"
"-----------------------------19179964687172196451241852838\r\n"
"Content-Disposition: form-data; name=\"barf\"\r\n"
"\r\n"
"353436\r\n"
"-----------------------------19179964687172196451241852838\r\n"
"Content-Disposition: form-data; name=\"m36h34tta\"\r\n"
"\r\n"
"353436\r\n"
"-----------------------------19179964687172196451241852838\r\n"
"Content-Disposition: form-data; name=\"ok\"\r\n"
"\r\n"
"Submit Query\r\n"
"-----------------------------19179964687172196451241852838\r\n"
"Content-Disposition: form-data; name=\"file\"; filename=\"foo.pike\"\r\n"
"Content-Type: application/octet-stream\r\n"
"\r\n"
"int main()\n"
"{\n"
"  multiset m = (<1, 2, 3>);\n"
"  foreach (m; mixed y;) {\n"
"    foreach (m; mixed x;) {\n"
"      werror (\"del %O\n\", x);\n"
"      m[x] = 0;\n"
"    }\n"
"    werror (\"%O\n\", y);\n"
"  }\n"
"}\n"
"\r\n"
"-----------------------------19179964687172196451241852838\r\n"
"Content-Disposition: form-data; name=\"file\"; filename=\"bar.pike\"\r\n"
"Content-Type: application/octet-stream\r\n"
"\r\n"
"void x () {werror (\".\n\");}\n"
"\n"
"int main()\n"
"{\n"
"  mixed p = x();\n"
"}\n"
"\r\n"
"-----------------------------19179964687172196451241852838--\r\n"