roxen.lists.roxen.general

Subject Author Date
Re: Decoding bug (and fix) for multipart/form-data ? Stephen R. van den Berg <srb[at]cuci[dot]nl> 16-03-2009
Martin Stjernholm wrote:
>Furthermore, it doesn't always use utf-8. E.g. if the <form> tag has
>an accept-charset attribute with something else then Firefox uses that
>charset, without any hint of it in the response.

Well, that isn't particularly worse than what we do now, since we don't
know it now, and we don't know when it changes either.

>The rfc says clearly that if a Content-Type header isn't present then
>it should default to text/plain. And the default charset if a charset
>parameter isn't present in a content type must be us-ascii according
>to the mime rfc (2046, section 4.1.2).

Well, that's what the RFC says perhaps, but that is not what
Firefox/MSIE/Opera do.

>So to begin with the fix should obey the charset provided with each
>form-data part, if there is any.

I haven't observed a browser actually submitting a charset, though I admit
I didn't look too closely at what Opera provides (and haven't tested Safari
yet, nor MSIE8).  But I could/should support this situation of course.

>But what's a good way to cope with the broken Firefox behavior? Your
>patch uses the same approach as url-encoded variables, complete with
>the roxen automatic charset variable hack. The problem is that that's
>in direct violation of the standards. :(

Yes, but since the browser behaviour violates the standards to begin
with, what can you do?

>I guess the next step is to see how other browsers behave, and see if
>the Mozilla folks has something to say about why they still don't
>implement a 10 year old standard correctly.

I would guess that they start pointing fingers at Microsoft, and Microsoft
will point back to Mozilla, all in the name of (bug-)compatibility, and
probably laziness (the multipart form stuff is complicated enough that
nobody actually dared to touch the code, I'd wager).

>The fun doesn't stop with the charsets, for that matter. The rfc and
>the html standard says that multiple file responses should be encoded
>as multipart/mixed within multipart/form-data. Firefox doesn't do that
>either, instead it just sends more multipart/form-data for the same
>form name. Anyway, in that case it's not difficult for Roxen to
>support both (it already handles the broken Firefox way, but not the
>correct way).

I didn't test this, since it's a rather silly use case (I think), but it
should be possible to accomodate at the Roxen/Pike end.

>Footnote: This is what my FF 3.0.7 sends in my little test case.
>Everything is utf-8 encoded, but there is no content charset spec
>anywhere.

Quite.  And that is what MSIE6/7 do as well.
-- 
Sincerely,
           Stephen R. van den Berg.
"Real Life?  I've played that game.  The plot stinks but the
 graphics are awesome."