librelist archives

« back to archive

Werkzeug: form parser ignoring Content-Type charset

Werkzeug: form parser ignoring Content-Type charset

From:
Simon Zimmermann
Date:
2012-11-02 @ 21:42
Given a form value like "Kjær", encoded with `windows-1254` encoding,
is decoded as UTF-8 by the werkzeug form parser. In Python
request.form['name'] will return u'Kj\ufffdr', rather than u'Kj\xe6r'.

The BaseRequest class defines the default charset as UTF-8. However,
even though a charset is specified as in the Content-Type header,
werkzeug will use this default charset when parsing urlencoded form
data.

I will argue that two mistakes are made by the werkzeug form parser;

1) The default charset is set to utf-8. While section 3.7.1 in the
HTTP spec says "When no explicit charset parameter is provided by the
sender, media subtypes of the "text" type are defined to have a
default charset value of "ISO-8859-1" when received via HTTP." [1].

2) The charset parameter is not respected [2].

I'll open an issue on the Werkzeug issue tracker if I can get a
confirmation that the form parser is reading the HTTP request
incorrectly, but I thought I'd ask here first to check if I'm not
misunderstanding something.

curl --data "name=Kj%E6r" -H "Content-Type:
application/x-www-form-urlencoded; charset=windows-1254"
http://localhost:5000/

[1]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1
[2]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.4

Re: [flask] Werkzeug: form parser ignoring Content-Type charset

From:
Markus Unterwaditzer
Date:
2012-11-03 @ 09:52
If it properly encodes non-ASCII characters to UTF-8, i wouldn't 
consider it a bug.

On 2012-11-02 22:42, Simon Zimmermann wrote:
> Given a form value like "Kjær", encoded with `windows-1254` encoding,
> is decoded as UTF-8 by the werkzeug form parser. In Python
> request.form['name'] will return u'Kj\ufffdr', rather than 
> u'Kj\xe6r'.
>
> The BaseRequest class defines the default charset as UTF-8. However,
> even though a charset is specified as in the Content-Type header,
> werkzeug will use this default charset when parsing urlencoded form
> data.
>
> I will argue that two mistakes are made by the werkzeug form parser;
>
> 1) The default charset is set to utf-8. While section 3.7.1 in the
> HTTP spec says "When no explicit charset parameter is provided by the
> sender, media subtypes of the "text" type are defined to have a
> default charset value of "ISO-8859-1" when received via HTTP." [1].
>
> 2) The charset parameter is not respected [2].
>
> I'll open an issue on the Werkzeug issue tracker if I can get a
> confirmation that the form parser is reading the HTTP request
> incorrectly, but I thought I'd ask here first to check if I'm not
> misunderstanding something.
>
> curl --data "name=Kj%E6r" -H "Content-Type:
> application/x-www-form-urlencoded; charset=windows-1254"
> http://localhost:5000/
>
> [1]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1
> [2]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.4

Re: [flask] Werkzeug: form parser ignoring Content-Type charset

From:
Markus Unterwaditzer
Date:
2012-11-03 @ 10:03
Nevermind, i overread your mail too quickly.

On 2012-11-03 10:52, Markus Unterwaditzer wrote:
> If it properly encodes non-ASCII characters to UTF-8, i wouldn't
> consider it a bug.
>
> On 2012-11-02 22:42, Simon Zimmermann wrote:
>> Given a form value like "Kjær", encoded with `windows-1254` 
>> encoding,
>> is decoded as UTF-8 by the werkzeug form parser. In Python
>> request.form['name'] will return u'Kj\ufffdr', rather than
>> u'Kj\xe6r'.
>>
>> The BaseRequest class defines the default charset as UTF-8. However,
>> even though a charset is specified as in the Content-Type header,
>> werkzeug will use this default charset when parsing urlencoded form
>> data.
>>
>> I will argue that two mistakes are made by the werkzeug form parser;
>>
>> 1) The default charset is set to utf-8. While section 3.7.1 in the
>> HTTP spec says "When no explicit charset parameter is provided by 
>> the
>> sender, media subtypes of the "text" type are defined to have a
>> default charset value of "ISO-8859-1" when received via HTTP." [1].
>>
>> 2) The charset parameter is not respected [2].
>>
>> I'll open an issue on the Werkzeug issue tracker if I can get a
>> confirmation that the form parser is reading the HTTP request
>> incorrectly, but I thought I'd ask here first to check if I'm not
>> misunderstanding something.
>>
>> curl --data "name=Kj%E6r" -H "Content-Type:
>> application/x-www-form-urlencoded; charset=windows-1254"
>> http://localhost:5000/
>>
>> [1]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1
>> [2]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.4

Re: [flask] Werkzeug: form parser ignoring Content-Type charset

From:
Simon Zimmermann
Date:
2012-11-03 @ 10:17
On Sat, Nov 3, 2012 at 10:52 AM, Markus Unterwaditzer
<markus@unterwaditzer.net> wrote:
> If it properly encodes non-ASCII characters to UTF-8, i wouldn't
> consider it a bug.

That is the point it. It doesn't. It incorrectly decodes the stream of
bytes using the wrong encoding, even though the correct Content-Type
headers are set.

Re: [flask] Werkzeug: form parser ignoring Content-Type charset

From:
Simon Zimmermann
Date:
2012-11-13 @ 17:13
I added an issue to the mailing list 6 days ago
(https://github.com/mitsuhiko/werkzeug/issues/233) as there was no
reply here regarding the original question.

Slightly off-topic: It looks like both Kenneth and Armin are busy guys
— Flask (and Werkzeug) could need some attention from someone with
time to go through the issue tracker's.

Re: [flask] Werkzeug: form parser ignoring Content-Type charset

From:
Markus Unterwaditzer
Date:
2012-11-13 @ 20:09
I think there are enough active people in the issue tracker, but no one with
the power to do something.

On Tue, Nov 13, 2012 at 06:13:29PM +0100, Simon Zimmermann wrote:
> I added an issue to the mailing list 6 days ago
> (https://github.com/mitsuhiko/werkzeug/issues/233) as there was no
> reply here regarding the original question.
> 
> Slightly off-topic: It looks like both Kenneth and Armin are busy guys
> — Flask (and Werkzeug) could need some attention from someone with
> time to go through the issue tracker's.