librelist archives

« back to archive

Issue sending a file with unicode filename

Issue sending a file with unicode filename

From:
Malphas Wats
Date:
2012-11-13 @ 09:03
Hi,

  I think this is more of a werkzeug issue, but I thought I'd ask here
first because I can't think of anywhere else to ask!

I store files in a database, I have a view that lets users download the
files, it looks like this:

@mod.route('/file/<int:file_id>/download')
@login_required
def download(file_id):
    file = database.query("""SELECT file, mime_type, filename FROM files
WHERE file_id=%s""", (file_id,))
    if file:
        r = Response(file[0]['file'], mimetype=file[0]['mime_type'])
        r.headers.add('Content-Disposition', u'attachment; filename="%s"' %
file[0]['filename']);
        return r
    else:
        abort(404)


this works ok most of the time. However, if a user has uploaded a file with
a unicode character in the filename (this seems to happen most with MS
Word, replacing a dash - with an emdash), the above function gives the
following error:

  Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1518, in
__call__
    return self.wsgi_app(environ, start_response)
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1507, in
wsgi_app
    return response(environ, start_response)
  File "/usr/local/lib/python2.7/dist-packages/werkzeug/wrappers.py", line
1082, in __call__
    app_iter, status, headers = self.get_wsgi_response(environ)
  File "/usr/local/lib/python2.7/dist-packages/werkzeug/wrappers.py", line
1072, in get_wsgi_response
    return app_iter, self.status, headers.to_list()
  File "/usr/local/lib/python2.7/dist-packages/werkzeug/datastructures.py",
line 1141, in to_list
    for k, v in self]
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2013' in
position 44: ordinal not in range(256)


but as you can see, the error is in the werkzeug/datastructures.py file,
rather than my stuff, so I'm at a loss as to how to fix it. My database
connection is definitely utf-8. If I print the filename to the console, the
emdash character gets displayed as an a with a hat (circumflex?), looking
directly at the data in the database though shows the emdash properly.

Does anyone have any ideas? I really struggle with unicode and Python :(

Thanks
-Mike

Re: [flask] Issue sending a file with unicode filename

From:
Audrius Kažukauskas
Date:
2012-11-13 @ 16:31
On Tue, 2012-11-13 at 09:03:14 +0000, Malphas Wats wrote:
> Does anyone have any ideas? I really struggle with unicode and Python :(

Looks like this is a quite complicated topic, judging from two
StackOverflow questions[0][1] I've stumbled upon while looking into this
problem.

[0] 
http://stackoverflow.com/questions/93551/how-to-encode-the-filename-parameter-of-content-disposition-header-in-http
[1] 
http://stackoverflow.com/questions/1361604/how-to-encode-utf8-filename-for-http-headers-python-django

The relevant RFCs are 2231, 5987, and 6266.  Here's my quick solution
which I tested only on Opera 12.10, Firefox 16.0.2 and Chrome 23 and
which supposedly should work on mostly all major browsers:

  from email.utils import encode_rfc2231
  from flask import Flask, Response

  app = Flask(__name__)

  @app.route('/download')
  def download():
      filename = u'foo–bar.txt'     # That is a U+2013 EN DASH.
      filename_ascii = filename.encode('ascii', 'ignore')
      filename_utf8 = encode_rfc2231(filename.encode('utf-8'), 'UTF-8')
      r = Response('foobar', mimetype='text/plain')
      r.headers.add('Content-Disposition',
                    'attachment; filename="%s"; filename*=%s' %
                    (filename_ascii, filename_utf8))
      return r

Somebody should definitely check this with various versions of IE.

There's also a library[2] for working with Content-Disposition headers,
but I haven't tried it.

[2] http://pypi.python.org/pypi/rfc6266

-- 
Audrius Kažukauskas
http://neutrino.lt/

Re: [flask] Issue sending a file with unicode filename

From:
Ignas Butėnas
Date:
2012-11-13 @ 09:21
Hi,

Not sure which database you are using, but when I had problems with UTF-8
and PostgreSQL this helped:
https://coderwall.com/p/wz5sca?i=1&p=1&q=author%3Abutenas_com&t%5B%5D=butenas_com

Ignas


On Tue, Nov 13, 2012 at 11:03 AM, Malphas Wats
<malphas@subdimension.co.uk>wrote:

> Hi,
>
>   I think this is more of a werkzeug issue, but I thought I'd ask here
> first because I can't think of anywhere else to ask!
>
> I store files in a database, I have a view that lets users download the
> files, it looks like this:
>
> @mod.route('/file/<int:file_id>/download')
> @login_required
> def download(file_id):
>     file = database.query("""SELECT file, mime_type, filename FROM files
> WHERE file_id=%s""", (file_id,))
>     if file:
>         r = Response(file[0]['file'], mimetype=file[0]['mime_type'])
>         r.headers.add('Content-Disposition', u'attachment; filename="%s"'
> % file[0]['filename']);
>         return r
>     else:
>         abort(404)
>
>
> this works ok most of the time. However, if a user has uploaded a file
> with a unicode character in the filename (this seems to happen most with MS
> Word, replacing a dash - with an emdash), the above function gives the
> following error:
>
>   Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1518,
> in __call__
>     return self.wsgi_app(environ, start_response)
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1507,
> in wsgi_app
>     return response(environ, start_response)
>   File "/usr/local/lib/python2.7/dist-packages/werkzeug/wrappers.py", line
> 1082, in __call__
>     app_iter, status, headers = self.get_wsgi_response(environ)
>   File "/usr/local/lib/python2.7/dist-packages/werkzeug/wrappers.py", line
> 1072, in get_wsgi_response
>     return app_iter, self.status, headers.to_list()
>   File
> "/usr/local/lib/python2.7/dist-packages/werkzeug/datastructures.py", line
> 1141, in to_list
>     for k, v in self]
> UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2013' in
> position 44: ordinal not in range(256)
>
>
> but as you can see, the error is in the werkzeug/datastructures.py file,
> rather than my stuff, so I'm at a loss as to how to fix it. My database
> connection is definitely utf-8. If I print the filename to the console, the
> emdash character gets displayed as an a with a hat (circumflex?), looking
> directly at the data in the database though shows the emdash properly.
>
> Does anyone have any ideas? I really struggle with unicode and Python :(
>
> Thanks
> -Mike
>

Re: [flask] Issue sending a file with unicode filename

From:
Malphas Wats
Date:
2012-11-13 @ 09:28
Thanks Ignas,

  I'm using mySQL, with pymysql for accessing. I did have some early
problems with unicode because my connection wasn't set to use it (the
database, tables and fields all are), but I fixed that a while back, I
don't appear to have any other unicode issues anywhere else - the filename
displays properly when I display it on a webpage as a download link.


On Tue, Nov 13, 2012 at 9:21 AM, Ignas Butėnas <ignas@butenas.com> wrote:

> Hi,
>
> Not sure which database you are using, but when I had problems with UTF-8
> and PostgreSQL this helped:
> 
https://coderwall.com/p/wz5sca?i=1&p=1&q=author%3Abutenas_com&t%5B%5D=butenas_com
>
> Ignas
>
>
> On Tue, Nov 13, 2012 at 11:03 AM, Malphas Wats <malphas@subdimension.co.uk
> > wrote:
>
>> Hi,
>>
>>   I think this is more of a werkzeug issue, but I thought I'd ask here
>> first because I can't think of anywhere else to ask!
>>
>> I store files in a database, I have a view that lets users download the
>> files, it looks like this:
>>
>> @mod.route('/file/<int:file_id>/download')
>> @login_required
>> def download(file_id):
>>     file = database.query("""SELECT file, mime_type, filename FROM files
>> WHERE file_id=%s""", (file_id,))
>>     if file:
>>         r = Response(file[0]['file'], mimetype=file[0]['mime_type'])
>>         r.headers.add('Content-Disposition', u'attachment; filename="%s"'
>> % file[0]['filename']);
>>         return r
>>     else:
>>         abort(404)
>>
>>
>> this works ok most of the time. However, if a user has uploaded a file
>> with a unicode character in the filename (this seems to happen most with MS
>> Word, replacing a dash - with an emdash), the above function gives the
>> following error:
>>
>>   Traceback (most recent call last):
>>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1518,
>> in __call__
>>     return self.wsgi_app(environ, start_response)
>>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1507,
>> in wsgi_app
>>     return response(environ, start_response)
>>   File "/usr/local/lib/python2.7/dist-packages/werkzeug/wrappers.py",
>> line 1082, in __call__
>>     app_iter, status, headers = self.get_wsgi_response(environ)
>>   File "/usr/local/lib/python2.7/dist-packages/werkzeug/wrappers.py",
>> line 1072, in get_wsgi_response
>>     return app_iter, self.status, headers.to_list()
>>   File
>> "/usr/local/lib/python2.7/dist-packages/werkzeug/datastructures.py", line
>> 1141, in to_list
>>     for k, v in self]
>> UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2013' in
>> position 44: ordinal not in range(256)
>>
>>
>> but as you can see, the error is in the werkzeug/datastructures.py file,
>> rather than my stuff, so I'm at a loss as to how to fix it. My database
>> connection is definitely utf-8. If I print the filename to the console, the
>> emdash character gets displayed as an a with a hat (circumflex?), looking
>> directly at the data in the database though shows the emdash properly.
>>
>> Does anyone have any ideas? I really struggle with unicode and Python :(
>>
>> Thanks
>> -Mike
>>
>
>

Re: [flask] Issue sending a file with unicode filename

From:
Paul Walsh
Date:
2012-11-13 @ 09:13
I am not directly answering you, but I struggled with a similar problem
recently, using Django and uWSGI. My solution was to go back to gunicorn
which "just works" in my environment.

However, the thread I started in Stack Overflow for this, has a bunch of
information that may be useful to you anyway about python and unicode
filenames...


http://stackoverflow.com/questions/13232108/python-overriding-os-path-supports-unicode-filenames-on-ubuntu

I'd also like to see your problem answered by someone in the Flask
community, as I am starting a new Flask project soon, and I regularly use
unicode in filenames.

*Paul Walsh*
0543551144



On Tue, Nov 13, 2012 at 11:03 AM, Malphas Wats
<malphas@subdimension.co.uk>wrote:

> Hi,
>
>   I think this is more of a werkzeug issue, but I thought I'd ask here
> first because I can't think of anywhere else to ask!
>
> I store files in a database, I have a view that lets users download the
> files, it looks like this:
>
> @mod.route('/file/<int:file_id>/download')
> @login_required
> def download(file_id):
>     file = database.query("""SELECT file, mime_type, filename FROM files
> WHERE file_id=%s""", (file_id,))
>     if file:
>         r = Response(file[0]['file'], mimetype=file[0]['mime_type'])
>         r.headers.add('Content-Disposition', u'attachment; filename="%s"'
> % file[0]['filename']);
>         return r
>     else:
>         abort(404)
>
>
> this works ok most of the time. However, if a user has uploaded a file
> with a unicode character in the filename (this seems to happen most with MS
> Word, replacing a dash - with an emdash), the above function gives the
> following error:
>
>   Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1518,
> in __call__
>     return self.wsgi_app(environ, start_response)
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1507,
> in wsgi_app
>     return response(environ, start_response)
>   File "/usr/local/lib/python2.7/dist-packages/werkzeug/wrappers.py", line
> 1082, in __call__
>     app_iter, status, headers = self.get_wsgi_response(environ)
>   File "/usr/local/lib/python2.7/dist-packages/werkzeug/wrappers.py", line
> 1072, in get_wsgi_response
>     return app_iter, self.status, headers.to_list()
>   File
> "/usr/local/lib/python2.7/dist-packages/werkzeug/datastructures.py", line
> 1141, in to_list
>     for k, v in self]
> UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2013' in
> position 44: ordinal not in range(256)
>
>
> but as you can see, the error is in the werkzeug/datastructures.py file,
> rather than my stuff, so I'm at a loss as to how to fix it. My database
> connection is definitely utf-8. If I print the filename to the console, the
> emdash character gets displayed as an a with a hat (circumflex?), looking
> directly at the data in the database though shows the emdash properly.
>
> Does anyone have any ideas? I really struggle with unicode and Python :(
>
> Thanks
> -Mike
>