Extended Character Filenames with Django and NGINX
January 8, 2012
I had some graphics files that had extended characters in, in this case VĂ„rspretten
. I wanted to upload these files to a web server running Django on gunicorn, through nginx, and then serve them back to users. The server is running debian.
This wasn't working originally, the uploaded file gave a 404. It was correctly loaded into the database, and looked to be encoded in the HTML correctly, yet it was not found. Looking at the file on the server the second character had been encoded wrongly and was showing as a ? in a triangle.
Running locale
on the server gave:
$ locale LANG=en_GB LC_CTYPE="en_GB" LC_NUMERIC="en_GB" LC_TIME="en_GB" LC_COLLATE="en_GB" LC_MONETARY="en_GB" LC_MESSAGES="en_GB" LC_PAPER="en_GB" LC_NAME="en_GB" LC_ADDRESS="en_GB" LC_TELEPHONE="en_GB" LC_MEASUREMENT="en_GB" LC_IDENTIFICATION="en_GB" LC_ALL=
The problem appeared to be because the server wasn't running using a unicode locale.
To fix this, I ran the following command as root:
# dpkg-reconfigure locales
This brought up a console and allowed me to select en_GB.UTF-8 as my locale. This was then saved and the locales regenerated.
At this point my database server (postgres) threw a strop because the locales had been regenerated. All was well again after a reboot.
The locale
command now gives the locale suffixed by UTF-8, and the files are named correctly when they are uploaded. Existing files needed to be uploaded again (or renamed).
$ locale LANG=en_GB.UTF-8 LC_CTYPE="en_GB.UTF-8" LC_NUMERIC="en_GB.UTF-8" LC_TIME="en_GB.UTF-8" LC_COLLATE="en_GB.UTF-8" LC_MONETARY="en_GB.UTF-8" LC_MESSAGES="en_GB.UTF-8" LC_PAPER="en_GB.UTF-8" LC_NAME="en_GB.UTF-8" LC_ADDRESS="en_GB.UTF-8" LC_TELEPHONE="en_GB.UTF-8" LC_MEASUREMENT="en_GB.UTF-8" LC_IDENTIFICATION="en_GB.UTF-8" LC_ALL=