Extended Character Filenames with Django and NGINX

January 8, 2012

I had some graphics files that had extended characters in, in this case VĂ„rspretten. I wanted to upload these files to a web server running Django on gunicorn, through nginx, and then serve them back to users. The server is running debian.

This wasn't working originally, the uploaded file gave a 404. It was correctly loaded into the database, and looked to be encoded in the HTML correctly, yet it was not found. Looking at the file on the server the second character had been encoded wrongly and was showing as a ? in a triangle.

Running locale on the server gave:

$ locale
LANG=en_GB
LC_CTYPE="en_GB"
LC_NUMERIC="en_GB"
LC_TIME="en_GB"
LC_COLLATE="en_GB"
LC_MONETARY="en_GB"
LC_MESSAGES="en_GB"
LC_PAPER="en_GB"
LC_NAME="en_GB"
LC_ADDRESS="en_GB"
LC_TELEPHONE="en_GB"
LC_MEASUREMENT="en_GB"
LC_IDENTIFICATION="en_GB"
LC_ALL=

The problem appeared to be because the server wasn't running using a unicode locale.

To fix this, I ran the following command as root:

# dpkg-reconfigure locales

This brought up a console and allowed me to select en_GB.UTF-8 as my locale. This was then saved and the locales regenerated.

At this point my database server (postgres) threw a strop because the locales had been regenerated. All was well again after a reboot.

The locale command now gives the locale suffixed by UTF-8, and the files are named correctly when they are uploaded. Existing files needed to be uploaded again (or renamed).

$ locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=

Tags: locale unicode