Converting a Postgres Database to UTF8

February 27, 2011

I have a database that is struggling to store some of the data my users are entering. It's a default postgres database, which means it's in LATIN1 not UTF8. This post covers converting a database from LATIN1 to UTF8.

Dump Database

Use pg_dump to get the data from the database into a file.

pg_dump database_name -U owner_username > db.sql

Create new UTF8 Database

Next, we need to create the new database with UTF8 encoding. In my setup, this is done using the superuser login to postgres.

postgres=# create database database_utf8 with encoding 'UTF-8';  
CREATE DATABASE
postgres=# grant all on database database_utf8 to owner_username;
GRANT
postgres=#

Convert File

We now need to convert the database dump from latin1 to utf-8. Here's the command to do that:

iconv --from-code latin1 --to-code utf-8 db.sql > db-utf8.sql

We also need to change the client encoding in the file. Change the client_encoding line near the top from LATIN1 to UTF-8:

SET client_encoding = 'UTF-8';

Finally, connect to our new utf8 database and import the new file.

psql -U user
user=> \c database_utf8
database_utf8=> \i db-utf8.sql