Thread: Encoding and Conversion Question(s)

Encoding and Conversion Question(s)

From
Dave Lazar
Date:
Hi,

I have a database that was created with the encoding set to SQL_ASCII.
A lot of data comes with accented characters. When reading this data
with PHP, and using utf-8 as my broweser output charset, any accented
characters are displayed as weird symbols. If I use the PHP function
utf8_encode() around the data, it all looks fine again.

So, I have decided to simply change the encoding of my database from
SQL_ASCII to UNICODE so that I do not need to use utf8_enocde() in
PHP.

I did a pg_dump of my database. I then created a blank database with
UNICODE as the encoding. However, pg_restore chokes with a message
about not being able to convert a multibyte character properly.

My server settings are en-US.UTF-8 for lc_collate and server encoding
is set to UNICODE.

How can I reload all my data into the UNICODE database I have created?
Is there something to do with the dump? I hope not!! Any tips on this
most appreciated!!

TIA

Dave

Re: Encoding and Conversion Question(s)

From
Tom Lane
Date:
Dave Lazar <hunkybill@gmail.com> writes:
> I have a database that was created with the encoding set to SQL_ASCII.
> A lot of data comes with accented characters.

You need to figure out what encoding that data is actually in (hint:
it's not ASCII) and specify that encoding as the client_encoding in
the restore script.  Postgres will then be able to convert the data
to UTF-8 correctly.

If the data is actually all in one encoding, this shouldn't be too
painful.  If it's in a mishmash of different encodings, you are in
for some pain getting things fixed up :-(

            regards, tom lane