Re: Easy way to convert a database from WIN1252 to UTF8? - Mailing list pgsql-general

From Mike Christensen
Subject Re: Easy way to convert a database from WIN1252 to UTF8?
Date
Msg-id AANLkTinsI5eyfnZG-4JvjLEL9Q1SGEwi7wstsI3GUUwy@mail.gmail.com
Whole thread Raw
In response to Re: Easy way to convert a database from WIN1252 to UTF8?  (Sam Mason <sam@samason.me.uk>)
List pgsql-general
On Thu, Jul 1, 2010 at 10:07 AM, Sam Mason <sam@samason.me.uk> wrote:
> On Thu, Jul 01, 2010 at 10:01:02AM -0700, Mike Christensen wrote:
>> Yup, the problem is line 170 doesn't actually match up to the
>> DB.dbs.out file line 170 (which is a blank line).  I believe it means
>> line 170 from the stdin pipe it was processing for the copy command.
>
> Doh, that's annoying.  It would be nice to know that it's done the right
> thing rather than "some" thing.
>
>> Suffice to say, there was some weird character in my database that PG
>> can't automatically translate from WIN1252 to UTF8, and apparently it
>> will drop that /entire/ COPY command (the entire table doesn't get
>> populated!)..
>
> Yup, this is deliberate.  You can also run psql with "-1" to put the
> whole lot (i.e. every table/view/... creation and data insert) in a
> transaction which will cause the whole restore to be rolled back if
> something doesn't look right as well.
>
>> As to what character was the culprit, I'm not entirely sure how to
>> figure this out.  I guess I could look for that hex value?  However,
>> if I set the encoding in the script itself, everything works
>> perfectly.
>
> PG is doing the right thing, 9D is undefined in Win1252.  I guess you've
> either got other problems or this was just an artifact of converting
> from Win1252 to UTF8 external to PG and then not telling it that you'd
> done that.
>
> --
>  Sam  http://samason.me.uk/
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Yeah, looking at the lines in question I don't really see anything
wrong with them.  Everything is going into the database as UTF8 so
maybe some weird characters got stuck in there somehow with the old
default encoding.  This is the main reason why I'm converting to UTF8
now, so data will be consistent across all layers..  Good to get these
bugs out of the way while the data set is relatively small.

If anyone wants me to do any more debugging, I'd be more than happy to
but I'm satisfied with the results.  Thanks!

Mike

pgsql-general by date:

Previous
From: Sam Mason
Date:
Subject: Re: Easy way to convert a database from WIN1252 to UTF8?
Next
From: David Kerr
Date:
Subject: Uncable to commit: transaction marked for rollback