Thread: Change server encoding after the fact

Change server encoding after the fact

From
Cody Caughlan
Date:
I would like to change my server_encoding which is currently SQL_ASCII to UTF8.

I have existing data that I would like to keep.

From my understanding of the steps I need to:

1) alter the template1 database encoding via

UPDATE pg_database SET encoding = 6 where datname IN ('template0', 'template1');

2) Dump my current database

pg_dump -Fc foo > foo.db

3) Drop my current database

drop database foo;

4) recreate it with the proper encoding

create database foo with template = template1 encoding = 'UTF-8';

5) restore from backup

pg_restore -d foo foo.db



Are these the correct steps to perform or is there an easier / in-place way?

Also, when I dump my old DB and restore it, will it be converted appropriately (e.g. it came from am SQL_ASCII encoding
andits going into a UTF-8 database)? 

Thank you

/Cody

Re: Change server encoding after the fact

From
Scott Marlowe
Date:
On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com> wrote:
> I would like to change my server_encoding which is currently SQL_ASCII to UTF8.
>
> I have existing data that I would like to keep.
>
> From my understanding of the steps I need to:
>
> 1) alter the template1 database encoding via
>
> UPDATE pg_database SET encoding = 6 where datname IN ('template0', 'template1');

Just create database using template0 as template and you can skip this step ^^
> Are these the correct steps to perform or is there an easier / in-place way?

> Also, when I dump my old DB and restore it, will it be converted appropriately (e.g. it came from am SQL_ASCII
encodingand its going into a UTF-8 database)? 

You might need to set client encoding when restoring.  Or use iconv to
convert from one encoding to another, which is what I usually do.
Note that it's VERY likely you'll have data in a SQL_ASCII db that
won't go into a UTF8 database without some lossiness.

Re: Change server encoding after the fact

From
Cody Caughlan
Date:
Thanks Scott. See below:

On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com> wrote:
> I would like to change my server_encoding which is currently SQL_ASCII to UTF8.
>
> I have existing data that I would like to keep.
>
> From my understanding of the steps I need to:
>
> 1) alter the template1 database encoding via
>
> UPDATE pg_database SET encoding = 6 where datname IN ('template0', 'template1');

Just create database using template0 as template and you can skip this step ^^


Wouldn't this only work if my template0 was UTF8 itself?

=> select datname, pg_encoding_to_char(encoding) from pg_database;
       datname        | pg_encoding_to_char 
----------------------+---------------------
 template1            | SQL_ASCII
 template0            | SQL_ASCII
 postgres             | SQL_ASCII


So it appears both template0 & template1 are SQL_ASCII, so how would creating from a new DB from template0 be any different than template1?

 
> Are these the correct steps to perform or is there an easier / in-place way?

> Also, when I dump my old DB and restore it, will it be converted appropriately (e.g. it came from am SQL_ASCII encoding and its going into a UTF-8 database)?

You might need to set client encoding when restoring.  Or use iconv to
convert from one encoding to another, which is what I usually do.
Note that it's VERY likely you'll have data in a SQL_ASCII db that
won't go into a UTF8 database without some lossiness.


Yes, I see this might be the case. From my playing around with iconv I cannot even properly do the conversion:

$ pg_dump -Fp foo > foo.sql
$ file -i foo.sql
foo.sql: text/plain; charset=us-ascii
$ iconv -f utf-8 foo.sql > utf8.sql
iconv: illegal input sequence at position 2512661

Uh oh... I cannot event convert it?

Whats my next step at this point if I cannot even convert my data? I'd be OK with some lossiness.

Thanks again
/Cody


Re: Change server encoding after the fact

From
Scott Marlowe
Date:
On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com> wrote:
> Thanks Scott. See below:
>
> On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com> wrote:
>> > I would like to change my server_encoding which is currently SQL_ASCII
>> > to UTF8.
>> >
>> > I have existing data that I would like to keep.
>> >
>> > From my understanding of the steps I need to:
>> >
>> > 1) alter the template1 database encoding via
>> >
>> > UPDATE pg_database SET encoding = 6 where datname IN ('template0',
>> > 'template1');
>>
>> Just create database using template0 as template and you can skip this
>> step ^^
>
>
> Wouldn't this only work if my template0 was UTF8 itself?
> => select datname, pg_encoding_to_char(encoding) from pg_database;
>        datname        | pg_encoding_to_char
> ----------------------+---------------------
>  template1            | SQL_ASCII
>  template0            | SQL_ASCII
>  postgres             | SQL_ASCII
>
> So it appears both template0 & template1 are SQL_ASCII, so how would
> creating from a new DB from template0 be any different than template1?

Well, let's try, shall we?  From a freshly created cluster on my
laptop, running 8.4:

smarlowe=# select datname, pg_encoding_to_char(encoding) from pg_database;
  datname  | pg_encoding_to_char
-----------+---------------------
 template1 | SQL_ASCII
 template0 | SQL_ASCII
 postgres  | SQL_ASCII
 smarlowe  | SQL_ASCII
(4 rows)

smarlowe=# create database j template template0 encoding 'UTF8';
CREATE DATABASE

Seems to work.

P.s. I'm not sure why it works, I just know that it does. :)

>> > Are these the correct steps to perform or is there an easier / in-place
>> > way?
>>
>> > Also, when I dump my old DB and restore it, will it be converted
>> > appropriately (e.g. it came from am SQL_ASCII encoding and its going into a
>> > UTF-8 database)?
>>
>> You might need to set client encoding when restoring.  Or use iconv to
>> convert from one encoding to another, which is what I usually do.
>> Note that it's VERY likely you'll have data in a SQL_ASCII db that
>> won't go into a UTF8 database without some lossiness.
>
>
> Yes, I see this might be the case. From my playing around with iconv I
> cannot even properly do the conversion:
> $ pg_dump -Fp foo > foo.sql
> $ file -i foo.sql
> foo.sql: text/plain; charset=us-ascii
> $ iconv -f utf-8 foo.sql > utf8.sql
> iconv: illegal input sequence at position 2512661

I think you got it backwards, the -f should be somthing other than
utf-8 right?  That's what the -t should be right?  Try iconv without a
-f switch and a -t of utf-8 and see what happens...

Re: Change server encoding after the fact

From
Tom Lane
Date:
Scott Marlowe <scott.marlowe@gmail.com> writes:
> On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com> wrote:
>> So it appears both template0 & template1 are SQL_ASCII, so how would
>> creating from a new DB from template0 be any different than template1?

> P.s. I'm not sure why it works, I just know that it does. :)

CREATE DATABASE assumes that template0 cannot contain any non-ASCII
data, so it's okay to clone it and then pretend that the result has some
other encoding.  The same assumption cannot be made for template1, since
that's user-modifiable.

            regards, tom lane

Re: Change server encoding after the fact

From
Cody Caughlan
Date:
Please see below.

On Fri, Sep 30, 2011 at 11:38 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com> wrote:
> Thanks Scott. See below:
>
> On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com> wrote:
>> > I would like to change my server_encoding which is currently SQL_ASCII
>> > to UTF8.
>> >
>> > I have existing data that I would like to keep.
>> >
>> > From my understanding of the steps I need to:
>> >
>> > 1) alter the template1 database encoding via
>> >
>> > UPDATE pg_database SET encoding = 6 where datname IN ('template0',
>> > 'template1');
>>
>> Just create database using template0 as template and you can skip this
>> step ^^
>
>
> Wouldn't this only work if my template0 was UTF8 itself?
> => select datname, pg_encoding_to_char(encoding) from pg_database;
>        datname        | pg_encoding_to_char
> ----------------------+---------------------
>  template1            | SQL_ASCII
>  template0            | SQL_ASCII
>  postgres             | SQL_ASCII
>
> So it appears both template0 & template1 are SQL_ASCII, so how would
> creating from a new DB from template0 be any different than template1?

Well, let's try, shall we?  From a freshly created cluster on my
laptop, running 8.4:

smarlowe=# select datname, pg_encoding_to_char(encoding) from pg_database;
 datname  | pg_encoding_to_char
-----------+---------------------
 template1 | SQL_ASCII
 template0 | SQL_ASCII
 postgres  | SQL_ASCII
 smarlowe  | SQL_ASCII
(4 rows)

smarlowe=# create database j template template0 encoding 'UTF8';
CREATE DATABASE

Seems to work.

P.s. I'm not sure why it works, I just know that it does. :)


Ok, I see what you mean. This would create a new DB with the proper encoding. Which is "fine", and probably what I will do. I guess I see an ideal scenario being one where we permanently convert the template encoding to UTF8 so going forward I dont have to worry about forgetting to adding the encoding= 'UTF8' for every new DB I create.
 
>> > Are these the correct steps to perform or is there an easier / in-place
>> > way?
>>
>> > Also, when I dump my old DB and restore it, will it be converted
>> > appropriately (e.g. it came from am SQL_ASCII encoding and its going into a
>> > UTF-8 database)?
>>
>> You might need to set client encoding when restoring.  Or use iconv to
>> convert from one encoding to another, which is what I usually do.
>> Note that it's VERY likely you'll have data in a SQL_ASCII db that
>> won't go into a UTF8 database without some lossiness.
>
>
> Yes, I see this might be the case. From my playing around with iconv I
> cannot even properly do the conversion:
> $ pg_dump -Fp foo > foo.sql
> $ file -i foo.sql
> foo.sql: text/plain; charset=us-ascii
> $ iconv -f utf-8 foo.sql > utf8.sql
> iconv: illegal input sequence at position 2512661

I think you got it backwards, the -f should be somthing other than
utf-8 right?  That's what the -t should be right?  Try iconv without a
-f switch and a -t of utf-8 and see what happens...

You're right, I had -f when I needed -t. I tried it again with the same error:

$ iconv -t utf-8 foo.sql > utf.sql
iconv: illegal input sequence at position 2512661

Re: Change server encoding after the fact

From
Scott Marlowe
Date:
On Fri, Sep 30, 2011 at 12:46 PM, Cody Caughlan <toolbag@gmail.com> wrote:
> Please see below.
>
> On Fri, Sep 30, 2011 at 11:38 AM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com> wrote:
>> > Thanks Scott. See below:
>> >
>> > On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe
>> > <scott.marlowe@gmail.com>
>> > wrote:
>> >>
>> >> On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com>
>> >> wrote:
>> >> > I would like to change my server_encoding which is currently
>> >> > SQL_ASCII
>> >> > to UTF8.
>> >> >
>> >> > I have existing data that I would like to keep.
>> >> >
>> >> > From my understanding of the steps I need to:
>> >> >
>> >> > 1) alter the template1 database encoding via
>> >> >
>> >> > UPDATE pg_database SET encoding = 6 where datname IN ('template0',
>> >> > 'template1');
>> >>
>> >> Just create database using template0 as template and you can skip this
>> >> step ^^
>> >
>> >
>> > Wouldn't this only work if my template0 was UTF8 itself?
>> > => select datname, pg_encoding_to_char(encoding) from pg_database;
>> >        datname        | pg_encoding_to_char
>> > ----------------------+---------------------
>> >  template1            | SQL_ASCII
>> >  template0            | SQL_ASCII
>> >  postgres             | SQL_ASCII
>> >
>> > So it appears both template0 & template1 are SQL_ASCII, so how would
>> > creating from a new DB from template0 be any different than template1?
>>
>> Well, let's try, shall we?  From a freshly created cluster on my
>> laptop, running 8.4:
>>
>> smarlowe=# select datname, pg_encoding_to_char(encoding) from pg_database;
>>  datname  | pg_encoding_to_char
>> -----------+---------------------
>>  template1 | SQL_ASCII
>>  template0 | SQL_ASCII
>>  postgres  | SQL_ASCII
>>  smarlowe  | SQL_ASCII
>> (4 rows)
>>
>> smarlowe=# create database j template template0 encoding 'UTF8';
>> CREATE DATABASE
>>
>> Seems to work.
>>
>> P.s. I'm not sure why it works, I just know that it does. :)
>>
>
> Ok, I see what you mean. This would create a new DB with the proper
> encoding. Which is "fine", and probably what I will do. I guess I see an
> ideal scenario being one where we permanently convert the template encoding
> to UTF8 so going forward I dont have to worry about forgetting to adding the
> encoding= 'UTF8' for every new DB I create.

Ah ok.  The way I fix that is this:

update pg_database set datistemplate = false where datname='template1';
drop database template1;
create database template1 template template0 encoding 'UTF8';

But your way would likely work too.

>> I think you got it backwards, the -f should be somthing other than
>> utf-8 right?  That's what the -t should be right?  Try iconv without a
>> -f switch and a -t of utf-8 and see what happens...
>
> You're right, I had -f when I needed -t. I tried it again with the same
> error:
> $ iconv -t utf-8 foo.sql > utf.sql
> iconv: illegal input sequence at position 2512661

Any idea waht the actual encoding of your source database is?
SQL_ASCII is basically not really ascii, more like anything goes.

Re: Change server encoding after the fact

From
Tom Lane
Date:
Cody Caughlan <toolbag@gmail.com> writes:
> Ok, I see what you mean. This would create a new DB with the proper
> encoding. Which is "fine", and probably what I will do. I guess I see an
> ideal scenario being one where we permanently convert the template encoding
> to UTF8 so going forward I dont have to worry about forgetting to adding the
> encoding= 'UTF8' for every new DB I create.

Well, if you're feeling brave you can mark template0 as having utf8
encoding via a manual update to pg_database.  In theory that should
be safe enough.  If you know template1 doesn't, and never has, contained
any non-ASCII data, you could do the same to it ... but it would be a
lot safer to drop it and recreate from template0.  See
http://wiki.postgresql.org/wiki/Adventures_in_PostgreSQL,_Episode_1
for some context.

            regards, tom lane

Re: Change server encoding after the fact

From
Cody Caughlan
Date:


On Fri, Sep 30, 2011 at 12:31 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Fri, Sep 30, 2011 at 12:46 PM, Cody Caughlan <toolbag@gmail.com> wrote:
> Please see below.
>
> On Fri, Sep 30, 2011 at 11:38 AM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com> wrote:
>> > Thanks Scott. See below:
>> >
>> > On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe
>> > <scott.marlowe@gmail.com>
>> > wrote:
>> >>
>> >> On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com>
>> >> wrote:
>> >> > I would like to change my server_encoding which is currently
>> >> > SQL_ASCII
>> >> > to UTF8.
>> >> >
>> >> > I have existing data that I would like to keep.
>> >> >
>> >> > From my understanding of the steps I need to:
>> >> >
>> >> > 1) alter the template1 database encoding via
>> >> >
>> >> > UPDATE pg_database SET encoding = 6 where datname IN ('template0',
>> >> > 'template1');
>> >>
>> >> Just create database using template0 as template and you can skip this
>> >> step ^^
>> >
>> >
>> > Wouldn't this only work if my template0 was UTF8 itself?
>> > => select datname, pg_encoding_to_char(encoding) from pg_database;
>> >        datname        | pg_encoding_to_char
>> > ----------------------+---------------------
>> >  template1            | SQL_ASCII
>> >  template0            | SQL_ASCII
>> >  postgres             | SQL_ASCII
>> >
>> > So it appears both template0 & template1 are SQL_ASCII, so how would
>> > creating from a new DB from template0 be any different than template1?
>>
>> Well, let's try, shall we?  From a freshly created cluster on my
>> laptop, running 8.4:
>>
>> smarlowe=# select datname, pg_encoding_to_char(encoding) from pg_database;
>>  datname  | pg_encoding_to_char
>> -----------+---------------------
>>  template1 | SQL_ASCII
>>  template0 | SQL_ASCII
>>  postgres  | SQL_ASCII
>>  smarlowe  | SQL_ASCII
>> (4 rows)
>>
>> smarlowe=# create database j template template0 encoding 'UTF8';
>> CREATE DATABASE
>>
>> Seems to work.
>>
>> P.s. I'm not sure why it works, I just know that it does. :)
>>
>
> Ok, I see what you mean. This would create a new DB with the proper
> encoding. Which is "fine", and probably what I will do. I guess I see an
> ideal scenario being one where we permanently convert the template encoding
> to UTF8 so going forward I dont have to worry about forgetting to adding the
> encoding= 'UTF8' for every new DB I create.

Ah ok.  The way I fix that is this:

update pg_database set datistemplate = false where datname='template1';
drop database template1;
create database template1 template template0 encoding 'UTF8';

But your way would likely work too.

>> I think you got it backwards, the -f should be somthing other than
>> utf-8 right?  That's what the -t should be right?  Try iconv without a
>> -f switch and a -t of utf-8 and see what happens...
>
> You're right, I had -f when I needed -t. I tried it again with the same
> error:
> $ iconv -t utf-8 foo.sql > utf.sql
> iconv: illegal input sequence at position 2512661

Any idea waht the actual encoding of your source database is?
SQL_ASCII is basically not really ascii, more like anything goes.


How would I find this? pg_database says my DB is SQL_ASCII. 

"show all" says

client_encoding = SQL_ASCII
server_encoding = SQL_ASCII

Re: Change server encoding after the fact

From
Scott Marlowe
Date:
On Fri, Sep 30, 2011 at 1:35 PM, Cody Caughlan <toolbag@gmail.com> wrote:
>
>
> On Fri, Sep 30, 2011 at 12:31 PM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Fri, Sep 30, 2011 at 12:46 PM, Cody Caughlan <toolbag@gmail.com> wrote:
>> > Please see below.
>> >
>> > On Fri, Sep 30, 2011 at 11:38 AM, Scott Marlowe
>> > <scott.marlowe@gmail.com>
>> > wrote:
>> >>
>> >> On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com>
>> >> wrote:
>> >> > Thanks Scott. See below:
>> >> >
>> >> > On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe
>> >> > <scott.marlowe@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com>
>> >> >> wrote:
>> >> >> > I would like to change my server_encoding which is currently
>> >> >> > SQL_ASCII
>> >> >> > to UTF8.
>> >> >> >
>> >> >> > I have existing data that I would like to keep.
>> >> >> >
>> >> >> > From my understanding of the steps I need to:
>> >> >> >
>> >> >> > 1) alter the template1 database encoding via
>> >> >> >
>> >> >> > UPDATE pg_database SET encoding = 6 where datname IN ('template0',
>> >> >> > 'template1');
>> >> >>
>> >> >> Just create database using template0 as template and you can skip
>> >> >> this
>> >> >> step ^^
>> >> >
>> >> >
>> >> > Wouldn't this only work if my template0 was UTF8 itself?
>> >> > => select datname, pg_encoding_to_char(encoding) from pg_database;
>> >> >        datname        | pg_encoding_to_char
>> >> > ----------------------+---------------------
>> >> >  template1            | SQL_ASCII
>> >> >  template0            | SQL_ASCII
>> >> >  postgres             | SQL_ASCII
>> >> >
>> >> > So it appears both template0 & template1 are SQL_ASCII, so how would
>> >> > creating from a new DB from template0 be any different than
>> >> > template1?
>> >>
>> >> Well, let's try, shall we?  From a freshly created cluster on my
>> >> laptop, running 8.4:
>> >>
>> >> smarlowe=# select datname, pg_encoding_to_char(encoding) from
>> >> pg_database;
>> >>  datname  | pg_encoding_to_char
>> >> -----------+---------------------
>> >>  template1 | SQL_ASCII
>> >>  template0 | SQL_ASCII
>> >>  postgres  | SQL_ASCII
>> >>  smarlowe  | SQL_ASCII
>> >> (4 rows)
>> >>
>> >> smarlowe=# create database j template template0 encoding 'UTF8';
>> >> CREATE DATABASE
>> >>
>> >> Seems to work.
>> >>
>> >> P.s. I'm not sure why it works, I just know that it does. :)
>> >>
>> >
>> > Ok, I see what you mean. This would create a new DB with the proper
>> > encoding. Which is "fine", and probably what I will do. I guess I see an
>> > ideal scenario being one where we permanently convert the template
>> > encoding
>> > to UTF8 so going forward I dont have to worry about forgetting to adding
>> > the
>> > encoding= 'UTF8' for every new DB I create.
>>
>> Ah ok.  The way I fix that is this:
>>
>> update pg_database set datistemplate = false where datname='template1';
>> drop database template1;
>> create database template1 template template0 encoding 'UTF8';
>>
>> But your way would likely work too.
>>
>> >> I think you got it backwards, the -f should be somthing other than
>> >> utf-8 right?  That's what the -t should be right?  Try iconv without a
>> >> -f switch and a -t of utf-8 and see what happens...
>> >
>> > You're right, I had -f when I needed -t. I tried it again with the same
>> > error:
>> > $ iconv -t utf-8 foo.sql > utf.sql
>> > iconv: illegal input sequence at position 2512661
>>
>> Any idea waht the actual encoding of your source database is?
>> SQL_ASCII is basically not really ascii, more like anything goes.
>
>
> How would I find this? pg_database says my DB is SQL_ASCII.
> "show all" says
> client_encoding = SQL_ASCII
> server_encoding = SQL_ASCII

It would have been set by the application accessing postgresql and
inserting the data.  I.e. was it a windows app using a typical windows
encoding?  etc.

Re: Change server encoding after the fact

From
Cody Caughlan
Date:
Its a Rails app and I do have:

  encoding: utf8

Set in my DB configuration.

On Fri, Sep 30, 2011 at 12:38 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Fri, Sep 30, 2011 at 1:35 PM, Cody Caughlan <toolbag@gmail.com> wrote:
>
>
> On Fri, Sep 30, 2011 at 12:31 PM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Fri, Sep 30, 2011 at 12:46 PM, Cody Caughlan <toolbag@gmail.com> wrote:
>> > Please see below.
>> >
>> > On Fri, Sep 30, 2011 at 11:38 AM, Scott Marlowe
>> > <scott.marlowe@gmail.com>
>> > wrote:
>> >>
>> >> On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com>
>> >> wrote:
>> >> > Thanks Scott. See below:
>> >> >
>> >> > On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe
>> >> > <scott.marlowe@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com>
>> >> >> wrote:
>> >> >> > I would like to change my server_encoding which is currently
>> >> >> > SQL_ASCII
>> >> >> > to UTF8.
>> >> >> >
>> >> >> > I have existing data that I would like to keep.
>> >> >> >
>> >> >> > From my understanding of the steps I need to:
>> >> >> >
>> >> >> > 1) alter the template1 database encoding via
>> >> >> >
>> >> >> > UPDATE pg_database SET encoding = 6 where datname IN ('template0',
>> >> >> > 'template1');
>> >> >>
>> >> >> Just create database using template0 as template and you can skip
>> >> >> this
>> >> >> step ^^
>> >> >
>> >> >
>> >> > Wouldn't this only work if my template0 was UTF8 itself?
>> >> > => select datname, pg_encoding_to_char(encoding) from pg_database;
>> >> >        datname        | pg_encoding_to_char
>> >> > ----------------------+---------------------
>> >> >  template1            | SQL_ASCII
>> >> >  template0            | SQL_ASCII
>> >> >  postgres             | SQL_ASCII
>> >> >
>> >> > So it appears both template0 & template1 are SQL_ASCII, so how would
>> >> > creating from a new DB from template0 be any different than
>> >> > template1?
>> >>
>> >> Well, let's try, shall we?  From a freshly created cluster on my
>> >> laptop, running 8.4:
>> >>
>> >> smarlowe=# select datname, pg_encoding_to_char(encoding) from
>> >> pg_database;
>> >>  datname  | pg_encoding_to_char
>> >> -----------+---------------------
>> >>  template1 | SQL_ASCII
>> >>  template0 | SQL_ASCII
>> >>  postgres  | SQL_ASCII
>> >>  smarlowe  | SQL_ASCII
>> >> (4 rows)
>> >>
>> >> smarlowe=# create database j template template0 encoding 'UTF8';
>> >> CREATE DATABASE
>> >>
>> >> Seems to work.
>> >>
>> >> P.s. I'm not sure why it works, I just know that it does. :)
>> >>
>> >
>> > Ok, I see what you mean. This would create a new DB with the proper
>> > encoding. Which is "fine", and probably what I will do. I guess I see an
>> > ideal scenario being one where we permanently convert the template
>> > encoding
>> > to UTF8 so going forward I dont have to worry about forgetting to adding
>> > the
>> > encoding= 'UTF8' for every new DB I create.
>>
>> Ah ok.  The way I fix that is this:
>>
>> update pg_database set datistemplate = false where datname='template1';
>> drop database template1;
>> create database template1 template template0 encoding 'UTF8';
>>
>> But your way would likely work too.
>>
>> >> I think you got it backwards, the -f should be somthing other than
>> >> utf-8 right?  That's what the -t should be right?  Try iconv without a
>> >> -f switch and a -t of utf-8 and see what happens...
>> >
>> > You're right, I had -f when I needed -t. I tried it again with the same
>> > error:
>> > $ iconv -t utf-8 foo.sql > utf.sql
>> > iconv: illegal input sequence at position 2512661
>>
>> Any idea waht the actual encoding of your source database is?
>> SQL_ASCII is basically not really ascii, more like anything goes.
>
>
> How would I find this? pg_database says my DB is SQL_ASCII.
> "show all" says
> client_encoding = SQL_ASCII
> server_encoding = SQL_ASCII

It would have been set by the application accessing postgresql and
inserting the data.  I.e. was it a windows app using a typical windows
encoding?  etc.

Re: Change server encoding after the fact

From
Scott Marlowe
Date:
On Fri, Sep 30, 2011 at 1:39 PM, Cody Caughlan <toolbag@gmail.com> wrote:
> Its a Rails app and I do have:
>   encoding: utf8


Hmmm, if you try this does it work (mostly)?

iconv -f utf-8 -t utf-8 -c < infile > outfile

Re: Change server encoding after the fact

From
Scott Marlowe
Date:
On Fri, Sep 30, 2011 at 1:41 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Fri, Sep 30, 2011 at 1:39 PM, Cody Caughlan <toolbag@gmail.com> wrote:
>> Its a Rails app and I do have:
>>   encoding: utf8
>
>
> Hmmm, if you try this does it work (mostly)?
>
> iconv -f utf-8 -t utf-8 -c < infile > outfile

If that doesn't work try:

iconv -f utf-8 -t utf-8//IGNORE -c < infile > outfile

Re: Change server encoding after the fact

From
Cody Caughlan
Date:
That worked, but "file" shows no difference:

$ iconv -f utf-8 -t utf-8 -c foo.sql > utf.sql
$ file -i foo.sql
foo.sql: text/plain; charset=us-ascii

$file -i utf.sql
utf.sql: text/plain; charset=us-ascii

So iconv didnt actually convert the file OR does is the "file" command just ignorant?

On Fri, Sep 30, 2011 at 12:41 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Fri, Sep 30, 2011 at 1:39 PM, Cody Caughlan <toolbag@gmail.com> wrote:
> Its a Rails app and I do have:
>   encoding: utf8


Hmmm, if you try this does it work (mostly)?

iconv -f utf-8 -t utf-8 -c < infile > outfile

Re: Change server encoding after the fact

From
Scott Marlowe
Date:
On Fri, Sep 30, 2011 at 1:45 PM, Cody Caughlan <toolbag@gmail.com> wrote:
> That worked, but "file" shows no difference:
> $ iconv -f utf-8 -t utf-8 -c foo.sql > utf.sql
> $ file -i foo.sql
> foo.sql: text/plain; charset=us-ascii
> $file -i utf.sql
> utf.sql: text/plain; charset=us-ascii
> So iconv didnt actually convert the file OR does is the "file" command just
> ignorant?

Not sure.  try loading the dump into the UTF-8 DB in postgres and see
what happens I guess?

Re: Change server encoding after the fact

From
Cody Caughlan
Date:
Please see below.

On Fri, Sep 30, 2011 at 1:12 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Fri, Sep 30, 2011 at 1:45 PM, Cody Caughlan <toolbag@gmail.com> wrote:
> That worked, but "file" shows no difference:
> $ iconv -f utf-8 -t utf-8 -c foo.sql > utf.sql
> $ file -i foo.sql
> foo.sql: text/plain; charset=us-ascii
> $file -i utf.sql
> utf.sql: text/plain; charset=us-ascii
> So iconv didnt actually convert the file OR does is the "file" command just
> ignorant?

Not sure.  try loading the dump into the UTF-8 DB in postgres and see
what happens I guess?


Uh oh.

On the remote machine:

$ pg_dump -Fc -E UTF8 foo > foo.sql

Then I've created a new local DB with UTF8 encoding and I try to restore this dump into it:

pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 2342; 0 17086 TABLE DATA wine_books vinosmith
pg_restore: [archiver (db)] COPY failed for table "wine_books": ERROR:  invalid byte sequence for encoding "UTF8": 0xc309
CONTEXT:  COPY wine_books, line 1147
WARNING: errors ignored on restore: 1

And sure enough the table "wine_books" is empty. Not good.

Re: Change server encoding after the fact

From
Scott Marlowe
Date:
On Fri, Sep 30, 2011 at 2:57 PM, Cody Caughlan <toolbag@gmail.com> wrote:
> Please see below.
>
> On Fri, Sep 30, 2011 at 1:12 PM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Fri, Sep 30, 2011 at 1:45 PM, Cody Caughlan <toolbag@gmail.com> wrote:
>> > That worked, but "file" shows no difference:
>> > $ iconv -f utf-8 -t utf-8 -c foo.sql > utf.sql
>> > $ file -i foo.sql
>> > foo.sql: text/plain; charset=us-ascii
>> > $file -i utf.sql
>> > utf.sql: text/plain; charset=us-ascii
>> > So iconv didnt actually convert the file OR does is the "file" command
>> > just
>> > ignorant?
>>
>> Not sure.  try loading the dump into the UTF-8 DB in postgres and see
>> what happens I guess?
>
>
> Uh oh.
> On the remote machine:
> $ pg_dump -Fc -E UTF8 foo > foo.sql
> Then I've created a new local DB with UTF8 encoding and I try to restore
> this dump into it:
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 2342; 0 17086 TABLE DATA
> wine_books vinosmith
> pg_restore: [archiver (db)] COPY failed for table "wine_books": ERROR:
>  invalid byte sequence for encoding "UTF8": 0xc309
> CONTEXT:  COPY wine_books, line 1147
> WARNING: errors ignored on restore: 1
> And sure enough the table "wine_books" is empty. Not good.

You may have to hunt down that one bad line (1147) and chop it out /
edit it so it works.

Re: Change server encoding after the fact

From
Cody Caughlan
Date:
Thanks y'all for your help on this.

I took this opportunity to upgrade to 9.1.1 which is UTF8 by default and I ended up manually cleaning up the borked data by hand (there wasn't that much).

So all is well now. 

Thanks again.

/Cody



On Fri, Sep 30, 2011 at 3:37 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Fri, Sep 30, 2011 at 2:57 PM, Cody Caughlan <toolbag@gmail.com> wrote:
> Please see below.
>
> On Fri, Sep 30, 2011 at 1:12 PM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Fri, Sep 30, 2011 at 1:45 PM, Cody Caughlan <toolbag@gmail.com> wrote:
>> > That worked, but "file" shows no difference:
>> > $ iconv -f utf-8 -t utf-8 -c foo.sql > utf.sql
>> > $ file -i foo.sql
>> > foo.sql: text/plain; charset=us-ascii
>> > $file -i utf.sql
>> > utf.sql: text/plain; charset=us-ascii
>> > So iconv didnt actually convert the file OR does is the "file" command
>> > just
>> > ignorant?
>>
>> Not sure.  try loading the dump into the UTF-8 DB in postgres and see
>> what happens I guess?
>
>
> Uh oh.
> On the remote machine:
> $ pg_dump -Fc -E UTF8 foo > foo.sql
> Then I've created a new local DB with UTF8 encoding and I try to restore
> this dump into it:
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 2342; 0 17086 TABLE DATA
> wine_books vinosmith
> pg_restore: [archiver (db)] COPY failed for table "wine_books": ERROR:
>  invalid byte sequence for encoding "UTF8": 0xc309
> CONTEXT:  COPY wine_books, line 1147
> WARNING: errors ignored on restore: 1
> And sure enough the table "wine_books" is empty. Not good.

You may have to hunt down that one bad line (1147) and chop it out /
edit it so it works.