Thread: Change server encoding after the fact
I would like to change my server_encoding which is currently SQL_ASCII to UTF8. I have existing data that I would like to keep. From my understanding of the steps I need to: 1) alter the template1 database encoding via UPDATE pg_database SET encoding = 6 where datname IN ('template0', 'template1'); 2) Dump my current database pg_dump -Fc foo > foo.db 3) Drop my current database drop database foo; 4) recreate it with the proper encoding create database foo with template = template1 encoding = 'UTF-8'; 5) restore from backup pg_restore -d foo foo.db Are these the correct steps to perform or is there an easier / in-place way? Also, when I dump my old DB and restore it, will it be converted appropriately (e.g. it came from am SQL_ASCII encoding andits going into a UTF-8 database)? Thank you /Cody
On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com> wrote: > I would like to change my server_encoding which is currently SQL_ASCII to UTF8. > > I have existing data that I would like to keep. > > From my understanding of the steps I need to: > > 1) alter the template1 database encoding via > > UPDATE pg_database SET encoding = 6 where datname IN ('template0', 'template1'); Just create database using template0 as template and you can skip this step ^^ > Are these the correct steps to perform or is there an easier / in-place way? > Also, when I dump my old DB and restore it, will it be converted appropriately (e.g. it came from am SQL_ASCII encodingand its going into a UTF-8 database)? You might need to set client encoding when restoring. Or use iconv to convert from one encoding to another, which is what I usually do. Note that it's VERY likely you'll have data in a SQL_ASCII db that won't go into a UTF8 database without some lossiness.
Thanks Scott. See below:
On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com> wrote:Just create database using template0 as template and you can skip this step ^^
> I would like to change my server_encoding which is currently SQL_ASCII to UTF8.
>
> I have existing data that I would like to keep.
>
> From my understanding of the steps I need to:
>
> 1) alter the template1 database encoding via
>
> UPDATE pg_database SET encoding = 6 where datname IN ('template0', 'template1');
Wouldn't this only work if my template0 was UTF8 itself?
=> select datname, pg_encoding_to_char(encoding) from pg_database;
datname | pg_encoding_to_char
----------------------+---------------------
template1 | SQL_ASCII
template0 | SQL_ASCII
postgres | SQL_ASCII
So it appears both template0 & template1 are SQL_ASCII, so how would creating from a new DB from template0 be any different than template1?
> Are these the correct steps to perform or is there an easier / in-place way?You might need to set client encoding when restoring. Or use iconv to
> Also, when I dump my old DB and restore it, will it be converted appropriately (e.g. it came from am SQL_ASCII encoding and its going into a UTF-8 database)?
convert from one encoding to another, which is what I usually do.
Note that it's VERY likely you'll have data in a SQL_ASCII db that
won't go into a UTF8 database without some lossiness.
Yes, I see this might be the case. From my playing around with iconv I cannot even properly do the conversion:
$ pg_dump -Fp foo > foo.sql
$ file -i foo.sql
foo.sql: text/plain; charset=us-ascii
$ iconv -f utf-8 foo.sql > utf8.sql
iconv: illegal input sequence at position 2512661
Uh oh... I cannot event convert it?
Whats my next step at this point if I cannot even convert my data? I'd be OK with some lossiness.
Thanks again
/Cody
On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com> wrote: > Thanks Scott. See below: > > On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe <scott.marlowe@gmail.com> > wrote: >> >> On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com> wrote: >> > I would like to change my server_encoding which is currently SQL_ASCII >> > to UTF8. >> > >> > I have existing data that I would like to keep. >> > >> > From my understanding of the steps I need to: >> > >> > 1) alter the template1 database encoding via >> > >> > UPDATE pg_database SET encoding = 6 where datname IN ('template0', >> > 'template1'); >> >> Just create database using template0 as template and you can skip this >> step ^^ > > > Wouldn't this only work if my template0 was UTF8 itself? > => select datname, pg_encoding_to_char(encoding) from pg_database; > datname | pg_encoding_to_char > ----------------------+--------------------- > template1 | SQL_ASCII > template0 | SQL_ASCII > postgres | SQL_ASCII > > So it appears both template0 & template1 are SQL_ASCII, so how would > creating from a new DB from template0 be any different than template1? Well, let's try, shall we? From a freshly created cluster on my laptop, running 8.4: smarlowe=# select datname, pg_encoding_to_char(encoding) from pg_database; datname | pg_encoding_to_char -----------+--------------------- template1 | SQL_ASCII template0 | SQL_ASCII postgres | SQL_ASCII smarlowe | SQL_ASCII (4 rows) smarlowe=# create database j template template0 encoding 'UTF8'; CREATE DATABASE Seems to work. P.s. I'm not sure why it works, I just know that it does. :) >> > Are these the correct steps to perform or is there an easier / in-place >> > way? >> >> > Also, when I dump my old DB and restore it, will it be converted >> > appropriately (e.g. it came from am SQL_ASCII encoding and its going into a >> > UTF-8 database)? >> >> You might need to set client encoding when restoring. Or use iconv to >> convert from one encoding to another, which is what I usually do. >> Note that it's VERY likely you'll have data in a SQL_ASCII db that >> won't go into a UTF8 database without some lossiness. > > > Yes, I see this might be the case. From my playing around with iconv I > cannot even properly do the conversion: > $ pg_dump -Fp foo > foo.sql > $ file -i foo.sql > foo.sql: text/plain; charset=us-ascii > $ iconv -f utf-8 foo.sql > utf8.sql > iconv: illegal input sequence at position 2512661 I think you got it backwards, the -f should be somthing other than utf-8 right? That's what the -t should be right? Try iconv without a -f switch and a -t of utf-8 and see what happens...
Scott Marlowe <scott.marlowe@gmail.com> writes: > On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com> wrote: >> So it appears both template0 & template1 are SQL_ASCII, so how would >> creating from a new DB from template0 be any different than template1? > P.s. I'm not sure why it works, I just know that it does. :) CREATE DATABASE assumes that template0 cannot contain any non-ASCII data, so it's okay to clone it and then pretend that the result has some other encoding. The same assumption cannot be made for template1, since that's user-modifiable. regards, tom lane
Please see below.
On Fri, Sep 30, 2011 at 11:38 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com> wrote:Well, let's try, shall we? From a freshly created cluster on my
> Thanks Scott. See below:
>
> On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com> wrote:
>> > I would like to change my server_encoding which is currently SQL_ASCII
>> > to UTF8.
>> >
>> > I have existing data that I would like to keep.
>> >
>> > From my understanding of the steps I need to:
>> >
>> > 1) alter the template1 database encoding via
>> >
>> > UPDATE pg_database SET encoding = 6 where datname IN ('template0',
>> > 'template1');
>>
>> Just create database using template0 as template and you can skip this
>> step ^^
>
>
> Wouldn't this only work if my template0 was UTF8 itself?
> => select datname, pg_encoding_to_char(encoding) from pg_database;
> datname | pg_encoding_to_char
> ----------------------+---------------------
> template1 | SQL_ASCII
> template0 | SQL_ASCII
> postgres | SQL_ASCII
>
> So it appears both template0 & template1 are SQL_ASCII, so how would
> creating from a new DB from template0 be any different than template1?
laptop, running 8.4:
smarlowe=# select datname, pg_encoding_to_char(encoding) from pg_database;datname | pg_encoding_to_charsmarlowe | SQL_ASCII
-----------+---------------------
template1 | SQL_ASCII
template0 | SQL_ASCII
postgres | SQL_ASCII
(4 rows)
smarlowe=# create database j template template0 encoding 'UTF8';
CREATE DATABASE
Seems to work.
P.s. I'm not sure why it works, I just know that it does. :)
Ok, I see what you mean. This would create a new DB with the proper encoding. Which is "fine", and probably what I will do. I guess I see an ideal scenario being one where we permanently convert the template encoding to UTF8 so going forward I dont have to worry about forgetting to adding the encoding= 'UTF8' for every new DB I create.
>> > Are these the correct steps to perform or is there an easier / in-placeI think you got it backwards, the -f should be somthing other than
>> > way?
>>
>> > Also, when I dump my old DB and restore it, will it be converted
>> > appropriately (e.g. it came from am SQL_ASCII encoding and its going into a
>> > UTF-8 database)?
>>
>> You might need to set client encoding when restoring. Or use iconv to
>> convert from one encoding to another, which is what I usually do.
>> Note that it's VERY likely you'll have data in a SQL_ASCII db that
>> won't go into a UTF8 database without some lossiness.
>
>
> Yes, I see this might be the case. From my playing around with iconv I
> cannot even properly do the conversion:
> $ pg_dump -Fp foo > foo.sql
> $ file -i foo.sql
> foo.sql: text/plain; charset=us-ascii
> $ iconv -f utf-8 foo.sql > utf8.sql
> iconv: illegal input sequence at position 2512661
utf-8 right? That's what the -t should be right? Try iconv without a
-f switch and a -t of utf-8 and see what happens...
You're right, I had -f when I needed -t. I tried it again with the same error:
$ iconv -t utf-8 foo.sql > utf.sql
iconv: illegal input sequence at position 2512661
On Fri, Sep 30, 2011 at 12:46 PM, Cody Caughlan <toolbag@gmail.com> wrote: > Please see below. > > On Fri, Sep 30, 2011 at 11:38 AM, Scott Marlowe <scott.marlowe@gmail.com> > wrote: >> >> On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com> wrote: >> > Thanks Scott. See below: >> > >> > On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe >> > <scott.marlowe@gmail.com> >> > wrote: >> >> >> >> On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com> >> >> wrote: >> >> > I would like to change my server_encoding which is currently >> >> > SQL_ASCII >> >> > to UTF8. >> >> > >> >> > I have existing data that I would like to keep. >> >> > >> >> > From my understanding of the steps I need to: >> >> > >> >> > 1) alter the template1 database encoding via >> >> > >> >> > UPDATE pg_database SET encoding = 6 where datname IN ('template0', >> >> > 'template1'); >> >> >> >> Just create database using template0 as template and you can skip this >> >> step ^^ >> > >> > >> > Wouldn't this only work if my template0 was UTF8 itself? >> > => select datname, pg_encoding_to_char(encoding) from pg_database; >> > datname | pg_encoding_to_char >> > ----------------------+--------------------- >> > template1 | SQL_ASCII >> > template0 | SQL_ASCII >> > postgres | SQL_ASCII >> > >> > So it appears both template0 & template1 are SQL_ASCII, so how would >> > creating from a new DB from template0 be any different than template1? >> >> Well, let's try, shall we? From a freshly created cluster on my >> laptop, running 8.4: >> >> smarlowe=# select datname, pg_encoding_to_char(encoding) from pg_database; >> datname | pg_encoding_to_char >> -----------+--------------------- >> template1 | SQL_ASCII >> template0 | SQL_ASCII >> postgres | SQL_ASCII >> smarlowe | SQL_ASCII >> (4 rows) >> >> smarlowe=# create database j template template0 encoding 'UTF8'; >> CREATE DATABASE >> >> Seems to work. >> >> P.s. I'm not sure why it works, I just know that it does. :) >> > > Ok, I see what you mean. This would create a new DB with the proper > encoding. Which is "fine", and probably what I will do. I guess I see an > ideal scenario being one where we permanently convert the template encoding > to UTF8 so going forward I dont have to worry about forgetting to adding the > encoding= 'UTF8' for every new DB I create. Ah ok. The way I fix that is this: update pg_database set datistemplate = false where datname='template1'; drop database template1; create database template1 template template0 encoding 'UTF8'; But your way would likely work too. >> I think you got it backwards, the -f should be somthing other than >> utf-8 right? That's what the -t should be right? Try iconv without a >> -f switch and a -t of utf-8 and see what happens... > > You're right, I had -f when I needed -t. I tried it again with the same > error: > $ iconv -t utf-8 foo.sql > utf.sql > iconv: illegal input sequence at position 2512661 Any idea waht the actual encoding of your source database is? SQL_ASCII is basically not really ascii, more like anything goes.
Cody Caughlan <toolbag@gmail.com> writes: > Ok, I see what you mean. This would create a new DB with the proper > encoding. Which is "fine", and probably what I will do. I guess I see an > ideal scenario being one where we permanently convert the template encoding > to UTF8 so going forward I dont have to worry about forgetting to adding the > encoding= 'UTF8' for every new DB I create. Well, if you're feeling brave you can mark template0 as having utf8 encoding via a manual update to pg_database. In theory that should be safe enough. If you know template1 doesn't, and never has, contained any non-ASCII data, you could do the same to it ... but it would be a lot safer to drop it and recreate from template0. See http://wiki.postgresql.org/wiki/Adventures_in_PostgreSQL,_Episode_1 for some context. regards, tom lane
On Fri, Sep 30, 2011 at 12:31 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
Ah ok. The way I fix that is this:On Fri, Sep 30, 2011 at 12:46 PM, Cody Caughlan <toolbag@gmail.com> wrote:
> Please see below.
>
> On Fri, Sep 30, 2011 at 11:38 AM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com> wrote:
>> > Thanks Scott. See below:
>> >
>> > On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe
>> > <scott.marlowe@gmail.com>
>> > wrote:
>> >>
>> >> On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com>
>> >> wrote:
>> >> > I would like to change my server_encoding which is currently
>> >> > SQL_ASCII
>> >> > to UTF8.
>> >> >
>> >> > I have existing data that I would like to keep.
>> >> >
>> >> > From my understanding of the steps I need to:
>> >> >
>> >> > 1) alter the template1 database encoding via
>> >> >
>> >> > UPDATE pg_database SET encoding = 6 where datname IN ('template0',
>> >> > 'template1');
>> >>
>> >> Just create database using template0 as template and you can skip this
>> >> step ^^
>> >
>> >
>> > Wouldn't this only work if my template0 was UTF8 itself?
>> > => select datname, pg_encoding_to_char(encoding) from pg_database;
>> > datname | pg_encoding_to_char
>> > ----------------------+---------------------
>> > template1 | SQL_ASCII
>> > template0 | SQL_ASCII
>> > postgres | SQL_ASCII
>> >
>> > So it appears both template0 & template1 are SQL_ASCII, so how would
>> > creating from a new DB from template0 be any different than template1?
>>
>> Well, let's try, shall we? From a freshly created cluster on my
>> laptop, running 8.4:
>>
>> smarlowe=# select datname, pg_encoding_to_char(encoding) from pg_database;
>> datname | pg_encoding_to_char
>> -----------+---------------------
>> template1 | SQL_ASCII
>> template0 | SQL_ASCII
>> postgres | SQL_ASCII
>> smarlowe | SQL_ASCII
>> (4 rows)
>>
>> smarlowe=# create database j template template0 encoding 'UTF8';
>> CREATE DATABASE
>>
>> Seems to work.
>>
>> P.s. I'm not sure why it works, I just know that it does. :)
>>
>
> Ok, I see what you mean. This would create a new DB with the proper
> encoding. Which is "fine", and probably what I will do. I guess I see an
> ideal scenario being one where we permanently convert the template encoding
> to UTF8 so going forward I dont have to worry about forgetting to adding the
> encoding= 'UTF8' for every new DB I create.
update pg_database set datistemplate = false where datname='template1';
drop database template1;
create database template1 template template0 encoding 'UTF8';
But your way would likely work too.Any idea waht the actual encoding of your source database is?
>> I think you got it backwards, the -f should be somthing other than
>> utf-8 right? That's what the -t should be right? Try iconv without a
>> -f switch and a -t of utf-8 and see what happens...
>
> You're right, I had -f when I needed -t. I tried it again with the same
> error:
> $ iconv -t utf-8 foo.sql > utf.sql
> iconv: illegal input sequence at position 2512661
SQL_ASCII is basically not really ascii, more like anything goes.
How would I find this? pg_database says my DB is SQL_ASCII.
"show all" says
client_encoding = SQL_ASCII
server_encoding = SQL_ASCII
On Fri, Sep 30, 2011 at 1:35 PM, Cody Caughlan <toolbag@gmail.com> wrote: > > > On Fri, Sep 30, 2011 at 12:31 PM, Scott Marlowe <scott.marlowe@gmail.com> > wrote: >> >> On Fri, Sep 30, 2011 at 12:46 PM, Cody Caughlan <toolbag@gmail.com> wrote: >> > Please see below. >> > >> > On Fri, Sep 30, 2011 at 11:38 AM, Scott Marlowe >> > <scott.marlowe@gmail.com> >> > wrote: >> >> >> >> On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com> >> >> wrote: >> >> > Thanks Scott. See below: >> >> > >> >> > On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe >> >> > <scott.marlowe@gmail.com> >> >> > wrote: >> >> >> >> >> >> On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com> >> >> >> wrote: >> >> >> > I would like to change my server_encoding which is currently >> >> >> > SQL_ASCII >> >> >> > to UTF8. >> >> >> > >> >> >> > I have existing data that I would like to keep. >> >> >> > >> >> >> > From my understanding of the steps I need to: >> >> >> > >> >> >> > 1) alter the template1 database encoding via >> >> >> > >> >> >> > UPDATE pg_database SET encoding = 6 where datname IN ('template0', >> >> >> > 'template1'); >> >> >> >> >> >> Just create database using template0 as template and you can skip >> >> >> this >> >> >> step ^^ >> >> > >> >> > >> >> > Wouldn't this only work if my template0 was UTF8 itself? >> >> > => select datname, pg_encoding_to_char(encoding) from pg_database; >> >> > datname | pg_encoding_to_char >> >> > ----------------------+--------------------- >> >> > template1 | SQL_ASCII >> >> > template0 | SQL_ASCII >> >> > postgres | SQL_ASCII >> >> > >> >> > So it appears both template0 & template1 are SQL_ASCII, so how would >> >> > creating from a new DB from template0 be any different than >> >> > template1? >> >> >> >> Well, let's try, shall we? From a freshly created cluster on my >> >> laptop, running 8.4: >> >> >> >> smarlowe=# select datname, pg_encoding_to_char(encoding) from >> >> pg_database; >> >> datname | pg_encoding_to_char >> >> -----------+--------------------- >> >> template1 | SQL_ASCII >> >> template0 | SQL_ASCII >> >> postgres | SQL_ASCII >> >> smarlowe | SQL_ASCII >> >> (4 rows) >> >> >> >> smarlowe=# create database j template template0 encoding 'UTF8'; >> >> CREATE DATABASE >> >> >> >> Seems to work. >> >> >> >> P.s. I'm not sure why it works, I just know that it does. :) >> >> >> > >> > Ok, I see what you mean. This would create a new DB with the proper >> > encoding. Which is "fine", and probably what I will do. I guess I see an >> > ideal scenario being one where we permanently convert the template >> > encoding >> > to UTF8 so going forward I dont have to worry about forgetting to adding >> > the >> > encoding= 'UTF8' for every new DB I create. >> >> Ah ok. The way I fix that is this: >> >> update pg_database set datistemplate = false where datname='template1'; >> drop database template1; >> create database template1 template template0 encoding 'UTF8'; >> >> But your way would likely work too. >> >> >> I think you got it backwards, the -f should be somthing other than >> >> utf-8 right? That's what the -t should be right? Try iconv without a >> >> -f switch and a -t of utf-8 and see what happens... >> > >> > You're right, I had -f when I needed -t. I tried it again with the same >> > error: >> > $ iconv -t utf-8 foo.sql > utf.sql >> > iconv: illegal input sequence at position 2512661 >> >> Any idea waht the actual encoding of your source database is? >> SQL_ASCII is basically not really ascii, more like anything goes. > > > How would I find this? pg_database says my DB is SQL_ASCII. > "show all" says > client_encoding = SQL_ASCII > server_encoding = SQL_ASCII It would have been set by the application accessing postgresql and inserting the data. I.e. was it a windows app using a typical windows encoding? etc.
Its a Rails app and I do have:
encoding: utf8
Set in my DB configuration.
On Fri, Sep 30, 2011 at 12:38 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
It would have been set by the application accessing postgresql andOn Fri, Sep 30, 2011 at 1:35 PM, Cody Caughlan <toolbag@gmail.com> wrote:
>
>
> On Fri, Sep 30, 2011 at 12:31 PM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Fri, Sep 30, 2011 at 12:46 PM, Cody Caughlan <toolbag@gmail.com> wrote:
>> > Please see below.
>> >
>> > On Fri, Sep 30, 2011 at 11:38 AM, Scott Marlowe
>> > <scott.marlowe@gmail.com>
>> > wrote:
>> >>
>> >> On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com>
>> >> wrote:
>> >> > Thanks Scott. See below:
>> >> >
>> >> > On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe
>> >> > <scott.marlowe@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com>
>> >> >> wrote:
>> >> >> > I would like to change my server_encoding which is currently
>> >> >> > SQL_ASCII
>> >> >> > to UTF8.
>> >> >> >
>> >> >> > I have existing data that I would like to keep.
>> >> >> >
>> >> >> > From my understanding of the steps I need to:
>> >> >> >
>> >> >> > 1) alter the template1 database encoding via
>> >> >> >
>> >> >> > UPDATE pg_database SET encoding = 6 where datname IN ('template0',
>> >> >> > 'template1');
>> >> >>
>> >> >> Just create database using template0 as template and you can skip
>> >> >> this
>> >> >> step ^^
>> >> >
>> >> >
>> >> > Wouldn't this only work if my template0 was UTF8 itself?
>> >> > => select datname, pg_encoding_to_char(encoding) from pg_database;
>> >> > datname | pg_encoding_to_char
>> >> > ----------------------+---------------------
>> >> > template1 | SQL_ASCII
>> >> > template0 | SQL_ASCII
>> >> > postgres | SQL_ASCII
>> >> >
>> >> > So it appears both template0 & template1 are SQL_ASCII, so how would
>> >> > creating from a new DB from template0 be any different than
>> >> > template1?
>> >>
>> >> Well, let's try, shall we? From a freshly created cluster on my
>> >> laptop, running 8.4:
>> >>
>> >> smarlowe=# select datname, pg_encoding_to_char(encoding) from
>> >> pg_database;
>> >> datname | pg_encoding_to_char
>> >> -----------+---------------------
>> >> template1 | SQL_ASCII
>> >> template0 | SQL_ASCII
>> >> postgres | SQL_ASCII
>> >> smarlowe | SQL_ASCII
>> >> (4 rows)
>> >>
>> >> smarlowe=# create database j template template0 encoding 'UTF8';
>> >> CREATE DATABASE
>> >>
>> >> Seems to work.
>> >>
>> >> P.s. I'm not sure why it works, I just know that it does. :)
>> >>
>> >
>> > Ok, I see what you mean. This would create a new DB with the proper
>> > encoding. Which is "fine", and probably what I will do. I guess I see an
>> > ideal scenario being one where we permanently convert the template
>> > encoding
>> > to UTF8 so going forward I dont have to worry about forgetting to adding
>> > the
>> > encoding= 'UTF8' for every new DB I create.
>>
>> Ah ok. The way I fix that is this:
>>
>> update pg_database set datistemplate = false where datname='template1';
>> drop database template1;
>> create database template1 template template0 encoding 'UTF8';
>>
>> But your way would likely work too.
>>
>> >> I think you got it backwards, the -f should be somthing other than
>> >> utf-8 right? That's what the -t should be right? Try iconv without a
>> >> -f switch and a -t of utf-8 and see what happens...
>> >
>> > You're right, I had -f when I needed -t. I tried it again with the same
>> > error:
>> > $ iconv -t utf-8 foo.sql > utf.sql
>> > iconv: illegal input sequence at position 2512661
>>
>> Any idea waht the actual encoding of your source database is?
>> SQL_ASCII is basically not really ascii, more like anything goes.
>
>
> How would I find this? pg_database says my DB is SQL_ASCII.
> "show all" says
> client_encoding = SQL_ASCII
> server_encoding = SQL_ASCII
inserting the data. I.e. was it a windows app using a typical windows
encoding? etc.
On Fri, Sep 30, 2011 at 1:39 PM, Cody Caughlan <toolbag@gmail.com> wrote: > Its a Rails app and I do have: > encoding: utf8 Hmmm, if you try this does it work (mostly)? iconv -f utf-8 -t utf-8 -c < infile > outfile
On Fri, Sep 30, 2011 at 1:41 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Fri, Sep 30, 2011 at 1:39 PM, Cody Caughlan <toolbag@gmail.com> wrote: >> Its a Rails app and I do have: >> encoding: utf8 > > > Hmmm, if you try this does it work (mostly)? > > iconv -f utf-8 -t utf-8 -c < infile > outfile If that doesn't work try: iconv -f utf-8 -t utf-8//IGNORE -c < infile > outfile
That worked, but "file" shows no difference:
$ iconv -f utf-8 -t utf-8 -c foo.sql > utf.sql
$ file -i foo.sql
foo.sql: text/plain; charset=us-ascii
$file -i utf.sql
utf.sql: text/plain; charset=us-ascii
So iconv didnt actually convert the file OR does is the "file" command just ignorant?
On Fri, Sep 30, 2011 at 12:41 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Fri, Sep 30, 2011 at 1:39 PM, Cody Caughlan <toolbag@gmail.com> wrote:Hmmm, if you try this does it work (mostly)?
> Its a Rails app and I do have:
> encoding: utf8
iconv -f utf-8 -t utf-8 -c < infile > outfile
On Fri, Sep 30, 2011 at 1:45 PM, Cody Caughlan <toolbag@gmail.com> wrote: > That worked, but "file" shows no difference: > $ iconv -f utf-8 -t utf-8 -c foo.sql > utf.sql > $ file -i foo.sql > foo.sql: text/plain; charset=us-ascii > $file -i utf.sql > utf.sql: text/plain; charset=us-ascii > So iconv didnt actually convert the file OR does is the "file" command just > ignorant? Not sure. try loading the dump into the UTF-8 DB in postgres and see what happens I guess?
Please see below.
On Fri, Sep 30, 2011 at 1:12 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Fri, Sep 30, 2011 at 1:45 PM, Cody Caughlan <toolbag@gmail.com> wrote:Not sure. try loading the dump into the UTF-8 DB in postgres and see
> That worked, but "file" shows no difference:
> $ iconv -f utf-8 -t utf-8 -c foo.sql > utf.sql
> $ file -i foo.sql
> foo.sql: text/plain; charset=us-ascii
> $file -i utf.sql
> utf.sql: text/plain; charset=us-ascii
> So iconv didnt actually convert the file OR does is the "file" command just
> ignorant?
what happens I guess?
Uh oh.
On the remote machine:
$ pg_dump -Fc -E UTF8 foo > foo.sql
Then I've created a new local DB with UTF8 encoding and I try to restore this dump into it:
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 2342; 0 17086 TABLE DATA wine_books vinosmith
pg_restore: [archiver (db)] COPY failed for table "wine_books": ERROR: invalid byte sequence for encoding "UTF8": 0xc309
CONTEXT: COPY wine_books, line 1147
WARNING: errors ignored on restore: 1
And sure enough the table "wine_books" is empty. Not good.
On Fri, Sep 30, 2011 at 2:57 PM, Cody Caughlan <toolbag@gmail.com> wrote: > Please see below. > > On Fri, Sep 30, 2011 at 1:12 PM, Scott Marlowe <scott.marlowe@gmail.com> > wrote: >> >> On Fri, Sep 30, 2011 at 1:45 PM, Cody Caughlan <toolbag@gmail.com> wrote: >> > That worked, but "file" shows no difference: >> > $ iconv -f utf-8 -t utf-8 -c foo.sql > utf.sql >> > $ file -i foo.sql >> > foo.sql: text/plain; charset=us-ascii >> > $file -i utf.sql >> > utf.sql: text/plain; charset=us-ascii >> > So iconv didnt actually convert the file OR does is the "file" command >> > just >> > ignorant? >> >> Not sure. try loading the dump into the UTF-8 DB in postgres and see >> what happens I guess? > > > Uh oh. > On the remote machine: > $ pg_dump -Fc -E UTF8 foo > foo.sql > Then I've created a new local DB with UTF8 encoding and I try to restore > this dump into it: > pg_restore: [archiver (db)] Error while PROCESSING TOC: > pg_restore: [archiver (db)] Error from TOC entry 2342; 0 17086 TABLE DATA > wine_books vinosmith > pg_restore: [archiver (db)] COPY failed for table "wine_books": ERROR: > invalid byte sequence for encoding "UTF8": 0xc309 > CONTEXT: COPY wine_books, line 1147 > WARNING: errors ignored on restore: 1 > And sure enough the table "wine_books" is empty. Not good. You may have to hunt down that one bad line (1147) and chop it out / edit it so it works.
Thanks y'all for your help on this.
I took this opportunity to upgrade to 9.1.1 which is UTF8 by default and I ended up manually cleaning up the borked data by hand (there wasn't that much).
So all is well now.
Thanks again.
/Cody
On Fri, Sep 30, 2011 at 3:37 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
You may have to hunt down that one bad line (1147) and chop it out /On Fri, Sep 30, 2011 at 2:57 PM, Cody Caughlan <toolbag@gmail.com> wrote:
> Please see below.
>
> On Fri, Sep 30, 2011 at 1:12 PM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Fri, Sep 30, 2011 at 1:45 PM, Cody Caughlan <toolbag@gmail.com> wrote:
>> > That worked, but "file" shows no difference:
>> > $ iconv -f utf-8 -t utf-8 -c foo.sql > utf.sql
>> > $ file -i foo.sql
>> > foo.sql: text/plain; charset=us-ascii
>> > $file -i utf.sql
>> > utf.sql: text/plain; charset=us-ascii
>> > So iconv didnt actually convert the file OR does is the "file" command
>> > just
>> > ignorant?
>>
>> Not sure. try loading the dump into the UTF-8 DB in postgres and see
>> what happens I guess?
>
>
> Uh oh.
> On the remote machine:
> $ pg_dump -Fc -E UTF8 foo > foo.sql
> Then I've created a new local DB with UTF8 encoding and I try to restore
> this dump into it:
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 2342; 0 17086 TABLE DATA
> wine_books vinosmith
> pg_restore: [archiver (db)] COPY failed for table "wine_books": ERROR:
> invalid byte sequence for encoding "UTF8": 0xc309
> CONTEXT: COPY wine_books, line 1147
> WARNING: errors ignored on restore: 1
> And sure enough the table "wine_books" is empty. Not good.
edit it so it works.