Thread: Dumping/Restoring with constraints?
Hello. I have a database dump. With data and schema, which includes all the constraints and rules. But it seems the pg_dumpall command does not copy data in such a way that foreign keys are satisfied upon restoring. Because tables are inter-related, importing them keep giving errors and eventually no data is imported. Neither pg_dumpall nor pg_restore seems to have a "without constraints" or "delay constraints check" type command. What am I missing? Thanks for any advice.
On Wednesday 27 August 2008 09:19, Phoenix Kiula wrote: > Hello. I have a database dump. With data and schema, which includes > all the constraints and rules. But it seems the pg_dumpall command > does not copy data in such a way that foreign keys are satisfied upon > restoring. Because tables are inter-related, importing them keep > giving errors and eventually no data is imported. Neither pg_dumpall > nor pg_restore seems to have a "without constraints" or "delay > constraints check" type command. What am I missing? Thanks for any > advice. We have all sorts of constraints and foreign keys and we have never had any problem with pg_restore related to dumping such that foreign keys are satisfied. You must have data already in the database that violates the restraints. You can restore in two phases; that is, by restoring the schema, and then the data using --disable-triggers. I'm assuming you are doing a binary dump. See the man page for pg_restore. HTH -- Terry Lee Tucker Turbo's IT Manager Turbo, division of Ozburn-Hessey Logistics 2251 Jesse Jewell Pkwy NE Gainesville, GA 30501 Tel: (336) 372-6812 Fax: (336) 372-6812 Cell: (336) 404-6987 terry@turbocorp.com www.turbocorp.com
On Wed, Aug 27, 2008 at 9:29 PM, Terry Lee Tucker <terry@chosen-ones.org> wrote: > We have all sorts of constraints and foreign keys and we have never had any > problem with pg_restore related to dumping such that foreign keys are > satisfied. You must have data already in the database that violates the > restraints. You can restore in two phases; that is, by restoring the schema, > and then the data using --disable-triggers. I'm assuming you are doing a > binary dump. See the man page for pg_restore. Thanks for this. I don't have any foreign key violations in my existing database. I think the violation is happening because upon restoring the table that is being populated checks in another table that doesn't yet have data. I am not using pg_restore. I am just using "psql --file=FILENAME" syntax. Is that an issue?
"Phoenix Kiula" <phoenix.kiula@gmail.com> writes: > Hello. I have a database dump. With data and schema, which includes > all the constraints and rules. But it seems the pg_dumpall command > does not copy data in such a way that foreign keys are satisfied upon > restoring. Because tables are inter-related, importing them keep > giving errors and eventually no data is imported. This shouldn't be possible in a regular dump, at least not with any remotely modern version of PG. However, if you are trying to restore a data-only dump into an already-created set of tables, it is possible because pg_dump doesn't know how to order the data loads in that case. (The problem may in fact be insoluble, since circular foreign key relationships are allowed.) The recommended solution is to use a regular schema-and-data dump. If you really have to separate schema and data, the --disable-triggers switch might help, though you open yourself to the possibility of loading inconsistent data that way. regards, tom lane
> The recommended solution is to use a regular schema-and-data dump. > If you really have to separate schema and data, the --disable-triggers > switch might help, though you open yourself to the possibility of > loading inconsistent data that way. Thanks Tom. This is the dump command being used on a 8.2.3 database on Linux: $ pg_dumpall > mydb.sql $ umask 077 $ gzip mydb.sql Then I download the mydb.sql.gz file into my local computer (Mac OSX with 8.3.3) and unzip it to mydb.sql. The local database is already created, with all tables and constraints and all. Here is the command I use to restore: $ psql -d mydb -U myuser -h localhost --file=mydb.sql Is this not the recommended method?
On Wednesday 27 August 2008 09:36, Phoenix Kiula wrote: > On Wed, Aug 27, 2008 at 9:29 PM, Terry Lee Tucker <terry@chosen-ones.org> wrote: > > We have all sorts of constraints and foreign keys and we have never had > > any problem with pg_restore related to dumping such that foreign keys are > > satisfied. You must have data already in the database that violates the > > restraints. You can restore in two phases; that is, by restoring the > > schema, and then the data using --disable-triggers. I'm assuming you are > > doing a binary dump. See the man page for pg_restore. > > Thanks for this. I don't have any foreign key violations in my > existing database. I think the violation is happening because upon > restoring the table that is being populated checks in another table > that doesn't yet have data. > > I am not using pg_restore. I am just using "psql --file=FILENAME" > syntax. Is that an issue? The errors you are having, then, must be related to your own trigger code. It sounds like you will need to prevent those triggers from firing and the only way I know how to accomplish that is to do a binary dump and then use pg_restore as I indicated earlier. There is no way to disable triggers in your method referenced above. -- Terry Lee Tucker Turbo's IT Manager Turbo, division of Ozburn-Hessey Logistics 2251 Jesse Jewell Pkwy NE Gainesville, GA 30501 Tel: (336) 372-6812 Fax: (336) 372-6812 Cell: (336) 404-6987 terry@turbocorp.com www.turbocorp.com
In response to "Phoenix Kiula" <phoenix.kiula@gmail.com>: > > The recommended solution is to use a regular schema-and-data dump. > > If you really have to separate schema and data, the --disable-triggers > > switch might help, though you open yourself to the possibility of > > loading inconsistent data that way. > > Thanks Tom. > > This is the dump command being used on a 8.2.3 database on Linux: > > $ pg_dumpall > mydb.sql > $ umask 077 > $ gzip mydb.sql > > Then I download the mydb.sql.gz file into my local computer (Mac OSX > with 8.3.3) and unzip it to mydb.sql. > > The local database is already created, with all tables and constraints and all. Don't do that. Do one of the following: *) Allow the dump file to create all tables. *) In the early step dump the data only with pg_dumpall --disable-triggers -a -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
Thanks Bill for this recommendation: > *) In the early step dump the data only with pg_dumpall --disable-triggers -a Dumb question. Will this kind of pg_dumpall lead to downtime, I mean is there a database lock during this time? Thanks!
In response to "Phoenix Kiula" <phoenix.kiula@gmail.com>: > Thanks Bill for this recommendation: > > > > *) In the early step dump the data only with pg_dumpall --disable-triggers -a > > > Dumb question. Will this kind of pg_dumpall lead to downtime, I mean > is there a database lock during this time? No. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
On Wed, Aug 27, 2008 at 10:34 PM, Bill Moran <wmoran@collaborativefusion.com> wrote: >> >> >> Dumb question. Will this kind of pg_dumpall lead to downtime, I mean >> is there a database lock during this time? > > No. Thanks. But there seems to be a tangible slowdown of DB operations during the time that pg_dump is running. Perhaps some of my postgresql.conf variables are not geared to this. (I vaguely recall setting it up so that the wal_buffers or checkpoint_segments needed to be set upwards for maintenance tasks). My question: is it possible to interactively set up these variables so that pg_dumpall can work very fast? And behind the scenes, without slowing stuff down? Thanks
"Phoenix Kiula" <phoenix.kiula@gmail.com> writes: > This is the dump command being used on a 8.2.3 database on Linux: > $ pg_dumpall > mydb.sql > $ umask 077 > $ gzip mydb.sql > Then I download the mydb.sql.gz file into my local computer (Mac OSX > with 8.3.3) and unzip it to mydb.sql. That's all fine ... > The local database is already created, with all tables and constraints and all. ... but this isn't. A pg_dump script expects to restore into an empty database. For pg_dumpall, you shouldn't even have created the databases, just start from a virgin installation. regards, tom lane
On Wed, Aug 27, 2008 at 10:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > For pg_dumpall, you shouldn't even have created the > databases, just start from a virgin installation. Should I have CREATEd the db at least without the table structure? I dropped the database locally. Entirely. Gone. Then I tried this: $ pg_restore -h localhost mydb.sql This tells me: pg_restore: [archiver] input file does not appear to be a valid archive What gives?
"Phoenix Kiula" <phoenix.kiula@gmail.com> writes: > Then I tried this: > $ pg_restore -h localhost mydb.sql > This tells me: > pg_restore: [archiver] input file does not appear to be a valid archive No, you were right the first time: just feed the dump script to psql. regards, tom lane
On Wed, Aug 27, 2008 at 10:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > No, you were right the first time: just feed the dump script to psql. > Ok. Tried that. Because there's no database, I have to execute the psql command as "postgres" otherwise it doesn't work. So here's my command: $ psql -d postgres -U postgres -h localhost < mydb.sql This seems to be it. This is what I should be executing, except that it spews out many errors, like: ERROR: invalid byte sequence for encoding "UTF8": 0x80 This prevents my main table from being copied - Why can't the dump and the restore just copy the file as-is, including the encoding and such? What am I not doing right? Thanks
On Wed, Aug 27, 2008 at 10:40:41PM +0800, Phoenix Kiula wrote: > Thanks. But there seems to be a tangible slowdown of DB operations > during the time that pg_dump is running. Yes. Pg_dump copies all the data out, so it puts load on your database and disks. A -- Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/
On Wed, Aug 27, 2008 at 11:20:55PM +0800, Phoenix Kiula wrote: > This seems to be it. This is what I should be executing, except that > it spews out many errors, like: > > ERROR: invalid byte sequence for encoding "UTF8": 0x80 You have bad data in your database. Apparently, you have an encoding of UTF-8, but you have data in there that's not UTF-8 data. I'll bet your other encoding is SQL_ASCII. > This prevents my main table from being copied - Why can't the dump and > the restore just copy the file as-is, including the encoding and such? > What am I not doing right? My bet is that you did initdb on one system with a locale of C and on another with a locale of utf-8 (somehow). You can use pg_controldata to find out: run it against the data areas on each system. If I'm right, then you probably want to run initidb again on the target system. A -- Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/
In response to "Phoenix Kiula" <phoenix.kiula@gmail.com>: > On Wed, Aug 27, 2008 at 10:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > > No, you were right the first time: just feed the dump script to psql. > > > > Ok. Tried that. Because there's no database, I have to execute the > psql command as "postgres" otherwise it doesn't work. > > So here's my command: > > $ psql -d postgres -U postgres -h localhost < mydb.sql > > This seems to be it. This is what I should be executing, except that > it spews out many errors, like: > > ERROR: invalid byte sequence for encoding "UTF8": 0x80 In psql, do a \l on both the original and the DB you're restoring to. The encoding should be the same, if it's not, then fix that before restoring. If you're using pg_dump (which you don't mention ... it's getting difficult to follow what you're doing in this thread) then you can also use the -C option to have pg_dump add a CREATE DATABASE statement to the dump file for you. If you actually want to switch database encodings, that's an entirely different question thread unto itself. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
Ok. Here is what it is. OLD DB: Since 2002. May contain non-UTF8 data. But I thought I had modified it all when I changed it to UTF-8 pgsql database (it was originally Mysql). The database works very well on a very busy website. Everything on that website is now UTF-8. I wish to mirror this database locally on my home machine. Server is linux with 6GB ram and pgsql 8.2.3, home machine is osx with imac of 4GM ram and 8.3.3 (new pg). NEW DB: Just installed fresh. There's nothing in it. I can basically wipe all data out. WHAT I AM DOING: 1. On the server, I am executing "pg_dumpall > mydb.out". Simple. 2. FTP that mydb.out file to local home machine. 3. Here, locally, I do this: "psql -h localhost -d postgres -U postgres -f mydb.out". This is what gives me the error. MY QUESTION: What can I do to: (a) Make sure all data on pgsql on server (OLD DB) is utf-8. Is there a command I can execute to convert all data? I thought I had converted it all to utf-8 using PHP sometime ago, which went through each and every row and column! (b) Once that data is utf8-ed, how can I bring it home and have a mirror of the db. Thanks.
On Wed, Aug 27, 2008 at 11:54:27PM +0800, Phoenix Kiula wrote: > OLD DB: > Since 2002. May contain non-UTF8 data. But I thought I had modified it > all when I changed it to UTF-8 pgsql database (it was originally > Mysql). The database works very well on a very busy website. > Everything on that website is now UTF-8. I wish to mirror this > database locally on my home machine. Server is linux with 6GB ram and > pgsql 8.2.3, home machine is osx with imac of 4GM ram and 8.3.3 (new > pg). Somehow, you're getting non-UTF-8 chars in there, either because your conversion didn't work, or because there's still bugs in your application that send non-UTF-8 data. If your database encoding is not UTF-8, then it is possible to get non-UTF-8 data anyway. That's why people asked about the database encoding. SQL_ASCII, please note, does not enforce that you're in the bottom 7 bits: it'll take anything you put in there. So if someone put (say) ISO 8859-1 in, you'll get in trouble. > WHAT I AM DOING: > 1. On the server, I am executing "pg_dumpall > mydb.out". Simple. > 2. FTP that mydb.out file to local home machine. > 3. Here, locally, I do this: "psql -h localhost -d postgres -U > postgres -f mydb.out". This is what gives me the error. Right. So the file includes data that doesn't match the encoding of the target database. AFAIR -- and my memory's not what it used to be, so check the release notes -- the UTF-8 checking in 8.2 was as good as in 8.3. One good test of this would be to install 8.2 on your home machine, and try restoring that too. If it works, then we know more. > MY QUESTION: > What can I do to: > (a) Make sure all data on pgsql on server (OLD DB) is utf-8. Is there > a command I can execute to convert all data? I thought I had converted > it all to utf-8 using PHP sometime ago, which went through each and > every row and column! The usual advice is to use iconv. Your Mac should have it installed. > (b) Once that data is utf8-ed, how can I bring it home and have a > mirror of the db. If you run iconv on the data dump before you load it, then it should work. This is not a trivial job, however. A -- Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/
On Wed, Aug 27, 2008 at 8:40 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote: > Thanks. But there seems to be a tangible slowdown of DB operations > during the time that pg_dump is running. Perhaps some of my > postgresql.conf variables are not geared to this. (I vaguely recall > setting it up so that the wal_buffers or checkpoint_segments needed to > be set upwards for maintenance tasks). > > My question: is it possible to interactively set up these variables so > that pg_dumpall can work very fast? And behind the scenes, without > slowing stuff down? Nope, it's not a software problem, it's a hardware problem. You're drive subsystem can only deliver data so fast. The fix is usually better hardware. Big RAID controller and lots of drives.
Hi, Phoenix Kiula wrote: > Hello. I have a database dump. With data and schema, which includes > all the constraints and rules. But it seems the pg_dumpall command > does not copy data in such a way that foreign keys are satisfied upon > restoring. Because tables are inter-related, importing them keep > giving errors and eventually no data is imported. Neither pg_dumpall > nor pg_restore seems to have a "without constraints" or "delay > constraints check" type command. What am I missing? Thanks for any > advice. Apart from disabling triggers temporarely, if you have enough space in xlog, you could try to add BEGIN work; to the start of the restore SQL and COMMIT; a the end. Regards Tino
Attachment
Is there a mechanism to check exactly which row may have characters that are not UTF8? I am trying with this kind of a command: select id from employee where modify_date between '2008-03-01' and '2008-07-01' and joint_field_of_name_etc <> convert(joint_field_of_name_etc, 'UTF8') This is of course not the best way of doing so. Is there a better system-provided way of checking for rows that may not be UTF8? Or to even have a constraint to this effect, to disallow any non-UTF8 data from getting in there? Thanks
On Thu, Aug 28, 2008 at 07:51:57AM +0800, Phoenix Kiula wrote: > system-provided way of checking for rows that may not be UTF8? Or to > even have a constraint to this effect, to disallow any non-UTF8 data > from getting in there? The latter should be in effect if your database encoding is UTF-8. What's the database encoding? A -- Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/
On 8/29/08, Andrew Sullivan <ajs@commandprompt.com> wrote: > > The latter should be in effect if your database encoding is UTF-8. > What's the database encoding? My database encoding on BOTH the sides is UTF8. The dumped DB is UTF8. The one I am importing into is also UTF8. So why is it showing me this error? Do I also have to look at stuff like "collation"? Also, how can I ensure that my dumped database on the server has only UTF8 data? What constraint or function can I use? Thanks
On Fri, Aug 29, 2008 at 01:53:28PM +0800, Phoenix Kiula wrote: > My database encoding on BOTH the sides is UTF8. The dumped DB is UTF8. Are you sure? I know people asked up-thread for the output of \l, but I haven't seen it. If your encoding is UTF-8, then I don't understand how you got non-UTF-8 characters in there, unless there's some bug in the 8.2 series of which I'm unaware. > The one I am importing into is also UTF8. So why is it showing me this > error? Do I also have to look at stuff like "collation"? You shouldn't have to. That's a constraint on sorting and such like. It doesn't control what can get into the database. Hmm. I wonder if the problem is the locale you're using for pg_dump vs. what you're using when handing the data back in. If the locale were different, it might be possible that your client_encoding was different. I have no idea if this is the case, but you might want to try it. Check your locale at the command line on each system. > Also, how can I ensure that my dumped database on the server has only > UTF8 data? What constraint or function can I use? As I said before, if the database encoding is UTF-8, then it's supposed to be _impossible_ that you get non-UTF-8 data in there. That's the whole point of having the database encoding. A -- Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/
Thanks Andrew. On the server (the DB to be dumped) everything is "UTF8". On my home server (where I would like to mirror the DB), this is the output: =# \l List of databases Name | Owner | Encoding -----------+-----------------+----------- postgres | postgres | SQL_ASCII pkiula | pkiula_pkiula | UTF8 template0 | postgres | SQL_ASCII template1 | postgres | SQL_ASCII (4 rows) This is a fresh install as you can see. The database into which I am importing ("pkiula") is in fact listed as UTF8! Is this not enough?
On Fri, Aug 29, 2008 at 10:26:14PM +0800, Phoenix Kiula wrote: > Thanks Andrew. > > On the server (the DB to be dumped) everything is "UTF8". > > On my home server (where I would like to mirror the DB), this is the output: Hrm. Well, what about the locale, as I suggested? I have no idea if it's relevant here, but give it a whack. Other than that, I'm not sure. You might want to troll the release notes to see if there was an encoding bug fixed in the intermediate releases between 8.2.whatever-you-converted-on and 8.2.current. A -- Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/
On 8/29/08, Andrew Sullivan <ajs@commandprompt.com> wrote: > > > Hrm. Well, what about the locale, as I suggested? I have no idea if How should I check for the locale?
On Fri, Aug 29, 2008 at 11:12:35PM +0800, Phoenix Kiula wrote: > > How should I check for the locale? As the user that you issue the pg_dump with on the machine you generated it, and as the user you're trying to use to restore the data, at the command line type $ locale You end up with something like this: stovepipe:~ ajs$ locale LANG="en_CA.UTF-8" LC_COLLATE="en_CA.UTF-8" LC_CTYPE="en_CA.UTF-8" LC_MESSAGES="en_CA.UTF-8" LC_MONETARY="en_CA.UTF-8" LC_NUMERIC="en_CA.UTF-8" LC_TIME="en_CA.UTF-8" LC_ALL= I can't remember whether the client encoding is going to be affected by this during dump and restore, and whether in these versions that is captured in the dump file. A -- Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/
On Friday 29 August 2008 8:12:35 am Phoenix Kiula wrote: > On 8/29/08, Andrew Sullivan <ajs@commandprompt.com> wrote: > > Hrm. Well, what about the locale, as I suggested? I have no idea if > > How should I check for the locale? On Linux: aklaver@tucker:~$ locale LANG=en_US.UTF-8 LANGUAGE=en LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= -- Adrian Klaver aklaver@comcast.net
Thanks so much! I think we may be on to something: On the pg_dump machine: LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= On the pg_restore machine: LANG= LC_COLLATE="C" LC_CTYPE="UTF-8" LC_MESSAGES="C" LC_MONETARY="C" LC_NUMERIC="C" LC_TIME="C" LC_ALL= So how can I change the locale on my local OSX machine? This could be it. Thanks!
On Fri, Aug 29, 2008 at 11:29:55PM +0800, Phoenix Kiula wrote: > So how can I change the locale on my local OSX machine? This could be > it. Thanks! Before you start restoring, put a LANG environment variable. So something like LANG=en_US.UTF-8 psql or else LANG=en_US.UTF-8; export LANG [other stuff] A -- Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/
On Friday 29 August 2008 8:29:55 am Phoenix Kiula wrote: > Thanks so much! I think we may be on to something: > > > On the pg_dump machine: > > LANG=en_US.UTF-8 > LC_CTYPE="en_US.UTF-8" > LC_NUMERIC="en_US.UTF-8" > LC_TIME="en_US.UTF-8" > LC_COLLATE="en_US.UTF-8" > LC_MONETARY="en_US.UTF-8" > LC_MESSAGES="en_US.UTF-8" > LC_PAPER="en_US.UTF-8" > LC_NAME="en_US.UTF-8" > LC_ADDRESS="en_US.UTF-8" > LC_TELEPHONE="en_US.UTF-8" > LC_MEASUREMENT="en_US.UTF-8" > LC_IDENTIFICATION="en_US.UTF-8" > LC_ALL= > > > > On the pg_restore machine: > > > LANG= > LC_COLLATE="C" > LC_CTYPE="UTF-8" > LC_MESSAGES="C" > LC_MONETARY="C" > LC_NUMERIC="C" > LC_TIME="C" > LC_ALL= > > > > So how can I change the locale on my local OSX machine? This could be > it. Thanks! Might also be useful to do SHOW ALL from within the database cluster you have on your restore machine. Will show what choices initdb made. test=# SHOW all; <Snip> lc_collate | en_US.UTF-8 lc_ctype | en_US.UTF-8 lc_messages | en_US.UTF-8 lc_monetary | en_US.UTF-8 lc_numeric | en_US.UTF-8 lc_time | en_US.UTF-8 <Snip> -- Adrian Klaver aklaver@comcast.net
On 8/30/08, Adrian Klaver <aklaver@comcast.net> wrote: > > Might also be useful to do SHOW ALL from within the database cluster you have > on your restore machine. Will show what choices initdb made. > > test=# SHOW all; > <Snip> > lc_collate | en_US.UTF-8 > lc_ctype | en_US.UTF-8 > lc_messages | en_US.UTF-8 > lc_monetary | en_US.UTF-8 > lc_numeric | en_US.UTF-8 > lc_time | en_US.UTF-8 > <Snip> Thanks. I did this and LC_COLLATE etc have the values "C". How can I change this to UTF8?
-------------- Original message ---------------------- From: "Phoenix Kiula" <phoenix.kiula@gmail.com> > On 8/30/08, Adrian Klaver <aklaver@comcast.net> wrote: > > > > Might also be useful to do SHOW ALL from within the database cluster you have > > on your restore machine. Will show what choices initdb made. > > > > test=# SHOW all; > > <Snip> > > lc_collate | en_US.UTF-8 > > lc_ctype | en_US.UTF-8 > > lc_messages | en_US.UTF-8 > > lc_monetary | en_US.UTF-8 > > lc_numeric | en_US.UTF-8 > > lc_time | en_US.UTF-8 > > <Snip> > > > > Thanks. I did this and LC_COLLATE etc have the values "C". How can I > change this to UTF8? > You have to start from scratch. LC_COLLATE is set at the initdb stage and cannot be changed. From the manual- http://www.postgresql.org/docs/8.2/interactive/multibyte.html " Important: Although you can specify any encoding you want for a database, it is unwise to choose an encoding that is notwhat is expected by the locale you have selected. The LC_COLLATE and LC_CTYPE settings imply a particular encoding, andlocale-dependent operations (such as sorting) are likely to misinterpret data that is in an incompatible encoding. Since these locale settings are frozen by initdb, the apparent flexibility to use different encodings in different databasesof a cluster is more theoretical than real. It is likely that these mechanisms will be revisited in future versionsof PostgreSQL. One way to use multiple encodings safely is to set the locale to C or POSIX during initdb, thus disabling any real localeawareness. " From above the presence of a C locale should not be causing a problem. In fact when I try something similar on my machineI do not have any problems, but I am working on a Linux box. There have been previous postings dealing with locale/encodingissues on OS X. I don't know if a 'answer' was found. You might want to start a new thread 'OS X encoding/locale'to attract the attention of those that have dealt with the problem before. -- Adrian Klaver aklaver@comcast.net
Phoenix Kiula wrote: > Thanks Andrew. > > On the server (the DB to be dumped) everything is "UTF8". > > On my home server (where I would like to mirror the DB), this is the output: > > > =# \l > List of databases > Name | Owner | Encoding > -----------+-----------------+----------- > postgres | postgres | SQL_ASCII > pkiula | pkiula_pkiula | UTF8 > template0 | postgres | SQL_ASCII > template1 | postgres | SQL_ASCII > (4 rows) > > > > This is a fresh install as you can see. The database into which I am > importing ("pkiula") is in fact listed as UTF8! Is this not enough? > You said you're getting these errors: ERROR: invalid byte sequence for encoding "UTF8": 0x80 those 0x80 bytes are inside the mydb.sql file, you may find it easier to look for them there and identify the offending string(s). Try (on the linux machine): zcat mydb.sql.gz | iconv -f utf8 > /dev/null should tell you something like: illegal input sequence at position xxx BTW, 0x80 is usually found in windows encoding, such as windows-1250, where it stands for the EURO symbol: echo -n "€" | iconv -t windows-1250 | hexdump -C 00000000 80 |.| 00000001 FYI, you *can* get non UTF-8 data from an UTF-8 database, if (and only if) your client encoding is something different (either because you explicitly set it so, or because of your client defaults). Likewise, you can insert non UTF-8 data (such as your mydb.sql) into an UTF-8 database, provided you set your client encoding accordingly. PostgreSQL clients handle encoding conversions, but there's no way to guess (reliabily) the encoding of a text file. OTOH, from a SQL_ASCII database you can get all sort of data, even mixed encoding text (which you need to fix somehow). If your mydb.sql contains data from a SQL_ASCII database, you simply know nothing about the encoding. I have seen SQL_ASCII databases containg data inserted from HTTP forms, both in UTF-8 and windows-1250 encoding. Displaying, dumping, restoring that correctly is impossible, you need to fix it somehow before processing it as text. .TM.