Thread: Avoid duplicated rows when restoring data from pg_dumpall ??
Avoid duplicated rows when restoring data from pg_dumpall ??
From
Pablo Alonso-Villaverde Roza
Date:
Hi all,
I'm getting duplicated rows in some of my tables when I try to restore data from a dump file generated with 'pg_dumpall'.
I use:
pg_dumpall -c -U postgres > mybackup.sql
to dump all the data, roles, etc. into a file.
Then I try to restore the data using:
psql -U postgres -f mybackup.sql
The only "way" I have found to solve this problem is deleting my database before restoring the data, so everything is 're-created' in the restore process without generating duplicated rows.
I thought, that the "-c" flag on pg_dumpall would force a DROP of any previous data structures on the server but...it looks like it doesn't work as I expected and ...when I restore data I get duplicated rows.
How can I get a "clean" restore automatically? I mean, getting exactly the same data there was on the server at the moment I made the 'pg_dumpall'
Excuse me for my poor English, by the way.
Thanks in advance,
I'm getting duplicated rows in some of my tables when I try to restore data from a dump file generated with 'pg_dumpall'.
I use:
pg_dumpall -c -U postgres > mybackup.sql
to dump all the data, roles, etc. into a file.
Then I try to restore the data using:
psql -U postgres -f mybackup.sql
The only "way" I have found to solve this problem is deleting my database before restoring the data, so everything is 're-created' in the restore process without generating duplicated rows.
I thought, that the "-c" flag on pg_dumpall would force a DROP of any previous data structures on the server but...it looks like it doesn't work as I expected and ...when I restore data I get duplicated rows.
How can I get a "clean" restore automatically? I mean, getting exactly the same data there was on the server at the moment I made the 'pg_dumpall'
Excuse me for my poor English, by the way.
Thanks in advance,
Pablo Alonso-Villaverde Roza <pavroza@gmail.com> writes: > I'm getting duplicated rows in some of my tables when I try to restore data > from a dump file generated with 'pg_dumpall'. Probably all of them, actually ... > The only "way" I have found to solve this problem is deleting my database > before restoring the data, so everything is 're-created' in the restore > process without generating duplicated rows. A pg_dumpall script expects to be restored into an empty cluster. This is not a bug. > I thought, that the "-c" flag on pg_dumpall would force a DROP of any > previous data structures on the server but...it looks like it doesn't work > as I expected and ...when I restore data I get duplicated rows. The -c flag should cause the script to DROP all your databases first. But that switch has been known to have bugs in the past, and in any case it could fail if there are other sessions connected to those databases. Did you pay attention to whether the restore reported any errors? What PG version is this? regards, tom lane
Re: Avoid duplicated rows when restoring data from pg_dumpall ??
From
Pablo Alonso-Villaverde Roza
Date:
Hi Tom, thanks for your fast reply,
- As regards the duplicated rows, no, I don't get duplicated rows in all the tables stored in the database because
some tables have primary-keys (and/or UNIQUE) constraints. These constraints don't allow the restore process to
duplicate rows. In fact, it is a kind of "solution" I've tried...(add an extra column with a primary-key or unique constraint, to this tables), and it "works". The restore process doesn't generate duplicated rows, because the constraint does not allow the insertion
of new duplicated data. Anyway..it looks like a kind of 'poor solution' :-)
- Ok, thanks for the info, I thought pg_dumpall would work as I desired even on non-empty clusters.
Of course...if there is no previous data, the restore process will never create duplicated rows.
- Yes, the restore process generates errors, because it tries to re-generate data structures that exist in the database server
at that moment. Even if I delete my own databases, some errors will appear (because I cannot delete the internal stuff of
of the server -> the 'postgres' database for example) . Those errors, could be ignored in most cases I think, but perhaps create a kind of "bad feeling" about the result of the restore process, or can "hide" other more important errors when you get a huge ammount of info on the screen either.
- I forgot to mention it. I'm using PostgreSQL 8.3 on Windows xp.
Thanks a lot for your advices,
Regards, Pablo
- As regards the duplicated rows, no, I don't get duplicated rows in all the tables stored in the database because
some tables have primary-keys (and/or UNIQUE) constraints. These constraints don't allow the restore process to
duplicate rows. In fact, it is a kind of "solution" I've tried...(add an extra column with a primary-key or unique constraint, to this tables), and it "works". The restore process doesn't generate duplicated rows, because the constraint does not allow the insertion
of new duplicated data. Anyway..it looks like a kind of 'poor solution' :-)
- Ok, thanks for the info, I thought pg_dumpall would work as I desired even on non-empty clusters.
Of course...if there is no previous data, the restore process will never create duplicated rows.
- Yes, the restore process generates errors, because it tries to re-generate data structures that exist in the database server
at that moment. Even if I delete my own databases, some errors will appear (because I cannot delete the internal stuff of
of the server -> the 'postgres' database for example) . Those errors, could be ignored in most cases I think, but perhaps create a kind of "bad feeling" about the result of the restore process, or can "hide" other more important errors when you get a huge ammount of info on the screen either.
- I forgot to mention it. I'm using PostgreSQL 8.3 on Windows xp.
Thanks a lot for your advices,
Regards, Pablo
2009/8/24 Tom Lane <tgl@sss.pgh.pa.us>
Pablo Alonso-Villaverde Roza <pavroza@gmail.com> writes:Probably all of them, actually ...
> I'm getting duplicated rows in some of my tables when I try to restore data
> from a dump file generated with 'pg_dumpall'.A pg_dumpall script expects to be restored into an empty cluster. This
> The only "way" I have found to solve this problem is deleting my database
> before restoring the data, so everything is 're-created' in the restore
> process without generating duplicated rows.
is not a bug.The -c flag should cause the script to DROP all your databases first.
> I thought, that the "-c" flag on pg_dumpall would force a DROP of any
> previous data structures on the server but...it looks like it doesn't work
> as I expected and ...when I restore data I get duplicated rows.
But that switch has been known to have bugs in the past, and in any case
it could fail if there are other sessions connected to those databases.
Did you pay attention to whether the restore reported any errors?
What PG version is this?
regards, tom lane
Hi Pablo > - As regards the duplicated rows, no, I don't get duplicated rows in > all the tables stored in the database because > some tables have primary-keys (and/or UNIQUE) constraints. These > constraints don't allow the restore process to > duplicate rows. In fact, it is a kind of "solution" I've tried...(add > an extra column with a primary-key or unique constraint, to this > tables), and it "works". The restore process doesn't generate > duplicated rows, because the constraint does not allow the insertion > of new duplicated data. Anyway..it looks like a kind of 'poor > solution' :-) Yeah, that's one solution, only trouble being if the data in the existing table is different to what's in the restore script (for a record with the same ID) it won't be updated. e.g.: in this example your restored database will be inconsistent with the backup. your table: Field1 = 1 (ID), Field2 = A, Field3 = B restore script: Field1 = 1 (ID), Field2 = B, Field3 = B > > - Ok, thanks for the info, I thought pg_dumpall would work as I > desired even on non-empty clusters. > Of course...if there is no previous data, the restore process will > never create duplicated rows. Exactly. If you're looking for some form of replication (i.e. master-to-slave) look at Slony - it fires triggers on the master that insert data into the slave. It has its limitations but AFAIK it's a workable solution. > > - Yes, the restore process generates errors, because it tries to > re-generate data structures that exist in the database server > at that moment. Even if I delete my own databases, some errors will > appear (because I cannot delete the internal stuff of > of the server -> the 'postgres' database for example) . Those errors, > could be ignored in most cases I think, but perhaps create a kind of > "bad feeling" about the result of the restore process, or can "hide" > other more important errors when you get a huge ammount of info on the > screen either. You can delete the "postgres" database - it's an empty database that's created when the server is initialised so you've got something to connect to. It's safe to delete, as long as you have another database you can connect to, but there's no real reason to unless it's in your restore script (e.g. from pg_dumpall.) http://www.postgresql.org/docs/8.3/static/manage-ag-templatedbs.html Regards, Andy
Re: Avoid duplicated rows when restoring data from pg_dumpall ??
From
Pablo Alonso-Villaverde Roza
Date:
Hi Andy,
First of all, thank you very much for your advices,
Ah, ok, I thought it was NOT possible to delete postgres because I was not allowed to do it from pgAdmin.
Thanks again,
Regards Pablo
First of all, thank you very much for your advices,
2009/8/26 Andy Shellam <andy-lists@networkmail.eu>
No, I was not looking for a replication system. Just a "simple" method to backup and restore data for users with very very low knowledge about computers administration and zero knowledge about postgreSQL. I'll not be able to spend much time with the users, so they should be able to make administration tasks without much supervision, and I was looking for a kind of "fire-and-forget" procedure Something they could do without many details to worry about.
Hi PabloYeah, that's one solution, only trouble being if the data in the existing table is different to what's in the restore script (for a record with the same ID) it won't be updated.- As regards the duplicated rows, no, I don't get duplicated rows in all the tables stored in the database because
some tables have primary-keys (and/or UNIQUE) constraints. These constraints don't allow the restore process to
duplicate rows. In fact, it is a kind of "solution" I've tried...(add an extra column with a primary-key or unique constraint, to this tables), and it "works". The restore process doesn't generate duplicated rows, because the constraint does not allow the insertion
of new duplicated data. Anyway..it looks like a kind of 'poor solution' :-)
e.g.: in this example your restored database will be inconsistent with the backup.
your table: Field1 = 1 (ID), Field2 = A, Field3 = B
restore script: Field1 = 1 (ID), Field2 = B, Field3 = B
Yes!, you're right, that could be a serious problem...some modified data will keep the same and not restored to the status it had
when the backup was made. Damn...
I've tought about a "heavy-method" solution.....make an "dumb-proof" utility to backup and restore the 'data' directory of the PostgresSQL server. It is not considered as good as a SQL dump, because it requires to shutdown the server, but that's not
a problem for me.
when the backup was made. Damn...
I've tought about a "heavy-method" solution.....make an "dumb-proof" utility to backup and restore the 'data' directory of the PostgresSQL server. It is not considered as good as a SQL dump, because it requires to shutdown the server, but that's not
a problem for me.
- Ok, thanks for the info, I thought pg_dumpall would work as I desired even on non-empty clusters.
Of course...if there is no previous data, the restore process will never create duplicated rows.
Exactly. If you're looking for some form of replication (i.e. master-to-slave) look at Slony - it fires triggers on the master that insert data into the slave. It has its limitations but AFAIK it's a workable solution.
No, I was not looking for a replication system. Just a "simple" method to backup and restore data for users with very very low knowledge about computers administration and zero knowledge about postgreSQL. I'll not be able to spend much time with the users, so they should be able to make administration tasks without much supervision, and I was looking for a kind of "fire-and-forget" procedure Something they could do without many details to worry about.
- Yes, the restore process generates errors, because it tries to re-generate data structures that exist in the database server
at that moment. Even if I delete my own databases, some errors will appear (because I cannot delete the internal stuff of
of the server -> the 'postgres' database for example) . Those errors, could be ignored in most cases I think, but perhaps create a kind of "bad feeling" about the result of the restore process, or can "hide" other more important errors when you get a huge ammount of info on the screen either.
You can delete the "postgres" database - it's an empty database that's created when the server is initialised so you've got something to connect to. It's safe to delete, as long as you have another database you can connect to, but there's no real reason to unless it's in your restore script (e.g. from pg_dumpall.)
http://www.postgresql.org/docs/8.3/static/manage-ag-templatedbs.html
Regards,
Andy
Ah, ok, I thought it was NOT possible to delete postgres because I was not allowed to do it from pgAdmin.
Thanks again,
Regards Pablo