Thread: Adding an ignore list to pg_restore
Hi PostgreSQL developers! On [1], Stephen and I are currently discussing how to provide seamless automatic version upgrades of PostgreSQL databases with third party modules like PostGIS. The core problem is that we want to not restore objects (mainly tables) in the destination database which already exist. pg_restore currently offers the -L option to selectively restore only particular objects, so our original idea was to automatically create this list based on the output of pg_dump -Fc --schema-only $db | pg_restore -l for the original and target databases. However, there is a fundamental problem with this approach. When using --schema-only, the generated list naturally does not contain TABLE DATA entries. This means that we cannot figure out the catalog id of the DATA object. Of course there are workarounds [2], but both of them are inefficient. So my current idea is to add a new option --ignore-list to pg_restore which specifies a file with objects that should not be restored. This file should not contain catalog IDs, but human readable data: type schema tag e. g. if we know that we don't want to restore the public.foo table, then this file would contain TABLE public foo TABLE DATA public foo I would be willing to provide an implementation, but before I wanted to ask if this feature is something you would generally accept, or regard as totally crackful? Another way would be to add an option which specifically ignores objects that are already present in the target database. This would be more robust and probably easier to implement, but less general than the ignore list. Thank you in advance for any comment, Martin [1] http://bugs.debian.org/351571 [2] (1) run pg_dump without --schema-only twice (once for pg_restore -l, second time for actual restoration), or (2)save pg_dump output into a temporary file -- Martin Pitt http://www.piware.de Ubuntu Developer http://www.ubuntu.com Debian Developer http://www.debian.org In a world without walls and fences, who needs Windows and Gates?
Martin Pitt <mpitt@debian.org> writes: > The core problem is that we want to not restore objects (mainly > tables) in the destination database which already exist. Why is this a problem? It's already the default behavior --- the creation commands fail but pg_restore keeps going. regards, tom lane
Hi Tom! Tom Lane [2006-02-18 13:32 -0500]: > Martin Pitt <mpitt@debian.org> writes: > > The core problem is that we want to not restore objects (mainly > > tables) in the destination database which already exist. > > Why is this a problem? It's already the default behavior --- the > creation commands fail but pg_restore keeps going. The problem is that pg_restore would restore the TABLE DATA object, although we don't want that (the postgis specific tables are pre-populated by PostGIS itself, and should not be altered by the upgrade. (Stephen knows the details of PostGIS). Thanks, Martin -- Martin Pitt http://www.piware.de Ubuntu Developer http://www.ubuntu.com Debian Developer http://www.debian.org In a world without walls and fences, who needs Windows and Gates?
Martin Pitt <mpitt@debian.org> writes: > Tom Lane [2006-02-18 13:32 -0500]: >> Martin Pitt <mpitt@debian.org> writes: >>> The core problem is that we want to not restore objects (mainly >>> tables) in the destination database which already exist. >> >> Why is this a problem? It's already the default behavior --- the >> creation commands fail but pg_restore keeps going. > The problem is that pg_restore would restore the TABLE DATA object, > although we don't want that (the postgis specific tables are > pre-populated by PostGIS itself, and should not be altered by the > upgrade. Hm. Rather than a variant of the -L facility (which is hard to use, and I don't see your proposal being much easier), maybe what's wanted is just a flag saying "don't try to restore data into any table whose creation command fails". Maybe that should even be the default ... and you could extend it to indexes and constraints on such tables too, as those would likely end up being duplicated as well. regards, tom lane
Hi Tom! Tom Lane [2006-02-18 14:34 -0500]: > Hm. Rather than a variant of the -L facility (which is hard to use, > and I don't see your proposal being much easier), maybe what's wanted > is just a flag saying "don't try to restore data into any table whose > creation command fails". Maybe that should even be the default ... > and you could extend it to indexes and constraints on such tables too, > as those would likely end up being duplicated as well. This comes close to my alternative proposal, it sounds fine to me. I'll try to come up with a reasonably clean implementation and report back then. Thank you, and have a nice Sunday, Martin -- Martin Pitt http://www.piware.de Ubuntu Developer http://www.ubuntulinux.org Debian Developer http://www.debian.org
Hi again, Tom Lane [2006-02-18 14:34 -0500]: > >>> The core problem is that we want to not restore objects (mainly > >>> tables) in the destination database which already exist. > >> > >> Why is this a problem? It's already the default behavior --- the > >> creation commands fail but pg_restore keeps going. > > > The problem is that pg_restore would restore the TABLE DATA object, > > although we don't want that (the postgis specific tables are > > pre-populated by PostGIS itself, and should not be altered by the > > upgrade. > > Hm. Rather than a variant of the -L facility (which is hard to use, > and I don't see your proposal being much easier), maybe what's wanted > is just a flag saying "don't try to restore data into any table whose > creation command fails". Maybe that should even be the default ... > and you could extend it to indexes and constraints on such tables too, > as those would likely end up being duplicated as well. My first stab at this is a patch which only does the minimal changes, just to get me going. If the restoration of a TABLE object fails, it marks the corresponding TABLE DATA object as to be ignored. Do you think the current patch is a valid approach? Since this changes the behaviour of pg_restore, this should probably become an option, e. g. -D / --ignore-existing-table-data. I'll do this if you agree to the principle of the current patch. For convenience, I wrote a small test script which demonstrates the behaviour. The table 'userdata' should be restored, while the table 'auxdata' is already present in the destination db, and its contents should not be modified. Output with pg_restore from 8.1.3: ------------------- snip ------------------------ $ LC_ALL=C sudo -u postgres ./test-pg_restore-existing.sh === create empty databases === === populating old database === === pre-creating auxdata in new database === === restoring old to new === pg_restore: connecting to database for restore pg_restore: creating SCHEMA public pg_restore: creating COMMENT SCHEMA public pg_restore: creating TABLE auxdata pg_restore: [archiver (db)] Error while PROCESSING TOC: pg_restore: [archiver (db)] Error from TOC entry 1183; 1259 17184 TABLE auxdata postgres pg_restore: [archiver (db)] could not execute query: FEHLER: Relation »auxdata« existiert bereits Command was: CREATE TABLE auxdata ( x integer ); pg_restore: creating TABLE userdata pg_restore: restoring data for table "auxdata" pg_restore: restoring data for table "userdata" pg_restore: setting owner and privileges for SCHEMA public pg_restore: setting owner and privileges for COMMENT SCHEMA public pg_restore: setting owner and privileges for ACL public pg_restore: setting owner and privileges for TABLE auxdata pg_restore: setting owner and privileges for TABLE userdata WARNING: errors ignored on restore: 1 pg_restore failed with 1 === new/userdata: === 42 256 === new/auxdata: === -1 -2 1 2 ------------------- snip ------------------------ Output with patched pg_restore: ------------------- snip ------------------------ $ LC_ALL=C sudo -u postgres ./test-pg_restore-existing.sh === create empty databases === === populating old database === === pre-creating auxdata in new database === === restoring old to new === pg_restore: connecting to database for restore pg_restore: creating SCHEMA public pg_restore: creating COMMENT SCHEMA public pg_restore: creating TABLE auxdata pg_restore: [archiver (db)] Error while PROCESSING TOC: pg_restore: [archiver (db)] Error from TOC entry 1183; 1259 17194 TABLE auxdata postgres pg_restore: [archiver (db)] could not execute query: FEHLER: Relation »auxdata« existiert bereits Command was: CREATE TABLE auxdata ( x integer ); pg_restore: table auxdata could not be created, will not restore its data pg_restore: creating TABLE userdata pg_restore: restoring data for table "userdata" pg_restore: setting owner and privileges for SCHEMA public pg_restore: setting owner and privileges for COMMENT SCHEMA public pg_restore: setting owner and privileges for ACL public pg_restore: setting owner and privileges for TABLE auxdata pg_restore: setting owner and privileges for TABLE userdata WARNING: errors ignored on restore: 1 pg_restore failed with 1 === new/userdata: === 42 256 === new/auxdata: === -1 -2 ------------------- snip ------------------------ Thus, with the patch, auxdata is not restored (which produced the additional entries '1' and '2'). Thanks, Martin -- Martin Pitt http://www.piware.de Ubuntu Developer http://www.ubuntu.com Debian Developer http://www.debian.org In a world without walls and fences, who needs Windows and Gates?
Attachment
Hi again, Meh, the list server didn't like the attached test script, so I put it here: http://people.debian.org/~mpitt/test-pg_restore-existing.sh Martin -- Martin Pitt http://www.piware.de Ubuntu Developer http://www.ubuntu.com Debian Developer http://www.debian.org In a world without walls and fences, who needs Windows and Gates?
Hi again, Martin Pitt [2006-02-19 14:39 +0100]: > Since this changes the behaviour of pg_restore, this should probably > become an option, e. g. -D / --ignore-existing-table-data. I'll do > this if you agree to the principle of the current patch. I improved the patch now to only ignore TABLE DATA for existing tables if '-X ignore-existing-tables' is specified. I also updated the documentation. Since this doesn't change the default behaviour now any more, I would like to put this patch into the Debian packages to provide automatic upgrades for PostGIS-enabled databases (see [1]). Does anyone object to this? Do you consider to adopt this upstream? Thanks in advance, and have a nice weekend! Martin [1] http://bugs.debian.org/351571 -- Martin Pitt http://www.piware.de Ubuntu Developer http://www.ubuntu.com Debian Developer http://www.debian.org In a world without walls and fences, who needs Windows and Gates?
Attachment
Martin Pitt wrote: > Hi again, > > Martin Pitt [2006-02-19 14:39 +0100]: > > Since this changes the behaviour of pg_restore, this should probably > > become an option, e. g. -D / --ignore-existing-table-data. I'll do > > this if you agree to the principle of the current patch. > > I improved the patch now to only ignore TABLE DATA for existing tables > if '-X ignore-existing-tables' is specified. I also updated the > documentation. Is this really an appropiate description for the behavior? What happens if the table is not created for some other reason? Consider for example a table using a datatype that couldn't be created. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Martin Pitt <mpitt@debian.org> writes: > Martin Pitt [2006-02-19 14:39 +0100]: >> Since this changes the behaviour of pg_restore, this should probably >> become an option, e. g. -D / --ignore-existing-table-data. I'll do >> this if you agree to the principle of the current patch. > I improved the patch now to only ignore TABLE DATA for existing tables > if '-X ignore-existing-tables' is specified. I also updated the > documentation. This patch is unbelievably ugly and probably vulnerable to coredumps. Please use a cleaner way of disabling the subsequent load than tromping all over the TOC datastructure, ie, not this: > + strcpy (tes->desc, "IGNOREDATA"); BTW, I'm pretty sure it fails for tables with same names in different schemas, too. regards, tom lane
I will clean it up before applying. Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --------------------------------------------------------------------------- Martin Pitt wrote: -- Start of PGP signed section. > Hi again, > > Martin Pitt [2006-02-19 14:39 +0100]: > > Since this changes the behaviour of pg_restore, this should probably > > become an option, e. g. -D / --ignore-existing-table-data. I'll do > > this if you agree to the principle of the current patch. > > I improved the patch now to only ignore TABLE DATA for existing tables > if '-X ignore-existing-tables' is specified. I also updated the > documentation. > > Since this doesn't change the default behaviour now any more, I would > like to put this patch into the Debian packages to provide automatic > upgrades for PostGIS-enabled databases (see [1]). Does anyone object > to this? > > Do you consider to adopt this upstream? > > Thanks in advance, and have a nice weekend! > > Martin > > [1] http://bugs.debian.org/351571 > > -- > Martin Pitt http://www.piware.de > Ubuntu Developer http://www.ubuntu.com > Debian Developer http://www.debian.org > > In a world without walls and fences, who needs Windows and Gates? [ Attachment, skipping... ] -- End of PGP section, PGP failed! -- Bruce Momjian http://candle.pha.pa.us SRA OSS, Inc. http://www.sraoss.com + If your life is a hard drive, Christ can be your backup. +
Hi all, thanks for the feedback. I updated the patch now. Alvaro Herrera [2006-02-25 13:47 -0300]: > > I improved the patch now to only ignore TABLE DATA for existing tables > > if '-X ignore-existing-tables' is specified. I also updated the > > documentation. > > Is this really an appropiate description for the behavior? What happens > if the table is not created for some other reason? Consider for example > a table using a datatype that couldn't be created. Right. However, if the table is not present at all, then it makes even less sense to attempt to restore its data. Therefore I consider this mainly a documentation issue. I changed the option to -X no-data-for-failed-tables and described it as By default, table data objects are restored even if the associated table could not be successfully created (e. g. because it already exists). [...] Tom Lane [2006-02-25 12:18 -0500]: > Martin Pitt <mpitt@debian.org> writes: > > Martin Pitt [2006-02-19 14:39 +0100]: > >> Since this changes the behaviour of pg_restore, this should probably > >> become an option, e. g. -D / --ignore-existing-table-data. I'll do > >> this if you agree to the principle of the current patch. > > > I improved the patch now to only ignore TABLE DATA for existing tables > > if '-X ignore-existing-tables' is specified. I also updated the > > documentation. > > This patch is unbelievably ugly and probably vulnerable to coredumps. > Please use a cleaner way of disabling the subsequent load than tromping > all over the TOC datastructure, ie, not this: > > > + strcpy (tes->desc, "IGNOREDATA"); It should not segfault, but I agree that this is a bit hackish. The updated patch completely removes the TABLE DATA node from the linked list. It does not free its memory, though; I did not find a free_tocentry() or similar function. However, pg_restore is no daemon, and without the new option the memory would be allocated, too, so it does not make much difference. Can anyone give me a hint how to properly free the struct? > BTW, I'm pretty sure it fails for tables with same names in different > schemas, too. Right, sorry for that. I fixed that, too. Bruce Momjian [2006-02-28 19:54 -0500]: > I will clean it up before applying. Thank you. I hope the updated patch makes that a little bit easier. > Your patch has been added to the PostgreSQL unapplied patches list at: > > http://momjian.postgresql.org/cgi-bin/pgpatches > > It will be applied as soon as one of the PostgreSQL committers reviews > and approves it. Great, thanks! Martin P.S. I also updated the test script to create two namespaces with identidal table names. http://people.debian.org/~mpitt/test-pg_restore-existing.sh -- Martin Pitt http://www.piware.de Ubuntu Developer http://www.ubuntu.com Debian Developer http://www.debian.org In a world without walls and fences, who needs Windows and Gates?