Thread: Adding an ignore list to pg_restore

Adding an ignore list to pg_restore

From
Martin Pitt
Date:
Hi PostgreSQL developers!

On [1], Stephen and I are currently discussing how to provide seamless
automatic version upgrades of PostgreSQL databases with third party
modules like PostGIS.

The core problem is that we want to not restore objects (mainly
tables) in the destination database which already exist. pg_restore
currently offers the -L option to selectively restore only particular
objects, so our original idea was to automatically create this list
based on the output of
 pg_dump -Fc --schema-only $db | pg_restore -l

for the original and target databases.

However, there is a fundamental problem with this approach. When using
--schema-only, the generated list naturally does not contain TABLE
DATA entries. This means that we cannot figure out the catalog id of
the DATA object. Of course there are workarounds [2], but both of them
are inefficient.

So my current idea is to add a new option --ignore-list to pg_restore
which specifies a file with objects that should not be restored. This
file should not contain catalog IDs, but human readable data:
type schema tag

e. g. if we know that we don't want to restore the public.foo table,
then this file would contain
 TABLE public foo TABLE DATA public foo

I would be willing to provide an implementation, but before I wanted
to ask if this feature is something you would generally accept, or
regard as totally crackful?

Another way would be to add an option which specifically ignores
objects that are already present in the target database. This would be
more robust and probably easier to implement, but less general than
the ignore list.

Thank you in advance for any comment,

Martin


[1] http://bugs.debian.org/351571

[2] (1) run pg_dump without --schema-only twice (once for pg_restore   -l, second time for actual restoration), or
(2)save pg_dump output into a temporary file 

--
Martin Pitt        http://www.piware.de
Ubuntu Developer   http://www.ubuntu.com
Debian Developer   http://www.debian.org

In a world without walls and fences, who needs Windows and Gates?

Re: Adding an ignore list to pg_restore

From
Tom Lane
Date:
Martin Pitt <mpitt@debian.org> writes:
> The core problem is that we want to not restore objects (mainly
> tables) in the destination database which already exist.

Why is this a problem?  It's already the default behavior --- the
creation commands fail but pg_restore keeps going.
        regards, tom lane


Re: Adding an ignore list to pg_restore

From
Martin Pitt
Date:
Hi Tom!

Tom Lane [2006-02-18 13:32 -0500]:
> Martin Pitt <mpitt@debian.org> writes:
> > The core problem is that we want to not restore objects (mainly
> > tables) in the destination database which already exist.
>
> Why is this a problem?  It's already the default behavior --- the
> creation commands fail but pg_restore keeps going.

The problem is that pg_restore would restore the TABLE DATA object,
although we don't want that (the postgis specific tables are
pre-populated by PostGIS itself, and should not be altered by the
upgrade.

(Stephen knows the details of PostGIS).

Thanks,

Martin
--
Martin Pitt        http://www.piware.de
Ubuntu Developer   http://www.ubuntu.com
Debian Developer   http://www.debian.org

In a world without walls and fences, who needs Windows and Gates?

Re: Adding an ignore list to pg_restore

From
Tom Lane
Date:
Martin Pitt <mpitt@debian.org> writes:
> Tom Lane [2006-02-18 13:32 -0500]:
>> Martin Pitt <mpitt@debian.org> writes:
>>> The core problem is that we want to not restore objects (mainly
>>> tables) in the destination database which already exist.
>>
>> Why is this a problem?  It's already the default behavior --- the
>> creation commands fail but pg_restore keeps going.

> The problem is that pg_restore would restore the TABLE DATA object,
> although we don't want that (the postgis specific tables are
> pre-populated by PostGIS itself, and should not be altered by the
> upgrade.

Hm.  Rather than a variant of the -L facility (which is hard to use,
and I don't see your proposal being much easier), maybe what's wanted
is just a flag saying "don't try to restore data into any table whose
creation command fails".  Maybe that should even be the default ...
and you could extend it to indexes and constraints on such tables too,
as those would likely end up being duplicated as well.
        regards, tom lane


Re: Adding an ignore list to pg_restore

From
Martin Pitt
Date:
Hi Tom!

Tom Lane [2006-02-18 14:34 -0500]:
> Hm.  Rather than a variant of the -L facility (which is hard to use,
> and I don't see your proposal being much easier), maybe what's wanted
> is just a flag saying "don't try to restore data into any table whose
> creation command fails".  Maybe that should even be the default ...
> and you could extend it to indexes and constraints on such tables too,
> as those would likely end up being duplicated as well.

This comes close to my alternative proposal, it sounds fine to me.
I'll try to come up with a reasonably clean implementation and report
back then.

Thank you, and have a nice Sunday,

Martin

--
Martin Pitt              http://www.piware.de
Ubuntu Developer   http://www.ubuntulinux.org
Debian Developer        http://www.debian.org

Re: Adding an ignore list to pg_restore, prototype patch #1

From
Martin Pitt
Date:
Hi again,

Tom Lane [2006-02-18 14:34 -0500]:
> >>> The core problem is that we want to not restore objects (mainly
> >>> tables) in the destination database which already exist.
> >>
> >> Why is this a problem?  It's already the default behavior --- the
> >> creation commands fail but pg_restore keeps going.
>
> > The problem is that pg_restore would restore the TABLE DATA object,
> > although we don't want that (the postgis specific tables are
> > pre-populated by PostGIS itself, and should not be altered by the
> > upgrade.
>
> Hm.  Rather than a variant of the -L facility (which is hard to use,
> and I don't see your proposal being much easier), maybe what's wanted
> is just a flag saying "don't try to restore data into any table whose
> creation command fails".  Maybe that should even be the default ...
> and you could extend it to indexes and constraints on such tables too,
> as those would likely end up being duplicated as well.

My first stab at this is a patch which only does the minimal changes,
just to get me going. If the restoration of a TABLE object fails, it
marks the corresponding TABLE DATA object as to be ignored. Do you
think the current patch is a valid approach?

Since this changes the behaviour of pg_restore, this should probably
become an option, e. g. -D / --ignore-existing-table-data. I'll do
this if you agree to the principle of the current patch.

For convenience, I wrote a small test script which demonstrates the
behaviour. The table 'userdata' should be restored, while the table
'auxdata' is already present in the destination db, and its contents
should not be modified.

Output with pg_restore from 8.1.3:
------------------- snip ------------------------
$ LC_ALL=C sudo -u postgres ./test-pg_restore-existing.sh
=== create empty databases ===
=== populating old database ===
=== pre-creating auxdata in new database ===
=== restoring old to new ===
pg_restore: connecting to database for restore
pg_restore: creating SCHEMA public
pg_restore: creating COMMENT SCHEMA public
pg_restore: creating TABLE auxdata
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 1183; 1259 17184 TABLE auxdata postgres
pg_restore: [archiver (db)] could not execute query: FEHLER:  Relation »auxdata« existiert bereits
    Command was: CREATE TABLE auxdata (
    x integer
);
pg_restore: creating TABLE userdata
pg_restore: restoring data for table "auxdata"
pg_restore: restoring data for table "userdata"
pg_restore: setting owner and privileges for SCHEMA public
pg_restore: setting owner and privileges for COMMENT SCHEMA public
pg_restore: setting owner and privileges for ACL public
pg_restore: setting owner and privileges for TABLE auxdata
pg_restore: setting owner and privileges for TABLE userdata
WARNING: errors ignored on restore: 1
pg_restore failed with 1
=== new/userdata: ===
42
256
=== new/auxdata: ===
-1
-2
1
2
------------------- snip ------------------------

Output with patched pg_restore:
------------------- snip ------------------------
$ LC_ALL=C sudo -u postgres ./test-pg_restore-existing.sh
=== create empty databases ===
=== populating old database ===
=== pre-creating auxdata in new database ===
=== restoring old to new ===
pg_restore: connecting to database for restore
pg_restore: creating SCHEMA public
pg_restore: creating COMMENT SCHEMA public
pg_restore: creating TABLE auxdata
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 1183; 1259 17194 TABLE auxdata postgres
pg_restore: [archiver (db)] could not execute query: FEHLER:  Relation »auxdata« existiert bereits
    Command was: CREATE TABLE auxdata (
    x integer
);
pg_restore: table auxdata could not be created, will not restore its data
pg_restore: creating TABLE userdata
pg_restore: restoring data for table "userdata"
pg_restore: setting owner and privileges for SCHEMA public
pg_restore: setting owner and privileges for COMMENT SCHEMA public
pg_restore: setting owner and privileges for ACL public
pg_restore: setting owner and privileges for TABLE auxdata
pg_restore: setting owner and privileges for TABLE userdata
WARNING: errors ignored on restore: 1
pg_restore failed with 1
=== new/userdata: ===
42
256
=== new/auxdata: ===
-1
-2
------------------- snip ------------------------

Thus, with the patch, auxdata is not restored (which produced the
additional entries '1' and '2').

Thanks,

Martin

--
Martin Pitt        http://www.piware.de
Ubuntu Developer   http://www.ubuntu.com
Debian Developer   http://www.debian.org

In a world without walls and fences, who needs Windows and Gates?

Attachment

Re: Adding an ignore list to pg_restore, prototype patch #1

From
Martin Pitt
Date:
Hi again,

Meh, the list server didn't like the attached test script, so I put it
here:
 http://people.debian.org/~mpitt/test-pg_restore-existing.sh

Martin
--
Martin Pitt        http://www.piware.de
Ubuntu Developer   http://www.ubuntu.com
Debian Developer   http://www.debian.org

In a world without walls and fences, who needs Windows and Gates?

Re: Adding an ignore list to pg_restore, prototype patch #1

From
Martin Pitt
Date:
Hi again,

Martin Pitt [2006-02-19 14:39 +0100]:
> Since this changes the behaviour of pg_restore, this should probably
> become an option, e. g. -D / --ignore-existing-table-data. I'll do
> this if you agree to the principle of the current patch.

I improved the patch now to only ignore TABLE DATA for existing tables
if '-X ignore-existing-tables' is specified. I also updated the
documentation.

Since this doesn't change the default behaviour now any more, I would
like to put this patch into the Debian packages to provide automatic
upgrades for PostGIS-enabled databases (see [1]). Does anyone object
to this?

Do you consider to adopt this upstream?

Thanks in advance, and have a nice weekend!

Martin

[1] http://bugs.debian.org/351571

--
Martin Pitt        http://www.piware.de
Ubuntu Developer   http://www.ubuntu.com
Debian Developer   http://www.debian.org

In a world without walls and fences, who needs Windows and Gates?

Attachment

Re: Adding an ignore list to pg_restore, prototype p.tch #1

From
Alvaro Herrera
Date:
Martin Pitt wrote:
> Hi again,
> 
> Martin Pitt [2006-02-19 14:39 +0100]:
> > Since this changes the behaviour of pg_restore, this should probably
> > become an option, e. g. -D / --ignore-existing-table-data. I'll do
> > this if you agree to the principle of the current patch.
> 
> I improved the patch now to only ignore TABLE DATA for existing tables
> if '-X ignore-existing-tables' is specified. I also updated the
> documentation.

Is this really an appropiate description for the behavior?  What happens
if the table is not created for some other reason?  Consider for example
a table using a datatype that couldn't be created.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Adding an ignore list to pg_restore, prototype patch #1

From
Tom Lane
Date:
Martin Pitt <mpitt@debian.org> writes:
> Martin Pitt [2006-02-19 14:39 +0100]:
>> Since this changes the behaviour of pg_restore, this should probably
>> become an option, e. g. -D / --ignore-existing-table-data. I'll do
>> this if you agree to the principle of the current patch.

> I improved the patch now to only ignore TABLE DATA for existing tables
> if '-X ignore-existing-tables' is specified. I also updated the
> documentation.

This patch is unbelievably ugly and probably vulnerable to coredumps.
Please use a cleaner way of disabling the subsequent load than tromping
all over the TOC datastructure, ie, not this:

> +                                            strcpy (tes->desc, "IGNOREDATA");

BTW, I'm pretty sure it fails for tables with same names in different
schemas, too.
        regards, tom lane


Re: Adding an ignore list to pg_restore, prototype patch #1

From
Bruce Momjian
Date:
I will clean it up before applying.

Your patch has been added to the PostgreSQL unapplied patches list at:
http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------


Martin Pitt wrote:
-- Start of PGP signed section.
> Hi again,
> 
> Martin Pitt [2006-02-19 14:39 +0100]:
> > Since this changes the behaviour of pg_restore, this should probably
> > become an option, e. g. -D / --ignore-existing-table-data. I'll do
> > this if you agree to the principle of the current patch.
> 
> I improved the patch now to only ignore TABLE DATA for existing tables
> if '-X ignore-existing-tables' is specified. I also updated the
> documentation.
> 
> Since this doesn't change the default behaviour now any more, I would
> like to put this patch into the Debian packages to provide automatic
> upgrades for PostGIS-enabled databases (see [1]). Does anyone object
> to this? 
> 
> Do you consider to adopt this upstream?
> 
> Thanks in advance, and have a nice weekend!
> 
> Martin
> 
> [1] http://bugs.debian.org/351571
> 
> -- 
> Martin Pitt        http://www.piware.de
> Ubuntu Developer   http://www.ubuntu.com
> Debian Developer   http://www.debian.org
> 
> In a world without walls and fences, who needs Windows and Gates?

[ Attachment, skipping... ]
-- End of PGP section, PGP failed!

--  Bruce Momjian   http://candle.pha.pa.us SRA OSS, Inc.   http://www.sraoss.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Adding an ignore list to pg_restore, patch take #3

From
Martin Pitt
Date:
Hi all,

thanks for the feedback. I updated the patch now.

Alvaro Herrera [2006-02-25 13:47 -0300]:
> > I improved the patch now to only ignore TABLE DATA for existing tables
> > if '-X ignore-existing-tables' is specified. I also updated the
> > documentation.
>
> Is this really an appropiate description for the behavior?  What happens
> if the table is not created for some other reason?  Consider for example
> a table using a datatype that couldn't be created.

Right. However, if the table is not present at all, then it makes even
less sense to attempt to restore its data. Therefore I consider this
mainly a documentation issue. I changed the option to
-X no-data-for-failed-tables and described it as

    By default, table data objects are restored even if the associated
    table could not be successfully created (e. g.  because it already
    exists). [...]

Tom Lane [2006-02-25 12:18 -0500]:
> Martin Pitt <mpitt@debian.org> writes:
> > Martin Pitt [2006-02-19 14:39 +0100]:
> >> Since this changes the behaviour of pg_restore, this should probably
> >> become an option, e. g. -D / --ignore-existing-table-data. I'll do
> >> this if you agree to the principle of the current patch.
>
> > I improved the patch now to only ignore TABLE DATA for existing tables
> > if '-X ignore-existing-tables' is specified. I also updated the
> > documentation.
>
> This patch is unbelievably ugly and probably vulnerable to coredumps.
> Please use a cleaner way of disabling the subsequent load than tromping
> all over the TOC datastructure, ie, not this:
>
> > +                                            strcpy (tes->desc, "IGNOREDATA");

It should not segfault, but I agree that this is a bit hackish. The
updated patch completely removes the TABLE DATA node from the linked
list. It does not free its memory, though; I did not find a
free_tocentry() or similar function. However, pg_restore is no daemon,
and without the new option the memory would be allocated, too, so it
does not make much difference. Can anyone give me a hint how to
properly free the struct?

> BTW, I'm pretty sure it fails for tables with same names in different
> schemas, too.

Right, sorry for that. I fixed that, too.

Bruce Momjian [2006-02-28 19:54 -0500]:
> I will clean it up before applying.

Thank you. I hope the updated patch makes that a little bit easier.

> Your patch has been added to the PostgreSQL unapplied patches list at:
>
>     http://momjian.postgresql.org/cgi-bin/pgpatches
>
> It will be applied as soon as one of the PostgreSQL committers reviews
> and approves it.

Great, thanks!

Martin

P.S. I also updated the test script to create two namespaces with
identidal table names.
http://people.debian.org/~mpitt/test-pg_restore-existing.sh
--
Martin Pitt        http://www.piware.de
Ubuntu Developer   http://www.ubuntu.com
Debian Developer   http://www.debian.org

In a world without walls and fences, who needs Windows and Gates?

Attachment