Re: pg_dump additional options for performance - Mailing list pgsql-patches

From Stephen Frost
Subject Re: pg_dump additional options for performance
Date
Msg-id 20080726151110.GF16005@tamriel.snowman.net
Whole thread Raw
In response to Re: pg_dump additional options for performance  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: pg_dump additional options for performance  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-patches
* Simon Riggs (simon@2ndquadrant.com) wrote:
> The key capability here is being able to split the dump into multiple
> pieces. The equivalent capability on restore is *not* required, because
> once the dump has been split the restore never needs to be. It might
> seem that the patch should be symmetrical with respect to pg_dump and
> pg_restore, but I see no use case for the pg_restore case.

I'm inclined to agree with this.  It might have been nice to provide a
way to split out already-created dumps, but I suspect that people who
need that probably have already figured out a way to do it (I know I
have..).  We should probably ensure that pg_restore doesn't *break* when
fed a partial dump.

> > The patch as submitted enforces what seem largely arbitrary restrictions
> > on combining these switches.
>
> I had it both ways at various points in development. I'm happy with what
> you propose.

I agree with removing the restrictions.  I don't see much in the way of
use cases, but it reduces code and doesn't cause problems.

> > Another issue is that the rules for deciding which objects are "before
> > data" and which are "after data" are wrong.  In particular ACLs are after
> > data not before data, which is relatively easy to fix.
>
> OK

This was partially why I was complaining about having documentation, and
a policy for that matter, which goes into more detail about why X is before
or after the data.  I agree that they're after today, but I don't see
any particular reason why they should be one or the other.  If we
adopted a policy like I proposed (--schema-post-data is essentially that
which uses the data and is faster done in bulk) then ACLs would be
before, and I tend to feel like it makes more sense that way.

> > Not so easy to fix
> > is that COMMENTs might be either before or after data depending on what
> > kind of object they are attached to.
>
> Is there anything to fix? Comments are added by calls to dumpComment,
> which are always made in conjunction with the dump of an object. So if
> you dump the object you dump the comment. As long as objects are
> correctly split out then comments will be also.

I agree with this, and it follows for BLOB comments- in any case, they
go with the object being dumped at the time of that object getting
dumped.  Comments make sense as an extention of the object, not as a
seperate set of objects to be explicitly placed before or after the
data.

> All of the above makes me certain I want to remove these options from
> pg_restore.

I'm in agreement with this.

> > BTW, another incomplete item is that pg_dumpall should probably be taught
> > to accept and pass down --schema-before-data and --schema-after-data
> > switches.
>
> OK

I could go either way on this.

> Can we prune down to the base use case to avoid this overhead? i.e. have
> these options on pg_dump only?

Makes sense to me.

    Thanks,

        Stephen

Attachment

pgsql-patches by date:

Previous
From: Simon Riggs
Date:
Subject: Re: pg_dump additional options for performance
Next
From: Tom Lane
Date:
Subject: Re: pg_dump additional options for performance