Thread: pg_dump exclusion switches and functions/types

pg_dump exclusion switches and functions/types

From

Kris Jurka

Date:

05 October 2006, 19:25:02

Testing out the new pg_dump exclusion switches I've found that excluding a 
table means that no functions or types will be dumped.  Excluding one 
table shouldn't exclude these objects.  My real use case for this 
functionality is that I have a database that has 99% of its space used by 
one big table and I'd like to be able to dump the rest of the database 
quickly.  If I lose function/type information it's useless.

Kris Jurka

Re: pg_dump exclusion switches and functions/types

From

Tom Lane

Date:

05 October 2006, 20:51:49

Kris Jurka <books@ejurka.com> writes:
> Testing out the new pg_dump exclusion switches I've found that excluding a 
> table means that no functions or types will be dumped.  Excluding one 
> table shouldn't exclude these objects.

I tend to agree ... will see if I can make it happen.  (I never did get
around to reviewing that patch, anyway ...)

One issue is what to do with procedural languages and large objects,
which don't have any associated schema.  If we treat them as being
outside all schemas, we'd have semantics like this: dump the PLs and
blobs unless one or more --schema switches appeared.  Is that OK?
        regards, tom lane

Re: pg_dump exclusion switches and functions/types

From

Markus Schaber

Date:

06 October 2006, 05:34:40

Hi, Tom,

Tom Lane wrote:

> One issue is what to do with procedural languages and large objects,
> which don't have any associated schema.  If we treat them as being
> outside all schemas, we'd have semantics like this: dump the PLs and
> blobs unless one or more --schema switches appeared.  Is that OK?

Sounds fine.

Is there a possibility to dump only those objects? Maybe --large-objects
and --languages?

Thanks,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

Re: pg_dump exclusion switches and functions/types

From

Richard Huxton

Date:

06 October 2006, 08:19:32

Tom Lane wrote:
> Kris Jurka <books@ejurka.com> writes:
>> Testing out the new pg_dump exclusion switches I've found that excluding a 
>> table means that no functions or types will be dumped.  Excluding one 
>> table shouldn't exclude these objects.
> 
> I tend to agree ... will see if I can make it happen.  (I never did get
> around to reviewing that patch, anyway ...)
> 
> One issue is what to do with procedural languages and large objects,
> which don't have any associated schema.  If we treat them as being
> outside all schemas, we'd have semantics like this: dump the PLs and
> blobs unless one or more --schema switches appeared.  Is that OK?

Is there a reason why pg_dump can't do the --list/--use-list flags like 
pg_restore, or is it just a matter of round tuits?

--   Richard Huxton  Archonet Ltd

Re: pg_dump exclusion switches and functions/types

From

Andrew Dunstan

Date:

06 October 2006, 09:41:40

Richard Huxton wrote:
>
> Is there a reason why pg_dump can't do the --list/--use-list flags 
> like pg_restore, or is it just a matter of round tuits?
>

The major reason for having those features as I understand it was to 
help overcome dependency difficulties in dumps, which are now largely a 
thing of the past.

However, ISTM that a similar facility for fine grained control could 
fairly easily be built into pg_dump.

cheers

andrew

Re: pg_dump exclusion switches and functions/types

From

Tom Lane

Date:

06 October 2006, 11:30:29

Andrew Dunstan <andrew@dunslane.net> writes:
> However, ISTM that a similar facility for fine grained control could 
> fairly easily be built into pg_dump.

Yeah ... later.

The way I envision it is that the schema-related switches are fine for
selecting things at the level of whole schemas, and the table-related
switches are fine for selecting individual tables, and what we lack are
inclusion/exclusion switches that operate on other kinds of individual
objects.  Somebody can design and implement those later, if the itch
strikes.  What we have to do today is make sure that the interaction of
schema and table switches is such that an extension in that direction
will fit in naturally.
        regards, tom lane

Re: pg_dump exclusion switches and functions/types

From

Andrew Dunstan

Date:

06 October 2006, 11:48:33

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> However, ISTM that a similar facility for fine grained control could 
>> fairly easily be built into pg_dump.
>>     
>
> Yeah ... later.
>
> The way I envision it is that the schema-related switches are fine for
> selecting things at the level of whole schemas, and the table-related
> switches are fine for selecting individual tables, and what we lack are
> inclusion/exclusion switches that operate on other kinds of individual
> objects.  Somebody can design and implement those later, if the itch
> strikes.  What we have to do today is make sure that the interaction of
> schema and table switches is such that an extension in that direction
> will fit in naturally.
>
>     
>   

totally agree.

cheers

andrew

Re: pg_dump exclusion switches and functions/types

From

Tom Lane

Date:

06 October 2006, 12:55:14

Kris Jurka <books@ejurka.com> writes:
> Testing out the new pg_dump exclusion switches I've found that excluding a 
> table means that no functions or types will be dumped.  Excluding one 
> table shouldn't exclude these objects.

I've been chewing on this a bit and find that the existing patch has
several behaviors that seem surprising.  Considering just one type of
inclusion/exclusion switches at a time (we'll get to the interaction of
schema and table switches below), I think we can all agree without too
much argument on these statements:

* With no inclusion/exclusion switches, all objects except system objects should be dumped.
* With only exclusion switches given, all objects except system objects and those matching at least one pattern should
bedumped.

* With only inclusion switches given, only those objects matching at least one pattern should be dumped (whether they
aresystem objects or not).

That last proviso might be debatable but on balance I think it's OK
(for instance, "pg_dump --schema=information_schema" could be useful
for debugging or documentation purposes).  Where things get interesting
is when you have both inclusion and exclusion switches given.  The
existing patch's behavior is that "the rightmost switch wins", ie,
if an object's name matches more than one pattern then it is included or
excluded according to the rightmost switch it matches.  This is, erm,
poorly documented, but it seems like useful behavior so I don't have
an objection myself.  The real question is what should happen to objects
that don't match any of the switch patterns?  (This is relevant to Kris'
complaint because non-table objects should be treated the same as tables
that don't match any table name inclusion/exclusion switches.)

What I find in the existing code is that if an inclusion switch appears
first:
pg_dump -n 's.*' -N 'ss.*' ...

then only schemas matching an inclusion switch (and not matching any
later exclusion switch) are dumped.  While if an exclusion switch
appears first:
pg_dump -N 's.*' -n 'ss.*' ...

then all schemas are dumped except system schemas and those matching an
exclusion switch (and not matching any later inclusion switch).  So the
"default" behavior for unmatched objects flips depending on switch
order.  This doesn't seem to satisfy the principle of least surprise,
and it's certainly not adequately documented.  It might be the most
useful behavior though.  I thought about the alternative rule that
"if any inclusion switches appear at all, the default is not to dump"
--- that is, an object must match at least one inclusion switch (and not
match any later exclusion switch) to be dumped.  But with that rule,
exclusion switches before the first inclusion switch are actually
useless.  Has anyone got a better idea?

Returning to the point about schema versus table selection switches,
what we've got is that they are independent filters: to be dumped,
a table must be in a schema selected by the schema inclusion/exclusion
switches (if any), and it must have a name selected by the table
inclusion/exclusion switches (if any).  I think this is OK but it leads
to the property that the order of -n/-N switches is relevant, and the
order of -t/-T switches is relevant, but their order relative to each
other is not relevant.  This could be surprising.

If you're still with me, the payoff is here: what are the rules for
dumping non-table objects, given that there are no inclusion/exclusion
switches for them (but we might want to add such later)?  If only schema
inclusion/exclusion switches are present, then it's relatively easy to
say "dump objects that are in selected schemas" --- but what about
objects that don't have a schema, such as PLs?  And what about the case
where table inclusion/exclusion switches are present?  I said above that
non-table objects should be treated the same way as unmatched tables,
which I think is a necessary rule if we want to extend the set of switch
types later.  But that leads to the conclusion that "a non-table object
is dumped unless a -t switch appears before any -T switches".  Which
strikes me as a mighty surprising behavior.  I'm not sure what to do
differently though.

Lastly, as long as we're questioning the premises of this patch,
I wonder about the choice to use regex pattern matching rules.
The problem with regex is that to be upward-compatible with the old
exact-match switch definitions, a switch value that doesn't contain
any regex special characters is treated as an equality condition not
a pattern, which makes for a discontinuity.  For instance, "-t x" is
treated like -t '^x$' while -t 'x.*y' doesn't get the anchors added.
That's going to burn people.  An alternative we could consider is to
use LIKE patterns instead, but since underscore is a wildcard in LIKE,
it's easy to imagine people getting burnt by that too.  Or we could
import the rather ad-hoc shell-wildcard-like rules used by psql's \d
stuff.  None of these are especially attractive :-(

Comments?
        regards, tom lane

Re: pg_dump exclusion switches and functions/types

From

Csaba Nagy

Date:

06 October 2006, 13:18:50

[Snip explanations]
> Comments?

Would it be reasonable to include one more switch: 'include
dependencies' ?

That would work like this:

* first consider all to be included objects (possibly limited by the
include switches);
* if dependencies are included, add all dependent objects, plus
non-schema objects (which arguably can be considered as dependencies for
the whole data base);
* remove all objects targeted by exclude switches;

This way you won't have any dependency on the ordering, and you could
consider all non-schema objects as dependencies, so they will only be
included if dependencies are to be included. 

Excluding dependencies would be the default if any switches are
specified, including otherwise (not sure how much is this of the
principle of least surprise, but it would be backwards compatible).

The scenario I most care about is to be able to make a complete data
base dump (including non-schema objects) while excluding only a few
tables. If I understood your explanations correctly, this would not be
easily possible with the current implementation. Note that I have a
patch (kludge ?) on the 8.1 pg_dump which does exactly this, it would be
a pity if I would need to patch the 8.2 one again to do that...

Cheers,
Csaba.

Re: pg_dump exclusion switches and functions/types

From

Andrew Dunstan

Date:

06 October 2006, 13:26:53

Tom Lane wrote:
> Lastly, as long as we're questioning the premises of this patch,
> I wonder about the choice to use regex pattern matching rules.
> The problem with regex is that to be upward-compatible with the old
> exact-match switch definitions, a switch value that doesn't contain
> any regex special characters is treated as an equality condition not
> a pattern, which makes for a discontinuity.  For instance, "-t x" is
> treated like -t '^x$' while -t 'x.*y' doesn't get the anchors added.
> That's going to burn people.  An alternative we could consider is to
> use LIKE patterns instead, but since underscore is a wildcard in LIKE,
> it's easy to imagine people getting burnt by that too.  Or we could
> import the rather ad-hoc shell-wildcard-like rules used by psql's \d
> stuff.  None of these are especially attractive :-(
>
> Comments?
>
>     


1. regexes, please.
2. I'd rather remove backwards compatibility than have the 
discontinuity. After all, users can anchor the expression pretty easily. 
If not, then let's use an alternate switch for the regexes, (I know we 
are running out of option space).

cheers

andrew

Re: pg_dump exclusion switches and functions/types

From

Tom Lane

Date:

06 October 2006, 14:23:57

Csaba Nagy <nagy@ecircle-ag.com> writes:
> Would it be reasonable to include one more switch: 'include
> dependencies' ?

We are two months past feature freeze ... adding entirely new features
to pg_dump is *not* on the table for 8.2.  What we need to do at the
moment is make sure that the features we've got work sanely and won't
create headaches for likely future extensions; but not actually
implement those extensions.

> The scenario I most care about is to be able to make a complete data
> base dump (including non-schema objects) while excluding only a few
> tables.

Isn't this the same as Kris' complaint?  Why do you need additional
dependency analysis to do the above?
        regards, tom lane

Re: pg_dump exclusion switches and functions/types

From

Tom Lane

Date:

06 October 2006, 14:36:59

Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> ... Or we could
>> import the rather ad-hoc shell-wildcard-like rules used by psql's \d
>> stuff.  None of these are especially attractive :-(

> 1. regexes, please.

One argument that occurs to me for importing the psql code is that it's
solved the problem of including a schema name in the pattern.  It would
be a lot nicer to say "-t schema.table" than to have to say "-t table -n
schema".  In particular this allows one to dump s1.foo and s2.bar
without also getting s1.bar and s2.foo; a problem that is insoluble if
we have only independent schema and table filters.  I think that ideally
one would only use the schema switches if one actually wanted a
schema-by-schema dump, not as a wart on the side of the
specific-object-selection switches.

The psql code does allow you to get at most of the functionality of
regexes...
        regards, tom lane

Re: pg_dump exclusion switches and functions/types

From

Bruno Wolff III

Date:

06 October 2006, 18:07:21

On Fri, Oct 06, 2006 at 11:54:51 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> The problem with regex is that to be upward-compatible with the old
> exact-match switch definitions, a switch value that doesn't contain
> any regex special characters is treated as an equality condition not
> a pattern, which makes for a discontinuity.  For instance, "-t x" is
> treated like -t '^x$' while -t 'x.*y' doesn't get the anchors added.
> That's going to burn people.  An alternative we could consider is to
> use LIKE patterns instead, but since underscore is a wildcard in LIKE,
> it's easy to imagine people getting burnt by that too.  Or we could
> import the rather ad-hoc shell-wildcard-like rules used by psql's \d
> stuff.  None of these are especially attractive :-(
> 
> Comments?

How about making the regex's anchored by default? People who want unanchored
ones can add .* at the beginning and/or end. Since only whether or not
the pattern matches is important (not the string it matched), this keeps
all of the same power, but matches the old behavior in simple cases.

Re: pg_dump exclusion switches and functions/types

From

Gregory Stark

Date:

06 October 2006, 23:28:42

Tom Lane <tgl@sss.pgh.pa.us> writes:

> The existing patch's behavior is that "the rightmost switch wins", ie, if an
> object's name matches more than one pattern then it is included or excluded
> according to the rightmost switch it matches. This is, erm, poorly
> documented, but it seems like useful behavior so I don't have an objection
> myself.

I don't know, it sounds like it's the source of the confusion you identify
later.

My first thought is that the rule should be to apply all the inclusion
switches (implicitly including everything if there are none), then apply all
the exclusion switches.

That leads to including non-schema objects only if there are no schema
inclusion switches. Which seems pretty logical since if you're explicitly
including objects then you'll only expect objects explicitly included to be
dumped and you'll quickly realize there's no switch to bring in those
non-schema objects. Maybe there should be a switch to include them just for
completeness.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

Re: pg_dump exclusion switches and functions/types

From

Tom Lane

Date:

07 October 2006, 18:10:52

Gregory Stark <stark@enterprisedb.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> The existing patch's behavior is that "the rightmost switch wins", ie, if an
>> object's name matches more than one pattern then it is included or excluded
>> according to the rightmost switch it matches.

> My first thought is that the rule should be to apply all the inclusion
> switches (implicitly including everything if there are none), then apply all
> the exclusion switches.

I kinda like that, because it makes the behavior completely independent
of switch ordering, which seems like a good property to preserve.
Anyone else have an opinion pro or con?

> That leads to including non-schema objects only if there are no schema
> inclusion switches. Which seems pretty logical since if you're explicitly
> including objects then you'll only expect objects explicitly included to be
> dumped and you'll quickly realize there's no switch to bring in those
> non-schema objects. Maybe there should be a switch to include them just for
> completeness.

Well, pg_dump already has a --blobs switch, which has been a no-op
(because now the default) since 8.1, but it's still in the switch
parser.  It wouldn't take much to revive it for the purpose of causing
blobs to be dumped even when there's an inclusion switch.  As for PLs,
I'm not really too worried about dumping them per se (since it's usually
easy enough to create the ones you're using).  The functionality we're
really lacking there is the "--include-dependencies" switch that was
discussed upthread ... which I think is a fine idea but should wait for
8.3.
        regards, tom lane

Re: pg_dump exclusion switches and functions/types

From

Josh Berkus

Date:

07 October 2006, 18:20:56

Tom,

> I kinda like that, because it makes the behavior completely independent
> of switch ordering, which seems like a good property to preserve.
> Anyone else have an opinion pro or con?

The only "con" argument I can think of is that "tar" and "rsync", whose syntax 
is familiar to a lot of sysadmins, apply switches left-to-right.  

However, I don't feel that that is a compelling argument.  The include/exclude 
switch order processing is something I've always *hated* about tar and has 
messed me up more times than I can count.  Also, Windows users could care 
less if we behave like tar.

So +1 to go with orderless switching.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: pg_dump exclusion switches and functions/types

From

Tom Lane

Date:

07 October 2006, 18:29:12

I wrote:
> One argument that occurs to me for importing the psql code is that it's
> solved the problem of including a schema name in the pattern.  It would
> be a lot nicer to say "-t schema.table" than to have to say "-t table -n
> schema".

The more I think about this, the more I think the above is a killer
argument.  We really should have had the ability to say "-t schema.table"
ever since schemas were added in 7.3, but no one got around to making it
happen.  If we go over to interpreting the arguments as standard regexes
then we'll never be able to do that, because we'll have foreclosed the
meaning of dot.  The psql pattern code was specifically designed as a
compromise notation adapted to SQL needs, and IMHO it's served pretty
well --- so I think we should adopt that into pg_dump rather than pure
regex notation.

> The psql code does allow you to get at most of the functionality of
> regexes...

Actually, it lets you get at all of it, though perhaps a bit awkwardly.
The transformations it makes are
.    =>    schema vs name separator*    =>    .*?    =>    .

So the only regex patterns you can't write directly are dot, R* and R?
for which you can use these locutions:
.    =>    ?R*    =>    (R+|)R?    =>    (R|)

(Perhaps this should be documented somewhere...)

So I propose that we should revise the patch to use psql's \d code to
determine which objects match a pattern.  I think that together with
Greg's idea of processing all inclusions before all exclusions should
answer the concerns I've got about the patch.
        regards, tom lane

Re: pg_dump exclusion switches and functions/types

From

David Fetter

Date:

07 October 2006, 22:22:31

On Fri, Oct 06, 2006 at 10:28:21PM -0400, Gregory Stark wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
> > The existing patch's behavior is that "the rightmost switch wins",
> > ie, if an object's name matches more than one pattern then it is
> > included or excluded according to the rightmost switch it matches.
> > This is, erm, poorly documented, but it seems like useful behavior
> > so I don't have an objection myself.
> 
> I don't know, it sounds like it's the source of the confusion you
> identify later.
> 
> My first thought is that the rule should be to apply all the
> inclusion switches (implicitly including everything if there are
> none), then apply all the exclusion switches.

+1 :)

Order-dependent switches are a giant foot gun.

Cheers,
D
-- 
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666                             Skype: davidfetter

Remember to vote!

Re: pg_dump exclusion switches and functions/types

From

Csaba Nagy

Date:

09 October 2006, 05:33:12

> We are two months past feature freeze ... adding entirely new features
> to pg_dump is *not* on the table for 8.2. 

Ok, clear.

> > The scenario I most care about is to be able to make a complete data
> > base dump (including non-schema objects) while excluding only a few
> > tables.
> 
> Isn't this the same as Kris' complaint?  Why do you need additional
> dependency analysis to do the above?

Well, I obviously didn't understand well the complete feature as it is
implemented. Now, is what I want (see above) possible with the new
feature, or if I exclude some tables I implicitly loose some other
things too from the dump which normally would be there ? This is my only
concern...

Cheers,
Csaba.

Re: pg_dump exclusion switches and functions/types

From

Tom Lane

Date:

09 October 2006, 11:25:55

Csaba Nagy <nagy@ecircle-ag.com> writes:
>> Isn't this the same as Kris' complaint?  Why do you need additional
>> dependency analysis to do the above?

> Well, I obviously didn't understand well the complete feature as it is
> implemented. Now, is what I want (see above) possible with the new
> feature, or if I exclude some tables I implicitly loose some other
> things too from the dump which normally would be there ? This is my only
> concern...

I think we've agreed that if you use some exclusion switches, but not
any inclusion switches, then only the specific objects matching your
switches are excluded.  CVS HEAD gets this wrong, but I'm going to work
on it today.
        regards, tom lane

Re: pg_dump exclusion switches and functions/types

From

Csaba Nagy

Date:

09 October 2006, 11:30:35

On Mon, 2006-10-09 at 16:24, Tom Lane wrote:
> I think we've agreed that if you use some exclusion switches, but not
> any inclusion switches, then only the specific objects matching your
> switches are excluded.  CVS HEAD gets this wrong, but I'm going to work
> on it today.

Cool, that makes it cover my use case and some more.

Thanks,
Csaba.

Re: pg_dump exclusion switches and functions/types

From

"Jim C. Nasby"

Date:

09 October 2006, 13:54:14

On Sat, Oct 07, 2006 at 05:29:03PM -0400, Tom Lane wrote:
> So the only regex patterns you can't write directly are dot, R* and R?
> for which you can use these locutions:
> 
>     .    =>    ?
>     R*    =>    (R+|)
>     R?    =>    (R|)
> 
> (Perhaps this should be documented somewhere...)

Yes, please.
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: pg_dump exclusion switches and functions/types

From

"Jim C. Nasby"

Date:

09 October 2006, 14:07:44

On Sat, Oct 07, 2006 at 06:22:19PM -0700, David Fetter wrote:
> On Fri, Oct 06, 2006 at 10:28:21PM -0400, Gregory Stark wrote:
> > Tom Lane <tgl@sss.pgh.pa.us> writes:
> > 
> > > The existing patch's behavior is that "the rightmost switch wins",
> > > ie, if an object's name matches more than one pattern then it is
> > > included or excluded according to the rightmost switch it matches.
> > > This is, erm, poorly documented, but it seems like useful behavior
> > > so I don't have an objection myself.
> > 
> > I don't know, it sounds like it's the source of the confusion you
> > identify later.
> > 
> > My first thought is that the rule should be to apply all the
> > inclusion switches (implicitly including everything if there are
> > none), then apply all the exclusion switches.
> 
> +1 :)
> 
> Order-dependent switches are a giant foot gun.

They're also very powerful, as anyone who's ever used them in a
non-trivial rsync (or rdiff-backup) scenareo can tell you. What if you
want to exclude all of a schema except for a few objects (granted, right
now we're limited to just tables...)?

Here's a real example, from my rdiff-backup exclude files list:

+ /opt/local/var/db/dports
+ /opt/local/var/log
+ /opt/local/etc
/opt

Note that rdiffbackup applies the first option that matches a
file/directory, not the last. So this says to exclude all of /opt,
except for /opt/local/var/db/dports, etc. If this was done as 'last
operator takes priority', you'd just reverse the order of the list:

pg_dump --exclude-schema /opt --include-table opt.local_var_db_dports
--include-table opt.local_var_log --include-table opt.local_etc

If we processed all includes first, you lose this ability. For simple
cases, it might not matter, but think about wanting to back up an entire
database, except for schema opt, but you do want a few objects from
within opt. Without following some kind of ordering on includes and
excludes, that's next to impossible.
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: pg_dump exclusion switches and functions/types

From

David Fetter

Date:

09 October 2006, 14:17:56

On Mon, Oct 09, 2006 at 12:07:29PM -0500, Jim C. Nasby wrote:
> On Sat, Oct 07, 2006 at 06:22:19PM -0700, David Fetter wrote:
> > On Fri, Oct 06, 2006 at 10:28:21PM -0400, Gregory Stark wrote:
> > > Tom Lane <tgl@sss.pgh.pa.us> writes:
> > > 
> > > > The existing patch's behavior is that "the rightmost switch wins",
> > > > ie, if an object's name matches more than one pattern then it is
> > > > included or excluded according to the rightmost switch it matches.
> > > > This is, erm, poorly documented, but it seems like useful behavior
> > > > so I don't have an objection myself.
> > > 
> > > I don't know, it sounds like it's the source of the confusion you
> > > identify later.
> > > 
> > > My first thought is that the rule should be to apply all the
> > > inclusion switches (implicitly including everything if there are
> > > none), then apply all the exclusion switches.
> > 
> > +1 :)
> > 
> > Order-dependent switches are a giant foot gun.
> 
> They're also very powerful, as anyone who's ever used them in a
> non-trivial rsync (or rdiff-backup) scenareo can tell you. What if
> you want to exclude all of a schema except for a few objects
> (granted, right now we're limited to just tables...)?

You make an important distinction here, and thanks for doing that. :)

IMHO, order-dependent switches are appropriate for a configuration
file and inappropriate for the command line.  The pg_hba.conf file is
a great example of a place where order dependence is a good idea.

Cheers,
D
-- 
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666                             Skype: davidfetter

Remember to vote!

Re: pg_dump exclusion switches and functions/types

From

Tom Lane

Date:

09 October 2006, 14:59:40

"Jim C. Nasby" <jim@nasby.net> writes:
> On Sat, Oct 07, 2006 at 06:22:19PM -0700, David Fetter wrote:
>> On Fri, Oct 06, 2006 at 10:28:21PM -0400, Gregory Stark wrote:
>>> My first thought is that the rule should be to apply all the
>>> inclusion switches (implicitly including everything if there are
>>> none), then apply all the exclusion switches.
>> 
>> +1 :)
>> Order-dependent switches are a giant foot gun.

> They're also very powerful, as anyone who's ever used them in a
> non-trivial rsync (or rdiff-backup) scenareo can tell you.

Sure, but the question is whether that incremental gain in capability
is worth the extra logical complexity.  I'm inclined to think that many
more users would get burned by the complexity than would have use for it.
Considering that we've gotten along this long with only the most
primitive selection capabilities in pg_dump, it doesn't seem like
there's an enormous demand for highly refined capabilities.

(And I agree with David's comment that it might be better to reserve
such behavior for a configuration file than to put it on the command
line.)
        regards, tom lane

Re: pg_dump exclusion switches and functions/types

From

"Jim C. Nasby"

Date:

09 October 2006, 16:34:17

On Mon, Oct 09, 2006 at 01:59:18PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <jim@nasby.net> writes:
> > On Sat, Oct 07, 2006 at 06:22:19PM -0700, David Fetter wrote:
> >> On Fri, Oct 06, 2006 at 10:28:21PM -0400, Gregory Stark wrote:
> >>> My first thought is that the rule should be to apply all the
> >>> inclusion switches (implicitly including everything if there are
> >>> none), then apply all the exclusion switches.
> >> 
> >> +1 :)
> >> Order-dependent switches are a giant foot gun.
> 
> > They're also very powerful, as anyone who's ever used them in a
> > non-trivial rsync (or rdiff-backup) scenareo can tell you.
> 
> Sure, but the question is whether that incremental gain in capability
> is worth the extra logical complexity.  I'm inclined to think that many
> more users would get burned by the complexity than would have use for it.
> Considering that we've gotten along this long with only the most
> primitive selection capabilities in pg_dump, it doesn't seem like
> there's an enormous demand for highly refined capabilities.
> 
> (And I agree with David's comment that it might be better to reserve
> such behavior for a configuration file than to put it on the command
> line.)

I can certainly see the logic in putting the more advanced capability in
a config file of some kind (though, I think a simple include/exclude
file is best for this...)

The question becomes: do we want incompatible behavior between the
config file and the command line? And which over-rides what?
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: pg_dump exclusion switches and functions/types

From

David Fetter

Date:

09 October 2006, 16:47:37

On Mon, Oct 09, 2006 at 02:34:09PM -0500, Jim C. Nasby wrote:
> On Mon, Oct 09, 2006 at 01:59:18PM -0400, Tom Lane wrote:
> > "Jim C. Nasby" <jim@nasby.net> writes:
> > > On Sat, Oct 07, 2006 at 06:22:19PM -0700, David Fetter wrote:
> > >> On Fri, Oct 06, 2006 at 10:28:21PM -0400, Gregory Stark wrote:
> > >>> My first thought is that the rule should be to apply all the
> > >>> inclusion switches (implicitly including everything if there
> > >>> are none), then apply all the exclusion switches.
> > >> 
> > >> +1 :) Order-dependent switches are a giant foot gun.
> > 
> > > They're also very powerful, as anyone who's ever used them in a
> > > non-trivial rsync (or rdiff-backup) scenareo can tell you.
> > 
> > Sure, but the question is whether that incremental gain in
> > capability is worth the extra logical complexity.  I'm inclined to
> > think that many more users would get burned by the complexity than
> > would have use for it.  Considering that we've gotten along this
> > long with only the most primitive selection capabilities in
> > pg_dump, it doesn't seem like there's an enormous demand for
> > highly refined capabilities.
> > 
> > (And I agree with David's comment that it might be better to
> > reserve such behavior for a configuration file than to put it on
> > the command line.)
> 
> I can certainly see the logic in putting the more advanced
> capability in a config file of some kind (though, I think a simple
> include/exclude file is best for this...)
> 
> The question becomes: do we want incompatible behavior between the
> config file and the command line? And which over-rides what?

The way I've cut this Gordian knot in the past is simply to make
command-line and file-based options for a given thing (e.g.
exclusion/inclusion) mutually exclusive and throw an error if somebody
attempts to mix them.

Cheers,
D
-- 
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666                             Skype: davidfetter

Remember to vote!

Re: pg_dump exclusion switches and functions/types

From

"Greg Sabino Mullane"

Date:

09 October 2006, 21:06:17

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> Sure, but the question is whether that incremental gain in capability
> is worth the extra logical complexity.  I'm inclined to think that many
> more users would get burned by the complexity than would have use for it.
> Considering that we've gotten along this long with only the most
> primitive selection capabilities in pg_dump, it doesn't seem like
> there's an enormous demand for highly refined capabilities.

I disagree - we lose a lot of flexibility by taking out the ordering, and,
as was pointed out to me when I first started this patch a while ago,
we might as well front-load all the complexity and changes now rather
than adding them in release by release. I'm also not sure why the regex
should be changed to something even more non-standard than the current
POSIX ones. Finally, I'm surprised at the apparent willingness at this
point to shatter backwards-compatibility with previous -t scripts, as
this was an option I raised early on but met strong resistance, thus
the current compromise of allowing existing scripts to run unaltered,
while adding in the ability to do some regular expressions.

The regex stuff was discussed in January, and the patch submitted in
July, so it seems a little rushed to be changing the underlying behavior
so quickly right now (that behavior being the ability to control which
tables and schemas to dump). I think the original post about the request
to exclude a single table and still dump other objects is a fair one,
but I think we've morphed far beyond solving that problem.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200610092003
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFFKuPFvJuQZxSWSsgRAjAxAJ9oY5HCM4KxmpLEU56eCMJauHBhFgCfcyDt
R5yf5SKKBeBHJ2gdRlE1Pqs=
=rIxZ
-----END PGP SIGNATURE-----

Re: pg_dump exclusion switches and functions/types

From

Tom Lane

Date:

10 October 2006, 20:02:14

"Greg Sabino Mullane" <greg@turnstep.com> writes:
>> Sure, but the question is whether that incremental gain in capability
>> is worth the extra logical complexity.  I'm inclined to think that many
>> more users would get burned by the complexity than would have use for
>> it.

> I disagree - we lose a lot of flexibility by taking out the ordering,

We lose some flexibility, but it's not clear to me that it's so
essential as all that.  Even the restricted patch is tremendously
more flexible than pg_dump has ever been, and I just don't see the
argument that there's a market demand for doing more at the cost
of clarity.

> I'm also not sure why the regex
> should be changed to something even more non-standard than the current
> POSIX ones. Finally, I'm surprised at the apparent willingness at this
> point to shatter backwards-compatibility with previous -t scripts, as
> this was an option I raised early on but met strong resistance, thus
> the current compromise of allowing existing scripts to run unaltered,
> while adding in the ability to do some regular expressions.

That's a fair point, but the way that the patch was preserving exact
backward compatibility was by making it a discontinuous corner case,
which is a decision I think we'd regret in the long run.  Andrew was
already suggesting upthread that we drop the anchoring (and lose
compatibility to a much greater extent than what this does) in order
to make the behavior more self-consistent.  Also, insisting on straight
regexps amounts to failing to learn from experience: before 7.3 the psql
\d commands used patterns that *were* straight regexps, and that just
did not work all that conveniently.

> The regex stuff was discussed in January, and the patch submitted in
> July, so it seems a little rushed to be changing the underlying behavior
> so quickly right now

Well, the problem is that once we ship 8.2 we'll be stuck with whatever
behavior we've defined --- it's unlikely that it'd be worth the pain of
another round of incompatibility in order to make small adjustments.
So we'd better get it right the first time.  I do apologize for not
having reviewed this patch more closely earlier, but I've been a tad
busy...
        regards, tom lane