Re: pg_dump exclusion switches and functions/types - Mailing list pgsql-hackers

Kris Jurka <books@ejurka.com> writes:
> Testing out the new pg_dump exclusion switches I've found that excluding a 
> table means that no functions or types will be dumped.  Excluding one 
> table shouldn't exclude these objects.

I've been chewing on this a bit and find that the existing patch has
several behaviors that seem surprising.  Considering just one type of
inclusion/exclusion switches at a time (we'll get to the interaction of
schema and table switches below), I think we can all agree without too
much argument on these statements:

* With no inclusion/exclusion switches, all objects except system objects should be dumped.
* With only exclusion switches given, all objects except system objects and those matching at least one pattern should
bedumped.
 
* With only inclusion switches given, only those objects matching at least one pattern should be dumped (whether they
aresystem objects or not).
 

That last proviso might be debatable but on balance I think it's OK
(for instance, "pg_dump --schema=information_schema" could be useful
for debugging or documentation purposes).  Where things get interesting
is when you have both inclusion and exclusion switches given.  The
existing patch's behavior is that "the rightmost switch wins", ie,
if an object's name matches more than one pattern then it is included or
excluded according to the rightmost switch it matches.  This is, erm,
poorly documented, but it seems like useful behavior so I don't have
an objection myself.  The real question is what should happen to objects
that don't match any of the switch patterns?  (This is relevant to Kris'
complaint because non-table objects should be treated the same as tables
that don't match any table name inclusion/exclusion switches.)

What I find in the existing code is that if an inclusion switch appears
first:
pg_dump -n 's.*' -N 'ss.*' ...

then only schemas matching an inclusion switch (and not matching any
later exclusion switch) are dumped.  While if an exclusion switch
appears first:
pg_dump -N 's.*' -n 'ss.*' ...

then all schemas are dumped except system schemas and those matching an
exclusion switch (and not matching any later inclusion switch).  So the
"default" behavior for unmatched objects flips depending on switch
order.  This doesn't seem to satisfy the principle of least surprise,
and it's certainly not adequately documented.  It might be the most
useful behavior though.  I thought about the alternative rule that
"if any inclusion switches appear at all, the default is not to dump"
--- that is, an object must match at least one inclusion switch (and not
match any later exclusion switch) to be dumped.  But with that rule,
exclusion switches before the first inclusion switch are actually
useless.  Has anyone got a better idea?

Returning to the point about schema versus table selection switches,
what we've got is that they are independent filters: to be dumped,
a table must be in a schema selected by the schema inclusion/exclusion
switches (if any), and it must have a name selected by the table
inclusion/exclusion switches (if any).  I think this is OK but it leads
to the property that the order of -n/-N switches is relevant, and the
order of -t/-T switches is relevant, but their order relative to each
other is not relevant.  This could be surprising.

If you're still with me, the payoff is here: what are the rules for
dumping non-table objects, given that there are no inclusion/exclusion
switches for them (but we might want to add such later)?  If only schema
inclusion/exclusion switches are present, then it's relatively easy to
say "dump objects that are in selected schemas" --- but what about
objects that don't have a schema, such as PLs?  And what about the case
where table inclusion/exclusion switches are present?  I said above that
non-table objects should be treated the same way as unmatched tables,
which I think is a necessary rule if we want to extend the set of switch
types later.  But that leads to the conclusion that "a non-table object
is dumped unless a -t switch appears before any -T switches".  Which
strikes me as a mighty surprising behavior.  I'm not sure what to do
differently though.

Lastly, as long as we're questioning the premises of this patch,
I wonder about the choice to use regex pattern matching rules.
The problem with regex is that to be upward-compatible with the old
exact-match switch definitions, a switch value that doesn't contain
any regex special characters is treated as an equality condition not
a pattern, which makes for a discontinuity.  For instance, "-t x" is
treated like -t '^x$' while -t 'x.*y' doesn't get the anchors added.
That's going to burn people.  An alternative we could consider is to
use LIKE patterns instead, but since underscore is a wildcard in LIKE,
it's easy to imagine people getting burnt by that too.  Or we could
import the rather ad-hoc shell-wildcard-like rules used by psql's \d
stuff.  None of these are especially attractive :-(

Comments?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: PL/pgSQL Todo, better information in errcontext from plpgsql
Next
From: "Rocco Altier"
Date:
Subject: Re: Win XP SP2 SMP locking (8.1.4)