Re: pg_dump exclusion switches and functions/types - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: pg_dump exclusion switches and functions/types |
Date | |
Msg-id | 21342.1160150091@sss.pgh.pa.us Whole thread Raw |
In response to | pg_dump exclusion switches and functions/types (Kris Jurka <books@ejurka.com>) |
Responses |
Re: pg_dump exclusion switches and functions/types
Re: pg_dump exclusion switches and functions/types Re: pg_dump exclusion switches and functions/types Re: pg_dump exclusion switches and functions/types |
List | pgsql-hackers |
Kris Jurka <books@ejurka.com> writes: > Testing out the new pg_dump exclusion switches I've found that excluding a > table means that no functions or types will be dumped. Excluding one > table shouldn't exclude these objects. I've been chewing on this a bit and find that the existing patch has several behaviors that seem surprising. Considering just one type of inclusion/exclusion switches at a time (we'll get to the interaction of schema and table switches below), I think we can all agree without too much argument on these statements: * With no inclusion/exclusion switches, all objects except system objects should be dumped. * With only exclusion switches given, all objects except system objects and those matching at least one pattern should bedumped. * With only inclusion switches given, only those objects matching at least one pattern should be dumped (whether they aresystem objects or not). That last proviso might be debatable but on balance I think it's OK (for instance, "pg_dump --schema=information_schema" could be useful for debugging or documentation purposes). Where things get interesting is when you have both inclusion and exclusion switches given. The existing patch's behavior is that "the rightmost switch wins", ie, if an object's name matches more than one pattern then it is included or excluded according to the rightmost switch it matches. This is, erm, poorly documented, but it seems like useful behavior so I don't have an objection myself. The real question is what should happen to objects that don't match any of the switch patterns? (This is relevant to Kris' complaint because non-table objects should be treated the same as tables that don't match any table name inclusion/exclusion switches.) What I find in the existing code is that if an inclusion switch appears first: pg_dump -n 's.*' -N 'ss.*' ... then only schemas matching an inclusion switch (and not matching any later exclusion switch) are dumped. While if an exclusion switch appears first: pg_dump -N 's.*' -n 'ss.*' ... then all schemas are dumped except system schemas and those matching an exclusion switch (and not matching any later inclusion switch). So the "default" behavior for unmatched objects flips depending on switch order. This doesn't seem to satisfy the principle of least surprise, and it's certainly not adequately documented. It might be the most useful behavior though. I thought about the alternative rule that "if any inclusion switches appear at all, the default is not to dump" --- that is, an object must match at least one inclusion switch (and not match any later exclusion switch) to be dumped. But with that rule, exclusion switches before the first inclusion switch are actually useless. Has anyone got a better idea? Returning to the point about schema versus table selection switches, what we've got is that they are independent filters: to be dumped, a table must be in a schema selected by the schema inclusion/exclusion switches (if any), and it must have a name selected by the table inclusion/exclusion switches (if any). I think this is OK but it leads to the property that the order of -n/-N switches is relevant, and the order of -t/-T switches is relevant, but their order relative to each other is not relevant. This could be surprising. If you're still with me, the payoff is here: what are the rules for dumping non-table objects, given that there are no inclusion/exclusion switches for them (but we might want to add such later)? If only schema inclusion/exclusion switches are present, then it's relatively easy to say "dump objects that are in selected schemas" --- but what about objects that don't have a schema, such as PLs? And what about the case where table inclusion/exclusion switches are present? I said above that non-table objects should be treated the same way as unmatched tables, which I think is a necessary rule if we want to extend the set of switch types later. But that leads to the conclusion that "a non-table object is dumped unless a -t switch appears before any -T switches". Which strikes me as a mighty surprising behavior. I'm not sure what to do differently though. Lastly, as long as we're questioning the premises of this patch, I wonder about the choice to use regex pattern matching rules. The problem with regex is that to be upward-compatible with the old exact-match switch definitions, a switch value that doesn't contain any regex special characters is treated as an equality condition not a pattern, which makes for a discontinuity. For instance, "-t x" is treated like -t '^x$' while -t 'x.*y' doesn't get the anchors added. That's going to burn people. An alternative we could consider is to use LIKE patterns instead, but since underscore is a wildcard in LIKE, it's easy to imagine people getting burnt by that too. Or we could import the rather ad-hoc shell-wildcard-like rules used by psql's \d stuff. None of these are especially attractive :-( Comments? regards, tom lane
pgsql-hackers by date: