Thread: pg_dump DROP commands and implicit search paths
I'm working on cleaning up loose ends in pg_dump, and in particular trying to ensure that objects in user schemas can be named the same as system objects without conflicts. Most of this works now, thanks to Peter's idea about explicitly setting the search path to include just the current target schema. But there is a problem with pg_dump's option to issue explicit DROP commands. Right now, with that option pg_dump will produce output like set search_path = my_schema; drop table my_table; create table my_table (...); This works fine unless the object name duplicates a system object; in that case, since the effective search path is really "pg_catalog, my_schema", the DROP will find and try to drop the system object. I can think of two workable solutions to this: 1. Explicitly qualify target-object names in the DROP commands, ie, we'd emit set search_path = my_schema; drop table my_schema.my_table; create table my_table (...); 2. Modify the backend so that DROP has a different behavior from other commands: it only searches the explicitly named search path elements (and the TEMP table schema, if any). If pg_catalog is being searched implicitly then DROP does not look there. Choice #1 is logically cleaner but would clutter the dump script with many more explicit schema references than I'd like to have. Choice #2 is awfully ugly at first glance but might prove a good idea in the long run. It'd certainly reduce the odds of mistakenly dropping a predefined object. Not sure which way to go. Comments anyone? regards, tom lane
> set search_path = my_schema; > This works fine unless the object name duplicates a system object; > in that case, since the effective search path is really "pg_catalog, > my_schema", the DROP will find and try to drop the system object. I must have missed that day. Why is that exactly? Clients like psql should probably explicitly specify pg_catalog path anyway. Afterall, if you create a my_schema.pg_class table (for whatever reason), and used my search path as my_schema, I'd expect my own to be hit with my queries. Likewise, postgresql internals should specifiy the specific namespace -- which they generally do through knowledge of the pg_class oid. Is this a temporary thing to tide clients over for a release without breaking too much? > 1. Explicitly qualify target-object names in the DROP commands, > ie, we'd emit Anyway, question at hand. How about a modification of #1. If the table begins in 'pg_' explicitly name it my_schema.pg_????. If users are creating stuff in pg_catalog they should be ready for weird things -- especially given the 'overriding' state it takes in the search path.
Re: pg_dump DROP commands and implicit search paths
From
nconway@klamath.dyndns.org (Neil Conway)
Date:
On Mon, May 13, 2002 at 02:58:08PM -0400, Tom Lane wrote: > 1. Explicitly qualify target-object names in the DROP commands, > 2. Modify the backend so that DROP has a different behavior from > other commands > Choice #1 is logically cleaner but would clutter the dump script with > many more explicit schema references than I'd like to have. I'd prefer this method -- IMHO the readibility of dump scripts isn't a top priority (or if it is, we're not doing very well in that regard any). I think dump scripts should be as verbose as is necessary to ensure that they can't be misinterpreted. > Choice #2 is awfully ugly at first glance but might prove a good > idea in the long run. It's certainly ugly, and I'm skeptical as to its long term benefits (I would think that the cleaner solution would be more maintainable in the long run). Am I missing something? Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
"Rod Taylor" <rbt@zort.ca> writes: > Afterall, if you create a my_schema.pg_class table (for whatever > reason), and used my search path as my_schema, I'd expect my own to be > hit with my queries. If you want that behavior, you can set the search path as "my_schema, pg_catalog". This does not solve pg_dump's DROP problem, however, since an unqualified reference to say pg_class might still be taken as an attempt to drop pg_catalog.pg_class. There's no guarantee that my_schema.pg_class exists beforehand. > Is this a temporary thing to tide clients over for a release without > breaking too much? No, it's a necessary thing to comply with the SQL standard. The standard thinks all the predefined names are keywords and should override user names. Therefore there *must* be a mode wherein pg_catalog is searched first (but is not the target for create operations, so path = "pg_catalog, my_schema" is not right either). > Anyway, question at hand. How about a modification of #1. If the > table begins in 'pg_' explicitly name it my_schema.pg_????. Tables are not really the problem. Think about types, functions, operators. There's no handy rule to know which names conflict (or, even more to the point, might conflict a release or two from now). I am currently thinking that explicitly setting path = my_schema, pg_catalog might solve some of the corner cases for pg_dump ... but it does not fix the DROP problem. regards, tom lane
nconway@klamath.dyndns.org (Neil Conway) writes: > I'd prefer this method -- IMHO the readibility of dump scripts isn't > a top priority (or if it is, we're not doing very well in that regard > any). I think dump scripts should be as verbose as is necessary to > ensure that they can't be misinterpreted. Perhaps instead of "readability" I should have said "editability". The thought that is lurking behind this is that you might want to retarget a dump script to be reloaded in some other schema. If the dump is cluttered with umpteen thousand copies of the schema name that's going to be difficult. Ideally I'd like the dumped object definitions to contain *no* explicit references to their containing schema. This would allow, for example, a pg_restore mode that loads the objects into a different schema. regards, tom lane
> No, it's a necessary thing to comply with the SQL standard. > The standard thinks all the predefined names are keywords and > should override user names. Therefore there *must* be a mode Hmm.. I'm not fond of this part of the standard in this case -- though it's got to be there for good reason. I think I understand the problem better, which may have an easy solution. Based on the assumption a DROP SCHEMA statement will also be issued. If pg_dump issues a DROP of all user objects at the top, as per user request, does it really need to issue a DROP of all the objects? If you DROP the schema, all of the objects contained within the schema will go with it. So technically you don't need to drop types, tables, functions which belong to a given schema. You just drop that schema. So we're left with public and pg_catalog. How about using a qualified name in all cases of DROP, BUT only issuing drops other than drop schema schema for public and pg_catalog contents? Perhaps public could be treated like any other schema as well -- which really only leaves pg_catalog or no problem since thats what will be hit by default.
"Rod Taylor" <rbt@zort.ca> writes: > ... Based on the assumption a DROP SCHEMA statement will also > be issued. Doesn't seem very workable for the public schema. I suspect pg_dump has to special-case public anyway, to some extent, but this doesn't really get us around the DROP problem for individual objects AFAICS. I agree that if we issue a drop for the schema there's no need to drop the individual objects ... but we aren't going to be issuing any drops for public IMHO ... so we still need a solution that supports dropping individual objects. If we assume that schema retargeting is something that should be done by a pg_restore option, then it'd probably be workable for pg_restore to modify the qualified DROP commands as it issues them. The main thing is to keep the explicit schema references out of the CREATE commands, and that part I think is doable. regards, tom lane
> Doesn't seem very workable for the public schema. I suspect pg_dump > has to special-case public anyway, to some extent, but this doesn't > really get us around the DROP problem for individual objects AFAICS. Most people in the know will probably never use public due to portability issues between other databases. A dump without create public won't work anywhere but Postgresql -- which is fine. So fully qualifying public entries isn't so bad -- especially if it's only public entries. After a few more releases (7.[56])public may be able to be treated as a standard user created schema where its creation is prepended from dumps from before 7.3. How did you intend on dealing with the case where a user removes public or otherwise changes the permissions / ownership on it? Is the assumption going to be made that it always exists and is world writable? Obviously restoring dumps from 7.2 or earlier will require public in that state, but will 7.3 and later require it as well where there are objects stored in it?
"Rod Taylor" <rbt@zort.ca> writes: > How did you intend on dealing with the case where a user removes > public or otherwise changes the permissions / ownership on it? I have that as one of my "to think about" items. The best idea I have at the moment is to assume it exists, but include GRANT/REVOKE commands to change its permissions if we see they're not at factory default settings. In the case where it's not there in the source database, should pg_dump have special-case logic to detect that fact and issue a DROP in the script? I'm leaning against, but an argument could be made that we should do that. > Is the assumption going to be made that it always exists and is world > writable? Obviously restoring dumps from 7.2 or earlier will require > public in that state, but will 7.3 and later require it as well where > there are objects stored in it? There are some philosophical issues here about what exactly pg_dump is supposed to do. Is it supposed to try to cause the target database to look exactly like the source, regardless of the initial state of the target? I don't think so; for example, we've never expected it to drop objects that exist in the target but not in the source. I think it is reasonable to assume that loading a pg_dump script into a factory-default empty database will reproduce the source, modulo necessary version-to-version differences. If the target is not in a factory-default condition then we probably ought to be thinking in terms of merging multiple sets of objects, and so gratuituous DROPs don't seem like a good idea. But these considerations give conflicting answers as to whether to DROP PUBLIC if it's not there in the source. As for whether we will deprecate PUBLIC a few releases out --- I can't see that far ahead. I can imagine that if we do, there'll come a time when the release notes may say "if you still have any objects in PUBLIC in your old database, you'll need to manually CREATE SCHEMA PUBLIC before reloading your dump script". regards, tom lane
On Tue, 2002-05-14 at 01:42, Tom Lane wrote: > nconway@klamath.dyndns.org (Neil Conway) writes: > > I'd prefer this method -- IMHO the readibility of dump scripts isn't > > a top priority (or if it is, we're not doing very well in that regard > > any). I think dump scripts should be as verbose as is necessary to > > ensure that they can't be misinterpreted. I agree with Neil on this. > Perhaps instead of "readability" I should have said "editability". > The thought that is lurking behind this is that you might want to > retarget a dump script to be reloaded in some other schema. If the > dump is cluttered with umpteen thousand copies of the schema name > that's going to be difficult. sed -e 's/ old_schema\./ new_schema./g' I don't think you should allow the dump to be ambiguous for the sake of making rarely used actions slightly more convenient. > Ideally I'd like the dumped object definitions to contain *no* explicit > references to their containing schema. This would allow, for example, > a pg_restore mode that loads the objects into a different schema. Provide a command line option to pg_restore to do an automatic edit of the schema name (-E old_schema,new_schema). People using "psql <dump" would have to edit the dump. -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C "Yea, though I walk through the valley of the shadow of death, I will fear no evil, for thou art with me; thyrod and thy staff they comfort me." Psalms 23:4
Oliver Elphick <olly@lfix.co.uk> writes: >> Perhaps instead of "readability" I should have said "editability". >> The thought that is lurking behind this is that you might want to >> retarget a dump script to be reloaded in some other schema. If the >> dump is cluttered with umpteen thousand copies of the schema name >> that's going to be difficult. > sed -e 's/ old_schema\./ new_schema./g'=20 > I don't think you should allow the dump to be ambiguous for the sake of > making rarely used actions slightly more convenient. You have no fear that that "sed" will substitute some places it shouldn't have? Also, what makes you think this'll be a "rarely used" feature? I'd guess that people load dumps every day into databases that have different names than the ones they dumped from. Don't see why the same is not likely to be true at the schema level. No, I'm not going to "allow the dump to be ambiguous". But I'm hoping for a solution that doesn't get in the way of retargeting, either. regards, tom lane
On Tue, 2002-05-14 at 07:08, Tom Lane wrote: > You have no fear that that "sed" will substitute some places it > shouldn't have? Also, what makes you think this'll be a "rarely > used" feature? I'd guess that people load dumps every day into > databases that have different names than the ones they dumped from. > Don't see why the same is not likely to be true at the schema level. A pg_restore option would presumably be more reliable than sed. -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C "Yea, though I walk through the valley of the shadow of death, I will fear no evil, for thou art with me; thyrod and thy staff they comfort me." Psalms 23:4
At 06:58 AM 5/14/02 +0100, Oliver Elphick wrote: > > retarget a dump script to be reloaded in some other schema. If the > > dump is cluttered with umpteen thousand copies of the schema name > > that's going to be difficult. > >sed -e 's/ old_schema\./ new_schema./g' > >I don't think you should allow the dump to be ambiguous for the sake of >making rarely used actions slightly more convenient. Erm, from what I see on this list, people regularly dump and reload, often for performance reasons. There's also dev|backup<->production|live. So I don't think dumping and reloading into another schema would be that rare nor should it be difficult. sed can screw up the data. I suppose we could do schema and data dumps separately but :(. Would that actually work tho? Might come in handy one not so fine day ;)... Regards, Link.