Thread: pg_dump DROP commands and implicit search paths

pg_dump DROP commands and implicit search paths

From

Tom Lane

Date:

13 May 2002, 17:59:31

I'm working on cleaning up loose ends in pg_dump, and in particular
trying to ensure that objects in user schemas can be named the same
as system objects without conflicts.  Most of this works now, thanks
to Peter's idea about explicitly setting the search path to include
just the current target schema.  But there is a problem with pg_dump's
option to issue explicit DROP commands.  Right now, with that option
pg_dump will produce output like
set search_path = my_schema;
drop table my_table;
create table my_table (...);

This works fine unless the object name duplicates a system object;
in that case, since the effective search path is really "pg_catalog,
my_schema", the DROP will find and try to drop the system object.

I can think of two workable solutions to this:

1. Explicitly qualify target-object names in the DROP commands,
ie, we'd emit
set search_path = my_schema;
drop table my_schema.my_table;
create table my_table (...);

2. Modify the backend so that DROP has a different behavior from
other commands: it only searches the explicitly named search path
elements (and the TEMP table schema, if any).  If pg_catalog is
being searched implicitly then DROP does not look there.

Choice #1 is logically cleaner but would clutter the dump script with
many more explicit schema references than I'd like to have.  Choice #2
is awfully ugly at first glance but might prove a good idea in the long
run.  It'd certainly reduce the odds of mistakenly dropping a predefined
object.

Not sure which way to go.  Comments anyone?
        regards, tom lane

Re: pg_dump DROP commands and implicit search paths

From

"Rod Taylor"

Date:

13 May 2002, 21:07:07

> set search_path = my_schema;

> This works fine unless the object name duplicates a system object;
> in that case, since the effective search path is really "pg_catalog,
> my_schema", the DROP will find and try to drop the system object.

I must have missed that day.  Why is that exactly?  Clients like psql
should probably explicitly specify pg_catalog path anyway.

Afterall, if you create a my_schema.pg_class table (for whatever
reason), and used my search path as my_schema, I'd expect my own to be
hit with my queries.  Likewise, postgresql internals should specifiy
the specific namespace -- which they generally do through knowledge of
the pg_class oid.

Is this a temporary thing to tide clients over for a release without
breaking too much?

> 1. Explicitly qualify target-object names in the DROP commands,
> ie, we'd emit

Anyway, question at hand.  How about a modification of #1.  If the
table begins in 'pg_' explicitly name it my_schema.pg_????.  If users
are creating stuff in pg_catalog they should be ready for weird
things -- especially given the 'overriding' state it takes in the
search path.

Re: pg_dump DROP commands and implicit search paths

From

nconway@klamath.dyndns.org (Neil Conway)

Date:

13 May 2002, 21:28:01

On Mon, May 13, 2002 at 02:58:08PM -0400, Tom Lane wrote:
> 1. Explicitly qualify target-object names in the DROP commands,

> 2. Modify the backend so that DROP has a different behavior from
> other commands

> Choice #1 is logically cleaner but would clutter the dump script with
> many more explicit schema references than I'd like to have.

I'd prefer this method -- IMHO the readibility of dump scripts isn't
a top priority (or if it is, we're not doing very well in that regard
any). I think dump scripts should be as verbose as is necessary to
ensure that they can't be misinterpreted.

> Choice #2 is awfully ugly at first glance but might prove a good
> idea in the long run.

It's certainly ugly, and I'm skeptical as to its long term benefits
(I would think that the cleaner solution would be more maintainable
in the long run). Am I missing something?

Cheers,

Neil

-- 
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC

Re: pg_dump DROP commands and implicit search paths

From

Tom Lane

Date:

13 May 2002, 21:28:09

"Rod Taylor" <rbt@zort.ca> writes:
> Afterall, if you create a my_schema.pg_class table (for whatever
> reason), and used my search path as my_schema, I'd expect my own to be
> hit with my queries.

If you want that behavior, you can set the search path as
"my_schema, pg_catalog".  This does not solve pg_dump's DROP
problem, however, since an unqualified reference to say pg_class
might still be taken as an attempt to drop pg_catalog.pg_class.
There's no guarantee that my_schema.pg_class exists beforehand.

> Is this a temporary thing to tide clients over for a release without
> breaking too much?

No, it's a necessary thing to comply with the SQL standard.
The standard thinks all the predefined names are keywords and
should override user names.  Therefore there *must* be a mode
wherein pg_catalog is searched first (but is not the target for
create operations, so path = "pg_catalog, my_schema" is not right
either).

> Anyway, question at hand.  How about a modification of #1.  If the
> table begins in 'pg_' explicitly name it my_schema.pg_????.

Tables are not really the problem.  Think about types, functions,
operators.  There's no handy rule to know which names conflict
(or, even more to the point, might conflict a release or two from
now).

I am currently thinking that explicitly setting path = my_schema,
pg_catalog might solve some of the corner cases for pg_dump ... but
it does not fix the DROP problem.
        regards, tom lane

Re: pg_dump DROP commands and implicit search paths

From

Tom Lane

Date:

13 May 2002, 21:28:27

nconway@klamath.dyndns.org (Neil Conway) writes:
> I'd prefer this method -- IMHO the readibility of dump scripts isn't
> a top priority (or if it is, we're not doing very well in that regard
> any). I think dump scripts should be as verbose as is necessary to
> ensure that they can't be misinterpreted.

Perhaps instead of "readability" I should have said "editability".
The thought that is lurking behind this is that you might want to
retarget a dump script to be reloaded in some other schema.  If the
dump is cluttered with umpteen thousand copies of the schema name
that's going to be difficult.

Ideally I'd like the dumped object definitions to contain *no* explicit
references to their containing schema.  This would allow, for example,
a pg_restore mode that loads the objects into a different schema.
        regards, tom lane

Re: pg_dump DROP commands and implicit search paths

From

"Rod Taylor"

Date:

13 May 2002, 21:40:25

> No, it's a necessary thing to comply with the SQL standard.
> The standard thinks all the predefined names are keywords and
> should override user names.  Therefore there *must* be a mode

Hmm.. I'm not fond of this part of the standard in this case -- though
it's got to be there for good reason.

I think I understand the problem better, which may have an easy
solution.  Based on the assumption a DROP SCHEMA statement will also
be issued.

If pg_dump issues a DROP of all user objects at the top, as per user
request, does it really need to issue a DROP of all the objects?

If you DROP the schema, all of the objects contained within the schema
will go with it.  So technically you don't need to drop types, tables,
functions which belong to a given schema.  You just drop that schema.

So we're left with public and pg_catalog.  How about using a qualified
name in all cases of DROP, BUT only issuing drops other than drop
schema schema for public and pg_catalog contents?

Perhaps public could be treated like any other schema as well -- which
really only leaves pg_catalog or no problem since thats what will be
hit by default.

Re: pg_dump DROP commands and implicit search paths

From

Tom Lane

Date:

13 May 2002, 21:48:38

"Rod Taylor" <rbt@zort.ca> writes:
> ... Based on the assumption a DROP SCHEMA statement will also
> be issued.

Doesn't seem very workable for the public schema.  I suspect pg_dump
has to special-case public anyway, to some extent, but this doesn't
really get us around the DROP problem for individual objects AFAICS.

I agree that if we issue a drop for the schema there's no need to
drop the individual objects ... but we aren't going to be issuing
any drops for public IMHO ... so we still need a solution that
supports dropping individual objects.

If we assume that schema retargeting is something that should be
done by a pg_restore option, then it'd probably be workable for
pg_restore to modify the qualified DROP commands as it issues them.
The main thing is to keep the explicit schema references out of the
CREATE commands, and that part I think is doable.
        regards, tom lane

Re: pg_dump DROP commands and implicit search paths

From

"Rod Taylor"

Date:

13 May 2002, 22:16:44

> Doesn't seem very workable for the public schema.  I suspect pg_dump
> has to special-case public anyway, to some extent, but this doesn't
> really get us around the DROP problem for individual objects AFAICS.

Most people in the know will probably never use public due to
portability issues between other databases.  A dump without create
public won't work anywhere but Postgresql -- which is fine.

So fully qualifying public entries isn't so bad -- especially if it's
only public entries.  After a few more releases (7.[56])public may be
able to be treated as a standard user created schema where its
creation is prepended from dumps from before 7.3.

How did you intend on dealing with the case where a user removes
public or otherwise changes the permissions / ownership on it?  Is the
assumption going to be made that it always exists and is world
writable?  Obviously restoring dumps from 7.2 or earlier will require
public in that state, but will 7.3 and later require it as well where
there are objects stored in it?

Re: pg_dump DROP commands and implicit search paths

From

Tom Lane

Date:

13 May 2002, 22:57:49

"Rod Taylor" <rbt@zort.ca> writes:
> How did you intend on dealing with the case where a user removes
> public or otherwise changes the permissions / ownership on it?

I have that as one of my "to think about" items.  The best idea I have
at the moment is to assume it exists, but include GRANT/REVOKE commands
to change its permissions if we see they're not at factory default
settings.

In the case where it's not there in the source database, should pg_dump
have special-case logic to detect that fact and issue a DROP in the
script?  I'm leaning against, but an argument could be made that we
should do that.

> Is the assumption going to be made that it always exists and is world
> writable?  Obviously restoring dumps from 7.2 or earlier will require
> public in that state, but will 7.3 and later require it as well where
> there are objects stored in it?

There are some philosophical issues here about what exactly pg_dump
is supposed to do.  Is it supposed to try to cause the target database
to look exactly like the source, regardless of the initial state of the
target?  I don't think so; for example, we've never expected it to drop
objects that exist in the target but not in the source.  I think it is
reasonable to assume that loading a pg_dump script into a
factory-default empty database will reproduce the source, modulo
necessary version-to-version differences.  If the target is not in a
factory-default condition then we probably ought to be thinking in terms
of merging multiple sets of objects, and so gratuituous DROPs don't seem
like a good idea.  But these considerations give conflicting answers as
to whether to DROP PUBLIC if it's not there in the source.

As for whether we will deprecate PUBLIC a few releases out --- I can't
see that far ahead.  I can imagine that if we do, there'll come a time
when the release notes may say "if you still have any objects in PUBLIC
in your old database, you'll need to manually CREATE SCHEMA PUBLIC
before reloading your dump script".
        regards, tom lane

Re: pg_dump DROP commands and implicit search paths

From

Oliver Elphick

Date:

14 May 2002, 01:58:48

On Tue, 2002-05-14 at 01:42, Tom Lane wrote:
> nconway@klamath.dyndns.org (Neil Conway) writes:
> > I'd prefer this method -- IMHO the readibility of dump scripts isn't
> > a top priority (or if it is, we're not doing very well in that regard
> > any). I think dump scripts should be as verbose as is necessary to
> > ensure that they can't be misinterpreted.

I agree with Neil on this.

> Perhaps instead of "readability" I should have said "editability".
> The thought that is lurking behind this is that you might want to
> retarget a dump script to be reloaded in some other schema.  If the
> dump is cluttered with umpteen thousand copies of the schema name
> that's going to be difficult.

sed -e 's/ old_schema\./ new_schema./g'

I don't think you should allow the dump to be ambiguous for the sake of
making rarely used actions slightly more convenient.

> Ideally I'd like the dumped object definitions to contain *no* explicit
> references to their containing schema.  This would allow, for example,
> a pg_restore mode that loads the objects into a different schema.

Provide a command line option to pg_restore to do an automatic edit of
the schema name (-E old_schema,new_schema).  People using "psql <dump"
would have to edit the dump.

--
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
    "Yea, though I walk through the valley of the shadow of     death, I will fear no evil, for thou art with me;
thyrod and thy staff they comfort me."  Psalms 23:4

Re: pg_dump DROP commands and implicit search paths

From

Tom Lane

Date:

14 May 2002, 02:08:28

Oliver Elphick <olly@lfix.co.uk> writes:
>> Perhaps instead of "readability" I should have said "editability".
>> The thought that is lurking behind this is that you might want to
>> retarget a dump script to be reloaded in some other schema.  If the
>> dump is cluttered with umpteen thousand copies of the schema name
>> that's going to be difficult.

> sed -e 's/ old_schema\./ new_schema./g'=20

> I don't think you should allow the dump to be ambiguous for the sake of
> making rarely used actions slightly more convenient.

You have no fear that that "sed" will substitute some places it
shouldn't have?  Also, what makes you think this'll be a "rarely
used" feature?  I'd guess that people load dumps every day into
databases that have different names than the ones they dumped from.
Don't see why the same is not likely to be true at the schema level.

No, I'm not going to "allow the dump to be ambiguous".  But I'm hoping
for a solution that doesn't get in the way of retargeting, either.
        regards, tom lane

Re: pg_dump DROP commands and implicit search paths

From

Oliver Elphick

Date:

14 May 2002, 02:29:45

On Tue, 2002-05-14 at 07:08, Tom Lane wrote:
> You have no fear that that "sed" will substitute some places it
> shouldn't have?  Also, what makes you think this'll be a "rarely
> used" feature?  I'd guess that people load dumps every day into
> databases that have different names than the ones they dumped from.
> Don't see why the same is not likely to be true at the schema level.

A pg_restore option would presumably be more reliable than sed.

--
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
    "Yea, though I walk through the valley of the shadow of     death, I will fear no evil, for thou art with me;
thyrod and thy staff they comfort me."  Psalms 23:4

Re: pg_dump DROP commands and implicit search paths

From

Lincoln Yeoh

Date:

14 May 2002, 06:30:11

At 06:58 AM 5/14/02 +0100, Oliver Elphick wrote:
> > retarget a dump script to be reloaded in some other schema.  If the
> > dump is cluttered with umpteen thousand copies of the schema name
> > that's going to be difficult.
>
>sed -e 's/ old_schema\./ new_schema./g'
>
>I don't think you should allow the dump to be ambiguous for the sake of
>making rarely used actions slightly more convenient.

Erm, from what I see on this list, people regularly dump and reload, often 
for performance reasons. There's also dev|backup<->production|live.

So I don't think dumping and reloading into another schema would be that 
rare nor should it be difficult.

sed can screw up the data. I suppose we could do schema and data dumps 
separately but :(. Would that actually work tho? Might come in handy one 
not so fine day ;)...

Regards,
Link.