Thread: set search_path in dump output considered harmful

set search_path in dump output considered harmful

From
Phil Frost
Date:
I've recently migrated one of my databases to using veil. This involved
creating a 'private' schema and moving all tables to it. Functions
remain in public, and secured views are created there which can be
accessed by normal users.

In doing so, I found to my extreme displeasure that although the
database continues to function flawlessly, I can no longer restore dumps
produced by pg_dump even after hours of manual tweaking. In all cases,
this is due to search_path being frobbed during the restore.

CASE 1: serial column not in the same schema as associated table

create table a(i serial primary key);
create schema notpublic;
alter SEQUENCE a_i_seq set schema notpublic;

Attempting to restore the output of pg_dump on a database in which the
above has been executed will result in the error:

SET search_path = notpublic, pg_catalog;
SET
--
-- Name: a_i_seq; Type: SEQUENCE SET; Schema: notpublic; Owner: pfrost
--
SELECT pg_catalog.setval(pg_catalog.pg_get_serial_sequence('a', 'i'), 1, false);
ERROR:  relation "a" does not exist


CASE 2: default set to the serial sequence of another table

create schema private;
create table private.t(i serial primary key);
alter sequence private.t_i_seq set schema public;
create table public.t(i integer primary key default nextval('t_i_seq'));

This is similar to case 1, and will encounter the same error first.
However, if that error is manually corrected, restoring the dump will
yield:

SET search_path = public, pg_catalog;
SELECT pg_catalog.setval(pg_catalog.pg_get_serial_sequence('private.t', 'i'), 1, false);        -- manually fixed
CREATE TABLE t (   i integer DEFAULT nextval('t_i_seq'::regclass) NOT NULL
);
ERROR:  relation "t_i_seq" does not exist

The problem here seems to be that although the sequence t_i_seq is in
schema public in the dumped database, restoring the dump places it in
schema private.


CASE 3: functions containing unqualified function references

create schema private;
create function private.a(text) returns text language sql immutable as $$ select $1 || 'a'; $$;
set search_path = public, private;
create function public.b(text) returns text language sql immutable as $$ select a($1); $$;
create table foo(t text);
insert into foo values ('foo');
create index foo_idx on foo ((b(t)));

Restoring the dump of this database yields:

...
CREATE INDEX foo_idx ON foo USING btree (b(t));
ERROR:  function a(text) does not exist
HINT:  No function matches the given name and argument types. You may need to add explicit type casts.
CONTEXT:  SQL function "b" during startup

The way I encountered this actually has little to do with veil. The
function involved in my case takes as parameters some values from
columns of a table and returns a tsvector to be indexed by tsearch2. I
suspect this would be common practice if the tsearch2 documentation did
not store the vector in an additional column.


CASE 4: functions using extension operators

Essentially the same as above, but the body of a function contains a
reference to an operator without specifying the schema with the
operator(schema.name) syntax. Again, contrib modules like tsearch2 are a
great way to encounter this problem.


Re: set search_path in dump output considered harmful

From
Jim Nasby
Date:
ISTM that pg_dump needs to produce output that includes schema names,  
though I'm not sure what side-effects that would have. I know one  
issue is that it'd make it next to impossible to move things to a  
different schema just be editing the dump.

On Jul 5, 2006, at 9:47 AM, Phil Frost wrote:

> I've recently migrated one of my databases to using veil. This  
> involved
> creating a 'private' schema and moving all tables to it. Functions
> remain in public, and secured views are created there which can be
> accessed by normal users.
>
> In doing so, I found to my extreme displeasure that although the
> database continues to function flawlessly, I can no longer restore  
> dumps
> produced by pg_dump even after hours of manual tweaking. In all cases,
> this is due to search_path being frobbed during the restore.
>
> CASE 1: serial column not in the same schema as associated table
>
> create table a(i serial primary key);
> create schema notpublic;
> alter SEQUENCE a_i_seq set schema notpublic;
>
> Attempting to restore the output of pg_dump on a database in which the
> above has been executed will result in the error:
>
> SET search_path = notpublic, pg_catalog;
> SET
> --
> -- Name: a_i_seq; Type: SEQUENCE SET; Schema: notpublic; Owner: pfrost
> --
> SELECT pg_catalog.setval(pg_catalog.pg_get_serial_sequence('a',  
> 'i'), 1, false);
> ERROR:  relation "a" does not exist
>
>
> CASE 2: default set to the serial sequence of another table
>
> create schema private;
> create table private.t(i serial primary key);
> alter sequence private.t_i_seq set schema public;
> create table public.t(i integer primary key default nextval 
> ('t_i_seq'));
>
> This is similar to case 1, and will encounter the same error first.
> However, if that error is manually corrected, restoring the dump will
> yield:
>
> SET search_path = public, pg_catalog;
> SELECT pg_catalog.setval(pg_catalog.pg_get_serial_sequence 
> ('private.t', 'i'), 1, false);        -- manually fixed
> CREATE TABLE t (
>     i integer DEFAULT nextval('t_i_seq'::regclass) NOT NULL
> );
> ERROR:  relation "t_i_seq" does not exist
>
> The problem here seems to be that although the sequence t_i_seq is in
> schema public in the dumped database, restoring the dump places it in
> schema private.
>
>
> CASE 3: functions containing unqualified function references
>
> create schema private;
> create function private.a(text) returns text language sql immutable  
> as $$ select $1 || 'a'; $$;
> set search_path = public, private;
> create function public.b(text) returns text language sql immutable  
> as $$ select a($1); $$;
> create table foo(t text);
> insert into foo values ('foo');
> create index foo_idx on foo ((b(t)));
>
> Restoring the dump of this database yields:
>
> ...
> CREATE INDEX foo_idx ON foo USING btree (b(t));
> ERROR:  function a(text) does not exist
> HINT:  No function matches the given name and argument types. You  
> may need to add explicit type casts.
> CONTEXT:  SQL function "b" during startup
>
> The way I encountered this actually has little to do with veil. The
> function involved in my case takes as parameters some values from
> columns of a table and returns a tsvector to be indexed by tsearch2. I
> suspect this would be common practice if the tsearch2 documentation  
> did
> not store the vector in an additional column.
>
>
> CASE 4: functions using extension operators
>
> Essentially the same as above, but the body of a function contains a
> reference to an operator without specifying the schema with the
> operator(schema.name) syntax. Again, contrib modules like tsearch2  
> are a
> great way to encounter this problem.
>
> ---------------------------(end of  
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>



Re: set search_path in dump output considered harmful

From
Tom Lane
Date:
Phil Frost <indigo@bitglue.com> writes:
> I've recently migrated one of my databases to using veil. This involved
> creating a 'private' schema and moving all tables to it.
> ...
> In doing so, I found to my extreme displeasure that although the
> database continues to function flawlessly, I can no longer restore dumps
> produced by pg_dump even after hours of manual tweaking. In all cases,
> this is due to search_path being frobbed during the restore.

No, not one of these things can be blamed on pg_dump.

> CASE 1: serial column not in the same schema as associated table

This is not supported.

> CASE 2: default set to the serial sequence of another table

This is actually the same thing as #1, because you did
> alter sequence private.t_i_seq set schema public;

> CASE 3: functions containing unqualified function references

This is a buggy function definition, plain and simple.  It will fail
just as much in the original database as in the dumped/restored one;
just invoke it with a search_path other than what it's expecting.

There has been talk of attaching a search_path setting to each function
so that it's independent of the caller's search_path, but the
performance hit seems a bit daunting.  In any case it's not pg_dump's
fault that this feature doesn't exist...
        regards, tom lane


Re: set search_path in dump output considered harmful

From
"Marko Kreen"
Date:
On 7/14/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

[ problems with missing schema in dump ]

> No, not one of these things can be blamed on pg_dump.

Ok, its not exactly bug but still a big annoyance that
instead dumping fully qualified names it juggles with
search path.  And I'm annoyed as a user looking at the dump:

* To see what schema the function/table/type is in, I need to
carefully look several pages back until there is schema change.

* I cannot copy-paste from dump.

* The function contents must still contain fully-qualified
names, so it cannot be said that the schema is not part of
function definition in some way.  Same goes for other objects.

* The name 'search_path' for picking storage is sick :)
I understand the backwards-compatibility angle, but the over-usage
in dumps makes it even more sick...

-- 
marko


Re: set search_path in dump output considered harmful

From
Phil Frost
Date:
On Thu, Jul 13, 2006 at 07:17:31PM -0400, Tom Lane wrote:
> Phil Frost <indigo@bitglue.com> writes:
> > I've recently migrated one of my databases to using veil. This involved
> > creating a 'private' schema and moving all tables to it.
> > ...
> > In doing so, I found to my extreme displeasure that although the
> > database continues to function flawlessly, I can no longer restore dumps
> > produced by pg_dump even after hours of manual tweaking. In all cases,
> > this is due to search_path being frobbed during the restore.
> 
> No, not one of these things can be blamed on pg_dump.
> 
> > CASE 1: serial column not in the same schema as associated table
> 
> This is not supported.

This fact is also not documented, and allowed without warning. Things
that people can do to get the database into an unsupported state are
most usually called "bugs".

> > CASE 2: default set to the serial sequence of another table
> 
> This is actually the same thing as #1, because you did
> > alter sequence private.t_i_seq set schema public;

Agreed.

> > CASE 3: functions containing unqualified function references
> 
> This is a buggy function definition, plain and simple.  It will fail
> just as much in the original database as in the dumped/restored one;
> just invoke it with a search_path other than what it's expecting.
> 
> There has been talk of attaching a search_path setting to each function
> so that it's independent of the caller's search_path, but the
> performance hit seems a bit daunting.  In any case it's not pg_dump's
> fault that this feature doesn't exist...

I don't see anything in the documentation that says I must schema
qualify every name reference in functions. Yes, I could break my
function by changing the search_path but I bet this is true of most
functions out there. Do you schema qualifiy all your operator
references? Does this mean most functions are buggy? If this is the
case, perhaps a warning should be issued! The fact is that search_path
is an important setting that should not be frobbed lightly. You wouldn't
arbitrarily change the header file's you have included in a C program,
or clobber your OS's dynamic linker search path, would you?

If pg_dump would only output qualified names always instead of setting
search_path arbitrarily, this would not be a problem. I could run an
"alter database" prior to restoring the dump as I have done on my
production database and the dump could restore normally. Given that
pg_dump already does this for many of the statements it outputs, I don't
think it would be impossible to do it for all of them.

I do not see any good arguments for the current behaviour. One was made
that it allows one to change the schema of a collection of objects by
editing a single line, but in fact this isn't true because half of the
output statements contain a schema qualification dispite that
search_path has been set. If there are any others that I am missing,
please tell me.

Other people have complained that the lack of schema qualified names
makes the dump harder to read. Yesterday on IRC another user was looking
for help because his backup wouldn't restore, and it seems the problem
is related to search_path.


Re: set search_path in dump output considered harmful

From
"Jim C. Nasby"
Date:
On Thu, Jul 13, 2006 at 07:17:31PM -0400, Tom Lane wrote:
> There has been talk of attaching a search_path setting to each function
> so that it's independent of the caller's search_path, but the
> performance hit seems a bit daunting.  In any case it's not pg_dump's
> fault that this feature doesn't exist...

Since most (all?) the stuff in functions ends up as prepared statements,
you'd only have to set search_path the first time the function was run,
no? That doesn't seem nearly as bad.

And FWIW, I agree that it would be better if pg_dump quit using
search_path.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461