Thread: set search_path in dump output considered harmful
I've recently migrated one of my databases to using veil. This involved creating a 'private' schema and moving all tables to it. Functions remain in public, and secured views are created there which can be accessed by normal users. In doing so, I found to my extreme displeasure that although the database continues to function flawlessly, I can no longer restore dumps produced by pg_dump even after hours of manual tweaking. In all cases, this is due to search_path being frobbed during the restore. CASE 1: serial column not in the same schema as associated table create table a(i serial primary key); create schema notpublic; alter SEQUENCE a_i_seq set schema notpublic; Attempting to restore the output of pg_dump on a database in which the above has been executed will result in the error: SET search_path = notpublic, pg_catalog; SET -- -- Name: a_i_seq; Type: SEQUENCE SET; Schema: notpublic; Owner: pfrost -- SELECT pg_catalog.setval(pg_catalog.pg_get_serial_sequence('a', 'i'), 1, false); ERROR: relation "a" does not exist CASE 2: default set to the serial sequence of another table create schema private; create table private.t(i serial primary key); alter sequence private.t_i_seq set schema public; create table public.t(i integer primary key default nextval('t_i_seq')); This is similar to case 1, and will encounter the same error first. However, if that error is manually corrected, restoring the dump will yield: SET search_path = public, pg_catalog; SELECT pg_catalog.setval(pg_catalog.pg_get_serial_sequence('private.t', 'i'), 1, false); -- manually fixed CREATE TABLE t ( i integer DEFAULT nextval('t_i_seq'::regclass) NOT NULL ); ERROR: relation "t_i_seq" does not exist The problem here seems to be that although the sequence t_i_seq is in schema public in the dumped database, restoring the dump places it in schema private. CASE 3: functions containing unqualified function references create schema private; create function private.a(text) returns text language sql immutable as $$ select $1 || 'a'; $$; set search_path = public, private; create function public.b(text) returns text language sql immutable as $$ select a($1); $$; create table foo(t text); insert into foo values ('foo'); create index foo_idx on foo ((b(t))); Restoring the dump of this database yields: ... CREATE INDEX foo_idx ON foo USING btree (b(t)); ERROR: function a(text) does not exist HINT: No function matches the given name and argument types. You may need to add explicit type casts. CONTEXT: SQL function "b" during startup The way I encountered this actually has little to do with veil. The function involved in my case takes as parameters some values from columns of a table and returns a tsvector to be indexed by tsearch2. I suspect this would be common practice if the tsearch2 documentation did not store the vector in an additional column. CASE 4: functions using extension operators Essentially the same as above, but the body of a function contains a reference to an operator without specifying the schema with the operator(schema.name) syntax. Again, contrib modules like tsearch2 are a great way to encounter this problem.
ISTM that pg_dump needs to produce output that includes schema names, though I'm not sure what side-effects that would have. I know one issue is that it'd make it next to impossible to move things to a different schema just be editing the dump. On Jul 5, 2006, at 9:47 AM, Phil Frost wrote: > I've recently migrated one of my databases to using veil. This > involved > creating a 'private' schema and moving all tables to it. Functions > remain in public, and secured views are created there which can be > accessed by normal users. > > In doing so, I found to my extreme displeasure that although the > database continues to function flawlessly, I can no longer restore > dumps > produced by pg_dump even after hours of manual tweaking. In all cases, > this is due to search_path being frobbed during the restore. > > CASE 1: serial column not in the same schema as associated table > > create table a(i serial primary key); > create schema notpublic; > alter SEQUENCE a_i_seq set schema notpublic; > > Attempting to restore the output of pg_dump on a database in which the > above has been executed will result in the error: > > SET search_path = notpublic, pg_catalog; > SET > -- > -- Name: a_i_seq; Type: SEQUENCE SET; Schema: notpublic; Owner: pfrost > -- > SELECT pg_catalog.setval(pg_catalog.pg_get_serial_sequence('a', > 'i'), 1, false); > ERROR: relation "a" does not exist > > > CASE 2: default set to the serial sequence of another table > > create schema private; > create table private.t(i serial primary key); > alter sequence private.t_i_seq set schema public; > create table public.t(i integer primary key default nextval > ('t_i_seq')); > > This is similar to case 1, and will encounter the same error first. > However, if that error is manually corrected, restoring the dump will > yield: > > SET search_path = public, pg_catalog; > SELECT pg_catalog.setval(pg_catalog.pg_get_serial_sequence > ('private.t', 'i'), 1, false); -- manually fixed > CREATE TABLE t ( > i integer DEFAULT nextval('t_i_seq'::regclass) NOT NULL > ); > ERROR: relation "t_i_seq" does not exist > > The problem here seems to be that although the sequence t_i_seq is in > schema public in the dumped database, restoring the dump places it in > schema private. > > > CASE 3: functions containing unqualified function references > > create schema private; > create function private.a(text) returns text language sql immutable > as $$ select $1 || 'a'; $$; > set search_path = public, private; > create function public.b(text) returns text language sql immutable > as $$ select a($1); $$; > create table foo(t text); > insert into foo values ('foo'); > create index foo_idx on foo ((b(t))); > > Restoring the dump of this database yields: > > ... > CREATE INDEX foo_idx ON foo USING btree (b(t)); > ERROR: function a(text) does not exist > HINT: No function matches the given name and argument types. You > may need to add explicit type casts. > CONTEXT: SQL function "b" during startup > > The way I encountered this actually has little to do with veil. The > function involved in my case takes as parameters some values from > columns of a table and returns a tsvector to be indexed by tsearch2. I > suspect this would be common practice if the tsearch2 documentation > did > not store the vector in an additional column. > > > CASE 4: functions using extension operators > > Essentially the same as above, but the body of a function contains a > reference to an operator without specifying the schema with the > operator(schema.name) syntax. Again, contrib modules like tsearch2 > are a > great way to encounter this problem. > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org >
Phil Frost <indigo@bitglue.com> writes: > I've recently migrated one of my databases to using veil. This involved > creating a 'private' schema and moving all tables to it. > ... > In doing so, I found to my extreme displeasure that although the > database continues to function flawlessly, I can no longer restore dumps > produced by pg_dump even after hours of manual tweaking. In all cases, > this is due to search_path being frobbed during the restore. No, not one of these things can be blamed on pg_dump. > CASE 1: serial column not in the same schema as associated table This is not supported. > CASE 2: default set to the serial sequence of another table This is actually the same thing as #1, because you did > alter sequence private.t_i_seq set schema public; > CASE 3: functions containing unqualified function references This is a buggy function definition, plain and simple. It will fail just as much in the original database as in the dumped/restored one; just invoke it with a search_path other than what it's expecting. There has been talk of attaching a search_path setting to each function so that it's independent of the caller's search_path, but the performance hit seems a bit daunting. In any case it's not pg_dump's fault that this feature doesn't exist... regards, tom lane
On 7/14/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: [ problems with missing schema in dump ] > No, not one of these things can be blamed on pg_dump. Ok, its not exactly bug but still a big annoyance that instead dumping fully qualified names it juggles with search path. And I'm annoyed as a user looking at the dump: * To see what schema the function/table/type is in, I need to carefully look several pages back until there is schema change. * I cannot copy-paste from dump. * The function contents must still contain fully-qualified names, so it cannot be said that the schema is not part of function definition in some way. Same goes for other objects. * The name 'search_path' for picking storage is sick :) I understand the backwards-compatibility angle, but the over-usage in dumps makes it even more sick... -- marko
On Thu, Jul 13, 2006 at 07:17:31PM -0400, Tom Lane wrote: > Phil Frost <indigo@bitglue.com> writes: > > I've recently migrated one of my databases to using veil. This involved > > creating a 'private' schema and moving all tables to it. > > ... > > In doing so, I found to my extreme displeasure that although the > > database continues to function flawlessly, I can no longer restore dumps > > produced by pg_dump even after hours of manual tweaking. In all cases, > > this is due to search_path being frobbed during the restore. > > No, not one of these things can be blamed on pg_dump. > > > CASE 1: serial column not in the same schema as associated table > > This is not supported. This fact is also not documented, and allowed without warning. Things that people can do to get the database into an unsupported state are most usually called "bugs". > > CASE 2: default set to the serial sequence of another table > > This is actually the same thing as #1, because you did > > alter sequence private.t_i_seq set schema public; Agreed. > > CASE 3: functions containing unqualified function references > > This is a buggy function definition, plain and simple. It will fail > just as much in the original database as in the dumped/restored one; > just invoke it with a search_path other than what it's expecting. > > There has been talk of attaching a search_path setting to each function > so that it's independent of the caller's search_path, but the > performance hit seems a bit daunting. In any case it's not pg_dump's > fault that this feature doesn't exist... I don't see anything in the documentation that says I must schema qualify every name reference in functions. Yes, I could break my function by changing the search_path but I bet this is true of most functions out there. Do you schema qualifiy all your operator references? Does this mean most functions are buggy? If this is the case, perhaps a warning should be issued! The fact is that search_path is an important setting that should not be frobbed lightly. You wouldn't arbitrarily change the header file's you have included in a C program, or clobber your OS's dynamic linker search path, would you? If pg_dump would only output qualified names always instead of setting search_path arbitrarily, this would not be a problem. I could run an "alter database" prior to restoring the dump as I have done on my production database and the dump could restore normally. Given that pg_dump already does this for many of the statements it outputs, I don't think it would be impossible to do it for all of them. I do not see any good arguments for the current behaviour. One was made that it allows one to change the schema of a collection of objects by editing a single line, but in fact this isn't true because half of the output statements contain a schema qualification dispite that search_path has been set. If there are any others that I am missing, please tell me. Other people have complained that the lack of schema qualified names makes the dump harder to read. Yesterday on IRC another user was looking for help because his backup wouldn't restore, and it seems the problem is related to search_path.
On Thu, Jul 13, 2006 at 07:17:31PM -0400, Tom Lane wrote: > There has been talk of attaching a search_path setting to each function > so that it's independent of the caller's search_path, but the > performance hit seems a bit daunting. In any case it's not pg_dump's > fault that this feature doesn't exist... Since most (all?) the stuff in functions ends up as prepared statements, you'd only have to set search_path the first time the function was run, no? That doesn't seem nearly as bad. And FWIW, I agree that it would be better if pg_dump quit using search_path. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461