Thread: CREATE FUNCTION .. SET vs. pg_dump

CREATE FUNCTION .. SET vs. pg_dump

From
Stefan Kaltenbrunner
Date:
Hi all!


While working on upgrading the database of the search system on
postgresql.org to 9.2 I noticed that the dumps that pg_dump generates on
that system are actually invalid and cannot be reloaded without being
hacked on manually...

Simple way to reproduce is using the following:



CREATE TEXT SEARCH CONFIGURATION pg (   PARSER = pg_catalog."default" );

CREATE FUNCTION test() RETURNS INTEGER
LANGUAGE sql SET default_text_search_config TO 'public.pg' AS $$
SELECT 1;
$$;


once you dump that you will end up with an invalid dump because the
function will be dumped before the actual text search configuration is
(re)created. I have not checked in any more detail but I suspect that
this problem is not only affecting default_text_search_config.


Stefan



Re: CREATE FUNCTION .. SET vs. pg_dump

From
Tom Lane
Date:
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> While working on upgrading the database of the search system on
> postgresql.org to 9.2 I noticed that the dumps that pg_dump generates on
> that system are actually invalid and cannot be reloaded without being
> hacked on manually...

> CREATE TEXT SEARCH CONFIGURATION pg (
>     PARSER = pg_catalog."default" );

> CREATE FUNCTION test() RETURNS INTEGER
> LANGUAGE sql SET default_text_search_config TO 'public.pg' AS $$
> SELECT 1;
> $$;

> once you dump that you will end up with an invalid dump because the
> function will be dumped before the actual text search configuration is
> (re)created.

I don't think it will work to try to fix this by reordering the dump;
it's too easy to imagine scenarios where that would lead to circular
ordering constraints.  What seems like a more workable answer is for
CREATE FUNCTION to not attempt to validate SET clauses when
check_function_bodies is off, or at least not throw a hard error when
the validation fails.  (I see for instance that if you try   ALTER ROLE joe SET default_text_search_config TO
nonesuch;
you just get a notice and not an error.)

However, I don't recall if there are any places where we assume the
SET info was pre-validated by CREATE/ALTER FUNCTION.
        regards, tom lane



Re: CREATE FUNCTION .. SET vs. pg_dump

From
Stefan Kaltenbrunner
Date:
On 08/18/2013 05:40 PM, Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> While working on upgrading the database of the search system on
>> postgresql.org to 9.2 I noticed that the dumps that pg_dump generates on
>> that system are actually invalid and cannot be reloaded without being
>> hacked on manually...
> 
>> CREATE TEXT SEARCH CONFIGURATION pg (
>>     PARSER = pg_catalog."default" );
> 
>> CREATE FUNCTION test() RETURNS INTEGER
>> LANGUAGE sql SET default_text_search_config TO 'public.pg' AS $$
>> SELECT 1;
>> $$;
> 
>> once you dump that you will end up with an invalid dump because the
>> function will be dumped before the actual text search configuration is
>> (re)created.
> 
> I don't think it will work to try to fix this by reordering the dump;
> it's too easy to imagine scenarios where that would lead to circular
> ordering constraints.  What seems like a more workable answer is for
> CREATE FUNCTION to not attempt to validate SET clauses when
> check_function_bodies is off, or at least not throw a hard error when
> the validation fails.  (I see for instance that if you try
>     ALTER ROLE joe SET default_text_search_config TO nonesuch;
> you just get a notice and not an error.)

hmm yeah - just throwing a NOTICE with check_function_bodies=off seems
like reasonable workaround to this problem area.
Not sure it would be required to turn it into a NOTICE in general,
though alter role/alter database seems like an established precedence
for this.



Stefan



Re: CREATE FUNCTION .. SET vs. pg_dump

From
Stefan Kaltenbrunner
Date:
On 08/18/2013 05:40 PM, Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> While working on upgrading the database of the search system on
>> postgresql.org to 9.2 I noticed that the dumps that pg_dump generates on
>> that system are actually invalid and cannot be reloaded without being
>> hacked on manually...
> 
>> CREATE TEXT SEARCH CONFIGURATION pg (
>>     PARSER = pg_catalog."default" );
> 
>> CREATE FUNCTION test() RETURNS INTEGER
>> LANGUAGE sql SET default_text_search_config TO 'public.pg' AS $$
>> SELECT 1;
>> $$;
> 
>> once you dump that you will end up with an invalid dump because the
>> function will be dumped before the actual text search configuration is
>> (re)created.
> 
> I don't think it will work to try to fix this by reordering the dump;
> it's too easy to imagine scenarios where that would lead to circular
> ordering constraints.  What seems like a more workable answer is for
> CREATE FUNCTION to not attempt to validate SET clauses when
> check_function_bodies is off, or at least not throw a hard error when
> the validation fails.  (I see for instance that if you try
>     ALTER ROLE joe SET default_text_search_config TO nonesuch;
> you just get a notice and not an error.)
> 
> However, I don't recall if there are any places where we assume the
> SET info was pre-validated by CREATE/ALTER FUNCTION.

any further insights into that issue? - seems a bit silly to have an
open bug that actually prevents us from taking (restorable) backups of
the search system on our own website...



Stefan



Re: CREATE FUNCTION .. SET vs. pg_dump

From
Stephen Frost
Date:
* Stefan Kaltenbrunner (stefan@kaltenbrunner.cc) wrote:
> On 08/18/2013 05:40 PM, Tom Lane wrote:
> > Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> >> While working on upgrading the database of the search system on
> >> postgresql.org to 9.2 I noticed that the dumps that pg_dump generates on
> >> that system are actually invalid and cannot be reloaded without being
> >> hacked on manually...
> >
> >> CREATE TEXT SEARCH CONFIGURATION pg (
> >>     PARSER = pg_catalog."default" );
> >
> >> CREATE FUNCTION test() RETURNS INTEGER
> >> LANGUAGE sql SET default_text_search_config TO 'public.pg' AS $$
> >> SELECT 1;
> >> $$;
> >
> >> once you dump that you will end up with an invalid dump because the
> >> function will be dumped before the actual text search configuration is
> >> (re)created.
> >
> > I don't think it will work to try to fix this by reordering the dump;
> > it's too easy to imagine scenarios where that would lead to circular
> > ordering constraints.  What seems like a more workable answer is for
> > CREATE FUNCTION to not attempt to validate SET clauses when
> > check_function_bodies is off, or at least not throw a hard error when
> > the validation fails.  (I see for instance that if you try
> >     ALTER ROLE joe SET default_text_search_config TO nonesuch;
> > you just get a notice and not an error.)
> >
> > However, I don't recall if there are any places where we assume the
> > SET info was pre-validated by CREATE/ALTER FUNCTION.
>
> any further insights into that issue? - seems a bit silly to have an
> open bug that actually prevents us from taking (restorable) backups of
> the search system on our own website...

It would seem that a simple solution would be to add an elevel argument
to ProcessGUCArray and then call it with NOTICE in the case that
check_function_bodies is true.  None of the contrib modules call
ProcessGUCArray, but should we worry that some external module does?

This doesn't address Tom's concern that we may trust in the SET to
ensure that the value stored is valid.  That seems like it'd be pretty
odd given how we typically handle GUCs, but I've not done a
comprehensive review to be sure.

Like Stefan, I'd really like to see this fixed, and sooner rather than
later, so we can continue the process of upgrading our systems to 9.2..
Thanks,
    Stephen

Re: CREATE FUNCTION .. SET vs. pg_dump

From
Stefan Kaltenbrunner
Date:
On 09/01/2013 12:53 AM, Stephen Frost wrote:
> * Stefan Kaltenbrunner (stefan@kaltenbrunner.cc) wrote:
>> On 08/18/2013 05:40 PM, Tom Lane wrote:
>>> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>>>> While working on upgrading the database of the search system on
>>>> postgresql.org to 9.2 I noticed that the dumps that pg_dump generates on
>>>> that system are actually invalid and cannot be reloaded without being
>>>> hacked on manually...
>>>
>>>> CREATE TEXT SEARCH CONFIGURATION pg (
>>>>     PARSER = pg_catalog."default" );
>>>
>>>> CREATE FUNCTION test() RETURNS INTEGER
>>>> LANGUAGE sql SET default_text_search_config TO 'public.pg' AS $$
>>>> SELECT 1;
>>>> $$;
>>>
>>>> once you dump that you will end up with an invalid dump because the
>>>> function will be dumped before the actual text search configuration is
>>>> (re)created.
>>>
>>> I don't think it will work to try to fix this by reordering the dump;
>>> it's too easy to imagine scenarios where that would lead to circular
>>> ordering constraints.  What seems like a more workable answer is for
>>> CREATE FUNCTION to not attempt to validate SET clauses when
>>> check_function_bodies is off, or at least not throw a hard error when
>>> the validation fails.  (I see for instance that if you try
>>>     ALTER ROLE joe SET default_text_search_config TO nonesuch;
>>> you just get a notice and not an error.)
>>>
>>> However, I don't recall if there are any places where we assume the
>>> SET info was pre-validated by CREATE/ALTER FUNCTION.
>>
>> any further insights into that issue? - seems a bit silly to have an
>> open bug that actually prevents us from taking (restorable) backups of
>> the search system on our own website...
>
> It would seem that a simple solution would be to add an elevel argument
> to ProcessGUCArray and then call it with NOTICE in the case that
> check_function_bodies is true.  None of the contrib modules call
> ProcessGUCArray, but should we worry that some external module does?

attached is a rough patch that does exactly that, if we are worried
about an api change we could simple do a ProcessGUCArrayNotice() in the
backbranches if that approach is actually sane.

>
> This doesn't address Tom's concern that we may trust in the SET to
> ensure that the value stored is valid.  That seems like it'd be pretty
> odd given how we typically handle GUCs, but I've not done a
> comprehensive review to be sure.

well the whole per-database/per-user GUC handling is already pretty
weird/inconsistent, if you for example alter a database with an invalid
 default_text_search_config you get a NOTICE about it, every time you
connect to that database later on you get a WARNING.


mastermind=# alter database mastermind set default_text_search_config to
'foo';
NOTICE:  text search configuration "foo" does not exist
ALTER DATABASE
mastermind=# \q
mastermind@powerbrain:~$ psql
WARNING:  invalid value for parameter "default_text_search_config": "foo"
psql (9.1.9)
Type "help" for help.

>
> Like Stefan, I'd really like to see this fixed, and sooner rather than
> later, so we can continue the process of upgrading our systems to 9.2..

well - we can certainly work around it but others might not...


Stefan

Attachment

Re: CREATE FUNCTION .. SET vs. pg_dump

From
Robert Haas
Date:
On Sun, Sep 1, 2013 at 10:36 AM, Stefan Kaltenbrunner
<stefan@kaltenbrunner.cc> wrote:
>> It would seem that a simple solution would be to add an elevel argument
>> to ProcessGUCArray and then call it with NOTICE in the case that
>> check_function_bodies is true.  None of the contrib modules call
>> ProcessGUCArray, but should we worry that some external module does?
>
> attached is a rough patch that does exactly that, if we are worried
> about an api change we could simple do a ProcessGUCArrayNotice() in the
> backbranches if that approach is actually sane.

This patch has some definite coding-style issues, but those should be
easy to fix.  The bigger thing I worry about is whether distributing
the decision as to what elevel ought to be used here all over the code
base is indeed sane.  Perhaps that ship has already sailed, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: CREATE FUNCTION .. SET vs. pg_dump

From
Stefan Kaltenbrunner
Date:
On 09/03/2013 06:15 PM, Robert Haas wrote:
> On Sun, Sep 1, 2013 at 10:36 AM, Stefan Kaltenbrunner
> <stefan@kaltenbrunner.cc> wrote:
>>> It would seem that a simple solution would be to add an elevel argument
>>> to ProcessGUCArray and then call it with NOTICE in the case that
>>> check_function_bodies is true.  None of the contrib modules call
>>> ProcessGUCArray, but should we worry that some external module does?
>>
>> attached is a rough patch that does exactly that, if we are worried
>> about an api change we could simple do a ProcessGUCArrayNotice() in the
>> backbranches if that approach is actually sane.
> 
> This patch has some definite coding-style issues, but those should be
> easy to fix.  The bigger thing I worry about is whether distributing
> the decision as to what elevel ought to be used here all over the code
> base is indeed sane.  Perhaps that ship has already sailed, though.

I can certainly fix the coding style thing up - but it was declared as
"rough" mostly because I'm not entirely sure the way this is going is
actually the right way to attack this...

This whole stuff seems to be a bit messy and bolted on in some ways.
There is ProcessGUCArray(), but also set_config_option() and its
external "wrapper" SetConfigOption() - the division of labour between
the caller deciding on what it wants vs what the function does with some
combination of elevel and source internally is not very consistent at best.

I also note that a lot of places are actually calling
set_config_option() directly, so maybe there is an opportunity to unify
here as well.


Stefan



Re: CREATE FUNCTION .. SET vs. pg_dump

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Sep 1, 2013 at 10:36 AM, Stefan Kaltenbrunner
> <stefan@kaltenbrunner.cc> wrote:
>> attached is a rough patch that does exactly that, if we are worried
>> about an api change we could simple do a ProcessGUCArrayNotice() in the
>> backbranches if that approach is actually sane.

> This patch has some definite coding-style issues, but those should be
> easy to fix.  The bigger thing I worry about is whether distributing
> the decision as to what elevel ought to be used here all over the code
> base is indeed sane.  Perhaps that ship has already sailed, though.

I don't particularly care for this approach, for that reason and others.
One such issue is that it would result in duplicate messages, because
functioncmds.c will already have issued warnings thanks to the GUCArrayAdd
calls it makes.  (Those calls are made with context PGC_S_TEST, which is
why they only throw warnings not errors, cf validate_option_array_item.)

But here's the big-picture consideration: the reason that we call
ProcessGUCArray in this place in ProcedureCreate is that we are trying to
set up the correct GUC environment for the function validator.  As an
example, if the SET clause for a SQL-language function includes a setting
for search_path, we had better adopt that search path while checking the
function body, or we will come to incorrect conclusions.

If we use this patch, we'd be saying that we're okay with failing to
apply the SET clause before calling the validator, any time the requested
GUC value happens to be invalid.  That's not terribly comfortable: in
general we'd be assuming that validators' results don't depend on the GUC
environment, which is clearly false.  Now we can probably get away with
that in the context of check_function_bodies = false, because the
validator shouldn't be doing much of anything then (although a few of them
check some things anyway...).  But if we're going to assume that the
validator doesn't depend on GUC environment when check_function_bodies =
false, why apply the SET clause at all?

So I think a more reasonable fix is just to skip the ProcessGUCArray
call altogether when check_function_bodies = false, and to document
that validators mustn't assume anything about the GUC environment
when check_function_bodies = false.

If no objections, I'll go fix it that way.

BTW, I notice the various comments about PGC_S_TEST context are out of
date, because they don't mention that it is used for CREATE FUNCTION SET
cases ...
        regards, tom lane



Re: CREATE FUNCTION .. SET vs. pg_dump

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> So I think a more reasonable fix is just to skip the ProcessGUCArray
> call altogether when check_function_bodies = false, and to document
> that validators mustn't assume anything about the GUC environment
> when check_function_bodies = false.
>
> If no objections, I'll go fix it that way.

Works for me.
Thanks!
    Stephen