Thread: Controlling changes in plpgsql variable resolution

Controlling changes in plpgsql variable resolution

From
Tom Lane
Date:
As most of you will recall, plpgsql currently acts as though identifiers
in SQL queries should be resolved first as plpgsql variable names, and
only failing that do they get processed as names of the query.  The
plpgsql parser rewrite that I'm working on will fix that for the
obviously-silly cases where a plpgsql variable is substituted for a
table name or some other non-scalar-variable identifier.  However, what
should we do when a name could represent either a plpgsql variable
or a column of the query?  Historically we've resolved it as the
plpgsql variable, but we've sure heard a lot of complaints about that.
Oracle's PL/SQL has the precedence the other way around: resolve first
as the query column, and only failing that as a PL variable.  The Oracle
behavior is arguably less surprising because the query-provided names
belong to the nearer enclosing scope.  I believe that we ought to move
to the Oracle behavior over time, but how do we get there from here?
Changing it is almost surely going to break a lot of people's functions,
and in rather subtle ways.

I think there are basically three behaviors that we could offer:

1. Resolve ambiguous names as plpgsql (historical PG behavior)
2. Resolve ambiguous names as query column (Oracle behavior)
3. Throw error if name is ambiguous (useful for finding problems)

(Another possibility is to throw a warning but proceed anyway.  It would
be easy to do that if we proceed with the Oracle behavior, but *not*
easy if we proceed with the historical PG behavior.  The reason is that
the code invoked by transformColumnRef may have already made some
side-effects on the query tree.  We discussed the implicit-RTE behavior
yesterday, but there are other effects of a successful name lookup,
such as marking columns for privilege checking.)

What I'm wondering about at the moment is which behaviors to offer and
how to control them.  The obvious answer is "use a GUC" but that answer
scares me because of the ease with which switching between #1 and #2
would break plpgsql functions.  It's not out of the question that that
could even amount to a security problem.  I could see using a GUC to
turn the error behavior (#3) on and off, but not to switch between #1
and #2.

Another possibility is to control it on a per-function basis by adding
some special syntax to plpgsql function bodies to say which behavior
to use.  We could for instance extend the never-documented "#option"
syntax.  This is pretty ugly and would be inconvenient to use too
--- if people have to go and add "#option something" to a function,
they might as well just fix whatever name conflicts it has instead.

I'm not seeing any choice that seems likely to make everybody happy.
Any comments or ideas?
        regards, tom lane


Re: Controlling changes in plpgsql variable resolution

From
Simon Riggs
Date:
On Sun, 2009-10-18 at 13:25 -0400, Tom Lane wrote:

> As most of you will recall, plpgsql currently acts as though identifiers
> in SQL queries should be resolved first as plpgsql variable names, and
> only failing that do they get processed as names of the query.  The
> plpgsql parser rewrite that I'm working on will fix that for the
> obviously-silly cases where a plpgsql variable is substituted for a
> table name or some other non-scalar-variable identifier.  However, what
> should we do when a name could represent either a plpgsql variable
> or a column of the query?  Historically we've resolved it as the
> plpgsql variable, but we've sure heard a lot of complaints about that.
> Oracle's PL/SQL has the precedence the other way around: resolve first
> as the query column, and only failing that as a PL variable.  The Oracle
> behavior is arguably less surprising because the query-provided names
> belong to the nearer enclosing scope.  I believe that we ought to move
> to the Oracle behavior over time, but how do we get there from here?
> Changing it is almost surely going to break a lot of people's functions,
> and in rather subtle ways.
> 
> I think there are basically three behaviors that we could offer:
> 
> 1. Resolve ambiguous names as plpgsql (historical PG behavior)
> 2. Resolve ambiguous names as query column (Oracle behavior)
> 3. Throw error if name is ambiguous (useful for finding problems)
> 
> (Another possibility is to throw a warning but proceed anyway.  It would
> be easy to do that if we proceed with the Oracle behavior, but *not*
> easy if we proceed with the historical PG behavior.  The reason is that
> the code invoked by transformColumnRef may have already made some
> side-effects on the query tree.  We discussed the implicit-RTE behavior
> yesterday, but there are other effects of a successful name lookup,
> such as marking columns for privilege checking.)
> 
> What I'm wondering about at the moment is which behaviors to offer and
> how to control them.  The obvious answer is "use a GUC" but that answer
> scares me because of the ease with which switching between #1 and #2
> would break plpgsql functions.  It's not out of the question that that
> could even amount to a security problem.  I could see using a GUC to
> turn the error behavior (#3) on and off, but not to switch between #1
> and #2.
> 
> Another possibility is to control it on a per-function basis by adding
> some special syntax to plpgsql function bodies to say which behavior
> to use.  We could for instance extend the never-documented "#option"
> syntax.  This is pretty ugly and would be inconvenient to use too
> --- if people have to go and add "#option something" to a function,
> they might as well just fix whatever name conflicts it has instead.

I'd suggest two options, one for name resolution (#1 or #2) and one for
error level of ambiguity (none or ERROR).

GUCs are fine, now we have GUC settings per-function.

-- Simon Riggs           www.2ndQuadrant.com



Re: Controlling changes in plpgsql variable resolution

From
Robert Haas
Date:
On Sun, Oct 18, 2009 at 1:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> As most of you will recall, plpgsql currently acts as though identifiers
> in SQL queries should be resolved first as plpgsql variable names, and
> only failing that do they get processed as names of the query.  The
> plpgsql parser rewrite that I'm working on will fix that for the
> obviously-silly cases where a plpgsql variable is substituted for a
> table name or some other non-scalar-variable identifier.  However, what
> should we do when a name could represent either a plpgsql variable
> or a column of the query?  Historically we've resolved it as the
> plpgsql variable, but we've sure heard a lot of complaints about that.
> Oracle's PL/SQL has the precedence the other way around: resolve first
> as the query column, and only failing that as a PL variable.  The Oracle
> behavior is arguably less surprising because the query-provided names
> belong to the nearer enclosing scope.  I believe that we ought to move
> to the Oracle behavior over time, but how do we get there from here?
> Changing it is almost surely going to break a lot of people's functions,
> and in rather subtle ways.
>
> I think there are basically three behaviors that we could offer:
>
> 1. Resolve ambiguous names as plpgsql (historical PG behavior)
> 2. Resolve ambiguous names as query column (Oracle behavior)
> 3. Throw error if name is ambiguous (useful for finding problems)
>
> (Another possibility is to throw a warning but proceed anyway.  It would
> be easy to do that if we proceed with the Oracle behavior, but *not*
> easy if we proceed with the historical PG behavior.  The reason is that
> the code invoked by transformColumnRef may have already made some
> side-effects on the query tree.  We discussed the implicit-RTE behavior
> yesterday, but there are other effects of a successful name lookup,
> such as marking columns for privilege checking.)
>
> What I'm wondering about at the moment is which behaviors to offer and
> how to control them.  The obvious answer is "use a GUC" but that answer
> scares me because of the ease with which switching between #1 and #2
> would break plpgsql functions.  It's not out of the question that that
> could even amount to a security problem.  I could see using a GUC to
> turn the error behavior (#3) on and off, but not to switch between #1
> and #2.
>
> Another possibility is to control it on a per-function basis by adding
> some special syntax to plpgsql function bodies to say which behavior
> to use.  We could for instance extend the never-documented "#option"
> syntax.  This is pretty ugly and would be inconvenient to use too
> --- if people have to go and add "#option something" to a function,
> they might as well just fix whatever name conflicts it has instead.
>
> I'm not seeing any choice that seems likely to make everybody happy.
> Any comments or ideas?

If we just change the default behavior from #1 to #2, it's going to be
insanely easy to dump a database using pg_dump for 8.4, restore into
an 8.5 database, and end up with a function that does something
different and broken.  So I'm opposed to that plan, but amenable to
any of the other options in varying degrees.

I think it would make a fair amount of sense to make #3 the default behavior.

If possible, I think we should try to engineer things so that using
pg_dump 8.5 on an 8.4 database and restoring the result into an 8.5
database produces a function with identical semantics.

...Robert


Re: Controlling changes in plpgsql variable resolution

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> If possible, I think we should try to engineer things so that using
> pg_dump 8.5 on an 8.4 database and restoring the result into an 8.5
> database produces a function with identical semantics.

Hmm ... actually, we could have pg_dump stick either a #option line
or a GUC SET parameter onto every plpgsql function it pulls from an
old database.  So if you're willing to assume that people do their
upgrades that way, it could be made reasonably safe, even if the
default behavior changes.
        regards, tom lane


Re: Controlling changes in plpgsql variable resolution

From
Robert Haas
Date:
On Sun, Oct 18, 2009 at 4:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> If possible, I think we should try to engineer things so that using
>> pg_dump 8.5 on an 8.4 database and restoring the result into an 8.5
>> database produces a function with identical semantics.
>
> Hmm ... actually, we could have pg_dump stick either a #option line
> or a GUC SET parameter onto every plpgsql function it pulls from an
> old database.  So if you're willing to assume that people do their
> upgrades that way, it could be made reasonably safe, even if the
> default behavior changes.

I'm not completely willing to assume that.  I know we recommend it,
but I'm sure that people don't always do it.  And this is pretty
subtle: someone might screw it up and get a non-working function
without realizing it.  The advantage of making the default behavior
"throw an error" is that it should be pretty obvious if you've broken
something.

And this isn't just about pg_dump, either.  I have a script that gets
run regularly on one of my production databases that goes an updates
the definitions of a whole bunch of functions to the latest version
from the source code repository.  Even if pg_dump DTRT, the next run
of a script of that type might subtly break a bunch of stuff.

The current behavior is a trap for the unwary, so I have no problem
with changing it.  But I think we should try really hard to avoid
creating a more subtle trap in the process.

...Robert


Re: Controlling changes in plpgsql variable resolution

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> I think there are basically three behaviors that we could offer:
>
> 1. Resolve ambiguous names as plpgsql (historical PG behavior)
> 2. Resolve ambiguous names as query column (Oracle behavior)
> 3. Throw error if name is ambiguous (useful for finding problems)

4. Resolve ambiguous names as query column, but throw warning

#4 would be my vote, followed by #3.  To be perfectly honest, I'd be a
whole lot happier with a pl/pgsql that let me prefix variable names with
a '$' or similar to get away from this whole nonsense.  I've been very
tempted to tell everyone I work with to start prefixing their variables
names with '_' except that it ends up looking just plain ugly.
Thanks,
    Stephen

Re: Controlling changes in plpgsql variable resolution

From
Robert Haas
Date:
On Mon, Oct 19, 2009 at 10:54 AM, Stephen Frost <sfrost@snowman.net> wrote:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> I think there are basically three behaviors that we could offer:
>>
>> 1. Resolve ambiguous names as plpgsql (historical PG behavior)
>> 2. Resolve ambiguous names as query column (Oracle behavior)
>> 3. Throw error if name is ambiguous (useful for finding problems)
>
> 4. Resolve ambiguous names as query column, but throw warning
>
> #4 would be my vote, followed by #3.  To be perfectly honest, I'd be a
> whole lot happier with a pl/pgsql that let me prefix variable names with
> a '$' or similar to get away from this whole nonsense.  I've been very
> tempted to tell everyone I work with to start prefixing their variables
> names with '_' except that it ends up looking just plain ugly.

I think warnings are too easy to miss, but I agree your other
suggestion.  I know you can write function_name.variable_name, but
that's often massively long-winded.  We either need a short, fixed
prefix, or some kind of sigil.  I previously suggested ? to parallel
ECPG, but Tom didn't like it.  I still do.  :-)

...Robert


Re: Controlling changes in plpgsql variable resolution

From
"David E. Wheeler"
Date:
On Oct 19, 2009, at 7:54 AM, Stephen Frost wrote:

> 4. Resolve ambiguous names as query column, but throw warning
>
> #4 would be my vote, followed by #3.  To be perfectly honest, I'd be a
> whole lot happier with a pl/pgsql that let me prefix variable names  
> with
> a '$' or similar to get away from this whole nonsense.  I've been very
> tempted to tell everyone I work with to start prefixing their  
> variables
> names with '_' except that it ends up looking just plain ugly.

+1, just what I was thinking.

Best,

David


Re: Controlling changes in plpgsql variable resolution

From
"David E. Wheeler"
Date:
On Oct 19, 2009, at 8:36 AM, Robert Haas wrote:

> I think warnings are too easy to miss, but I agree your other
> suggestion.  I know you can write function_name.variable_name, but
> that's often massively long-winded.  We either need a short, fixed
> prefix, or some kind of sigil.  I previously suggested ? to parallel
> ECPG, but Tom didn't like it.  I still do.  :-)

I suppose that $ would interfere with dollar quoting. What about @ or
@@ (sorry, I did mess with MSSQL back in the 90s).

Hrm…PostgreSQL is starting to have the same problem as Perl: running
out of characters because they're used for operators. :var would be
perfect, if it wasn't for psql. ?var is okay, I guess, if a bit…
questionable. Are {braces} used for anything?

Best,

David

Re: Controlling changes in plpgsql variable resolution

From
Stephen Frost
Date:
* David E. Wheeler (david@kineticode.com) wrote:
> On Oct 19, 2009, at 8:36 AM, Robert Haas wrote:
>
>> I think warnings are too easy to miss, but I agree your other
>> suggestion.  I know you can write function_name.variable_name, but
>> that's often massively long-winded.  We either need a short, fixed
>> prefix, or some kind of sigil.  I previously suggested ? to parallel
>> ECPG, but Tom didn't like it.  I still do.  :-)
>
> I suppose that $ would interfere with dollar quoting. What about @ or @@
> (sorry, I did mess with MSSQL back in the 90s).

Uh, what dollar quoting?  $_$ is what I typically use, so I wouldn't
expect a $ prefix to cause a problem.  I think it'd be more of an issue
because pl/pgsql still uses $1 and whatnot internally (doesn't it?).
Stephen

Re: Controlling changes in plpgsql variable resolution

From
"David E. Wheeler"
Date:
On Oct 19, 2009, at 9:29 AM, Stephen Frost wrote:

> Uh, what dollar quoting?  $_$ is what I typically use, so I wouldn't
> expect a $ prefix to cause a problem.  I think it'd be more of an  
> issue
> because pl/pgsql still uses $1 and whatnot internally (doesn't it?).

Yes, but that's no more an issue than it is in Perl, where the same $n  
variables are globals. The issue with dollar quoting is that you can  
put anything between the dollar signs. So if you have two $variables,  
they can get in the way. Potentially. But perhaps the lexer and/or  
Parser won't be confused by that, Tom?

I'd sure love $, as it's like shell, Perl, and other stuff.

Best,

David


Re: Controlling changes in plpgsql variable resolution

From
Tom Lane
Date:
"David E. Wheeler" <david@kineticode.com> writes:
> I'd sure love $, as it's like shell, Perl, and other stuff.

This discussion has gotten utterly off track.  The problem I am trying
to solve is a non-Oracle-compatible behavior in plpgsql.  I have got
substantially less than zero interest in proposals that "solve" the
problem by introducing notations that don't even pretend to be
compatible.
        regards, tom lane


Re: Controlling changes in plpgsql variable resolution

From
"David E. Wheeler"
Date:
On Oct 19, 2009, at 9:49 AM, Tom Lane wrote:

>> I'd sure love $, as it's like shell, Perl, and other stuff.
>
> This discussion has gotten utterly off track.  The problem I am trying
> to solve is a non-Oracle-compatible behavior in plpgsql.  I have got
> substantially less than zero interest in proposals that "solve" the
> problem by introducing notations that don't even pretend to be
> compatible.

Party pooper.

I'd be in favor of a GUC that I could turn on to throw an error when  
there's an ambiguity. As for which way it should go, I have no dog in  
that pony hunt. Or something.

Best,

David


Re: Controlling changes in plpgsql variable resolution

From
"Kevin Grittner"
Date:
"David E. Wheeler" <david@kineticode.com> wrote:
> I'd be in favor of a GUC that I could turn on to throw an error
> when there's an ambiguity.
I would consider hiding one definition with another very bad form, so
I would prefer to have plpgsql throw an error when that happens.  I
don't particularly care whether that is the only supported behavior or
whether there's a GUC to control it, or what its default is, if
present.
-Kevin


Re: Controlling changes in plpgsql variable resolution

From
Pavel Stehule
Date:
2009/10/19 Kevin Grittner <Kevin.Grittner@wicourts.gov>:
> "David E. Wheeler" <david@kineticode.com> wrote:
>
>> I'd be in favor of a GUC that I could turn on to throw an error
>> when there's an ambiguity.
>
> I would consider hiding one definition with another very bad form, so
> I would prefer to have plpgsql throw an error when that happens.  I
> don't particularly care whether that is the only supported behavior or
> whether there's a GUC to control it, or what its default is, if
> present.
>

ambiguous identifiers is probably the top reason of some plpgsql's
mysterious errors. More times I found wrong code - sometime really
important (some security checks). I never found good code with
ambiguous identifiers - so for me, exception is good. But - there will
be lot of working applications that contains this hidden bug - and
works "well". So it could be a problem. GUC should be a solution.

Pavel


> -Kevin
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: Controlling changes in plpgsql variable resolution

From
Merlin Moncure
Date:
On Mon, Oct 19, 2009 at 12:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "David E. Wheeler" <david@kineticode.com> writes:
>> I'd sure love $, as it's like shell, Perl, and other stuff.
>
> This discussion has gotten utterly off track.  The problem I am trying
> to solve is a non-Oracle-compatible behavior in plpgsql.  I have got
> substantially less than zero interest in proposals that "solve" the
> problem by introducing notations that don't even pretend to be
> compatible.

Personally, I'd vote against a GUC option. I just plain don't like the
idea that a function could do different things depending on server
configuration.   TBH, I'm not very happy with #option either.   That
said, I agree that Oracle method is far better.

Maybe invent a new language handler?  plpgsql2 or shorten to pgsql?
Now you can mess around all you want (and maybe fix some other
compatibility warts at the same time).

merlin


Re: Controlling changes in plpgsql variable resolution

From
Robert Haas
Date:
On Mon, Oct 19, 2009 at 12:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "David E. Wheeler" <david@kineticode.com> writes:
>> I'd sure love $, as it's like shell, Perl, and other stuff.
>
> This discussion has gotten utterly off track.  The problem I am trying
> to solve is a non-Oracle-compatible behavior in plpgsql.  I have got
> substantially less than zero interest in proposals that "solve" the
> problem by introducing notations that don't even pretend to be
> compatible.

OK.  In that case, it seems like we should offer options #2 and #3
with a GUC or #option to switch between them.  Nobody has made an
argument in favor of keeping #1 around.  I'm still strongly of the
opinion that #3 (error) should be the default behavior to avoid silent
failures.

...Robert


Re: Controlling changes in plpgsql variable resolution

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> ambiguous identifiers is probably the top reason of some plpgsql's
> mysterious errors. More times I found wrong code - sometime really
> important (some security checks). I never found good code with
> ambiguous identifiers - so for me, exception is good. But - there will
> be lot of working applications that contains this hidden bug - and
> works "well". So it could be a problem. GUC should be a solution.

So the conclusions so far are:

(a) Nobody but me is afraid of the consequences of treating this as
a GUC.  (I still think you're all wrong, but so be it.)

(b) Everybody agrees that a "throw error" setting would be helpful.

I am not sure there's any consensus on what the default setting should
be, though.  Can we get away with making the default be "throw error"?
What are the probabilities that the OpenACSes of the world will just
set the value to "backward compatible" instead of touching their code?
Do we need/want a hack in pg_dump to attach a SET to functions dumped
from old DBs?
        regards, tom lane


Re: Controlling changes in plpgsql variable resolution

From
Robert Haas
Date:
On Mon, Oct 19, 2009 at 1:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> ambiguous identifiers is probably the top reason of some plpgsql's
>> mysterious errors. More times I found wrong code - sometime really
>> important (some security checks). I never found good code with
>> ambiguous identifiers - so for me, exception is good. But - there will
>> be lot of working applications that contains this hidden bug - and
>> works "well". So it could be a problem. GUC should be a solution.
>
> So the conclusions so far are:
>
> (a) Nobody but me is afraid of the consequences of treating this as
> a GUC.  (I still think you're all wrong, but so be it.)

I'm afraid of it, I'm just not sure I have a better idea.  It wouldn't
bother me a bit if we made the only available behavior "throw an
error", but I'm afraid it will bother someone else.

Is there a chance we could make this a GUC, but only allow it to be
changed at the function level, with no way to override the server
default?  It seems to me that the chances of blowing up the world
would be a lot lower that way, though possibly still not low enough.

> (b) Everybody agrees that a "throw error" setting would be helpful.
>
> I am not sure there's any consensus on what the default setting should
> be, though.  Can we get away with making the default be "throw error"?
> What are the probabilities that the OpenACSes of the world will just
> set the value to "backward compatible" instead of touching their code?
> Do we need/want a hack in pg_dump to attach a SET to functions dumped
> from old DBs?

I've already commented on most of these (recap: yes, very high, yes)
so I'll refrain from beating a dead horse.

...Robert


Re: Controlling changes in plpgsql variable resolution

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Oct 19, 2009 at 1:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> (a) Nobody but me is afraid of the consequences of treating this as
>> a GUC. �(I still think you're all wrong, but so be it.)

> I'm afraid of it, I'm just not sure I have a better idea.  It wouldn't
> bother me a bit if we made the only available behavior "throw an
> error", but I'm afraid it will bother someone else.

> Is there a chance we could make this a GUC, but only allow it to be
> changed at the function level, with no way to override the server
> default?  It seems to me that the chances of blowing up the world
> would be a lot lower that way, though possibly still not low enough.

I don't particularly care to invent a new GUC class just for this,
but if we think the issue is important enough, we could

(a) make the GUC superuser-only

(b) invent a #option or similar syntax to override the GUC per-function.
        regards, tom lane


Re: Controlling changes in plpgsql variable resolution

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> (a) Nobody but me is afraid of the consequences of treating this as
> a GUC.
Well, it seems dangerous to me, but I'm confident we can cover this
within our shop, so I'm reluctant to take a position on it.  I guess
the main question is whether we want to allow an Oracle-compatibility
mode, knowing it's a foot-gun.  Without it we'd likely make extra work
for someone converting from Oracle to PostgreSQL, although they would
be likely to fix bugs during the cleanup work.  Based on previous
decisions I've seen here, I would have expected people to just go with
an error, period; especially since it would simplify the code.
> (b) Everybody agrees that a "throw error" setting would be helpful.
That's the only setting I would use on any of our databases, if it
were a GUC.
-Kevin


Re: Controlling changes in plpgsql variable resolution

From
Andrew Dunstan
Date:

Tom Lane wrote:
> (a) Nobody but me is afraid of the consequences of treating this as
> a GUC.  (I still think you're all wrong, but so be it.)
>   


I can't say I'm happy about it. For one thing, the granularity seems all 
wrong.  I'd rather be able to keep backwards compatibility on a function 
by function basis. Or would the value of the GUC at the time the 
function was created stick?

> What are the probabilities that the OpenACSes of the world will just
> set the value to "backward compatible" instead of touching their code?
>   

Quite high, I should say.


cheers

andrew


Re: Controlling changes in plpgsql variable resolution

From
Tom Lane
Date:
Merlin Moncure <mmoncure@gmail.com> writes:
> Maybe invent a new language handler?  plpgsql2 or shorten to pgsql?
> Now you can mess around all you want (and maybe fix some other
> compatibility warts at the same time).

Well, pl/psm is out there, and might even make it into core someday.
I don't find a lot of attraction in inventing a new language type that's
only marginally different from plpgsql --- that approach doesn't scale
up to handling multiple compatibility issues, at least not unless you
fix them all at the same time.
        regards, tom lane


Re: Controlling changes in plpgsql variable resolution

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> (a) Nobody but me is afraid of the consequences of treating this as
>> a GUC.  (I still think you're all wrong, but so be it.)

> I can't say I'm happy about it. For one thing, the granularity seems all 
> wrong.  I'd rather be able to keep backwards compatibility on a function 
> by function basis. Or would the value of the GUC at the time the 
> function was created stick?

Again, I can't see making a GUC that works fundamentally differently
from the rest of them.

Given this round of feedback, I make the following proposal:

1. Invent a GUC that has the settings backwards-compatible,
oracle-compatible, throw-error (exact spellings TBD).  Factory default,
at least for a few releases, will be throw-error.  Make it SUSET so that
unprivileged users can't break things by twiddling it; but it's still
possible for the DBA to set it per-database or per-user.

2. Also invent a #option syntax that allows the GUC to be overridden
per-function.  (Since the main GUC is SUSET, we can't just use a
per-function SET to override it.  There are other ways we could do this
but none seem less ugly than #option...)

Given that the global default will be throw-error, I don't feel a need
to kluge up pg_dump to insert #option in old function definitions;
that's ugly and there are too many cases it would not cover.  But that
could be added to this proposal if folks feel strongly enough.
        regards, tom lane


Re: Controlling changes in plpgsql variable resolution

From
"David E. Wheeler"
Date:
On Oct 19, 2009, at 11:47 AM, Tom Lane wrote:

> 1. Invent a GUC that has the settings backwards-compatible,
> oracle-compatible, throw-error (exact spellings TBD).  Factory  
> default,
> at least for a few releases, will be throw-error.  Make it SUSET so  
> that
> unprivileged users can't break things by twiddling it; but it's still
> possible for the DBA to set it per-database or per-user.
>
> 2. Also invent a #option syntax that allows the GUC to be overridden
> per-function.  (Since the main GUC is SUSET, we can't just use a
> per-function SET to override it.  There are other ways we could do  
> this
> but none seem less ugly than #option...)

What about adopting the modifier syntax you're adding to COPY?

David


Re: Controlling changes in plpgsql variable resolution

From
Tom Lane
Date:
"David E. Wheeler" <david@kineticode.com> writes:
> On Oct 19, 2009, at 11:47 AM, Tom Lane wrote:
>> 2. Also invent a #option syntax that allows the GUC to be overridden
>> per-function.  (Since the main GUC is SUSET, we can't just use a
>> per-function SET to override it.  There are other ways we could do  
>> this but none seem less ugly than #option...)

> What about adopting the modifier syntax you're adding to COPY?

Where exactly would you put the modifier, and why is that better than
the existing #option convention?
        regards, tom lane


Re: Controlling changes in plpgsql variable resolution

From
"Eric B. Ridge"
Date:
On Oct 19, 2009, at 2:47 PM, Tom Lane wrote:

> 1. Invent a GUC that has the settings backwards-compatible,
> oracle-compatible, throw-error (exact spellings TBD).  Factory  
> default,
> at least for a few releases, will be throw-error.

Sorry if this is obvious to everyone else, but *when* will the error  
throw?  During CREATE FUNCTION or during runtime?  I'm secretly hoping  
that it'll throw during CREATE FUNCTION.  I'd rather have my entire  
schema creation transaction abort so I can fix the problems up-front,  
rather than at "random" while the application is running.

eric





Re: Controlling changes in plpgsql variable resolution

From
"David E. Wheeler"
Date:
On Oct 19, 2009, at 12:05 PM, Tom Lane wrote:

>> What about adopting the modifier syntax you're adding to COPY?
>
> Where exactly would you put the modifier, and why is that better than
> the existing #option convention?

CREATE OR REPLACE FUNCTION foo()
RETURNS BOOLEAN
LANGUAGE plpgsql WITH opt1, opt2
AS $$...$$;

That is, the specification of options is made outside of the language  
in question. It might only effect a particular language (plpgsql in  
this case) and be ignored otherwise (or trigger an exception), but  
it's clean and very much like what you have elsewhere.

Best,

David


Re: Controlling changes in plpgsql variable resolution

From
Tom Lane
Date:
I wrote:
> Where exactly would you put the modifier, and why is that better than
> the existing #option convention?

BTW, it occurs to me that since that's undocumented, not everyone might
know what I'm talking about.  There's some code in plpgsql that allows
you to write#option dump
at the very beginning of a plpgsql function body, and get a dump of the
plpgsql syntax tree.  Since this was never intended for anything except
low-level debugging, it never got documented --- but the obvious
intention was that other sorts of options might come along later.
Now we have a case where a per-function option seems like just the
ticket.
        regards, tom lane


Re: Controlling changes in plpgsql variable resolution

From
Tom Lane
Date:
"David E. Wheeler" <david@kineticode.com> writes:
> On Oct 19, 2009, at 12:05 PM, Tom Lane wrote:
>> Where exactly would you put the modifier, and why is that better than
>> the existing #option convention?

> CREATE OR REPLACE FUNCTION foo()
> RETURNS BOOLEAN
> LANGUAGE plpgsql WITH opt1, opt2
> AS $$...$$;

> That is, the specification of options is made outside of the language  
> in question.

I don't think I particularly care for this.  It's inventing a global
mechanism to cover a problem that we currently have one instance of
in one PL.  That's a mighty thin justification.  Also, I tend to think
that you should have several instances of a problem before you venture
to design a global solution --- else your one-size-fits-all solution
might turn out to be a lot less general than you thought.
        regards, tom lane


Re: Controlling changes in plpgsql variable resolution

From
Tom Lane
Date:
"Eric B. Ridge" <ebr@tcdi.com> writes:
> On Oct 19, 2009, at 2:47 PM, Tom Lane wrote:
>> 1. Invent a GUC that has the settings backwards-compatible,
>> oracle-compatible, throw-error (exact spellings TBD).  Factory  
>> default,
>> at least for a few releases, will be throw-error.

> Sorry if this is obvious to everyone else, but *when* will the error  
> throw?

Whenever we do semantic analysis of the particular query or expression.

> During CREATE FUNCTION or during runtime?  I'm secretly hoping  
> that it'll throw during CREATE FUNCTION.

Be careful what you ask for, you might get it ;-)

The problem with doing more than minimal syntax checking during CREATE
FUNCTION, or even at the start of function execution, is that people
are far too accustomed to being able to do things like
CREATE TEMP TABLE foo ( ... );INSERT INTO foo ... ;

and of course the second command will fail outright if foo doesn't exist
--- or even if we made that not fail, how will we do any meaningful
semantic checking of later SELECTs against foo?  Another example is a
fairly common pattern in trigger functions:
if tg_op = 'insert' then    ... do something with new.* ...else if tg_op = 'delete' then    ... do something with old.*
......etc ...
 

where semantic checking on the non-executed parts of the function would
certainly throw error.

I would love to offer an option that "fully" checks plpgsql functions
but I think it would break so much code that no one could really use it.

In any case this is pretty much unrelated to the current patch...
        regards, tom lane


Re: Controlling changes in plpgsql variable resolution

From
"David E. Wheeler"
Date:
On Oct 19, 2009, at 12:23 PM, Tom Lane wrote:

>> That is, the specification of options is made outside of the language
>> in question.
>
> I don't think I particularly care for this.  It's inventing a global
> mechanism to cover a problem that we currently have one instance of
> in one PL.  That's a mighty thin justification.  Also, I tend to think
> that you should have several instances of a problem before you venture
> to design a global solution --- else your one-size-fits-all solution
> might turn out to be a lot less general than you thought.

Sure, just an idea to keep in mind for when you do have a second and a
third option to add…

Best,

David

Re: Controlling changes in plpgsql variable resolution

From
"Eric B. Ridge"
Date:
On Oct 19, 2009, at 3:46 PM, Tom Lane wrote:

>> Sorry if this is obvious to everyone else, but *when* will the error
>> throw?
>
> Whenever we do semantic analysis of the particular query or  
> expression.

That's what I figured.

>> During CREATE FUNCTION or during runtime?  I'm secretly hoping
>> that it'll throw during CREATE FUNCTION.
>
> Be careful what you ask for, you might get it ;-)

<snip really good reasons>

Yeah, and we've got at least one function that does the CREATE TEMP  
TABLE foo (...) pattern.  So I understand.

We want to our schema to keep pace with whatever the default settings  
are for stuff like this, so it'd be great if we could find and resolve  
the issues sooner rather than later.  We implemented better coding  
practices later on in the project to help us disambiguate between  
variables and columns, but there's still a bunch of legacy stuff  
that's going to be broken.

eric


Tom Lane wrote:
> What are the probabilities that the OpenACSes of the world will just
> set the value to "backward compatible" instead of touching their code?

Would postgres get considerably cleaner if a hypothetical 9.0 release
skipped backward compatibility and removed anything that's only
maintained for historical reasons?

I notice the docs are filled with passages like the quotes
below - which suggest that there's a fair amount of stuff
that might be done differently if it weren't for backward
compatibility.




"For historical reasons (i.e., this is clearly wrong but it's fartoo late to change it), subscripting of fixed-length
arraytypesstarts from zero, rather than from one as for variable-length arrays. "
 

"Most of the alternative names listed in the "Aliases" column arethe names used internally by PostgreSQL for historical
reasons.Inaddition, some internally used or deprecated types are available,but are not listed here. "
 

"Note:  The name "oid2name" is historical, and is actuallyrather misleading"

"Note:  Native Kerberos authentication has been deprecatedand should be used only for backward compatibility."

"Old-style functions are now deprecated because of portabilityproblems and lack of functionality, but they are
stillsupportedfor compatibility reasons. "
 

"Although they still work, they are deprecated due to poorerror handling, inconvenient methods of detecting
end-of-data,andlack of support for binary or nonblocking transfers."
 

"The PostgreSQL usage of SELECT INTO to represent tablecreation is historical. It is best to use CREATE TABLE ASfor
thispurpose in new code. "
 

"regular expression metasyntax ...option...m: historical synonym for n"

"Such comments are more a historical artifact than a useful facility,and their use is deprecated; use the expanded
syntaxinstead."
 

"The CAST syntax conforms to SQL; the syntax with :: is historicalPostgreSQL  usage."

"timeofday() is a historical PostgreSQL function."

"(This does not match non-slice behavior and is done forhistorical reasons.)"

"The SQL standard requires the use of the ISO 8601 format. The name ofthe "SQL" output format is a historical
accident."

"attribute ... The historical way to specify optional pieces ofinformation about the function. "
Caution

"Caution: If the configuration parameter standard_conforming_stringsis off, then PostgreSQL recognizes backslash
escapesin both regularand escape string constants. This is for backward compatibilitywith the historical"
 

"historical alias for stddev_samp ... historical alias for var_samp"

"For historical reasons, this variable contains two independent components"

"For historical reasons, the same function doesn't just return aboolean result; instead it has to store the flag at the
locationindicatedby the third argument. "
 

"For historical reasons, there are two levels of notice handling,"

"Note that subscripting is 1-based, whereas for historicalreasons proargtypes is subscripted from 0 "

"The term attribute is equivalent to column and is usedfor historical reasons. "

"For historical reasons, ALTER TABLE can be used withsequences too; but the only variants of ALTER TABLEthat are
allowedwith sequences are...."
 

"While this still works, it is deprecated and the special
meaning of \. can be expected to be removed in a future release."

"Use of this parameter is deprecated as of PostgreSQL 8.3;"



Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
> Would postgres get considerably cleaner if a hypothetical 9.0 release
> skipped backward compatibility and removed anything that's only
> maintained for historical reasons?

Yeah, and our user community would get a lot smaller too :-(

Actually, I think any attempt to do that would result in a fork,
and a consequent splintering of the community.  We can get away
with occasionally cleaning up individual problematic behaviors
(example: implicit casts to text), but any sort of all-at-once
breakage would result in a lot of people Just Saying No.
        regards, tom lane


* Tom Lane <tgl@sss.pgh.pa.us> [091019 18:45]:
> Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
> > Would postgres get considerably cleaner if a hypothetical 9.0 release
> > skipped backward compatibility and removed anything that's only
> > maintained for historical reasons?
> 
> Yeah, and our user community would get a lot smaller too :-(
> 
> Actually, I think any attempt to do that would result in a fork,
> and a consequent splintering of the community.  We can get away
> with occasionally cleaning up individual problematic behaviors
> (example: implicit casts to text), but any sort of all-at-once
> breakage would result in a lot of people Just Saying No.

I don't know... What if this hypothetical "baggage-free" version came
with configurable syncrhonous master-slave replication, writable CTEs,
and everything ;-)

Couple it with a libpq/protocol increase that allows fixing of the
various warts of the current connection (like encoding, etc), and you
still have a *very* attractive platform...

And then just do the rename official to Postgres, and people can support
both PostgreSQL, warts and all, or Postgres, the super-duper
database-to-rule-them-all...

;-)

/me crawls back into his hole

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Could postgres be much cleaner if a future release skipped backward compatibility?

From
"Marc G. Fournier"
Date:
On Mon, 19 Oct 2009, Tom Lane wrote:

> Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
>> Would postgres get considerably cleaner if a hypothetical 9.0 release
>> skipped backward compatibility and removed anything that's only
>> maintained for historical reasons?
>
> Yeah, and our user community would get a lot smaller too :-(
>
> Actually, I think any attempt to do that would result in a fork,
> and a consequent splintering of the community.  We can get away
> with occasionally cleaning up individual problematic behaviors
> (example: implicit casts to text), but any sort of all-at-once
> breakage would result in a lot of people Just Saying No.

Just curious, but with that thought in mind, are we doing any code 
cleanups as far as EOL releases?  Ie. is there any code in our tree right 
now that is for 'backward compatibility' for 7.3.x versions that could be 
cleaned out?

I realize that this might not make a huge difference, but it would be 
easier to do a 'gradual clean up', then an 'all-at-once' scenario, no?



----
Marc G. Fournier                        Hub.Org Hosting Solutions S.A.
scrappy@hub.org                                     http://www.hub.org

Yahoo:yscrappy    Skype: hub.org    ICQ:7615664    MSN:scrappy@hub.org


"Marc G. Fournier" <scrappy@hub.org> writes:
> Just curious, but with that thought in mind, are we doing any code 
> cleanups as far as EOL releases?  Ie. is there any code in our tree right 
> now that is for 'backward compatibility' for 7.3.x versions that could be 
> cleaned out?

Well, we were just trying to pull out add_missing_from, and look how far
that's gotten ...

Something that has been suggested is to rip out pg_dump's support for
pre-7.3 servers, which would probably be easier to get through because
it wouldn't create any user-visible feature loss.  It wouldn't be a
monstrous savings but it'd certainly simplify matters somewhat.
        regards, tom lane


Re: Controlling changes in plpgsql variable resolution

From
Bruce Momjian
Date:
Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
> > Tom Lane wrote:
> >> (a) Nobody but me is afraid of the consequences of treating this as
> >> a GUC.  (I still think you're all wrong, but so be it.)
> 
> > I can't say I'm happy about it. For one thing, the granularity seems all 
> > wrong.  I'd rather be able to keep backwards compatibility on a function 
> > by function basis. Or would the value of the GUC at the time the 
> > function was created stick?
> 
> Again, I can't see making a GUC that works fundamentally differently
> from the rest of them.
> 
> Given this round of feedback, I make the following proposal:
> 
> 1. Invent a GUC that has the settings backwards-compatible,
> oracle-compatible, throw-error (exact spellings TBD).  Factory default,
> at least for a few releases, will be throw-error.  Make it SUSET so that
> unprivileged users can't break things by twiddling it; but it's still
> possible for the DBA to set it per-database or per-user.
> 
> 2. Also invent a #option syntax that allows the GUC to be overridden
> per-function.  (Since the main GUC is SUSET, we can't just use a
> per-function SET to override it.  There are other ways we could do this
> but none seem less ugly than #option...)

I don't see the logic to making the setting SUSET.  The user wrote the
function;  what logic is there to say the resolution rules are not under
their control?

Also, I think to GUC that throws an error or not is a lot safer than one
that changes resolution semantics.  Changing resolution semantics sounds
like the autocommit GUC to me.  :-O

Also, I am not really keen on the "keep it for a few releases" --- we
often don't come back to finally change it, so maybe just error/no error
and using Oracle semantics is the way to go, with 'error' as the
default.  Our change in casting for 8.3 seemed much more major than
this.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


On Mon, 2009-10-19 at 14:08 -0700, Ron Mayer wrote:
> Tom Lane wrote:
> > What are the probabilities that the OpenACSes of the world will just
> > set the value to "backward compatible" instead of touching their code?
> 
> Would postgres get considerably cleaner if a hypothetical 9.0 release
> skipped backward compatibility and removed anything that's only
> maintained for historical reasons?

Probably not.  Most of the examples you cite of documented deprecated or
historical behavior would be one-line changes to get rid of.




Re: Controlling changes in plpgsql variable resolution

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> > 1. Invent a GUC that has the settings backwards-compatible,
> > oracle-compatible, throw-error (exact spellings TBD).  Factory default,
> > at least for a few releases, will be throw-error.  Make it SUSET so that
> > unprivileged users can't break things by twiddling it; but it's still
> > possible for the DBA to set it per-database or per-user.
> > 
> > 2. Also invent a #option syntax that allows the GUC to be overridden
> > per-function.  (Since the main GUC is SUSET, we can't just use a
> > per-function SET to override it.  There are other ways we could do this
> > but none seem less ugly than #option...)
> 
> I don't see the logic to making the setting SUSET.  The user wrote the
> function;  what logic is there to say the resolution rules are not under
> their control?
> 
> Also, I think to GUC that throws an error or not is a lot safer than one
> that changes resolution semantics.  Changing resolution semantics sounds
> like the autocommit GUC to me.  :-O
> 
> Also, I am not really keen on the "keep it for a few releases" --- we
> often don't come back to finally change it, so maybe just error/no error
> and using Oracle semantics is the way to go, with 'error' as the
> default.  Our change in casting for 8.3 seemed much more major than
> this.

Oh, two more things.  First, with the Oracle resolution rules, I think
it is possible to change the behavior of a function by adding or
renaming a column that wasn't referenced in the function because a
new/renamed column might mask a function variable --- that is not nice.

Second, I can see the value of having the GUC be SUSET if changing the
setting could possible break the function or cause insecure behavior,
but I wasn't clear that was possible.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Could postgres be much cleaner if a future release skipped backward compatibility?

From
Heikki Linnakangas
Date:
Aidan Van Dyk wrote:
> * Tom Lane <tgl@sss.pgh.pa.us> [091019 18:45]:
>> Actually, I think any attempt to do that would result in a fork,
>> and a consequent splintering of the community.  We can get away
>> with occasionally cleaning up individual problematic behaviors
>> (example: implicit casts to text), but any sort of all-at-once
>> breakage would result in a lot of people Just Saying No.
> 
> I don't know... What if this hypothetical "baggage-free" version came
> with configurable syncrhonous master-slave replication, writable CTEs,
> and everything ;-)

I know your joking, but what would actually happen is that the people
maintaining the backwards-compatible fork would simply port over those
features as well, or do the fork after those features have went in. None
of the historical features or warts we maintain for
backwards-compatibility are in the way of those cool new features.

It makes sense to remove old warts whenever they do get in the way. Like
add_missing_from that's been discussed at the moment. Other than that,
there's very little harm in keeping them.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Could postgres be much cleaner if a future release skipped backward compatibility?

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----                              
Hash: RIPEMD160                                                 

Tom Lane replied to Ron Mayer:
>> Would postgres get considerably cleaner if a hypothetical 9.0 release
>> skipped backward compatibility and removed anything that's only      
>> maintained for historical reasons?
>
> Yeah, and our user community would get a lot smaller too :-(
>
> Actually, I think any attempt to do that would result in a fork,
> and a consequent splintering of the community.  We can get away
> with occasionally cleaning up individual problematic behaviors
> (example: implicit casts to text), but any sort of all-at-once
> breakage would result in a lot of people Just Saying No.

That particular example is a very poor one for illustrating your
point. You severely underestimate "get away with" for the implicit
cast changes in 8.3. This was a really big deal for many, many users
of Postgres, and continues to cause many problems to this day.

I'm sure the casting changes broke more applications and prevented more
people from upgrading than every thing on Ron's list for a clean 9.0 would.
Not that I'm necessarily promoting his idea, but 8.3 was already a
"all-at-once breakage" release.

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation
PGP Key: 0x14964AC8 200910201005
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkrdxBEACgkQvJuQZxSWSsgDigCfcEXFz/+4GvcNstAEYh05rkYR
RJcAoICN46WCy1jLI9umMfGn5j9taqEt
=9Iq7
-----END PGP SIGNATURE-----




On Tue, Oct 20, 2009 at 10:07 AM, Greg Sabino Mullane <greg@turnstep.com> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: RIPEMD160
>
> Tom Lane replied to Ron Mayer:
>>> Would postgres get considerably cleaner if a hypothetical 9.0 release
>>> skipped backward compatibility and removed anything that's only
>>> maintained for historical reasons?
>>
>> Yeah, and our user community would get a lot smaller too :-(
>>
>> Actually, I think any attempt to do that would result in a fork,
>> and a consequent splintering of the community.  We can get away
>> with occasionally cleaning up individual problematic behaviors
>> (example: implicit casts to text), but any sort of all-at-once
>> breakage would result in a lot of people Just Saying No.
>
> That particular example is a very poor one for illustrating your
> point. You severely underestimate "get away with" for the implicit
> cast changes in 8.3. This was a really big deal for many, many users
> of Postgres, and continues to cause many problems to this day.
>
> I'm sure the casting changes broke more applications and prevented more
> people from upgrading than every thing on Ron's list for a clean 9.0 would.
> Not that I'm necessarily promoting his idea, but 8.3 was already a
> "all-at-once breakage" release.

This is a fair point.  I think the real issue, though, is that answer
to Ron's original question is "No".  When backward compatibility gets
in the way of cool new features, that's worth considering.  But
removing backward compatibility just for the sake of removing backward
compatibility doesn't really buy us anything.  It's basically doing
extra work for no benefit and some possible harm.

...Robert


"Greg Sabino Mullane" <greg@turnstep.com> writes:
> I'm sure the casting changes broke more applications and prevented more
> people from upgrading than every thing on Ron's list for a clean 9.0 would.
> Not that I'm necessarily promoting his idea, but 8.3 was already a
> "all-at-once breakage" release.

I'm having to support 8.2 because that's the most recent upgrade we can
afford on some project here, for this very problem. I don't think
removing support for add_missing_from would stop us from upgrading to
8.5 (or call it 9.0) from 8.3+. Or more to the point, that the cost to
migrate from pre-8.3 to any 8.[345] releases would change much.

As far as breaking historical compatibility where it matters for a next
release, I think it boils down to older releases mainteance. The way I
see it, if 8.5 or 9.0 breaks a lot of warts but 8.0 (or 8.1) to 8.4 are
still maintained the way they are now, this will be just an usual
PostgreSQL major upgrade headache (and revenue source for contractants).

It seems such a little game changer that I won't care voting for or
against it. Please continue considering code maintenance as a higher
priority target than upgrade easyness, that seems to open the road for
more stability and features in the long run.

Regards,
-- 
dim


Re: Controlling changes in plpgsql variable resolution

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Tom Lane wrote:
>> 1. Invent a GUC that has the settings backwards-compatible,
>> oracle-compatible, throw-error (exact spellings TBD).  Factory default,
>> at least for a few releases, will be throw-error.  Make it SUSET so that
>> unprivileged users can't break things by twiddling it; but it's still
>> possible for the DBA to set it per-database or per-user.

> I don't see the logic to making the setting SUSET.  The user wrote the
> function;  what logic is there to say the resolution rules are not under
> their control?

That's only sane if you are 100% certain that there could not be a
security issue arising from the change of behavior.  Otherwise someone
could for instance subvert a security-definer function by running it
under the setting it wasn't written for.  Personally I am not 100%
certain of that.

> Also, I think to GUC that throws an error or not is a lot safer than one
> that changes resolution semantics.  Changing resolution semantics sounds
> like the autocommit GUC to me.  :-O

Yeah, that's another reason to not allow it to be changed too easily.

> Also, I am not really keen on the "keep it for a few releases"

Well, I'm not necessarily saying we would ever change it.  Maybe the
default could always stay at "error".

> ... maybe just error/no error
> and using Oracle semantics is the way to go, with 'error' as the
> default.

I'd personally be entirely happy with that, but people with large
plpgsql code bases will not be.  They're going to want a
backward-compatible setting so that this doesn't become a show stopper
for migration to 8.5.  Any time you can allow someone to deal with a
migration issue later instead of right away, it becomes easier for them
to migrate.
        regards, tom lane



Robert Haas wrote:
>  I think the real issue, though, is that answer
> to Ron's original question is "No".  When backward compatibility gets
> in the way of cool new features, that's worth considering.  But
> removing backward compatibility just for the sake of removing backward
> compatibility doesn't really buy us anything.  It's basically doing
> extra work for no benefit and some possible harm.
>
>
>   

Well said.

I am singularly unimpressed by arguments for removing backwards 
compatibility features to satisfy someone's passion for neatness, or to 
force people to conform to how they think their software should be 
managed. I occasionally shake my head in amazement at the willingness of 
some people to throw other users under the bus.

Upgrading a database installation is hard enough without us gratuitously 
making it harder, and we positively don't want to make people stay on 
older releases if they don't have to, I should have thought.

cheers

andrew


"Greg Sabino Mullane" <greg@turnstep.com> writes:
> That particular example is a very poor one for illustrating your
> point. You severely underestimate "get away with" for the implicit
> cast changes in 8.3. This was a really big deal for many, many users
> of Postgres, and continues to cause many problems to this day.

Actually, that was *exactly* my point.  What's being blithely discussed
here is to make tens of changes that would be equally as painful as that
one, all at once.  You really think that would fly?
        regards, tom lane


On Tue, Oct 20, 2009 at 10:46:10AM -0400, Andrew Dunstan wrote:
> Robert Haas wrote:
>> I think the real issue, though, is that answer to Ron's original
>> question is "No".  When backward compatibility gets in the way of
>> cool new features, that's worth considering.  But removing backward
>> compatibility just for the sake of removing backward compatibility
>> doesn't really buy us anything.  It's basically doing extra work
>> for no benefit and some possible harm.
>
> Well said.
>
> I am singularly unimpressed by arguments for removing backwards
> compatibility features to satisfy someone's passion for neatness, or
> to  force people to conform to how they think their software should
> be  managed. I occasionally shake my head in amazement at the
> willingness of  some people to throw other users under the bus.

This isn't about a "passion for neatness."  It's about recognizing
that some experiments have failed and weeding out the failures.  The
RULE system, for example, was a ground-breaking innovation in the
sense of being a truly new idea.  Evidence over the decades since has
shown that it was a *bad* idea, and I like to think we're going with
an evidence-based approach.  Things like add_missing_from and
regex_flavor, to name two examples, are just bletcherous hacks
invented to solve no-longer-extant problems.

The burden of keeping things like this in the code base is large and
increasing.  There's even a name for it: technical debt.

http://en.wikipedia.org/wiki/Technical_debt

If we're to remain a successful project, the vast majority of our
community is that of future users, not present or past, so worshipping
backward compatibility is rejecting the needs of the many in favor of
the few.  Let's at least think a bit before we make a decision to do
such a thing.

> Upgrading a database installation is hard enough without us
> gratuitously  making it harder, and we positively don't want to make
> people stay on  older releases if they don't have to, I should have
> thought.

Upgrading a database installation should be made easier in a lot of
different ways, not least that a postmaster should be able to point to
an older database instance, possibly with some kind of prep procedure,
and do the right thing.  What we have now (Slony, pg_upgrade) is a bit
better than, "fire up pg_dump and take the down time," but it needs a
lot more love.  This is for the benefit of our entire community, not
just the present one. :)

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Controlling changes in plpgsql variable resolution

From
Merlin Moncure
Date:
On Tue, Oct 20, 2009 at 10:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Bruce Momjian <bruce@momjian.us> writes:
>> Tom Lane wrote:
>>> 1. Invent a GUC that has the settings backwards-compatible,
>>> oracle-compatible, throw-error (exact spellings TBD).  Factory default,
>>> at least for a few releases, will be throw-error.  Make it SUSET so that
>>> unprivileged users can't break things by twiddling it; but it's still
>>> possible for the DBA to set it per-database or per-user.
>
>> I don't see the logic to making the setting SUSET.  The user wrote the
>> function;  what logic is there to say the resolution rules are not under
>> their control?
>
> That's only sane if you are 100% certain that there could not be a
> security issue arising from the change of behavior.  Otherwise someone
> could for instance subvert a security-definer function by running it
> under the setting it wasn't written for.  Personally I am not 100%
> certain of that.
>
>> Also, I think to GUC that throws an error or not is a lot safer than one
>> that changes resolution semantics.  Changing resolution semantics sounds
>> like the autocommit GUC to me.  :-O
>
> Yeah, that's another reason to not allow it to be changed too easily.
>
>> Also, I am not really keen on the "keep it for a few releases"
>
> Well, I'm not necessarily saying we would ever change it.  Maybe the
> default could always stay at "error".
>
>> ... maybe just error/no error
>> and using Oracle semantics is the way to go, with 'error' as the
>> default.
>
> I'd personally be entirely happy with that, but people with large
> plpgsql code bases will not be.  They're going to want a
> backward-compatible setting so that this doesn't become a show stopper
> for migration to 8.5.  Any time you can allow someone to deal with a
> migration issue later instead of right away, it becomes easier for them
> to migrate.

How about warning for release before making the big switch?  The text
cast change, while ultimately good, maybe could have been stretched
out for a release or two...it was painful.  I do though absolutely
think that it was good in the end to not support a compatibility
option in core.

Didn't we have a long discussion on big compatibility changes with the
consensus that we were to going give a transition release before we
dropped a backwards compatibility bomb?  I can't help feeling that we
are about to jump off into the abyss...

merlin


Re: Controlling changes in plpgsql variable resolution

From
Tom Lane
Date:
Merlin Moncure <mmoncure@gmail.com> writes:
> How about warning for release before making the big switch?  The text
> cast change, while ultimately good, maybe could have been stretched
> out for a release or two...it was painful.  I do though absolutely
> think that it was good in the end to not support a compatibility
> option in core.

As long as we provide a backwards-compatible setting, I don't believe
this change will be a huge deal.  The problem with the implicit-casts
business was that there was no reasonable way to provide a control
switch --- the catalog entries were either there or not :-(.  So far
as I can tell at the moment, there won't be any large technical cost
to providing a backwards-compatible option for this plpgsql change.
        regards, tom lane


On Tue, Oct 20, 2009 at 11:48 AM, David Fetter <david@fetter.org> wrote:
> On Tue, Oct 20, 2009 at 10:46:10AM -0400, Andrew Dunstan wrote:
>> Robert Haas wrote:
>>> I think the real issue, though, is that answer to Ron's original
>>> question is "No".  When backward compatibility gets in the way of
>>> cool new features, that's worth considering.  But removing backward
>>> compatibility just for the sake of removing backward compatibility
>>> doesn't really buy us anything.  It's basically doing extra work
>>> for no benefit and some possible harm.
>>
>> Well said.
>>
>> I am singularly unimpressed by arguments for removing backwards
>> compatibility features to satisfy someone's passion for neatness, or
>> to  force people to conform to how they think their software should
>> be  managed. I occasionally shake my head in amazement at the
>> willingness of  some people to throw other users under the bus.
>
> This isn't about a "passion for neatness."  It's about recognizing
> that some experiments have failed and weeding out the failures.  The
> RULE system, for example, was a ground-breaking innovation in the
> sense of being a truly new idea.  Evidence over the decades since has
> shown that it was a *bad* idea, and I like to think we're going with
> an evidence-based approach.  Things like add_missing_from and
> regex_flavor, to name two examples, are just bletcherous hacks
> invented to solve no-longer-extant problems.
>
> The burden of keeping things like this in the code base is large and
> increasing.

I don't think that's true.  That's just an assertion on your part, and
I don't think there's any evidence to back it up.

> There's even a name for it: technical debt.
>
> http://en.wikipedia.org/wiki/Technical_debt
>
> If we're to remain a successful project, the vast majority of our
> community is that of future users, not present or past, so worshipping
> backward compatibility is rejecting the needs of the many in favor of
> the few.  Let's at least think a bit before we make a decision to do
> such a thing.

I think we usually think pretty carefully about these decisions on a
case-by-case basis.  Making a blanket policy, as proposed here, would
NOT be thinking carefully.

...Robert


Re: Controlling changes in plpgsql variable resolution

From
Pavel Stehule
Date:
2009/10/20 Tom Lane <tgl@sss.pgh.pa.us>:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> How about warning for release before making the big switch?  The text
>> cast change, while ultimately good, maybe could have been stretched
>> out for a release or two...it was painful.  I do though absolutely
>> think that it was good in the end to not support a compatibility
>> option in core.
>
> As long as we provide a backwards-compatible setting, I don't believe
> this change will be a huge deal.  The problem with the implicit-casts
> business was that there was no reasonable way to provide a control
> switch --- the catalog entries were either there or not :-(.  So far
> as I can tell at the moment, there won't be any large technical cost
> to providing a backwards-compatible option for this plpgsql change.

I don't thing, so drop some implicit-casts was huge problem. Somebody
could to use Peter's patch, that recreate missing casts.

regards
Pavel
>
>                        regards, tom lane
>


David Fetter <david@fetter.org> writes:
> This isn't about a "passion for neatness."  It's about recognizing
> that some experiments have failed and weeding out the failures.  The
> RULE system, for example, was a ground-breaking innovation in the
> sense of being a truly new idea.  Evidence over the decades since has
> shown that it was a *bad* idea, and I like to think we're going with
> an evidence-based approach.  Things like add_missing_from and
> regex_flavor, to name two examples, are just bletcherous hacks
> invented to solve no-longer-extant problems.

The above examples seem to me to show that your argument is nonsense.
regex_flavor, in particular, is not about "failed experiments",
it's about backwards compatibility with a previous version that simply
worked differently.  It might be that we can have a sunset provision for
backwards compatibility, but to argue that it's not important pretty
much flies in the face of most of the discussions we've had on the
topic.

I agree that the RULE system is a failed experiment and we need to find
something better, but we're unlikely to rip that out either until the
replacement has been in use for a few releases.
        regards, tom lane


Re: Controlling changes in plpgsql variable resolution

From
Bruce Momjian
Date:
Pavel Stehule wrote:
> 2009/10/20 Tom Lane <tgl@sss.pgh.pa.us>:
> > Merlin Moncure <mmoncure@gmail.com> writes:
> >> How about warning for release before making the big switch? ?The text
> >> cast change, while ultimately good, maybe could have been stretched
> >> out for a release or two...it was painful. ?I do though absolutely
> >> think that it was good in the end to not support a compatibility
> >> option in core.
> >
> > As long as we provide a backwards-compatible setting, I don't believe
> > this change will be a huge deal. ?The problem with the implicit-casts
> > business was that there was no reasonable way to provide a control
> > switch --- the catalog entries were either there or not :-(. ?So far
> > as I can tell at the moment, there won't be any large technical cost
> > to providing a backwards-compatible option for this plpgsql change.
> 
> I don't thing, so drop some implicit-casts was huge problem. Somebody
> could to use Peter's patch, that recreate missing casts.

True, but we should have had those compatibility pathes (Peter's patch)
ready before we released, and advertised them in the release notes.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Controlling changes in plpgsql variable resolution

From
Pavel Stehule
Date:
>>
>> I don't thing, so drop some implicit-casts was huge problem. Somebody
>> could to use Peter's patch, that recreate missing casts.
>
> True, but we should have had those compatibility pathes (Peter's patch)
> ready before we released, and advertised them in the release notes.

sure

Maybe we have to have a compatibility group. I don't support 100%
compatibility - mainly with things that should be source of bugs -
like these implicit casts. I migrate two applications (not own) and I
really found bugs there. But change of every feature should be well
supported.

Pavel


>
> --
>  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>  EnterpriseDB                             http://enterprisedb.com
>
>  + If your life is a hard drive, Christ can be your backup. +
>


Re: Controlling changes in plpgsql variable resolution

From
Josh Berkus
Date:
Tom,

> 1. Invent a GUC that has the settings backwards-compatible,
> oracle-compatible, throw-error (exact spellings TBD).  Factory default,
> at least for a few releases, will be throw-error.  Make it SUSET so that
> unprivileged users can't break things by twiddling it; but it's still
> possible for the DBA to set it per-database or per-user.
> 
> 2. Also invent a #option syntax that allows the GUC to be overridden
> per-function.  (Since the main GUC is SUSET, we can't just use a
> per-function SET to override it.  There are other ways we could do this
> but none seem less ugly than #option...)

Hmmmm.  I don't see any reason why this couldn't be set by any user at
runtime, really.  From a security standpoint, it's less of a risk than
search_path, and we allow anyone to mess with that.  Then we'd have the
simple factor of setting it in postgresql.conf or setting it in the
function definitions via WITH.

--Josh Berkus


Re: Controlling changes in plpgsql variable resolution

From
Robert Haas
Date:
On Wed, Oct 21, 2009 at 1:59 PM, Josh Berkus <josh@agliodbs.com> wrote:
> Tom,
>
>> 1. Invent a GUC that has the settings backwards-compatible,
>> oracle-compatible, throw-error (exact spellings TBD).  Factory default,
>> at least for a few releases, will be throw-error.  Make it SUSET so that
>> unprivileged users can't break things by twiddling it; but it's still
>> possible for the DBA to set it per-database or per-user.
>>
>> 2. Also invent a #option syntax that allows the GUC to be overridden
>> per-function.  (Since the main GUC is SUSET, we can't just use a
>> per-function SET to override it.  There are other ways we could do this
>> but none seem less ugly than #option...)
>
> Hmmmm.  I don't see any reason why this couldn't be set by any user at
> runtime, really.  From a security standpoint, it's less of a risk than
> search_path, and we allow anyone to mess with that.

That's like saying that it's less of a risk than a group of rabid
tyrannosaurs in a kindergarten classroom.

...Robert


Re: Controlling changes in plpgsql variable resolution

From
"David E. Wheeler"
Date:
On Oct 21, 2009, at 11:37 AM, Robert Haas wrote:

> That's like saying that it's less of a risk than a group of rabid
> tyrannosaurs in a kindergarten classroom.

I'm not sure, but I kind of doubt that tyrannosaurs can get rabies. I  
mean, if they were even around anymore. Which, you know, they're not.

Best,

David


Re: Controlling changes in plpgsql variable resolution

From
Josh Berkus
Date:
Robert,

>> Hmmmm.  I don't see any reason why this couldn't be set by any user at
>> runtime, really.  From a security standpoint, it's less of a risk than
>> search_path, and we allow anyone to mess with that.
> 
> That's like saying that it's less of a risk than a group of rabid
> tyrannosaurs in a kindergarten classroom.

No, it's like saying "I don't see a point in putting a burglar alarm on
the windows when the door isn't even locked."

Making this GUC suset would make it far less useful to users trying to
gradually upgrade their infrastructures, and make it more likely that
many/most of our users would just set it to backwards-compatible in
their postgresql.conf and not fix anything.  In fact, I'd go so far as
to say that if it's suset, we might as well make it sighup because
postgresql.conf is the *only* place anyone will touch it.

In a secure database setup, none of the functions belong to the
superuser.  Yet an suset would mean that the function owner couldn't set
the parameter for their own functions, which is the most obvious place
to set it.

Tom has proposed some kind of odd special "options" syntax to get around
this, but I think that's unnecessary.  So far on this thread, I haven't
seen anyone engineer an actual function exploit by using this setting; I
personally can't come up with one off the top of my head which doesn't
require the attacker to be the table owner, the function owner, or the
superuser themselves.  Also keep in mind what we're patching here is an
unmaintanable and insecure practice anyway, which is the ambiguous use
of variable names which match column names.

So, I'm saying: make it a userset.

--Josh Berkus




Re: Controlling changes in plpgsql variable resolution

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> Making this GUC suset would make it far less useful to users trying to
> gradually upgrade their infrastructures, and make it more likely that
> many/most of our users would just set it to backwards-compatible in
> their postgresql.conf and not fix anything.  In fact, I'd go so far as
> to say that if it's suset, we might as well make it sighup because
> postgresql.conf is the *only* place anyone will touch it.

No, they might reasonably also set it via ALTER DATABASE or ALTER USER.

> In a secure database setup, none of the functions belong to the
> superuser.  Yet an suset would mean that the function owner couldn't set
> the parameter for their own functions, which is the most obvious place
> to set it.

That's what the #option alternative is for.  Yes, it's a bit ugly, but
it's perfectly functional, and secure too.

> Tom has proposed some kind of odd special "options" syntax to get around
> this, but I think that's unnecessary.  So far on this thread, I haven't
> seen anyone engineer an actual function exploit by using this setting;

It would take a coincidence of names to make anything happen, so the
details would depend a lot on the exact contents of the function you
were hoping to break.  But since you insist:
create function foo(id int) ......select * into rec from tab where tab.id = id;if found then do-something else
do-something-elseend if;
 

Under old-style semantics this will do what the programmer thought.
Under Oracle semantics it will return the first table row.  If
do-something is security critical then this is enough to call it
an exploit.  The reverse direction (code meant for Oracle behavior
breaks under old-style) is not difficult to cons up either; I think
you can find some examples in pgsql-bugs archives from people trying
to port Oracle code to PG.

Given that people are very prone to intentionally naming things as above
(for a particularly egregious example try
http://archives.postgresql.org/pgsql-bugs/2009-10/msg00054.php)
I think it's entirely foolish to imagine this isn't a real hazard.
If name collisions were improbable we'd not have so many complaints
about the current behavior in our archives, and probably wouldn't be
thinking about changing the behavior at all.

As for the analogy to search_path, I think if we were doing that over
knowing what we know today, we'd have locked it down a lot more from the
beginning.  We might yet decide to lock it down more.  It's not a good
example to cite when arguing that this setting doesn't need any
protection.
        regards, tom lane


Re: Controlling changes in plpgsql variable resolution

From
Josh Berkus
Date:
> That's what the #option alternative is for.  Yes, it's a bit ugly, but
> it's perfectly functional, and secure too.

I still don't see why it's needed.  If the function owner simply sets
the option in the function definitions (as a userset), it doesn't matter
what the calling user sets, does it?

All that's needed is a scripting example to set this in all function
definitions as a regular setting.

--Josh Berkus




Re: Controlling changes in plpgsql variable resolution

From
Josh Berkus
Date:
On 10/21/09 1:02 PM, Josh Berkus wrote:
>> That's what the #option alternative is for.  Yes, it's a bit ugly, but
>> it's perfectly functional, and secure too.
> 
> I still don't see why it's needed.  If the function owner simply sets
> the option in the function definitions (as a userset), it doesn't matter
> what the calling user sets, does it?
> 
> All that's needed is a scripting example to set this in all function
> definitions as a regular setting.

Actually, to follow this up: there would be *huge* utility in ALTER
FUNCTION name SET setting=value statement.  This would help people do
all the "lock down" things we want to do with function definitions.

There'd also be utility in a default set of guc settings for new
functions, but maybe we should put that off ...

--Josh



Re: Controlling changes in plpgsql variable resolution

From
Merlin Moncure
Date:
On Wed, Oct 21, 2009 at 3:09 PM, Josh Berkus <josh@agliodbs.com> wrote:
> Tom has proposed some kind of odd special "options" syntax to get around
> this, but I think that's unnecessary.  So far on this thread, I haven't
> seen anyone engineer an actual function exploit by using this setting; I
> personally can't come up with one off the top of my head which doesn't
> require the attacker to be the table owner, the function owner, or the
> superuser themselves.  Also keep in mind what we're patching here is an
> unmaintanable and insecure practice anyway, which is the ambiguous use
> of variable names which match column names.
>
> So, I'm saying: make it a userset.

I couldn't disagree more strongly.  .conf entries that adjust how
plpgsql funtions operate in a very central way will 'fork'  plpgsql
develoeprs so that you have one group of people using method 'a' and
another using method 'b'.  Nobody bothers to fix legacy code and now
we have a first class mess.  All code intended to run on servers you
don't control (like library code) now needs to be decorated with 'set
local...' which defeats the whole purpose.  IMO, guc settings that
control how sql behaves should be avoided at all costs.  You should be
able to look at a piece of code and explicitly determine what it does.At least with #option, knowing the server version
andthe function 
body is enough.  if you want to support multiple behaviors but don't
like #option, i think the only alternative is to version the plpgsql
language somehow and decorate 'create function' with the version.  Tom
didn't like that idea,  but it still beats GUC imo.

merlin


Re: Controlling changes in plpgsql variable resolution

From
Robert Haas
Date:
On Wed, Oct 21, 2009 at 4:28 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Wed, Oct 21, 2009 at 3:09 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> Tom has proposed some kind of odd special "options" syntax to get around
>> this, but I think that's unnecessary.  So far on this thread, I haven't
>> seen anyone engineer an actual function exploit by using this setting; I
>> personally can't come up with one off the top of my head which doesn't
>> require the attacker to be the table owner, the function owner, or the
>> superuser themselves.  Also keep in mind what we're patching here is an
>> unmaintanable and insecure practice anyway, which is the ambiguous use
>> of variable names which match column names.
>>
>> So, I'm saying: make it a userset.
>
> I couldn't disagree more strongly.  .conf entries that adjust how
> plpgsql funtions operate in a very central way will 'fork'  plpgsql
> develoeprs so that you have one group of people using method 'a' and
> another using method 'b'.  Nobody bothers to fix legacy code and now
> we have a first class mess.  All code intended to run on servers you
> don't control (like library code) now needs to be decorated with 'set
> local...' which defeats the whole purpose.  IMO, guc settings that
> control how sql behaves should be avoided at all costs.  You should be
> able to look at a piece of code and explicitly determine what it does.
>  At least with #option, knowing the server version and the function
> body is enough.  if you want to support multiple behaviors but don't
> like #option, i think the only alternative is to version the plpgsql
> language somehow and decorate 'create function' with the version.  Tom
> didn't like that idea,  but it still beats GUC imo.

I agree.  We already have one case (search_path) where you potentially
can't accurately predict the semantics of the function without knowing
something about the calling environment.  That means every
security-definer function, to be secure, has to explicitly control the
search path.  That's bad.

I actually think that we should not have a GUC for this at all.  We
should have a compiled-in default, and it should be error.  If you
want some other behavior, decorate your functions with #option.  The
only way this is not a train wreck is if the semantics are set *from
within the function*.  Having any portion of the semantics, no matter
how seemingly trivial, imported from the outside is just asking to get
whacked on the head with a stream of difficult-to-diagnose
misbehavior.  Even if we assume, with a heavy dose of unjustified
optimism, that no security vulnerabilities will result, it's not going
to be fun or pleasant.

...Robert


Re: Controlling changes in plpgsql variable resolution

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I actually think that we should not have a GUC for this at all.  We
> should have a compiled-in default, and it should be error.  If you
> want some other behavior, decorate your functions with #option.

We've agreed that the factory default should be "error", but I don't
think we can go much further than that in the direction of breaking
peoples' code.  We need to provide a backwards-compatible option,
IMHO, so that this isn't seen as a roadblock to updating to 8.5.
(You can argue all you want about whether it really is one, but it'll
be seen as one.)  And the Oracle-compatible option will be attractive
to people coming in from that side.  Reviewing megabytes of pl/sql
code for this kind of gotcha is not fun, and the "error" default would
only help a bit.

One advantage of making the GUC be SUSET is that those who want to take
a paranoid approach here have an easy answer: don't allow it to be
changed from "error".  We are allowed to assume that those who do change
it are responsible adults.
        regards, tom lane


Re: Controlling changes in plpgsql variable resolution

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
>> That's what the #option alternative is for.  Yes, it's a bit ugly, but
>> it's perfectly functional, and secure too.

> I still don't see why it's needed.  If the function owner simply sets
> the option in the function definitions (as a userset), it doesn't matter
> what the calling user sets, does it?

If we do it that way, it is safe only if *every* *single* plpgsql
function has an attached SET option for this.  Otherwise a function's
own setting will propagate to its callees.  This is error-prone and will
be pretty bad for performance too --- the per-function SET mechanism
isn't especially cheap and was never meant to be used by every last
function.
        regards, tom lane


Re: Controlling changes in plpgsql variable resolution

From
Robert Haas
Date:
On Wed, Oct 21, 2009 at 5:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I actually think that we should not have a GUC for this at all.  We
>> should have a compiled-in default, and it should be error.  If you
>> want some other behavior, decorate your functions with #option.
>
> We've agreed that the factory default should be "error", but I don't
> think we can go much further than that in the direction of breaking
> peoples' code.  We need to provide a backwards-compatible option,
> IMHO, so that this isn't seen as a roadblock to updating to 8.5.
> (You can argue all you want about whether it really is one, but it'll
> be seen as one.)

Hmm... I wouldn't see inserting a line at the beginning of every
function as a roadblock, but I don't rule out the possibility that I'm
thick-skinned.

...Robert


Re: Controlling changes in plpgsql variable resolution

From
Pavel Stehule
Date:
2009/10/21 Tom Lane <tgl@sss.pgh.pa.us>:
> Josh Berkus <josh@agliodbs.com> writes:
>> Making this GUC suset would make it far less useful to users trying to
>> gradually upgrade their infrastructures, and make it more likely that
>> many/most of our users would just set it to backwards-compatible in
>> their postgresql.conf and not fix anything.  In fact, I'd go so far as
>> to say that if it's suset, we might as well make it sighup because
>> postgresql.conf is the *only* place anyone will touch it.
>
> No, they might reasonably also set it via ALTER DATABASE or ALTER USER.
>
>> In a secure database setup, none of the functions belong to the
>> superuser.  Yet an suset would mean that the function owner couldn't set
>> the parameter for their own functions, which is the most obvious place
>> to set it.
>
> That's what the #option alternative is for.  Yes, it's a bit ugly, but
> it's perfectly functional, and secure too.
>
>> Tom has proposed some kind of odd special "options" syntax to get around
>> this, but I think that's unnecessary.  So far on this thread, I haven't
>> seen anyone engineer an actual function exploit by using this setting;
>
> It would take a coincidence of names to make anything happen, so the
> details would depend a lot on the exact contents of the function you
> were hoping to break.  But since you insist:
>
>        create function foo(id int) ...
>        ...
>        select * into rec from tab where tab.id = id;
>        if found then do-something else do-something-else end if;
>
> Under old-style semantics this will do what the programmer thought.
> Under Oracle semantics it will return the first table row.  If
> do-something is security critical then this is enough to call it
> an exploit.  The reverse direction (code meant for Oracle behavior
> breaks under old-style) is not difficult to cons up either; I think
> you can find some examples in pgsql-bugs archives from people trying
> to port Oracle code to PG.
>
> Given that people are very prone to intentionally naming things as above
> (for a particularly egregious example try
> http://archives.postgresql.org/pgsql-bugs/2009-10/msg00054.php)
> I think it's entirely foolish to imagine this isn't a real hazard.
> If name collisions were improbable we'd not have so many complaints
> about the current behavior in our archives, and probably wouldn't be
> thinking about changing the behavior at all.
>
> As for the analogy to search_path, I think if we were doing that over
> knowing what we know today, we'd have locked it down a lot more from the
> beginning.  We might yet decide to lock it down more.  It's not a good
> example to cite when arguing that this setting doesn't need any
> protection.
>
>                        regards, tom lane
>

Although I am one who like to see Oracle behave in Pg I have to say,
so Oracle's behave is a significant risk. PL/pgSQL is more dynamic
then PL/SQL and it should be very strange, when somebody add new table
and for some functions will be broken. What I know, PL/SQL programmers
knows this problem and protects self.

In light of this I see more important default to raise error. Oracle
mode is good for stable environments, for migration from Oracle, but
for others is similar risk like current plpgsql, only with different
risks.

Personally I prefer #option. This option isn't possible to change
inside run of function, and this is exactly what we need. It's local
only plpgsql problem - it isn't related to postgres, it is related
only to plpgsql.

regards
Pavel Stehule


Re: Controlling changes in plpgsql variable resolution

From
Dimitri Fontaine
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> be seen as one.)  And the Oracle-compatible option will be attractive
> to people coming in from that side.  Reviewing megabytes of pl/sql
> code for this kind of gotcha is not fun, and the "error" default would
> only help a bit.

What about having a new pl language called plsql (or mabe plosql) where
it behaves like Oracle. The handler could maybe set the environment then
call the plpgsql interpreter. Is it technically sound?

If it is, it's just another way to spell this unfriendly #option syntax
that people do not like. Yet another idea would be to keep the #option
mecanism but spell it differently, in a more plpgsql like way:
 create funcion ... language plpgsql as $$ OPTIONS  lexer_priority = 'table, variable'; DECLARE  v_foo integer; BEGIN
END;$$;
 

I know I don't like #option because it looks and feels "foreign", so t
might just boils down to syntax issue for others too.

Regards,
-- 
dim


Re: Controlling changes in plpgsql variable resolution

From
Pavel Stehule
Date:
2009/10/22 Dimitri Fontaine <dfontaine@hi-media.com>:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> be seen as one.)  And the Oracle-compatible option will be attractive
>> to people coming in from that side.  Reviewing megabytes of pl/sql
>> code for this kind of gotcha is not fun, and the "error" default would
>> only help a bit.
>
> What about having a new pl language called plsql (or mabe plosql) where
> it behaves like Oracle. The handler could maybe set the environment then
> call the plpgsql interpreter. Is it technically sound?

-1

without significant refactoring you will be far to plsql. And you
don't solve problem of plpgsql. Minimally plpgsql needs better
solution of ambiguous identifiers, and have to have some back
compatibility possibility.

I am thinking about new language based on SQL/PSM syntax - but I am
sure, so I don't use current plpgsql interpret. I thing, so there are
some possibilities for simplification - but it should to have some
incompatibilities (so I would not to back port it to plpgsql).

>
> If it is, it's just another way to spell this unfriendly #option syntax
> that people do not like. Yet another idea would be to keep the #option
> mecanism but spell it differently, in a more plpgsql like way:
>
>  create funcion ... language plpgsql
>  as $$
>  OPTIONS
>   lexer_priority = 'table, variable';
>  DECLARE
>   v_foo integer;
>  BEGIN
>  END;
>  $$;
>
> I know I don't like #option because it looks and feels "foreign", so t
> might just boils down to syntax issue for others too.
>

sorry I don't see it cleaner then just #option

CREATE FUNCTION foo()
RETURNS int AS $$
#option sqlprecedence
DECLARE ...
..

This mechanism is available now, and don't need to add some new code.

Regards
Pavel

> Regards,
> --
> dim
>


Re: Controlling changes in plpgsql variable resolution

From
Andrew Dunstan
Date:

Dimitri Fontaine wrote:
> I know I don't like #option because it looks and feels "foreign", so t
> might just boils down to syntax issue for others too.
>
>
>   

I don't see why it feels any more foreign than, say, #pragma in C.

And it's something we already have, albeit undocumented.

Let's not get too hung up on syntax.

cheers

andrew


Re: Controlling changes in plpgsql variable resolution

From
Robert Haas
Date:
On Thu, Oct 22, 2009 at 10:12 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
> Let's not get too hung up on syntax.

+1.

...Robert


Re: Controlling changes in plpgsql variable resolution

From
Dimitri Fontaine
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> I don't see why it feels any more foreign than, say, #pragma in C.
>
> And it's something we already have, albeit undocumented.
>
> Let's not get too hung up on syntax.

Ok just wanted to have this syntax part explicitely talked about, I
don't have strong opinions about it. I sure don't find it nice, but that
doesn't change the fact that it's there and cheap to use ;)

Regards,
-- 
dim


Re: Controlling changes in plpgsql variable resolution

From
Peter Eisentraut
Date:
On Tue, 2009-10-20 at 10:32 -0400, Tom Lane wrote:
> That's only sane if you are 100% certain that there could not be a
> security issue arising from the change of behavior.  Otherwise someone
> could for instance subvert a security-definer function by running it
> under the setting it wasn't written for.  Personally I am not 100%
> certain of that.

This is pretty much equivalent to the issue of setting search_path for
security definer functions.  Such settings that affect the semantics of
code should be overridden to a know safe value by such functions.

Still, making the setting SUSET will be very impractical.  The superuser
will have to ensure that all current and future functions on the
installation conform to the setting that is chosen.  This is pretty much
impossible.  The other choice is to uglify every function with an
explicit setting.  So in practice the only convenient and robust choice
is to leave it as is.

The correct solution, IMO, is that security definer functions need to
save the set of deemed-security-critical parameters automatically at
function creation time.  Since we don't do that currently, I'd guess
that 50% of security definer functions out there are still unsafe about
search_path.




Tom Lane wrote:
> Actually, I think any attempt to do that would result in a fork,
> and a consequent splintering of the community.  We can get away
>   
Perhaps the answer might be a pre-emptive simplifying fork to postgres-NG,
perhaps taking a lead from MySQL and Samba.

I'm not sure that you would necessarily lose too much: if the 
postgres-NG system
implements a functional subset (perhaps on a subset of platforms, eg 
ones with C++
and threading support) with some implementation advantages, then it 
might allow
users who are interested in the potential advantages of -NG to check 
that they are
using the subset while still remaining PostgreSQL users for serious use.

Suppose, for example, that you severely limit the ability of individual 
connections
to load extensions, and make it a dbo task - and have an architecture 
that is not
process-per-connection but process-per-db. This might allow some changes 
in the
cache and read-ahead/write-behind that use large memory on modern 
systems more
easily - perhaps in the way that the falcon store for MySQL planned to. 
The core
query planner and execution engine can remain common, even if the process
structure that hosts it and the underlying cache page management is 
quite different.

James



Re: Controlling changes in plpgsql variable resolution

From
Bruce Momjian
Date:
Tom Lane wrote:
> Under old-style semantics this will do what the programmer thought.
> Under Oracle semantics it will return the first table row.  If
> do-something is security critical then this is enough to call it
> an exploit.  The reverse direction (code meant for Oracle behavior
> breaks under old-style) is not difficult to cons up either; I think
> you can find some examples in pgsql-bugs archives from people trying
> to port Oracle code to PG.
> 
> Given that people are very prone to intentionally naming things as above
> (for a particularly egregious example try
> http://archives.postgresql.org/pgsql-bugs/2009-10/msg00054.php)
> I think it's entirely foolish to imagine this isn't a real hazard.
> If name collisions were improbable we'd not have so many complaints
> about the current behavior in our archives, and probably wouldn't be
> thinking about changing the behavior at all.

Sorry for the late reply:

Stepping back a bit, is there something we can do to reduce the chances
of variable-name collision?  If you are selecting a column called
"first_name", it is logical to put it into a variable called
"first_name", and hence the conflict if that variable is used in a
query.

I know some Oracle people use a 'v_' prefix for variables, but that
always looked ugly to me.  Is there something else we could use to
qualify variables used in queries to avoid conflicts?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +