Home > mailing lists

Thread: Function Column Expansion Causes Inserts To Fail

Function Column Expansion Causes Inserts To Fail

From

"David Johnston"

Date:

30 May 2011, 18:12:17

PostgreSQL 9.0.4

The following script fails even though the “pkonlytest” table is empty since we just created it…

>>>>>>>>>>>>>>>>>>>>>>>>>> BEGIN SCRIPT

CREATE TABLE pkonlytest (

pkid text PRIMARY KEY

);

CREATE OR REPLACE FUNCTION createpkrecord(INOUT pkvalue text, OUT col1 boolean, OUT col2 boolean)

RETURNS record

AS $$

BEGIN

INSERT INTO pkonlytest (pkid) VALUES (pkvalue);

col1 = true;

col2 = false;

END;

LANGUAGE 'plpgsql';

SELECT ( createpkrecord('1') ).*;

SQL Error: ERROR: duplicate key value violates unique constraint "pkonlytest_pkey"

DETAIL: Key (pkid)=(1) already exists.

CONTEXT: SQL statement "INSERT INTO pkonlytest (pkid) VALUES (pkvalue)"

PL/pgSQL function "createpkrecord" line 2 at SQL statement

>>>>>>>>>>>>>>>>>>END SCRIPT

If you call the function without the column expansion (and required parentheses) it work just fine.

SELECT createpkrecord(‘1’);

There is a workaround…

SELECT (func.result).* FROM (

SELECT createpkrecord('4') as result ) func

David J.

Re: Function Column Expansion Causes Inserts To Fail

From

Tom Lane

Date:

31 May 2011, 00:09:55

"David Johnston" <polobo@yahoo.com> writes:
> SELECT (   createpkrecord('1')    ).*;
> [ results in function being called more than once ]

Yeah.  Don't do that.  Better style is

SELECT * FROM createpkrecord('1');

            regards, tom lane

Re: Function Column Expansion Causes Inserts To Fail

From

"David Johnston"

Date:

31 May 2011, 11:40:07

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Tom Lane
> Sent: Monday, May 30, 2011 11:10 PM
> To: David Johnston
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Function Column Expansion Causes Inserts To Fail
>
> "David Johnston" <polobo@yahoo.com> writes:
> > SELECT (   createpkrecord('1')    ).*;
> > [ results in function being called more than once ]
>
> Yeah.  Don't do that.  Better style is
>
> SELECT * FROM createpkrecord('1');
>
>             regards, tom lane
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make
> changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

From syntax works fine for literals but how would you then get table.column
values into the function call - where you want to evaluate multiple rows
from the source table?  In order to feed rows to a function you need the
function in the SELECT column-list so that it can see the columns in
question.

 If using the described syntax results in the odd situation where a function
is called twice I would consider that a bug.  Either it needs to be fixed or
the system should disallow that syntax from being used.  I haven't tried
serial pk creation or other side-effects that would not result in such an
obvious error but it is reasonable to believe that if the duplicate key
exception is being thrown then other bad - but not catchable things - could
occur as well.  Even an expensive SELECT statement inside the function would
make this behavior undesirable - though I am guessing it would otherwise be
invisible since the SELECT is not a side-effect and thus the engine would
only return one set of results - though I haven't tested this theory either.

The fact that: SELECT createpkrecord('1') works - returning a "row" - leads
me to think that decomposing that row should be (but is not) independent of
the source of that "row".

The work around I described (converting the SELECT function() statement to a
sub-query and expanding the results in the parent) is fine but if that is
the only safe way to do it then the alternate method should fail since it is
unsafe.  Now, back to my first question, are there other alternatives that
I've overlooked when you want to use the result of a SELECT statement as the
source of values for a function call?

That is, how would you re-write this to place "createpkrecord(sub)" in a
FROM clause instead of the SELECT list?

SELECT createpkrecord(sub)
FROM (SELECT sub FROM generate_series(1, 10) sub ) src;

David J.

Re: Function Column Expansion Causes Inserts To Fail

From

Merlin Moncure

Date:

31 May 2011, 13:37:44

On Tue, May 31, 2011 at 9:24 AM, David Johnston <polobo@yahoo.com> wrote:
> From syntax works fine for literals but how would you then get table.column
> values into the function call - where you want to evaluate multiple rows
> from the source table?  In order to feed rows to a function you need the
> function in the SELECT column-list so that it can see the columns in
> question.
>
>  If using the described syntax results in the odd situation where a function
> is called twice I would consider that a bug.  Either it needs to be fixed or
> the system should disallow that syntax from being used.  I haven't tried
> serial pk creation or other side-effects that would not result in such an
> obvious error but it is reasonable to believe that if the duplicate key
> exception is being thrown then other bad - but not catchable things - could
> occur as well.  Even an expensive SELECT statement inside the function would
> make this behavior undesirable - though I am guessing it would otherwise be
> invisible since the SELECT is not a side-effect and thus the engine would
> only return one set of results - though I haven't tested this theory either.
>
> The fact that: SELECT createpkrecord('1') works - returning a "row" - leads
> me to think that decomposing that row should be (but is not) independent of
> the source of that "row".
>
> The work around I described (converting the SELECT function() statement to a
> sub-query and expanding the results in the parent) is fine but if that is
> the only safe way to do it then the alternate method should fail since it is
> unsafe.  Now, back to my first question, are there other alternatives that
> I've overlooked when you want to use the result of a SELECT statement as the
> source of values for a function call?
>
> That is, how would you re-write this to place "createpkrecord(sub)" in a
> FROM clause instead of the SELECT list?
>
> SELECT createpkrecord(sub)
> FROM (SELECT sub FROM generate_series(1, 10) sub ) src;

The basic issue is that:
select (func()).*, if the return type has fields, 'a', 'b', gets expanded to:

select (func()).a, (func()).b;

This is a *huge* gotcha with type returning functions -- in many cases
people only notice the problem indirectly through slow performance.
I've griped about this many times but it's not clear if there's a
solution other than to document and advise workarounds.  Typically the
safest way to deal with this is through use of CTE:

> SELECT createpkrecord(sub)
> FROM (SELECT sub FROM generate_series(1, 10) sub ) src;

becomes

with list as (SELECT createpkrecord(sub) as c FROM generate_series(1, 10) sub )
select (c).* from list;

merlin

Re: Function Column Expansion Causes Inserts To Fail

From

"David Johnston"

Date:

31 May 2011, 13:59:51

See my thoughts below.  Other user's opinions (or a pointer to where this
topic has been previously discussed) are greatly welcomed.

> -----Original Message-----
> From: Merlin Moncure [mailto:mmoncure@gmail.com]
> Sent: Tuesday, May 31, 2011 11:56 AM
> To: David Johnston
> Cc: Tom Lane; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Function Column Expansion Causes Inserts To Fail
>
> The basic issue is that:
> select (func()).*, if the return type has fields, 'a', 'b', gets expanded
to:
>
> select (func()).a, (func()).b;
>
> This is a *huge* gotcha with type returning functions -- in many cases
people
> only notice the problem indirectly through slow performance.
> I've griped about this many times but it's not clear if there's a solution
other
> than to document and advise workarounds.  Typically the safest way to deal
> with this is through use of CTE:
>
> > SELECT createpkrecord(sub)
> > FROM (SELECT sub FROM generate_series(1, 10) sub ) src;
>
> becomes
>
> with list as (SELECT createpkrecord(sub) as c FROM generate_series(1, 10)
> sub ) select (c).* from list;
>
> merlin

Thank you for the technical detail on how  ().* gets expanded by the engine.
I still believe it would make sense to disallow VOLATILE functions to have
the duplicate behavior performed by default - with probably an override
during function creation.  Backwards compatibility could introduce notices
and have server configurations to restore prior behavior.  The rewriter
should know that the composite/record type in the select list is a function
as opposed to an actual type.  Even if  you document the behavior unless you
make the runtime engine comply as well this is very subtle (and invisible)
behavior for the end-user.

I may be overreacting here but calling a function multiple times when the
query only says to call it once is unexpected behavior and arguably results
in an incompatible query relative to what was expected.  I can see how such
behavior can be desirable and benign but the user should have to explicitly
request/allow such behavior as opposed to having it given to them by
default.  That way, at least during the decision making process of turning
on the feature the user can be reasonable expected to view the relevant
documentation to learn why such explicit permission is required.

Now, for actual types this is obviously not an issue.  If you could output
the function result into an actual type and the simply duplicate the type
with the relevant column specification that would obviously avoid the entire
problem.  I am guessing, from your response, that this is not that easy of a
solution to implement.  Now, if the rewriter could generate something like
the following:

SELECT aux1, aux2, (createpkrecord(sub)).* FROM generate_series(1,10) sub

[REWRITE] (remove the ().* construct and add AS resultfunction1; make the
resultant query a sub-query and copy all the non-function columns to the
parent and also expand functionresult1.* as necessary

SELECT aux1, aux2, functionresult1.col1, functionresult1.col2
FROM (SELECT aux1, aux2, createpkrecord(sub) AS functionresult1 FROM
generate_series(1,10) sub)

That would be semantically and functionally equivalent; and obviously the
resultant query works since that it the function workaround and is also what
happens when you use a CTE.

Again, I don't mind using the more verbose syntax but I'd rather have the
system disallow the broken syntax outright since it changing the declared
behavior of the query.  It's hard for me to make a risk/cost/benefit
analysis on the issue but from a pure theory stand-point IMHO  the behavior
is contrary to the promise of the database engine to not change the meaning
of the query that it is given.  Making it work as expected would be nice but
otherwise steps should be taken to stop multiple function calls unless the
function says it is safe to do so.

Documentation should not be a substitute for bug fixing.  If you really want
to say "don't do that" it is better if you do so through the database engine
itself and not the mailing list.

David J.

Re: Function Column Expansion Causes Inserts To Fail

From

Merlin Moncure

Date:

31 May 2011, 16:18:08

On Tue, May 31, 2011 at 11:57 AM, David Johnston <polobo@yahoo.com> wrote:
> See my thoughts below.  Other user's opinions (or a pointer to where this
> topic has been previously discussed) are greatly welcomed.
> Thank you for the technical detail on how  ().* gets expanded by the engine.
> I still believe it would make sense to disallow VOLATILE functions to have
> the duplicate behavior performed by default - with probably an override
> during function creation.  Backwards compatibility could introduce notices
> and have server configurations to restore prior behavior.  The rewriter
> should know that the composite/record type in the select list is a function
> as opposed to an actual type.  Even if  you document the behavior unless you
> make the runtime engine comply as well this is very subtle (and invisible)
> behavior for the end-user.

There have been multiple complaints about this in the archives.  In
the old days, you would have to rewrite your query to use the 'select
* from func()' form (which isn't always so easy) or use a subquery and
the 'offset 0' hack.  Running in to this problem has actually become
more common as our type system has gotten fancier and plpgsql got the
ability to be called with the column list, aka select func(), syntax.

The community has had to endure multiple sanctimonious rants about
this by yours truly.  Unfortunately complaints are cheap relative to
the hard work and consensus building it would require to fix this
problem.  Here's a kinda sorta related thread (read the whole thread)
about it where I was trying to work a solution in somehow:

http://archives.postgresql.org/pgsql-hackers/2010-05/msg00333.php

> I may be overreacting here but calling a function multiple times when the
> query only says to call it once is unexpected behavior and arguably results
> in an incompatible query relative to what was expected.  I can see how such
> behavior can be desirable and benign but the user should have to explicitly
> request/allow such behavior as opposed to having it given to them by
> default.  That way, at least during the decision making process of turning
> on the feature the user can be reasonable expected to view the relevant
> documentation to learn why such explicit permission is required.

The idea that you should be able to control how the function 'hooks
in' to the outer query in terms of inputs and outputs is almost
certainly not going to fly.  In this particular case I think you're
best off trying to prove that the current behavior is not
intentionally relied on by anybody and should be changed.   If you
could do that, as noted in the email above, you might be able to argue
that (func()).* is a special case and takes on a unique meaning, as
would, possibly, select (foo).* from foo;

> Now, for actual types this is obviously not an issue.  If you could output
> the function result into an actual type and the simply duplicate the type
> with the relevant column specification that would obviously avoid the entire
> problem.  I am guessing, from your response, that this is not that easy of a
> solution to implement.  Now, if the rewriter could generate something like
> the following:
>
> SELECT aux1, aux2, (createpkrecord(sub)).* FROM generate_series(1,10) sub
>
> [REWRITE] (remove the ().* construct and add AS resultfunction1; make the
> resultant query a sub-query and copy all the non-function columns to the
> parent and also expand functionresult1.* as necessary

IMO, That's not gonna fly either -- you are oversimplifying.  Suppose
you wrapped the above query into a create view statement, what would
it look like?

> SELECT aux1, aux2, functionresult1.col1, functionresult1.col2
> FROM (SELECT aux1, aux2, createpkrecord(sub) AS functionresult1 FROM
> generate_series(1,10) sub)

This is also a problem: postgresql can (and does) inline the subquery
(unless you use offset 0), defeating the intended purpose of what you
are trying to do.  So this is not the same as a CTE, which has a rigid
mechanism of order of execution (although I wonder even relying on
that is future-proof).

> Documentation should not be a substitute for bug fixing.  If you really want
> to say "don't do that" it is better if you do so through the database engine
> itself and not the mailing list.

Well, surprising behaviors != bug.  The current behavior actually
works very well for all other contexts than functions (for example
when creating views).   In lieu of a sneaky fix like I mentioned
above, maybe you could get some buy-in on a warning though.

merlin

Re: Function Column Expansion Causes Inserts To Fail

From

Tom Lane

Date:

31 May 2011, 17:28:28

Merlin Moncure <mmoncure@gmail.com> writes:
> There have been multiple complaints about this in the archives.  In
> the old days, you would have to rewrite your query to use the 'select
> * from func()' form (which isn't always so easy) or use a subquery and
> the 'offset 0' hack.  Running in to this problem has actually become
> more common as our type system has gotten fancier and plpgsql got the
> ability to be called with the column list, aka select func(), syntax.

> The community has had to endure multiple sanctimonious rants about
> this by yours truly.  Unfortunately complaints are cheap relative to
> the hard work and consensus building it would require to fix this
> problem.

FWIW, the SQL-standard LATERAL construct would fix the problem
reasonably well, and that is on the roadmap already.

            regards, tom lane

Re: Function Column Expansion Causes Inserts To Fail

From

Merlin Moncure

Date:

31 May 2011, 18:42:48

On Tue, May 31, 2011 at 3:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> There have been multiple complaints about this in the archives.  In
>> the old days, you would have to rewrite your query to use the 'select
>> * from func()' form (which isn't always so easy) or use a subquery and
>> the 'offset 0' hack.  Running in to this problem has actually become
>> more common as our type system has gotten fancier and plpgsql got the
>> ability to be called with the column list, aka select func(), syntax.
>
>> The community has had to endure multiple sanctimonious rants about
>> this by yours truly.  Unfortunately complaints are cheap relative to
>> the hard work and consensus building it would require to fix this
>> problem.
>
> FWIW, the SQL-standard LATERAL construct would fix the problem
> reasonably well, and that is on the roadmap already.

right -- it looks like you could write the OP's query:
SELECT createpkrecord(sub) FROM (SELECT sub FROM generate_series(1,
10) sub ) src;

like this:
SELECT s.* from generate_series(1,10) sub, lateral(createpkrecord(sub)) AS s;

That doesn't really speak though to the OP's point, which I obviously
agree with, that the current behavior is pretty awful and that the
dangers of relying on it should be advertised more loudly.  Maybe a
warning plus a hint to use lateral might be helpful if/when that
feature comes in, or a documentation fix.

I've never taken the time to really get my head around 'lateral'
enough to say for sure if it provides clean workarounds for all the
cases that get people into hot water.  The case that used to get me a
lot is (the unfortunately generally under utilized) custom aggregates.

problem:
select bar_id, (some_agg(foo)).* from foo join bar ... group by bar_id;

solution with lateral?

merlin

Re: Function Column Expansion Causes Inserts To Fail

From

Tom Lane

Date:

31 May 2011, 19:14:05

Merlin Moncure <mmoncure@gmail.com> writes:
> I've never taken the time to really get my head around 'lateral'
> enough to say for sure if it provides clean workarounds for all the
> cases that get people into hot water.  The case that used to get me a
> lot is (the unfortunately generally under utilized) custom aggregates.

> problem:
> select bar_id, (some_agg(foo)).* from foo join bar ... group by bar_id;

Hm, really?  I'd expect that nodeAgg's attempts to collect identical
aggregate calls into one would keep you out of trouble there.  That
hack unfortunately doesn't generalize to ordinary functions ...

            regards, tom lane

Re: Function Column Expansion Causes Inserts To Fail

From

Merlin Moncure

Date:

31 May 2011, 19:16:29

On Tue, May 31, 2011 at 5:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> I've never taken the time to really get my head around 'lateral'
>> enough to say for sure if it provides clean workarounds for all the
>> cases that get people into hot water.  The case that used to get me a
>> lot is (the unfortunately generally under utilized) custom aggregates.
>
>> problem:
>> select bar_id, (some_agg(foo)).* from foo join bar ... group by bar_id;
>
> Hm, really?  I'd expect that nodeAgg's attempts to collect identical
> aggregate calls into one would keep you out of trouble there.  That
> hack unfortunately doesn't generalize to ordinary functions ...

you appear to be right -- memory failing here.

merlin