Thread: Function Column Expansion Causes Inserts To Fail
PostgreSQL 9.0.4
The following script fails even though the “pkonlytest” table is empty since we just created it…
>>>>>>>>>>>>>>>>>>>>>>>>>> BEGIN SCRIPT
CREATE TABLE pkonlytest (
pkid text PRIMARY KEY
);
CREATE OR REPLACE FUNCTION createpkrecord(INOUT pkvalue text, OUT col1 boolean, OUT col2 boolean)
RETURNS record
AS $$
BEGIN
INSERT INTO pkonlytest (pkid) VALUES (pkvalue);
col1 = true;
col2 = false;
END;
$$
LANGUAGE 'plpgsql';
SELECT ( createpkrecord('1') ).*;
SQL Error: ERROR: duplicate key value violates unique constraint "pkonlytest_pkey"
DETAIL: Key (pkid)=(1) already exists.
CONTEXT: SQL statement "INSERT INTO pkonlytest (pkid) VALUES (pkvalue)"
PL/pgSQL function "createpkrecord" line 2 at SQL statement
>>>>>>>>>>>>>>>>>>END SCRIPT
If you call the function without the column expansion (and required parentheses) it work just fine.
SELECT createpkrecord(‘1’);
There is a workaround…
SELECT (func.result).* FROM (
SELECT createpkrecord('4') as result ) func
David J.
"David Johnston" <polobo@yahoo.com> writes: > SELECT ( createpkrecord('1') ).*; > [ results in function being called more than once ] Yeah. Don't do that. Better style is SELECT * FROM createpkrecord('1'); regards, tom lane
> -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of Tom Lane > Sent: Monday, May 30, 2011 11:10 PM > To: David Johnston > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Function Column Expansion Causes Inserts To Fail > > "David Johnston" <polobo@yahoo.com> writes: > > SELECT ( createpkrecord('1') ).*; > > [ results in function being called more than once ] > > Yeah. Don't do that. Better style is > > SELECT * FROM createpkrecord('1'); > > regards, tom lane > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make > changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general From syntax works fine for literals but how would you then get table.column values into the function call - where you want to evaluate multiple rows from the source table? In order to feed rows to a function you need the function in the SELECT column-list so that it can see the columns in question. If using the described syntax results in the odd situation where a function is called twice I would consider that a bug. Either it needs to be fixed or the system should disallow that syntax from being used. I haven't tried serial pk creation or other side-effects that would not result in such an obvious error but it is reasonable to believe that if the duplicate key exception is being thrown then other bad - but not catchable things - could occur as well. Even an expensive SELECT statement inside the function would make this behavior undesirable - though I am guessing it would otherwise be invisible since the SELECT is not a side-effect and thus the engine would only return one set of results - though I haven't tested this theory either. The fact that: SELECT createpkrecord('1') works - returning a "row" - leads me to think that decomposing that row should be (but is not) independent of the source of that "row". The work around I described (converting the SELECT function() statement to a sub-query and expanding the results in the parent) is fine but if that is the only safe way to do it then the alternate method should fail since it is unsafe. Now, back to my first question, are there other alternatives that I've overlooked when you want to use the result of a SELECT statement as the source of values for a function call? That is, how would you re-write this to place "createpkrecord(sub)" in a FROM clause instead of the SELECT list? SELECT createpkrecord(sub) FROM (SELECT sub FROM generate_series(1, 10) sub ) src; David J.
On Tue, May 31, 2011 at 9:24 AM, David Johnston <polobo@yahoo.com> wrote: > From syntax works fine for literals but how would you then get table.column > values into the function call - where you want to evaluate multiple rows > from the source table? In order to feed rows to a function you need the > function in the SELECT column-list so that it can see the columns in > question. > > If using the described syntax results in the odd situation where a function > is called twice I would consider that a bug. Either it needs to be fixed or > the system should disallow that syntax from being used. I haven't tried > serial pk creation or other side-effects that would not result in such an > obvious error but it is reasonable to believe that if the duplicate key > exception is being thrown then other bad - but not catchable things - could > occur as well. Even an expensive SELECT statement inside the function would > make this behavior undesirable - though I am guessing it would otherwise be > invisible since the SELECT is not a side-effect and thus the engine would > only return one set of results - though I haven't tested this theory either. > > The fact that: SELECT createpkrecord('1') works - returning a "row" - leads > me to think that decomposing that row should be (but is not) independent of > the source of that "row". > > The work around I described (converting the SELECT function() statement to a > sub-query and expanding the results in the parent) is fine but if that is > the only safe way to do it then the alternate method should fail since it is > unsafe. Now, back to my first question, are there other alternatives that > I've overlooked when you want to use the result of a SELECT statement as the > source of values for a function call? > > That is, how would you re-write this to place "createpkrecord(sub)" in a > FROM clause instead of the SELECT list? > > SELECT createpkrecord(sub) > FROM (SELECT sub FROM generate_series(1, 10) sub ) src; The basic issue is that: select (func()).*, if the return type has fields, 'a', 'b', gets expanded to: select (func()).a, (func()).b; This is a *huge* gotcha with type returning functions -- in many cases people only notice the problem indirectly through slow performance. I've griped about this many times but it's not clear if there's a solution other than to document and advise workarounds. Typically the safest way to deal with this is through use of CTE: > SELECT createpkrecord(sub) > FROM (SELECT sub FROM generate_series(1, 10) sub ) src; becomes with list as (SELECT createpkrecord(sub) as c FROM generate_series(1, 10) sub ) select (c).* from list; merlin
See my thoughts below. Other user's opinions (or a pointer to where this topic has been previously discussed) are greatly welcomed. > -----Original Message----- > From: Merlin Moncure [mailto:mmoncure@gmail.com] > Sent: Tuesday, May 31, 2011 11:56 AM > To: David Johnston > Cc: Tom Lane; pgsql-general@postgresql.org > Subject: Re: [GENERAL] Function Column Expansion Causes Inserts To Fail > > The basic issue is that: > select (func()).*, if the return type has fields, 'a', 'b', gets expanded to: > > select (func()).a, (func()).b; > > This is a *huge* gotcha with type returning functions -- in many cases people > only notice the problem indirectly through slow performance. > I've griped about this many times but it's not clear if there's a solution other > than to document and advise workarounds. Typically the safest way to deal > with this is through use of CTE: > > > SELECT createpkrecord(sub) > > FROM (SELECT sub FROM generate_series(1, 10) sub ) src; > > becomes > > with list as (SELECT createpkrecord(sub) as c FROM generate_series(1, 10) > sub ) select (c).* from list; > > merlin Thank you for the technical detail on how ().* gets expanded by the engine. I still believe it would make sense to disallow VOLATILE functions to have the duplicate behavior performed by default - with probably an override during function creation. Backwards compatibility could introduce notices and have server configurations to restore prior behavior. The rewriter should know that the composite/record type in the select list is a function as opposed to an actual type. Even if you document the behavior unless you make the runtime engine comply as well this is very subtle (and invisible) behavior for the end-user. I may be overreacting here but calling a function multiple times when the query only says to call it once is unexpected behavior and arguably results in an incompatible query relative to what was expected. I can see how such behavior can be desirable and benign but the user should have to explicitly request/allow such behavior as opposed to having it given to them by default. That way, at least during the decision making process of turning on the feature the user can be reasonable expected to view the relevant documentation to learn why such explicit permission is required. Now, for actual types this is obviously not an issue. If you could output the function result into an actual type and the simply duplicate the type with the relevant column specification that would obviously avoid the entire problem. I am guessing, from your response, that this is not that easy of a solution to implement. Now, if the rewriter could generate something like the following: SELECT aux1, aux2, (createpkrecord(sub)).* FROM generate_series(1,10) sub [REWRITE] (remove the ().* construct and add AS resultfunction1; make the resultant query a sub-query and copy all the non-function columns to the parent and also expand functionresult1.* as necessary SELECT aux1, aux2, functionresult1.col1, functionresult1.col2 FROM (SELECT aux1, aux2, createpkrecord(sub) AS functionresult1 FROM generate_series(1,10) sub) That would be semantically and functionally equivalent; and obviously the resultant query works since that it the function workaround and is also what happens when you use a CTE. Again, I don't mind using the more verbose syntax but I'd rather have the system disallow the broken syntax outright since it changing the declared behavior of the query. It's hard for me to make a risk/cost/benefit analysis on the issue but from a pure theory stand-point IMHO the behavior is contrary to the promise of the database engine to not change the meaning of the query that it is given. Making it work as expected would be nice but otherwise steps should be taken to stop multiple function calls unless the function says it is safe to do so. Documentation should not be a substitute for bug fixing. If you really want to say "don't do that" it is better if you do so through the database engine itself and not the mailing list. David J.
On Tue, May 31, 2011 at 11:57 AM, David Johnston <polobo@yahoo.com> wrote: > See my thoughts below. Other user's opinions (or a pointer to where this > topic has been previously discussed) are greatly welcomed. > Thank you for the technical detail on how ().* gets expanded by the engine. > I still believe it would make sense to disallow VOLATILE functions to have > the duplicate behavior performed by default - with probably an override > during function creation. Backwards compatibility could introduce notices > and have server configurations to restore prior behavior. The rewriter > should know that the composite/record type in the select list is a function > as opposed to an actual type. Even if you document the behavior unless you > make the runtime engine comply as well this is very subtle (and invisible) > behavior for the end-user. There have been multiple complaints about this in the archives. In the old days, you would have to rewrite your query to use the 'select * from func()' form (which isn't always so easy) or use a subquery and the 'offset 0' hack. Running in to this problem has actually become more common as our type system has gotten fancier and plpgsql got the ability to be called with the column list, aka select func(), syntax. The community has had to endure multiple sanctimonious rants about this by yours truly. Unfortunately complaints are cheap relative to the hard work and consensus building it would require to fix this problem. Here's a kinda sorta related thread (read the whole thread) about it where I was trying to work a solution in somehow: http://archives.postgresql.org/pgsql-hackers/2010-05/msg00333.php > I may be overreacting here but calling a function multiple times when the > query only says to call it once is unexpected behavior and arguably results > in an incompatible query relative to what was expected. I can see how such > behavior can be desirable and benign but the user should have to explicitly > request/allow such behavior as opposed to having it given to them by > default. That way, at least during the decision making process of turning > on the feature the user can be reasonable expected to view the relevant > documentation to learn why such explicit permission is required. The idea that you should be able to control how the function 'hooks in' to the outer query in terms of inputs and outputs is almost certainly not going to fly. In this particular case I think you're best off trying to prove that the current behavior is not intentionally relied on by anybody and should be changed. If you could do that, as noted in the email above, you might be able to argue that (func()).* is a special case and takes on a unique meaning, as would, possibly, select (foo).* from foo; > Now, for actual types this is obviously not an issue. If you could output > the function result into an actual type and the simply duplicate the type > with the relevant column specification that would obviously avoid the entire > problem. I am guessing, from your response, that this is not that easy of a > solution to implement. Now, if the rewriter could generate something like > the following: > > SELECT aux1, aux2, (createpkrecord(sub)).* FROM generate_series(1,10) sub > > [REWRITE] (remove the ().* construct and add AS resultfunction1; make the > resultant query a sub-query and copy all the non-function columns to the > parent and also expand functionresult1.* as necessary IMO, That's not gonna fly either -- you are oversimplifying. Suppose you wrapped the above query into a create view statement, what would it look like? > SELECT aux1, aux2, functionresult1.col1, functionresult1.col2 > FROM (SELECT aux1, aux2, createpkrecord(sub) AS functionresult1 FROM > generate_series(1,10) sub) This is also a problem: postgresql can (and does) inline the subquery (unless you use offset 0), defeating the intended purpose of what you are trying to do. So this is not the same as a CTE, which has a rigid mechanism of order of execution (although I wonder even relying on that is future-proof). > Documentation should not be a substitute for bug fixing. If you really want > to say "don't do that" it is better if you do so through the database engine > itself and not the mailing list. Well, surprising behaviors != bug. The current behavior actually works very well for all other contexts than functions (for example when creating views). In lieu of a sneaky fix like I mentioned above, maybe you could get some buy-in on a warning though. merlin
Merlin Moncure <mmoncure@gmail.com> writes: > There have been multiple complaints about this in the archives. In > the old days, you would have to rewrite your query to use the 'select > * from func()' form (which isn't always so easy) or use a subquery and > the 'offset 0' hack. Running in to this problem has actually become > more common as our type system has gotten fancier and plpgsql got the > ability to be called with the column list, aka select func(), syntax. > The community has had to endure multiple sanctimonious rants about > this by yours truly. Unfortunately complaints are cheap relative to > the hard work and consensus building it would require to fix this > problem. FWIW, the SQL-standard LATERAL construct would fix the problem reasonably well, and that is on the roadmap already. regards, tom lane
On Tue, May 31, 2011 at 3:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> There have been multiple complaints about this in the archives. In >> the old days, you would have to rewrite your query to use the 'select >> * from func()' form (which isn't always so easy) or use a subquery and >> the 'offset 0' hack. Running in to this problem has actually become >> more common as our type system has gotten fancier and plpgsql got the >> ability to be called with the column list, aka select func(), syntax. > >> The community has had to endure multiple sanctimonious rants about >> this by yours truly. Unfortunately complaints are cheap relative to >> the hard work and consensus building it would require to fix this >> problem. > > FWIW, the SQL-standard LATERAL construct would fix the problem > reasonably well, and that is on the roadmap already. right -- it looks like you could write the OP's query: SELECT createpkrecord(sub) FROM (SELECT sub FROM generate_series(1, 10) sub ) src; like this: SELECT s.* from generate_series(1,10) sub, lateral(createpkrecord(sub)) AS s; That doesn't really speak though to the OP's point, which I obviously agree with, that the current behavior is pretty awful and that the dangers of relying on it should be advertised more loudly. Maybe a warning plus a hint to use lateral might be helpful if/when that feature comes in, or a documentation fix. I've never taken the time to really get my head around 'lateral' enough to say for sure if it provides clean workarounds for all the cases that get people into hot water. The case that used to get me a lot is (the unfortunately generally under utilized) custom aggregates. problem: select bar_id, (some_agg(foo)).* from foo join bar ... group by bar_id; solution with lateral? merlin
Merlin Moncure <mmoncure@gmail.com> writes: > I've never taken the time to really get my head around 'lateral' > enough to say for sure if it provides clean workarounds for all the > cases that get people into hot water. The case that used to get me a > lot is (the unfortunately generally under utilized) custom aggregates. > problem: > select bar_id, (some_agg(foo)).* from foo join bar ... group by bar_id; Hm, really? I'd expect that nodeAgg's attempts to collect identical aggregate calls into one would keep you out of trouble there. That hack unfortunately doesn't generalize to ordinary functions ... regards, tom lane
On Tue, May 31, 2011 at 5:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> I've never taken the time to really get my head around 'lateral' >> enough to say for sure if it provides clean workarounds for all the >> cases that get people into hot water. The case that used to get me a >> lot is (the unfortunately generally under utilized) custom aggregates. > >> problem: >> select bar_id, (some_agg(foo)).* from foo join bar ... group by bar_id; > > Hm, really? I'd expect that nodeAgg's attempts to collect identical > aggregate calls into one would keep you out of trouble there. That > hack unfortunately doesn't generalize to ordinary functions ... you appear to be right -- memory failing here. merlin