Thread: Bug? Function with side effects not evaluated in CTE
In a new database test
, I create the following table and function:
create table tb_item ( item integer );
create or replace function fn_new_item( in_item integer )
returns integer language plpgsql as$_$
begin insert into tb_item values( in_item ); return in_item;
end$_$;
Then, I execute the following query and see DELETE 0 as you would expect:
test=# with tt_created as
( select fn_new_item(1) as item
)
delete from tb_itemwhere item not in (select item from tt_created);
DELETE 0
However, if everything is correct, we should see one row in the table, where item = 1. But we do not:
test=# table tb_item;item
------
(0 rows)
I tried to outsmart Postgres with the following query, but it failed in the same way:
with tt_created as
( select fn_new_item(1) as item
),
tt_to_delete as
( select i.item from tb_item ileft join tt_created c on i.item = c.item where c.item is null
)
delete from tb_item iusing tt_to_delete tdwhere td.item = i.item;
However, It behaves as one would expect if the first CTE is built with INSERT ... RETURNING
.
It also behaves as one would expect if the table starts out non-empty.
This seems like a bug. Would someone please look into this?
Thanks.
Moshe Jacobson
Manager of Systems Engineering, Nead Werx Inc.
2323 Cumberland Parkway · Suite 201 · Atlanta, GA 30339
“Quality is not an act, it is a habit.” — Aristotle
Moshe Jacobson <moshe@neadwerx.com> writes: > However, It behaves as one would expect if the first CTE is built with INSERT > ... RETURNING. CTEs containing INSERT/UPDATE/DELETE are guaranteed to be executed exactly once. CTEs containing SELECTs are guaranteed to be executed at most once (the documentation phrases that as "execution can stop early if the outer query doesn't read all the rows" --- in this case, it read none of them, since the outer query never had to evaluate the NOT IN). I see no bug here. regards, tom lane
On Wed, Oct 16, 2013 at 4:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Moshe Jacobson <moshe@neadwerx.com> writes: >> However, It behaves as one would expect if the first CTE is built with INSERT >> ... RETURNING. > > CTEs containing INSERT/UPDATE/DELETE are guaranteed to be executed exactly > once. CTEs containing SELECTs are guaranteed to be executed at most once > (the documentation phrases that as "execution can stop early if the outer > query doesn't read all the rows" --- in this case, it read none of them, > since the outer query never had to evaluate the NOT IN). > > I see no bug here. You can force the CTE to be read a couple of different ways; it isn't very difficult to do: just tweak the final query so that it *must* touch the SELECT result somehow (e.g. AND (SELECT COUNT(*) FROM tt_created) > 0). That being said, I do think it might be better behavior (and still technically correct per the documentation) if volatile query expressions were force-evaluated. merlin
That being said, I do think it might be better behavior (and still technically correct per the documentation) if volatile query expressions were force-evaluated.
This sounds reasonable for a "yes or no" case like this, but wouldn't it raise the question of how many times the function should be evaluated?
What if the query looked more like this:
with tt_created as
(
select fn_new_item(foo) as item
from some_huge_table
)
select item
from tt_created
limit 10
Should the CTE be calculated in its entirety, running the function for every row in some_huge_table? Or should it run at most 10 times?
Which is desired would depend on the situation, but there's no real way to indicate in the syntax.
-- Rowan Collins [IMSoP]
This sounds reasonable for a "yes or no" case like this, but wouldn't it raise the question of how many times the function should be evaluated?On 17/10/2013 00:06, Merlin Moncure wrote:That being said, I do think it might be better behavior (and still technically correct per the documentation) if volatile query expressions were force-evaluated.
What if the query looked more like this:
with tt_created as
(
select fn_new_item(foo) as item
from some_huge_table
)
select item
from tt_created
limit 10
Should the CTE be calculated in its entirety, running the function for every row in some_huge_table? Or should it run at most 10 times?
Which is desired would depend on the situation, but there's no real way to indicate in the syntax.-- Rowan Collins [IMSoP]
ajelinek@gmail.com wrote > but if I insert one row before I run the sql the CTE is > executed and I get a new row in the table. I was hoping that I would see > a > difference in the explain, but the explain with an empty table where the > CTE is *not* executed is identical to the explain where there is one row > in > the table already and the CTE *is* executed resulting in a new row. Would help to include the explain(s). Did you ANALYZE after the insert; if not the planner probably still thought the table was empty (thus the matching explain) but upon execution realized it had records and thus needed to run the CTE. Since the executor cannot fully trust the statistics, and a full scan of an empty table would be very fast, scanning the table to delete would be a necessary first step before running the CTE for the secondary conditions (where clause). An implicit first-condition/result is that a DELETE on an empty table is effectively a No-Op. The only reason to override that no-op would be if a CTE needs to be run by policy as Tom noted. > I thought maybe Postgres was not executing the CTE because it knows that > there are no rows in the table for it to delete, however if I change the > CTE to be an insert returning instead of a function I get different > results. Even when the table is empty I get new row created. Like Tom said, if you don't hide the INSERT inside a function the CTE will always be executed. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Bug-Function-with-side-effects-not-evaluated-in-CTE-tp5774792p5775095.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Wed, Oct 16, 2013 at 7:14 PM, Rowan Collins <rowan.collins@gmail.com> wrote: > On 17/10/2013 00:06, Merlin Moncure wrote: > > That being said, I do think it might be better behavior (and still > technically correct per the documentation) if volatile query > expressions were force-evaluated. > > > This sounds reasonable for a "yes or no" case like this, but wouldn't it > raise the question of how many times the function should be evaluated? > > What if the query looked more like this: > > with tt_created as > ( > select fn_new_item(foo) as item > from some_huge_table > ) > select item > from tt_created > limit 10 > > > Should the CTE be calculated in its entirety, running the function for every > row in some_huge_table? Or should it run at most 10 times? > > Which is desired would depend on the situation, but there's no real way to > indicate in the syntax. ISTM the answer is clearly "in its entirety". The premise is that the optimization of non-evaluation of CTE queries is not dependent on mechanics further down the chain if the CTE has volatile expressions. If you wanted to structure the query so that the function was run only 10 times, that could be done trivially by moving the limit inside the CTE. merlin
Merlin Moncure-2 wrote > If you wanted to structure the query so that the function was run only > 10 times, that could be done trivially by moving the limit inside the > CTE. It is not trivial if you want to wrap the CTE expression into a VIEW and the caller of the view only wishes to see/evaluate a subset of the data - even if just to check the structure with some sample contents? A plain LIMIT is limited though and usually an ORDER BY would be added for some kind of top/bottom-N listing. In such a case the presence of the ORDER BY would trigger the full scan and operate before the LIMIT. And why is volatile so special here? A stable function seems just as good a candidate for this behavior and even an immutable one. You are taking the presence of "volatile" to mean that the function may be wrapping an INSERT/UPDATE/DELETE and so should be executed in case that is true. The examples are focused on avoiding expensive and needless SELECT evaluations. Since INSERT/UPDATE/DELETE cannot occur in a non-volatile function, and expensive table-processing/generating functions can be implemented as stable, this would seem to work at the cost of further leveraging the volatility modifier on the function - which defaults to VOLATILE. Not sure if this is a good or bad decision to make but this thread hasn't really provided a compelling use-case for making a change: the example provided is too contrived. And while the current situation is somewhat difficult to learn attaching yet more logic to the function volatility is going to be difficult to learn too. Wrapping INSERT/UPDATE/DELETE inside a non-procedural function is not that common and in those cases where it is used it is most often a standalone API and not something that is going to be reasonably used in a CTE. Putting it into a CTE is an attempt to avoid procedural logic and/or SQL scripting; but those tools are there to solve just this kind of problem. The separation of data insertion from data querying is just as desirable as separating data from presentation. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Bug-Function-with-side-effects-not-evaluated-in-CTE-tp5774792p5775098.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Fri, Oct 18, 2013 at 4:08 PM, David Johnston <polobo@yahoo.com> wrote: > And why is volatile so special here? A stable function seems just as good a > candidate for this behavior and even an immutable one. Absolutely disagree with this. Stable operations do not have side effects and volatile operations do (or at least they can). Point being: after the query runs the database is in the same state regardless if a stable function is run or not. So, for stable operations (which covers most queries), the current behavior is good. But, it's a completely different story in the case of volatile expression and that's why I believe the optimization to skip is plainly a POLA violation. OP is not the first to bring it up by the way. > Wrapping INSERT/UPDATE/DELETE inside a non-procedural function is not that > common and in those cases where it is used it is most often a standalone API I'm not buying this either. It may not be 'common' but people are clearly exercising the behavior and 'data modifying with' is IMNSHO a much better coding paradigm than functions wrapping same. Regardless, the point at hand is whether specific plan semantics down the chain can control whether or not volatile expressions should run. Clearly, at least to me, they should not. merlin
>not the planner probably still thought the table was empty (thus the
>matching explain) but upon execution realized it had records and thus needed
>to run the CTE.
ajelinek@gmail.com wrote> but if I insert one row before I run the sql the CTE is> CTE is *not* executed is identical to the explain where there is one row
> executed and I get a new row in the table. I was hoping that I would see
> a
> difference in the explain, but the explain with an empty table where the
> in
> the table already and the CTE *is* executed resulting in a new row.
Would help to include the explain(s). Did you ANALYZE after the insert; if
not the planner probably still thought the table was empty (thus the
matching explain) but upon execution realized it had records and thus needed
to run the CTE.
Since the executor cannot fully trust the statistics, and a full scan of an
empty table would be very fast, scanning the table to delete would be a
necessary first step before running the CTE for the secondary conditions
(where clause). An implicit first-condition/result is that a DELETE on an
empty table is effectively a No-Op. The only reason to override that no-op
would be if a CTE needs to be run by policy as Tom noted.Like Tom said, if you don't hide the INSERT inside a function the CTE will
> I thought maybe Postgres was not executing the CTE because it knows that
> there are no rows in the table for it to delete, however if I change the
> CTE to be an insert returning instead of a function I get different
> results. Even when the table is empty I get new row created.
always be executed.
David J.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Bug-Function-with-side-effects-not-evaluated-in-CTE-tp5774792p5775095.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Merlin Moncure-2 wrote > Regardless, the point at hand is whether specific plan semantics down > the chain can control whether or not volatile expressions should run. > Clearly, at least to me, they should not. I don't personally see any solid reason to reject the always evaluate CTEs with volatile expressions option but at the same time am then curious that if it keeps coming up and constitutes a POLA violation why it hasn't been done (e.g., insufficient demand for the cost or are there technical concerns). How severe is the POLA violation and are there compelling use-cases for and/or against implementing this solution? Put differently ideally this should be put either on the todo list or the "we do not want" list and further inquiries can then go to building up enough popular demand to convince someone to implement it. Maybe there should be a "convince us" list? Not a todo yet but something still be considered. Keeping in mind that there are likely volatile queries relying on the current optimization that would become non-optimized and possibly very poorly performing. Since the "fix" for these would be a simple alter function to make them stable the cost benefit seems worthwhile - since those functions are probably mis-identified in the first place due to the use of the default. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Bug-Function-with-side-effects-not-evaluated-in-CTE-tp5774792p5775125.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
this thread hasn't really provided a compelling use-case for making a change: the example
provided is too contrived.
I am of the belief that if the function in a CTE is volatile, that it should be executed unconditionally.
Moshe Jacobson
"Quality is not an act, it is a habit." -- Aristotle
On Fri, Oct 18, 2013 at 8:37 PM, David Johnston <polobo@yahoo.com> wrote: > Merlin Moncure-2 wrote >> Regardless, the point at hand is whether specific plan semantics down >> the chain can control whether or not volatile expressions should run. >> Clearly, at least to me, they should not. > > Put differently ideally this should be put either on the todo list or the > "we do not want" list and further inquiries can then go to building up > enough popular demand to convince someone to implement it. Maybe there > should be a "convince us" list? Not a todo yet but something still be > considered. I would vote it as todo. > Keeping in mind that there are likely volatile queries relying on the > current optimization that would become non-optimized and possibly very > poorly performing. Since the "fix" for these would be a simple alter > function to make them stable the cost benefit seems worthwhile - since those > functions are probably mis-identified in the first place due to the use of > the default. Any reliance on that behavior would be wrong because it's expressly contraindicated by the documentation. TBH, If it were not specifically documented that way, I would be considering this to be bugged behavior. As for function mis-identification, fair point, but that's a general problem and not specifically related to CTE evaluation. merlin
Merlin Moncure-2 wrote > Any reliance on that behavior would be wrong because it's expressly > contraindicated by the documentation. That makes no practical difference since the decision to make the function volatile is not conscious due to it being the default; and the current behavior hides the fact that what they are doing is unsupported since they do nothing special to invoke the optimization. The people likely to be hit by this are those with the least experience and so in making the change, which I do support, communication of the behavior difference needs to be done in such a way as to reasonably reach and gain understanding from these people. That aside, I'm not coming up with any standard idioms that would benefit from this optimization so the scope of the problem may very well be small enough to just bite the bullet and deal with regression complaints as they are voiced. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Bug-Function-with-side-effects-not-evaluated-in-CTE-tp5774792p5775292.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Moshe Jacobson <moshe@neadwerx.com> writes: > I am of the belief that if the function in a CTE is volatile, that it > should be executed unconditionally. [ shrug... ] Consider SELECT volatile_function(i) FROM generate_series(1, 10) i LIMIT 1; How many times should the volatile function get executed? If your answer is not "10", how is this different from the CTE case? This LIMIT clause is restricting the number of times the function executes in pretty much the same way that our definition of CTE evaluation does, AFAICS. You could of course argue that our definition of LIMIT is wrong too, but that's going to raise the bar for convincing people even higher, because of the number of existing applications that such a redefinition would break. regards, tom lane
Consider
SELECT volatile_function(i) FROM generate_series(1, 10) i LIMIT 1;
How many times should the volatile function get executed? If your answer
is not "10", how is this different from the CTE case? This LIMIT clause
is restricting the number of times the function executes in pretty much
the same way that our definition of CTE evaluation does, AFAICS.
Moshe Jacobson
"Quality is not an act, it is a habit." -- Aristotle
Moshe Jacobson <moshe@neadwerx.com> writes: > On Mon, Oct 21, 2013 at 2:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Consider >> >> SELECT volatile_function(i) FROM generate_series(1, 10) i LIMIT 1; >> >> How many times should the volatile function get executed? If your answer >> is not "10", how is this different from the CTE case? This LIMIT clause >> is restricting the number of times the function executes in pretty much >> the same way that our definition of CTE evaluation does, AFAICS. > I don't think your example above is analogous, because in your example, you > are asking *how many times* to execute the function, whereas in my example, > the question is *whether* to execute the query at all. No, it's about how many times to execute it, ie, how many rows to pull from the CTE. In particular, the optimization you're complaining about is an early-termination rule that's not fundamentally different IMO from what LIMIT does. More generally, what you're arguing for is that the executor's behavior should change depending on whether a query contains a volatile function. That's a direction I think we shouldn't go in. Up to now, the presence of volatile functions has been something that can disable particular planner optimizations, but it's not of concern to the executor. regards, tom lane
Tom Lane-2 wrote > Moshe Jacobson < > moshe@ > > writes: >> I am of the belief that if the function in a CTE is volatile, that it >> should be executed unconditionally. > > [ shrug... ] Consider > > SELECT volatile_function(i) FROM generate_series(1, 10) i LIMIT 1; > > How many times should the volatile function get executed? If your answer > is not "10", how is this different from the CTE case? This LIMIT clause > is restricting the number of times the function executes in pretty much > the same way that our definition of CTE evaluation does, AFAICS. > > You could of course argue that our definition of LIMIT is wrong too, > but that's going to raise the bar for convincing people even higher, > because of the number of existing applications that such a redefinition > would break. The CTE would functionally replace the generate_series() call as opposed to the select-list evaluation. Since the CTE establishes an optimization boundary the parts of the query below the main (pulling) FROM clause should not (or need not) influence the evaluation of the CTE. The two comparable queries are: A) WITH vf ( SELECT volatile_function(x) FROM generate_series(1,10) ) SELECT * FROM vf LIMIT 1 B) SELECT volatile_function(x) FROM generate_series(1,10) gs (x) LIMIT 1 In (A) the relation "vf" - which is a 10-row table with the result of volatile_function as the only column - is limited to a single record and that whole row is output as-is (because of the "*") In (B) the relation "gs" - which is 10 rows having the result of generate_series as the only column - is limited to a single row and then the select-list project occurs against that single row (the volatile_function) This is my naive, not technically informed, opinion of how these two constructs differ. This makes the optimization boundary characteristic of CTEs much stronger so making such a boundary dependent upon whether the CTE contains any volatile_functions seems desirable. That way if "volatile_function" is instead made stable then its evaluation 10-times can be avoided; though in this case that would depend on whether pushing down the LIMIT is even valid. As commented by Moshe the "number of times" is less and issue than "yes/no" determination but I guess that any simple implementation would have to handle both cases identically so its impossible to ignore the implication on the "number of times" queries in solving the "yes/no" problem. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Bug-Function-with-side-effects-not-evaluated-in-CTE-tp5774792p5775321.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
David Johnston <polobo@yahoo.com> writes: > The two comparable queries are: > A) WITH vf ( SELECT volatile_function(x) FROM generate_series(1,10) ) > SELECT * FROM vf LIMIT 1 > B) SELECT volatile_function(x) FROM generate_series(1,10) gs (x) LIMIT 1 > In (A) the relation "vf" - which is a 10-row table with the result of > volatile_function as the only column - is limited to a single record and > that whole row is output as-is (because of the "*") > In (B) the relation "gs" - which is 10 rows having the result of > generate_series as the only column - is limited to a single row and then the > select-list project occurs against that single row (the volatile_function) Just for the record, your interpretation of (B) is wrong. LIMIT acts after select-list evaluation --- try a set-returning function in the select list to see that this is true. regards, tom lane
On Mon, Oct 21, 2013 at 2:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Moshe Jacobson <moshe@neadwerx.com> writes: >> On Mon, Oct 21, 2013 at 2:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> How many times should the volatile function get executed? If your answer >>> is not "10", how is this different from the CTE case? This LIMIT clause >>> is restricting the number of times the function executes in pretty much >>> the same way that our definition of CTE evaluation does, AFAICS. > >> I don't think your example above is analogous, because in your example, you >> are asking *how many times* to execute the function, whereas in my example, >> the question is *whether* to execute the query at all. > > No, it's about how many times to execute it, ie, how many rows to pull > from the CTE. In particular, the optimization you're complaining about > is an early-termination rule that's not fundamentally different IMO from > what LIMIT does. Well, I think it *is* different precisely because it is in a CTE; that's really the whole point. LIMIT is expressly declared and has obvious (well, mostly) user visible semantics. > More generally, what you're arguing for is that the executor's behavior > should change depending on whether a query contains a volatile function. > That's a direction I think we shouldn't go in. Up to now, the presence of > volatile functions has been something that can disable particular planner > optimizations, but it's not of concern to the executor. Ah -- rats. Well, it (arguably) should be concerned but I see your point. merlin
Tom Lane-2 wrote > try a set-returning function in the > select list to see that this is true. Random thoughts... Noted - though then there appears to be various optimizations at play here then... [somewhat dated 9.0.X version] SELECT x, generate_series(x, 5) AS y FROM generate_series(1,3) gs (x) LIMIT 6 QUERY PLAN Limit (cost=0.00..0.08 rows=6 width=4) (actual time=0.011..0.015 rows=6 loops=1) -> Function Scan on generate_series gs (cost=0.00..12.50 rows=1000 width=4) (actual time=0.011..0.015 rows=6 loops=1) Total runtime: 0.031 ms ...unfortunately EXPLAIN doesn't provide much help in understanding where. Related to other postings: can the planner generate an instruction for the executor that in effect forces (complete?) execution of a specific node by the executor? The executor doesn't care that it is being told to force execution due to a volatile function declaration or any other reason is just knows it has to do it. The original case is: WITH cte AS ( volatile_function ) DELETE FROM empty_table How about a compromise position making the above work but without altering existing mechanics? I would suppose that would require new syntax, and the ability for the planner to force the executor to evaluate a CTE expression unconditionally, but it would make the above query idiom work correctly. WITH cte (ALWAYS EXECUTE) AS ( SELECT volatile_function(1) ) DELETE FROM empty_table I guess ignoring implementation considerations is this idiom, and means of executing an UPSERT as noted by the OP, something that is likely to become common given the currently provided CTE functionality. This ability has only been recently added to PostgreSQL and now we are seeing it in action and coming across limitations. In addition to evaluating the technical limitations on why something wasn't implemented initially it is good to get viewpoints on whether these unexpected uses of this feature are something that should be made possible/easier with further enhancements to the system. If the enhancements are desirable then at minimum a ToDo item should be created wherein implementation factors can be discussed. In evaluating the above discussion of whether the current implementation of LIMIT is consistent with the current implementation of CTEs is tabled in favor of discussing whether there is a demonstrable need/desire for having some functions always evaluated when inside a CTE. If so maybe altering the function definition, instead of the CTE, would be a valid solution. But if one is focused on why the CTE behaves the way it does one may tend to lose sight of trying to solve the larger problem in preference to trying to justify maintaining the status-quo. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Bug-Function-with-side-effects-not-evaluated-in-CTE-tp5774792p5775329.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
I don't think your example above is analogous, because in your example, you are asking how many times to execute the function, whereas in my example, the question is whether to execute the query at all.If the outer statement of the CTE doesn't need to use the contents of the CTE, and there is no volatile function in there, then I agree that it's fine not to execute it.
Surely the point is that "not executing at all" is just a particular case of "executing N times" where N is calculated as zero?
Your original example was this:
with tt_created as
( select fn_new_item(1) as item
)
That CTE has a maximum of 1 row, so the choices are only 0 or 1. But how is it fundamentally different from this:with tt_created as
( select fn_new_item(i) as item from generate_series(1, 10) as i
)
This CTE has up to 10 rows, but the current optimization (as I understand it) might mean that only 5 of those are calculated. It might also decide that 0 of them are calculated, but that's not a qualitatively different decision, it's just a different value for the "how many rows do you need" parameter.Personally, I'm definitely in the "this is surprising behaviour" camp, although "surprising" and "wrong" aren't *necessarily* the same thing, so I'll leave it to greater minds to decide on the latter...
-- Rowan Collins [IMSoP]
In my opinion, the simplest and most correct way to handle this is to document that there are no guarantees about what will happen with volatile functions in these strange cases. PostgreSQL shouldn't have to make guarantees about whether functions are evaluated in CTEs or what have you; it should have the freedom to optimize those things away or not.
On 10/21/2013 3:52 PM, BladeOfLight16 wrote: > I've only skimmed this thread, but clearly, this is why using > functions with side effects in the middle of complex queries is a bad > idea. =) Something like SELECT func_with_side_effect(1); is probably > fine, but beyond that, put the function in the middle of a DO block or > something and actually code what you want to happen. > > In terms of "expected" or "surprising" behavior, I don't think you can > say ANY behavior could be expected. SQL is designed to be declarative. > When it comes to retrieval (which is the issue originally raised since > this involves a SELECT before the modification), you tell it what you > want, and some engine figures out the best way to retrieve it. The > engine is allowed to make whatever optimizations it chooses as long as > the result set is correct. So if you really want to modify something, > be explicit and don't drop a function with side effects in the middle > of a complex query like this. God only knows what the engine will do > with that. indeed. ran into someone who was calling pg_stop_backup() deep in a heavily nested query+function call mess, totally blew my mind why anyone would think that was a good idea. -- john r pierce 37N 122W somewhere on the middle of the left coast
In my opinion, the simplest and most correct way to handle this is to document that there are no guarantees about what will happen with volatile functions in these strange cases. PostgreSQL shouldn't have to make guarantees about whether functions are evaluated in CTEs or what have you; it should have the freedom to optimize those things away or not.
BladeOfLight16 wrote > Regarding UPSERT in particular, are you working with a single row or a set > of rows? If a single row, is there a reason you can't perform a SELECT > before hand to see if the PK is already there and then INSERT or UPDATE > accordingly? If multiple rows, is there a reason you can't UPDATE ... > SELECT * FROM rows WHERE pk IN ... and then INSERT ... SELECT * FROM rows > WHERE pk NOT IN ...? It seems to me that would be more > readable/maintainable than relying on a particular CTE behavior. Saw this it my inbox but not Nabble so I'll copy it here: > From Moshe; the OP: > > We use stored procs to provide us the functionality of an UPSERT, which > PostgreSQL lacks. > We are using this in the first CTE to create new entries in a table, and > we are using the DELETE to delete the entries that already existed that we > didn't just create. > I am of the belief that if the function in a CTE is volatile, that it > should be executed unconditionally. This seems like a terrible idea. To rephrase to make sure I understand: You "always" insert a new record. Then, if an existing record was found you DELETE that existing record. Thus at no point do you actually issue an UPDATE. How are you dealing with Primary and Foreign Keys? I guess you could defer evaluate them so you can have a duplicate PK on the table before the deletion occurs...though whether a FK will allow this kind of behavior I do not know. It would help is Moshe would post a minimally viable working example of the entire use-case so that its desirability can be assessed and potential short-term alternative provided since even if desired this could not be released until 9.4 as it constitutes a behavior change (I don't think anyone is going to accept this a being a bug-fix no matter what solution is offered). David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Bug-Function-with-side-effects-not-evaluated-in-CTE-tp5774792p5775359.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Mon, Oct 21, 2013 at 7:52 PM, David Johnston <polobo@yahoo.com> wrote:
It would help is Moshe would post a minimally viable working example of the
entire use-case so that its desirability can be assessed and potential
short-term alternative provided since even if desired this could not be
released until 9.4 as it constitutes a behavior change (I don't think anyone
is going to accept this a being a bug-fix no matter what solution is
offered).
Here is the full code. It is not “minimal”, but actually what we are using.
fn_get_create_or_update_space_sku() will create a non-existent row, or update it with the passed-in data if it already exists.
You’ll notice that in this version I don’t use NOT IN( ) but rather another CTE with a left join.
It behaves the same way.
I’ve put $varname in certain places to indicate that a value is going to go in there. Some of these are actually bound with placeholders, but I left it like this for clarity.
with tt_space_sku_data as
( select unnest(array[$sku_array]) as sku, unnest(array[$quantity_array]) as quantity , unnest(array[$primary_array]) as primary , unnest(array[$position_array]) as position
),
tt_space_skus as
( select fn_get_create_or_update_space_sku ( $pk_space , tt.sku , tt.quantity , tt.primary , tt.position , TRUE ) as space_sku from tt_space_sku_data tt
),
tt_space_skus_to_delete as( select ss.space_sku from tb_space_sku ss left join tt_space_skus tt on tt.space_sku = ss.space_sku where tt.space_sku is null and ss.space = $pk_space
)
delete from tb_space_sku ss using tt_space_skus_to_delete ttwhere ss.space = $pk_space and ss.space_sku = tt.space_sku
Moshe Jacobson
Manager of Systems Engineering, Nead Werx Inc.
2323 Cumberland Parkway · Suite 201 · Atlanta, GA 30339
“Quality is not an act, it is a habit.” — Aristotle
Here is the full code. It is not “minimal”, but actually what we are using.
fn_get_create_or_update_space_sku() will create a non-existent row, or update it with the passed-in data if it already exists.
You’ll notice that in this version I don’t use NOT IN( ) but rather another CTE with a left join.
It behaves the same way.
I’ve put $varname in certain places to indicate that a value is going to go in there. Some of these are actually bound with placeholders, but I left it like this for clarity.with tt_space_sku_data as ( select unnest(array[$sku_array]) as sku, unnest(array[$quantity_array]) as quantity , unnest(array[$primary_array]) as primary , unnest(array[$position_array]) as position ), tt_space_skus as ( select fn_get_create_or_update_space_sku ( $pk_space , tt.sku , tt.quantity , tt.primary , tt.position , TRUE ) as space_sku from tt_space_sku_data tt ), tt_space_skus_to_delete as( select ss.space_sku from tb_space_sku ss left join tt_space_skus tt on tt.space_sku = ss.space_sku where tt.space_sku is null and ss.space = $pk_space ) delete from tb_space_sku ss using tt_space_skus_to_delete ttwhere ss.space = $pk_space and ss.space_sku = tt.space_sku
This doesn't depend on CTE behavior, and I find it simpler and easier to interpret (and therefore more maintainable). Does that suit your needs?