Thread: Bug? Function with side effects not evaluated in CTE

Bug? Function with side effects not evaluated in CTE

From
Moshe Jacobson
Date:

In a new database test, I create the following table and function:

create table tb_item ( item integer );

create or replace function fn_new_item( in_item integer )
returns integer language plpgsql as$_$
begin   insert into tb_item values( in_item );   return in_item;
end$_$;

Then, I execute the following query and see DELETE 0 as you would expect:

test=# with tt_created as
(   select fn_new_item(1) as item
)
delete from tb_itemwhere item not in (select item from tt_created);
DELETE 0

However, if everything is correct, we should see one row in the table, where item = 1. But we do not:

test=# table tb_item;item
------
(0 rows)

I tried to outsmart Postgres with the following query, but it failed in the same way:

with tt_created as
(   select fn_new_item(1) as item
),
tt_to_delete as
(   select i.item     from tb_item ileft join tt_created c       on i.item = c.item    where c.item is null
)
delete from tb_item iusing tt_to_delete tdwhere td.item = i.item;

However, It behaves as one would expect if the first CTE is built with INSERT ... RETURNING.
It also behaves as one would expect if the table starts out non-empty.

This seems like a bug. Would someone please look into this?

Thanks.

Moshe Jacobson
Manager of Systems Engineering, Nead Werx Inc.
2323 Cumberland Parkway · Suite 201 · Atlanta, GA 30339

“Quality is not an act, it is a habit.” — Aristotle

Re: Bug? Function with side effects not evaluated in CTE

From
Moshe Jacobson
Date:
By the way, I am running Postres 9.1.9.

Re: Bug? Function with side effects not evaluated in CTE

From
Tom Lane
Date:
Moshe Jacobson <moshe@neadwerx.com> writes:
> However, It behaves as one would expect if the first CTE is built with INSERT
> ... RETURNING.

CTEs containing INSERT/UPDATE/DELETE are guaranteed to be executed exactly
once.  CTEs containing SELECTs are guaranteed to be executed at most once
(the documentation phrases that as "execution can stop early if the outer
query doesn't read all the rows" --- in this case, it read none of them,
since the outer query never had to evaluate the NOT IN).

I see no bug here.

            regards, tom lane


Re: Bug? Function with side effects not evaluated in CTE

From
Merlin Moncure
Date:
On Wed, Oct 16, 2013 at 4:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Moshe Jacobson <moshe@neadwerx.com> writes:
>> However, It behaves as one would expect if the first CTE is built with INSERT
>> ... RETURNING.
>
> CTEs containing INSERT/UPDATE/DELETE are guaranteed to be executed exactly
> once.  CTEs containing SELECTs are guaranteed to be executed at most once
> (the documentation phrases that as "execution can stop early if the outer
> query doesn't read all the rows" --- in this case, it read none of them,
> since the outer query never had to evaluate the NOT IN).
>
> I see no bug here.

You can force the CTE to be read a couple of different ways; it isn't
very difficult to do: just tweak the final query so that it *must*
touch the SELECT result somehow (e.g. AND (SELECT COUNT(*) FROM
tt_created) > 0).

That being said, I do think it might be better behavior (and still
technically correct per the documentation) if volatile query
expressions were force-evaluated.

merlin


Re: Bug? Function with side effects not evaluated in CTE

From
Rowan Collins
Date:
On 17/10/2013 00:06, Merlin Moncure wrote:
That being said, I do think it might be better behavior (and still
technically correct per the documentation) if volatile query
expressions were force-evaluated.

This sounds reasonable for a "yes or no" case like this, but wouldn't it raise the question of how many times the function should be evaluated?

What if the query looked more like this:

with tt_created as
(
    select fn_new_item(foo) as item
    from some_huge_table
)
select item
from tt_created
limit 10


Should the CTE be calculated in its entirety, running the function for every row in some_huge_table? Or should it run at most 10 times?

Which is desired would depend on the situation, but there's no real way to indicate in the syntax.
-- 
Rowan Collins
[IMSoP]

Re: Bug? Function with side effects not evaluated in CTE

From
Adam Jelinek
Date:
I thought this was interesting, and wanted to make sure I understood what is going on, but the more tests I run the more confused I get. 

if I take the exact set up outlined by Mosche I get the same results in 9.3 (as expected) , but if I insert one row before I run the sql the CTE is executed and I get a new row in the table.  I was hoping that I would see a difference in the explain, but the explain with an empty table where the CTE is not executed is identical to the explain where there is one row in the table already and the CTE is executed resulting in a new row.  I thought maybe Postgres was not executing the CTE because it knows that there are no rows in the table for it to delete, however if I change the CTE to be an insert returning instead of a function I get different results.  Even when the table is empty I get new row created.

I would really like to know "why' it is working like this so something similar does not come back and bite me in the future.

Thanks


On Wed, Oct 16, 2013 at 7:14 PM, Rowan Collins <rowan.collins@gmail.com> wrote:
On 17/10/2013 00:06, Merlin Moncure wrote:
That being said, I do think it might be better behavior (and still
technically correct per the documentation) if volatile query
expressions were force-evaluated.

This sounds reasonable for a "yes or no" case like this, but wouldn't it raise the question of how many times the function should be evaluated?

What if the query looked more like this:

with tt_created as
(
    select fn_new_item(foo) as item
    from some_huge_table
)
select item
from tt_created
limit 10


Should the CTE be calculated in its entirety, running the function for every row in some_huge_table? Or should it run at most 10 times?

Which is desired would depend on the situation, but there's no real way to indicate in the syntax.
-- 
Rowan Collins
[IMSoP]

Re: Bug? Function with side effects not evaluated in CTE

From
David Johnston
Date:
ajelinek@gmail.com wrote
> but if I insert one row before I run the sql the CTE is
> executed and I get a new row in the table.  I was hoping that I would see
> a
> difference in the explain, but the explain with an empty table where the
> CTE is *not* executed is identical to the explain where there is one row
> in
> the table already and the CTE *is* executed resulting in a new row.

Would help to include the explain(s).  Did you ANALYZE after the insert; if
not the planner probably still thought the table was empty (thus the
matching explain) but upon execution realized it had records and thus needed
to run the CTE.

Since the executor cannot fully trust the statistics, and a full scan of an
empty table would be very fast, scanning the table to delete would be a
necessary first step before running the CTE for the secondary conditions
(where clause).  An implicit first-condition/result is that a DELETE on an
empty table is effectively a No-Op.  The only reason to override that no-op
would be if a CTE needs to be run by policy as Tom noted.


> I thought maybe Postgres was not executing the CTE because it knows that
> there are no rows in the table for it to delete, however if I change the
> CTE to be an insert returning instead of a function I get different
> results.  Even when the table is empty I get new row created.

Like Tom said, if you don't hide the INSERT inside a function the CTE will
always be executed.

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Bug-Function-with-side-effects-not-evaluated-in-CTE-tp5774792p5775095.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Bug? Function with side effects not evaluated in CTE

From
Merlin Moncure
Date:
On Wed, Oct 16, 2013 at 7:14 PM, Rowan Collins <rowan.collins@gmail.com> wrote:
> On 17/10/2013 00:06, Merlin Moncure wrote:
>
> That being said, I do think it might be better behavior (and still
> technically correct per the documentation) if volatile query
> expressions were force-evaluated.
>
>
> This sounds reasonable for a "yes or no" case like this, but wouldn't it
> raise the question of how many times the function should be evaluated?
>
> What if the query looked more like this:
>
> with tt_created as
> (
>     select fn_new_item(foo) as item
>     from some_huge_table
> )
> select item
> from tt_created
> limit 10
>
>
> Should the CTE be calculated in its entirety, running the function for every
> row in some_huge_table? Or should it run at most 10 times?
>
> Which is desired would depend on the situation, but there's no real way to
> indicate in the syntax.

ISTM the answer is clearly "in its entirety".  The premise is that the
optimization of non-evaluation of CTE queries is not dependent on
mechanics further down the chain if the CTE has volatile expressions.

If you wanted to structure the query so that the function was run only
10 times, that could be done trivially by moving the limit inside the
CTE.

merlin


Re: Bug? Function with side effects not evaluated in CTE

From
David Johnston
Date:
Merlin Moncure-2 wrote
> If you wanted to structure the query so that the function was run only
> 10 times, that could be done trivially by moving the limit inside the
> CTE.

It is not trivial if you want to wrap the CTE expression into a VIEW and the
caller of the view only wishes to see/evaluate a subset of the data - even
if just to check the structure with some sample contents?  A plain LIMIT is
limited though and usually an ORDER BY would be added for some kind of
top/bottom-N listing.  In such a case the presence of the ORDER BY would
trigger the full scan and operate before the LIMIT.

And why is volatile so special here?  A stable function seems just as good a
candidate for this behavior and even an immutable one.  You are taking the
presence of "volatile" to mean that the function may be wrapping an
INSERT/UPDATE/DELETE and so should be executed in case that is true.  The
examples are focused on avoiding expensive and needless SELECT evaluations.
Since INSERT/UPDATE/DELETE cannot occur in a non-volatile function, and
expensive table-processing/generating functions can be implemented as
stable, this would seem to work at the cost of further leveraging the
volatility modifier on the function - which defaults to VOLATILE.

Not sure if this is a good or bad decision to make but this thread hasn't
really provided a compelling use-case for making a change: the example
provided is too contrived.  And while the current situation is somewhat
difficult to learn attaching yet more logic to the function volatility is
going to be difficult to learn too.

Wrapping INSERT/UPDATE/DELETE inside a non-procedural function is not that
common and in those cases where it is used it is most often a standalone API
and not something that is going to be reasonably used in a CTE.  Putting it
into a CTE is an attempt to avoid procedural logic and/or SQL scripting; but
those tools are there to solve just this kind of problem.  The separation of
data insertion from data querying is just as desirable as separating data
from presentation.

David J.






--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Bug-Function-with-side-effects-not-evaluated-in-CTE-tp5774792p5775098.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Bug? Function with side effects not evaluated in CTE

From
Merlin Moncure
Date:
On Fri, Oct 18, 2013 at 4:08 PM, David Johnston <polobo@yahoo.com> wrote:
> And why is volatile so special here?  A stable function seems just as good a
> candidate for this behavior and even an immutable one.

Absolutely disagree with this.  Stable operations do not have side
effects and volatile operations do (or at least they can).  Point
being: after the query runs the database is in the same state
regardless if a stable function is run or not.  So, for stable
operations (which covers most queries), the current behavior is good.
But, it's a completely different story in the case of volatile
expression and that's why I believe the optimization to skip is
plainly a POLA violation.  OP is not the first to bring it up by the
way.

> Wrapping INSERT/UPDATE/DELETE inside a non-procedural function is not that
> common and in those cases where it is used it is most often a standalone API

I'm not buying this either.  It may not be 'common' but people are
clearly exercising the behavior and 'data modifying with' is IMNSHO a
much better coding paradigm than functions wrapping same.
Regardless, the point at hand is whether specific plan semantics down
the chain can control whether or not volatile expressions should run.
Clearly, at least to me, they should not.

merlin


Re: Bug? Function with side effects not evaluated in CTE

From
Adam Jelinek
Date:
>Would help to include the explain(s).  Did you ANALYZE after the insert; if
>not the planner probably still thought the table was empty (thus the
>matching explain) but upon execution realized it had records and thus needed
>to run the CTE.

I did not do an ANALYZE after the insert, I think the plan would still be the same either way.  I did what I should have done to start with and ran explain analyze on the query which showed that it found (or did not find) matching rows.  After reading the remaining emails I think I understand Thanks for explaining.

Here is my two cents (take it for what it is worth). I agree with Merlin on this.  I work as a developer at a large corporation, and in my experience very few of the developers can write "good" SQL/data access, and then only a fraction of them even try to understand the planner.  Although the behavior makes sense (from what was explained above) I does not do what one would expect (the same thing every time).   Then when you read the manual on VOLATILE it states the optimizer makes no assumptions on such functions, resulting in people asking why is this happening.

 

On Fri, Oct 18, 2013 at 3:39 PM, David Johnston <polobo@yahoo.com> wrote:
ajelinek@gmail.com wrote
> but if I insert one row before I run the sql the CTE is
> executed and I get a new row in the table.  I was hoping that I would see
> a
> difference in the explain, but the explain with an empty table where the
> CTE is *not* executed is identical to the explain where there is one row
> in
> the table already and the CTE *is* executed resulting in a new row.

Would help to include the explain(s).  Did you ANALYZE after the insert; if
not the planner probably still thought the table was empty (thus the
matching explain) but upon execution realized it had records and thus needed
to run the CTE.

Since the executor cannot fully trust the statistics, and a full scan of an
empty table would be very fast, scanning the table to delete would be a
necessary first step before running the CTE for the secondary conditions
(where clause).  An implicit first-condition/result is that a DELETE on an
empty table is effectively a No-Op.  The only reason to override that no-op
would be if a CTE needs to be run by policy as Tom noted.


> I thought maybe Postgres was not executing the CTE because it knows that
> there are no rows in the table for it to delete, however if I change the
> CTE to be an insert returning instead of a function I get different
> results.  Even when the table is empty I get new row created.

Like Tom said, if you don't hide the INSERT inside a function the CTE will
always be executed.

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/Bug-Function-with-side-effects-not-evaluated-in-CTE-tp5774792p5775095.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Bug? Function with side effects not evaluated in CTE

From
David Johnston
Date:
Merlin Moncure-2 wrote
> Regardless, the point at hand is whether specific plan semantics down
> the chain can control whether or not volatile expressions should run.
> Clearly, at least to me, they should not.

I don't personally see any solid reason to reject the always evaluate CTEs
with volatile expressions option but at the same time am then curious that
if it keeps coming up and constitutes a POLA violation why it hasn't been
done (e.g., insufficient demand for the cost or are there technical
concerns).  How severe is the POLA violation and are there compelling
use-cases for and/or against implementing this solution?

Put differently ideally this should be put either on the todo list or the
"we do not want" list and further inquiries can then go to building up
enough popular demand to convince someone to implement it.  Maybe there
should be a "convince us" list? Not a todo yet but something still be
considered.

Keeping in mind that there are likely volatile queries relying on the
current optimization that would become non-optimized and possibly very
poorly performing.  Since the "fix" for these would be a simple alter
function to make them stable the cost benefit seems worthwhile - since those
functions are probably mis-identified in the first place due to the use of
the default.

David J.



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Bug-Function-with-side-effects-not-evaluated-in-CTE-tp5774792p5775125.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Bug? Function with side effects not evaluated in CTE

From
Moshe Jacobson
Date:

On Fri, Oct 18, 2013 at 5:08 PM, David Johnston <polobo@yahoo.com> wrote:
this thread hasn't really provided a compelling use-case for making a change: the example
provided is too contrived.

It seems contrived because I distilled it down from what it originally was. 
There is an actual use case in which this was failing:
We use stored procs to provide us the functionality of an UPSERT, which PostgreSQL lacks.
We are using this in the first CTE to create new entries in a table, and we are using the DELETE to delete the entries that already existed that we didn't just create.
I am of the belief that if the function in a CTE is volatile, that it should be executed unconditionally.


Moshe Jacobson
Manager of Systems Engineering, Nead Werx Inc.
2323 Cumberland Parkway · Suite 201 · Atlanta, GA 30339

"Quality is not an act, it is a habit." -- Aristotle

Re: Bug? Function with side effects not evaluated in CTE

From
Merlin Moncure
Date:
On Fri, Oct 18, 2013 at 8:37 PM, David Johnston <polobo@yahoo.com> wrote:
> Merlin Moncure-2 wrote
>> Regardless, the point at hand is whether specific plan semantics down
>> the chain can control whether or not volatile expressions should run.
>> Clearly, at least to me, they should not.
>
> Put differently ideally this should be put either on the todo list or the
> "we do not want" list and further inquiries can then go to building up
> enough popular demand to convince someone to implement it.  Maybe there
> should be a "convince us" list? Not a todo yet but something still be
> considered.

I would vote it as todo.

> Keeping in mind that there are likely volatile queries relying on the
> current optimization that would become non-optimized and possibly very
> poorly performing.  Since the "fix" for these would be a simple alter
> function to make them stable the cost benefit seems worthwhile - since those
> functions are probably mis-identified in the first place due to the use of
> the default.

Any reliance on that behavior would be wrong because it's expressly
contraindicated by the documentation.  TBH, If it were not
specifically documented that way, I would be considering this to be
bugged behavior.  As for function mis-identification, fair point, but
that's a general problem and not specifically related to CTE
evaluation.

merlin


Re: Bug? Function with side effects not evaluated in CTE

From
David Johnston
Date:
Merlin Moncure-2 wrote
> Any reliance on that behavior would be wrong because it's expressly
> contraindicated by the documentation.

That makes no practical difference since the decision to make the function
volatile is not conscious due to it being the default; and the current
behavior hides the fact that what they are doing is unsupported since they
do nothing special to invoke the optimization. The people likely to be hit
by this are those with the least experience and so in making the change,
which I do support, communication of the behavior difference needs to be
done in such a way as to reasonably reach and gain understanding from these
people.

That aside, I'm not coming up with any standard idioms that would benefit
from this optimization so the scope of the problem may very well be small
enough to just bite the bullet and deal with regression complaints as they
are voiced.

David J.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Bug-Function-with-side-effects-not-evaluated-in-CTE-tp5774792p5775292.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Bug? Function with side effects not evaluated in CTE

From
Tom Lane
Date:
Moshe Jacobson <moshe@neadwerx.com> writes:
> I am of the belief that if the function in a CTE is volatile, that it
> should be executed unconditionally.

[ shrug... ]  Consider

    SELECT volatile_function(i) FROM generate_series(1, 10) i LIMIT 1;

How many times should the volatile function get executed?  If your answer
is not "10", how is this different from the CTE case?  This LIMIT clause
is restricting the number of times the function executes in pretty much
the same way that our definition of CTE evaluation does, AFAICS.

You could of course argue that our definition of LIMIT is wrong too,
but that's going to raise the bar for convincing people even higher,
because of the number of existing applications that such a redefinition
would break.

            regards, tom lane


Re: Bug? Function with side effects not evaluated in CTE

From
Moshe Jacobson
Date:

On Mon, Oct 21, 2013 at 2:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Consider

        SELECT volatile_function(i) FROM generate_series(1, 10) i LIMIT 1;

How many times should the volatile function get executed?  If your answer
is not "10", how is this different from the CTE case?  This LIMIT clause
is restricting the number of times the function executes in pretty much
the same way that our definition of CTE evaluation does, AFAICS.

I don't think your example above is analogous, because in your example, you are asking how many times to execute the function, whereas in my example, the question is whether to execute the query at all.
If the outer statement of the CTE doesn't need to use the contents of the CTE, and there is no volatile function in there, then I agree that it's fine not to execute it.
But if there is a volatile function, then the CTE query should always be evaluated just like CREATE TEMP TABLE.
There is no question as to how many times to evaluate it here. It is just once.


Moshe Jacobson
Manager of Systems Engineering, Nead Werx Inc.
2323 Cumberland Parkway · Suite 201 · Atlanta, GA 30339

"Quality is not an act, it is a habit." -- Aristotle

Re: Bug? Function with side effects not evaluated in CTE

From
Tom Lane
Date:
Moshe Jacobson <moshe@neadwerx.com> writes:
> On Mon, Oct 21, 2013 at 2:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Consider
>>
>> SELECT volatile_function(i) FROM generate_series(1, 10) i LIMIT 1;
>>
>> How many times should the volatile function get executed?  If your answer
>> is not "10", how is this different from the CTE case?  This LIMIT clause
>> is restricting the number of times the function executes in pretty much
>> the same way that our definition of CTE evaluation does, AFAICS.

> I don't think your example above is analogous, because in your example, you
> are asking *how many times* to execute the function, whereas in my example,
> the question is *whether* to execute the query at all.

No, it's about how many times to execute it, ie, how many rows to pull
from the CTE.  In particular, the optimization you're complaining about
is an early-termination rule that's not fundamentally different IMO from
what LIMIT does.

More generally, what you're arguing for is that the executor's behavior
should change depending on whether a query contains a volatile function.
That's a direction I think we shouldn't go in.  Up to now, the presence of
volatile functions has been something that can disable particular planner
optimizations, but it's not of concern to the executor.

            regards, tom lane


Re: Bug? Function with side effects not evaluated in CTE

From
David Johnston
Date:
Tom Lane-2 wrote
> Moshe Jacobson <

> moshe@

> > writes:
>> I am of the belief that if the function in a CTE is volatile, that it
>> should be executed unconditionally.
>
> [ shrug... ]  Consider
>
>     SELECT volatile_function(i) FROM generate_series(1, 10) i LIMIT 1;
>
> How many times should the volatile function get executed?  If your answer
> is not "10", how is this different from the CTE case?  This LIMIT clause
> is restricting the number of times the function executes in pretty much
> the same way that our definition of CTE evaluation does, AFAICS.
>
> You could of course argue that our definition of LIMIT is wrong too,
> but that's going to raise the bar for convincing people even higher,
> because of the number of existing applications that such a redefinition
> would break.

The CTE would functionally replace the generate_series() call as opposed to
the select-list evaluation.  Since the CTE establishes an optimization
boundary the parts of the query below the main (pulling) FROM clause should
not (or need not) influence the evaluation of the CTE.

The two comparable queries are:

A) WITH vf ( SELECT volatile_function(x) FROM generate_series(1,10) )
SELECT * FROM vf LIMIT 1

B) SELECT volatile_function(x) FROM generate_series(1,10) gs (x) LIMIT 1

In (A) the relation "vf" - which is a 10-row table with the result of
volatile_function as the only column - is limited to a single record and
that whole row is output as-is (because of the "*")

In (B) the relation "gs" - which is 10 rows having the result of
generate_series as the only column - is limited to a single row and then the
select-list project occurs against that single row (the volatile_function)

This is my naive, not technically informed, opinion of how these two
constructs differ.

This makes the optimization boundary characteristic of CTEs much stronger so
making such a boundary dependent upon whether the CTE contains any
volatile_functions seems desirable.  That way if "volatile_function" is
instead made stable then its evaluation 10-times can be avoided; though in
this case that would depend on whether pushing down the LIMIT is even valid.

As commented by Moshe the "number of times" is less and issue than "yes/no"
determination but I guess that any simple implementation would have to
handle both cases identically so its impossible to ignore the implication on
the "number of times" queries in solving the "yes/no" problem.

David J.








--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Bug-Function-with-side-effects-not-evaluated-in-CTE-tp5774792p5775321.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Bug? Function with side effects not evaluated in CTE

From
Tom Lane
Date:
David Johnston <polobo@yahoo.com> writes:
> The two comparable queries are:

> A) WITH vf ( SELECT volatile_function(x) FROM generate_series(1,10) )
> SELECT * FROM vf LIMIT 1

> B) SELECT volatile_function(x) FROM generate_series(1,10) gs (x) LIMIT 1

> In (A) the relation "vf" - which is a 10-row table with the result of
> volatile_function as the only column - is limited to a single record and
> that whole row is output as-is (because of the "*")

> In (B) the relation "gs" - which is 10 rows having the result of
> generate_series as the only column - is limited to a single row and then the
> select-list project occurs against that single row (the volatile_function)

Just for the record, your interpretation of (B) is wrong.  LIMIT acts
after select-list evaluation --- try a set-returning function in the
select list to see that this is true.

            regards, tom lane


Re: Bug? Function with side effects not evaluated in CTE

From
Merlin Moncure
Date:
On Mon, Oct 21, 2013 at 2:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Moshe Jacobson <moshe@neadwerx.com> writes:
>> On Mon, Oct 21, 2013 at 2:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> How many times should the volatile function get executed?  If your answer
>>> is not "10", how is this different from the CTE case?  This LIMIT clause
>>> is restricting the number of times the function executes in pretty much
>>> the same way that our definition of CTE evaluation does, AFAICS.
>
>> I don't think your example above is analogous, because in your example, you
>> are asking *how many times* to execute the function, whereas in my example,
>> the question is *whether* to execute the query at all.
>
> No, it's about how many times to execute it, ie, how many rows to pull
> from the CTE.  In particular, the optimization you're complaining about
> is an early-termination rule that's not fundamentally different IMO from
> what LIMIT does.

Well, I think it *is* different precisely because it is in a CTE;
that's really the whole point.  LIMIT is expressly declared and has
obvious (well, mostly) user visible semantics.

> More generally, what you're arguing for is that the executor's behavior
> should change depending on whether a query contains a volatile function.
> That's a direction I think we shouldn't go in.  Up to now, the presence of
> volatile functions has been something that can disable particular planner
> optimizations, but it's not of concern to the executor.

Ah -- rats.  Well, it (arguably) should be concerned but I see your point.

merlin


Re: Bug? Function with side effects not evaluated in CTE

From
David Johnston
Date:
Tom Lane-2 wrote
> try a set-returning function in the
> select list to see that this is true.

Random thoughts...

Noted - though then there appears to be various optimizations at play here
then...

[somewhat dated 9.0.X version]

SELECT x, generate_series(x, 5) AS y FROM generate_series(1,3) gs (x) LIMIT
6

QUERY PLAN
Limit  (cost=0.00..0.08 rows=6 width=4) (actual time=0.011..0.015 rows=6
loops=1)
  ->  Function Scan on generate_series gs  (cost=0.00..12.50 rows=1000
width=4) (actual time=0.011..0.015 rows=6 loops=1)
Total runtime: 0.031 ms

...unfortunately EXPLAIN doesn't provide much help in understanding where.


Related to other postings: can the planner generate an instruction for the
executor that in effect forces (complete?) execution of a specific node by
the executor?  The executor doesn't care that it is being told to force
execution due to a volatile function declaration or any other reason is just
knows it has to do it.

The original case is:

WITH cte AS ( volatile_function ) DELETE FROM empty_table

How about a compromise position making the above work but without altering
existing mechanics?  I would suppose that would require new syntax, and the
ability for the planner to force the executor to evaluate a CTE expression
unconditionally, but it would make the above query idiom work correctly.

WITH cte (ALWAYS EXECUTE) AS ( SELECT volatile_function(1) ) DELETE FROM
empty_table

I guess ignoring implementation considerations is this idiom, and means of
executing an UPSERT as noted by the OP, something that is likely to become
common given the currently provided CTE functionality.  This ability has
only been recently added to PostgreSQL and now we are seeing it in action
and coming across limitations.  In addition to evaluating the technical
limitations on why something wasn't implemented initially it is good to get
viewpoints on whether these unexpected uses of this feature are something
that should be made possible/easier with further enhancements to the system.
If the enhancements are desirable then at minimum a ToDo item should be
created wherein implementation factors can be discussed.

In evaluating the above discussion of whether the current implementation of
LIMIT is consistent with the current implementation of CTEs is tabled in
favor of discussing whether there is a demonstrable need/desire for having
some functions always evaluated when inside a CTE.  If so maybe altering the
function definition, instead of the CTE, would be a valid solution. But if
one is focused on why the CTE behaves the way it does one may tend to lose
sight of trying to solve the larger problem in preference to trying to
justify maintaining the status-quo.

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Bug-Function-with-side-effects-not-evaluated-in-CTE-tp5774792p5775329.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Bug? Function with side effects not evaluated in CTE

From
Rowan Collins
Date:
On 21/10/2013 20:40, Moshe Jacobson wrote:
I don't think your example above is analogous, because in your example, you are asking how many times to execute the function, whereas in my example, the question is whether to execute the query at all.
If the outer statement of the CTE doesn't need to use the contents of the CTE, and there is no volatile function in there, then I agree that it's fine not to execute it.

Surely the point is that "not executing at all" is just a particular case of "executing N times" where N is calculated as zero?

Your original example was this:
with tt_created as
(   select fn_new_item(1) as item
)
That CTE has a maximum of 1 row, so the choices are only 0 or 1. But how is it fundamentally different from this:
with tt_created as
(   select fn_new_item(i) as item from generate_series(1, 10) as i
)
This CTE has up to 10 rows, but the current optimization (as I understand it) might mean that only 5 of those are calculated. It might also decide that 0 of them are calculated, but that's not a qualitatively different decision, it's just a different value for the "how many rows do you need" parameter.

Personally, I'm definitely in the "this is surprising behaviour" camp, although "surprising" and "wrong" aren't *necessarily* the same thing, so I'll leave it to greater minds to decide on the latter...
-- 
Rowan Collins
[IMSoP]

Re: Bug? Function with side effects not evaluated in CTE

From
BladeOfLight16
Date:
I've only skimmed this thread, but clearly, this is why using functions with side effects in the middle of complex queries is a bad idea. =) Something like SELECT func_with_side_effect(1); is probably fine, but beyond that, put the function in the middle of a DO block or something and actually code what you want to happen.

In terms of "expected" or "surprising" behavior, I don't think you can say ANY behavior could be expected. SQL is designed to be declarative. When it comes to retrieval (which is the issue originally raised since this involves a SELECT before the modification), you tell it what you want, and some engine figures out the best way to retrieve it. The engine is allowed to make whatever optimizations it chooses as long as the result set is correct. So if you really want to modify something, be explicit and don't drop a function with side effects in the middle of a complex query like this. God only knows what the engine will do with that.

In my opinion, the simplest and most correct way to handle this is to document that there are no guarantees about what will happen with volatile functions in these strange cases. PostgreSQL shouldn't have to make guarantees about whether functions are evaluated in CTEs or what have you; it should have the freedom to optimize those things away or not.

Re: Bug? Function with side effects not evaluated in CTE

From
John R Pierce
Date:
On 10/21/2013 3:52 PM, BladeOfLight16 wrote:
> I've only skimmed this thread, but clearly, this is why using
> functions with side effects in the middle of complex queries is a bad
> idea. =) Something like SELECT func_with_side_effect(1); is probably
> fine, but beyond that, put the function in the middle of a DO block or
> something and actually code what you want to happen.
>
> In terms of "expected" or "surprising" behavior, I don't think you can
> say ANY behavior could be expected. SQL is designed to be declarative.
> When it comes to retrieval (which is the issue originally raised since
> this involves a SELECT before the modification), you tell it what you
> want, and some engine figures out the best way to retrieve it. The
> engine is allowed to make whatever optimizations it chooses as long as
> the result set is correct. So if you really want to modify something,
> be explicit and don't drop a function with side effects in the middle
> of a complex query like this. God only knows what the engine will do
> with that.

indeed.   ran into someone who was calling pg_stop_backup() deep in a
heavily nested query+function call mess, totally blew my mind why anyone
would think that was a good idea.



--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



Re: Bug? Function with side effects not evaluated in CTE

From
BladeOfLight16
Date:
On Mon, Oct 21, 2013 at 6:52 PM, BladeOfLight16 <bladeoflight16@gmail.com> wrote:
In my opinion, the simplest and most correct way to handle this is to document that there are no guarantees about what will happen with volatile functions in these strange cases. PostgreSQL shouldn't have to make guarantees about whether functions are evaluated in CTEs or what have you; it should have the freedom to optimize those things away or not.

I'd like to add one thing. I really appreciate that the maintainers of PostgreSQL are so open to there being use cases for seemingly weird things. They try as hard as they can to avoid answering behavior questions with, "Of course it doesn't work; you shouldn't do that." The prevalence of that kind of thinking in the Oracle community is something that causes me a lot of grief since I have to work with Oracle regularly, so I know how valuable PostgreSQL's work at being intuitive even in seemingly weird cases is. This is a special case, though. The "right behavior" isn't even close to clear here; there isn't even a majority consensus as near as I can tell. There are too many weird edge cases to account for them all. When the "right behavior" isn't even close to clear like this, I think it's better to simply avoid the issue entirely and discourage people from depending on any kind of particular behavior.

Regarding UPSERT in particular, are you working with a single row or a set of rows? If a single row, is there a reason you can't perform a SELECT before hand to see if the PK is already there and then INSERT or UPDATE accordingly? If multiple rows, is there a reason you can't UPDATE ... SELECT * FROM rows WHERE pk IN ... and then INSERT ... SELECT * FROM rows WHERE pk NOT IN ...? It seems to me that would be more readable/maintainable than relying on a particular CTE behavior.

Re: Bug? Function with side effects not evaluated in CTE

From
David Johnston
Date:
BladeOfLight16 wrote
> Regarding UPSERT in particular, are you working with a single row or a set
> of rows? If a single row, is there a reason you can't perform a SELECT
> before hand to see if the PK is already there and then INSERT or UPDATE
> accordingly? If multiple rows, is there a reason you can't UPDATE ...
> SELECT * FROM rows WHERE pk IN ... and then INSERT ... SELECT * FROM rows
> WHERE pk NOT IN ...? It seems to me that would be more
> readable/maintainable than relying on a particular CTE behavior.

Saw this it my inbox but not Nabble so I'll copy it here:


> From Moshe; the OP:
>
> We use stored procs to provide us the functionality of an UPSERT, which
> PostgreSQL lacks.
> We are using this in the first CTE to create new entries in a table, and
> we are using the DELETE to delete the entries that already existed that we
> didn't just create.
> I am of the belief that if the function in a CTE is volatile, that it
> should be executed unconditionally.

This seems like a terrible idea.  To rephrase to make sure I understand:

You "always" insert a new record.  Then, if an existing record was found you
DELETE that existing record.  Thus at no point do you actually issue an
UPDATE.

How are you dealing with Primary and Foreign Keys?  I guess you could defer
evaluate them so you can have a duplicate PK on the table before the
deletion occurs...though whether a FK will allow this kind of behavior I do
not know.

It would help is Moshe would post a minimally viable working example of the
entire use-case so that its desirability can be assessed and potential
short-term alternative provided since even if desired this could not be
released until 9.4 as it constitutes a behavior change (I don't think anyone
is going to accept this a being a bug-fix no matter what solution is
offered).

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Bug-Function-with-side-effects-not-evaluated-in-CTE-tp5774792p5775359.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Bug? Function with side effects not evaluated in CTE

From
Moshe Jacobson
Date:

On Mon, Oct 21, 2013 at 7:52 PM, David Johnston <polobo@yahoo.com> wrote:

It would help is Moshe would post a minimally viable working example of the
entire use-case so that its desirability can be assessed and potential
short-term alternative provided since even if desired this could not be
released until 9.4 as it constitutes a behavior change (I don't think anyone
is going to accept this a being a bug-fix no matter what solution is
offered).

Here is the full code. It is not “minimal”, but actually what we are using.
fn_get_create_or_update_space_sku() will create a non-existent row, or update it with the passed-in data if it already exists.
You’ll notice that in this version I don’t use NOT IN( ) but rather another CTE with a left join.
It behaves the same way.
I’ve put $varname in certain places to indicate that a value is going to go in there. Some of these are actually bound with placeholders, but I left it like this for clarity.

with tt_space_sku_data as
( select unnest(array[$sku_array]) as sku,        unnest(array[$quantity_array]) as quantity ,         unnest(array[$primary_array])  as primary ,        unnest(array[$position_array]) as position
),
tt_space_skus as
(  select fn_get_create_or_update_space_sku      (          $pk_space ,           tt.sku ,          tt.quantity ,          tt.primary ,          tt.position ,          TRUE       ) as space_sku  from tt_space_sku_data tt
),
tt_space_skus_to_delete as(  select ss.space_sku    from tb_space_sku ss  left join tt_space_skus tt      on tt.space_sku = ss.space_sku     where tt.space_sku is null      and ss.space = $pk_space
)
delete from tb_space_sku ss using tt_space_skus_to_delete ttwhere ss.space = $pk_space  and ss.space_sku = tt.space_sku

Moshe Jacobson
Manager of Systems Engineering, Nead Werx Inc.
2323 Cumberland Parkway · Suite 201 · Atlanta, GA 30339

“Quality is not an act, it is a habit.” — Aristotle

Re: Bug? Function with side effects not evaluated in CTE

From
BladeOfLight16
Date:
On Tue, Oct 22, 2013 at 3:15 PM, Moshe Jacobson <moshe@neadwerx.com> wrote:

Here is the full code. It is not “minimal”, but actually what we are using.
fn_get_create_or_update_space_sku() will create a non-existent row, or update it with the passed-in data if it already exists.
You’ll notice that in this version I don’t use NOT IN( ) but rather another CTE with a left join.
It behaves the same way.
I’ve put $varname in certain places to indicate that a value is going to go in there. Some of these are actually bound with placeholders, but I left it like this for clarity.

with tt_space_sku_data as
( select unnest(array[$sku_array]) as sku,        unnest(array[$quantity_array]) as quantity ,         unnest(array[$primary_array])  as primary ,        unnest(array[$position_array]) as position
),
tt_space_skus as
(  select fn_get_create_or_update_space_sku      (          $pk_space ,           tt.sku ,          tt.quantity ,          tt.primary ,          tt.position ,          TRUE       ) as space_sku  from tt_space_sku_data tt
),
tt_space_skus_to_delete as(  select ss.space_sku    from tb_space_sku ss  left join tt_space_skus tt      on tt.space_sku = ss.space_sku     where tt.space_sku is null      and ss.space = $pk_space
)
delete from tb_space_sku ss using tt_space_skus_to_delete ttwhere ss.space = $pk_space  and ss.space_sku = tt.space_sku


Oops. Messed up and didn't include the PG user's list on the recipients the first time.

Original message:

I must say this is quite difficult to interpret, which in and of itself is a reason to rewrite it.

First, instead of having 1 array per column, pass in a single set of rows instead (could still be an array). If you can't pass your data to the database in that form, consider having a separate function that turns your multiple arrays into a set of rows and pass the result of that function into this one.

I've created a SQL Fiddle that implements UPSERT on an example table: http://sqlfiddle.com/#!12/4b716/1/0. Look over in the schema definition for the function and where the function is called. I'm sure you could do better than the very ugly SELECT query I have to call the function, but if you can't find a better way, at least it works. The basic idea is to have a function that takes a set of rows for the table, UPDATE the rows that are already there, and then INSERT the rows that are not. Straightforward and to the point. I'd appreciate any ideas from veterans. =) I believe my function requires only 2 SELECTs on the table itself, which I believe is the same number required for your definition above.

This doesn't depend on CTE behavior, and I find it simpler and easier to interpret (and therefore more maintainable). Does that suit your needs?

New Info:

I've improved that SQL query a bit: http://sqlfiddle.com/#!12/11ebc/1/0

Also, I forgot to mention that you'll need to remove the forward slashes. They're an artifact of using SQL Fiddle. It was trying to split my CRETE FUNCTION statement on the semicolon inside the definition string.

Hope this helps.