Thread: Volatile functions in WITH

Volatile functions in WITH

From

Ben Morrow

Date:

17 February 2013, 10:59:12

Suppose I run the following query:
   WITH "exp" AS (       DELETE FROM "item" i       USING "item_expired" e       WHERE   e.oref      = i.ref
ANDi.basket    = $1       RETURNING i.basket, e.oref, e.nref, i.count, e.msg   ),   "subst" AS (       INSERT INTO
"item"("basket", "ref", "count")       SELECT e.basket, e.nref, e.count       FROM "exp" e       WHERE e.nref IS NOT
NULL  )   SELECT DISTINCT e.msg   FROM "exp" e
 

This is a very convenient and somewhat more flexible alternative to
INSERT... DELETE RETURNING (which doesn't work). However, the "item"
table has a unique constraint on (basket, ref), so sometimes I need to
update instead of insert; to handle this I have a VOLATILE function,
add_item. Unfortunately, if I call it the obvious way
   WITH "exp" AS ( -- as before   ),   "subst" AS (       SELECT add_item(e.basket, e.nref, e.count)       FROM "exp" e
     WHERE e.nref IS NOT NULL   )   SELECT DISTINCT e.msg   FROM "exp" e
 

then the planner sees that the results of "subst" are not used, and
doesn't include it in the query plan at all.

Is there any way I can tell WITH that add_item is actually a data-
modifying statement? Adding FOR UPDATE doesn't seem to help (I didn't
really expect it would.)

Alternatively, are either of these safe (that is, are they guaranteed to
call the function once for every row returned by "exp", even if the
DISTINCT ends up eliminating some of those rows)?
   WITH "exp" AS ( -- as before   ), "subst" AS ( -- SELECT add_item(...) as before   )   SELECT DISTINCT e.msg   FROM
"exp"e       LEFT JOIN "subst" s ON FALSE
 
   WITH "exp" AS ( -- as before   )   SELECT DISTINCT s.msg   FROM (       SELECT e.msg, CASE           WHEN e.nref IS
NULLTHEN NULL           ELSE add_item(e.basket, e.nref, e.count)       END "subst"   ) s
 

I don't like the second alternative much, but I could live with it if I
had to.

Ben

Re: Volatile functions in WITH

From

Sergey Konoplev

Date:

20 February 2013, 09:18:20

On Sat, Feb 16, 2013 at 11:58 PM, Ben Morrow <ben@morrow.me.uk> wrote:
>     WITH "exp" AS ( -- as before
>     ),
>     "subst" AS (
>         SELECT add_item(e.basket, e.nref, e.count)
>         FROM "exp" e
>         WHERE e.nref IS NOT NULL
>     )
>     SELECT DISTINCT e.msg
>     FROM "exp" e

Alternatively I suppose you can try this one:
WITH "exp" AS (       DELETE FROM "item" i       USING "item_expired" e       WHERE e.oref = i.ref       AND i.basket =
$1      RETURNING i.basket, e.oref, e.nref, i.count, e.msg   ),   "upd" AS (       UPDATE "item" SET "count" = e.count
    FROM "exp" e       WHERE e.nref IS NOT NULL       AND ("basket", "nref") IS NOT DISTINCT FROM (e.basket, e.nref)
  RETURNING "basket", "nref"   )  "ins" AS (       INSERT INTO "item" ("basket", "ref", "count")       SELECT e.basket,
e.nref,e.count       FROM "exp" e LEFT JOIN "upd" u       ON ("basket", "nref") IS NOT DISTINCT FROM (e.basket, e.nref)
     WHERE e.nref IS NOT NULL AND (u.basket, u.nref) IS NULL   )   SELECT DISTINCT e.msg   FROM "exp" e
 

> then the planner sees that the results of "subst" are not used, and
> doesn't include it in the query plan at all.
>
> Is there any way I can tell WITH that add_item is actually a data-
> modifying statement? Adding FOR UPDATE doesn't seem to help (I didn't
> really expect it would.)

In this regard I would like to listen to gugrus' opinion too.

EXPLAIN ANALYZE WITH t AS (SELECT random()) SELECT 1;                                    QUERY PLAN
------------------------------------------------------------------------------------Result  (cost=0.00..0.01 rows=1
width=0)(actual time=0.002..0.003
 
rows=1 loops=1)Total runtime: 0.063 ms
(2 rows)

EXPLAIN ANALYZE WITH t AS (SELECT random()) SELECT 1 from t;                                        QUERY PLAN
--------------------------------------------------------------------------------------------CTE Scan on t
(cost=0.01..0.03rows=1 width=0) (actual
 
time=0.048..0.052 rows=1 loops=1)  CTE t    ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.038..0.039 rows=1 loops=1)Total runtime: 0.131 ms
(4 rows)

I couldn't manage to come to any solution except faking the reference
in the resulting query:

WITH t AS (SELECT random()) SELECT 1 UNION ALL (SELECT 1 FROM t LIMIT 0);

>
> Alternatively, are either of these safe (that is, are they guaranteed to
> call the function once for every row returned by "exp", even if the
> DISTINCT ends up eliminating some of those rows)?
>
>     WITH "exp" AS ( -- as before
>     ), "subst" AS ( -- SELECT add_item(...) as before
>     )
>     SELECT DISTINCT e.msg
>     FROM "exp" e
>         LEFT JOIN "subst" s ON FALSE
>
>     WITH "exp" AS ( -- as before
>     )
>     SELECT DISTINCT s.msg
>     FROM (
>         SELECT e.msg, CASE
>             WHEN e.nref IS NULL THEN NULL
>             ELSE add_item(e.basket, e.nref, e.count)
>         END "subst"
>     ) s
>
> I don't like the second alternative much, but I could live with it if I
> had to.
>
> Ben
>
>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql



--
Sergey Konoplev
Database and Software Architect
http://www.linkedin.com/in/grayhemp

Phones:
USA +1 415 867 9984
Russia, Moscow +7 901 903 0499
Russia, Krasnodar +7 988 888 1979

Skype: gray-hemp
Jabber: gray.ru@gmail.com

Re: Volatile functions in WITH

From

Ben Morrow

Date:

20 February 2013, 11:19:24

Quoth gray.ru@gmail.com (Sergey Konoplev):
> On Sat, Feb 16, 2013 at 11:58 PM, Ben Morrow <ben@morrow.me.uk> wrote:
> >     WITH "exp" AS ( -- as before
> >     ),
> >     "subst" AS (
> >         SELECT add_item(e.basket, e.nref, e.count)
> >         FROM "exp" e
> >         WHERE e.nref IS NOT NULL
> >     )
> >     SELECT DISTINCT e.msg
> >     FROM "exp" e
> 
> Alternatively I suppose you can try this one:
> 
>  WITH "exp" AS (
>         DELETE FROM "item" i
>         USING "item_expired" e
>         WHERE e.oref = i.ref
>         AND i.basket = $1
>         RETURNING i.basket, e.oref, e.nref, i.count, e.msg
>     ),
>     "upd" AS (
>         UPDATE "item" SET "count" = e.count
>         FROM "exp" e
>         WHERE e.nref IS NOT NULL
>         AND ("basket", "nref") IS NOT DISTINCT FROM (e.basket, e.nref)
>         RETURNING "basket", "nref"
>     )
>    "ins" AS (
>         INSERT INTO "item" ("basket", "ref", "count")
>         SELECT e.basket, e.nref, e.count
>         FROM "exp" e LEFT JOIN "upd" u
>         ON ("basket", "nref") IS NOT DISTINCT FROM (e.basket, e.nref)
>         WHERE e.nref IS NOT NULL AND (u.basket, u.nref) IS NULL
>     )
>     SELECT DISTINCT e.msg
>     FROM "exp" e

That's not reliable. A concurrent txn could insert a conflicting row
between the update and the insert, which would cause the insert to fail
with a unique constraint violation.

> > then the planner sees that the results of "subst" are not used, and
> > doesn't include it in the query plan at all.
> >
> > Is there any way I can tell WITH that add_item is actually a data-
> > modifying statement? Adding FOR UPDATE doesn't seem to help (I didn't
> > really expect it would.)
> 
> In this regard I would like to listen to gugrus' opinion too.
> 
> EXPLAIN ANALYZE WITH t AS (SELECT random()) SELECT 1;
>                                      QUERY PLAN
> ------------------------------------------------------------------------------------
>  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.003
> rows=1 loops=1)
>  Total runtime: 0.063 ms
> (2 rows)
> 
> EXPLAIN ANALYZE WITH t AS (SELECT random()) SELECT 1 from t;
>                                          QUERY PLAN
> --------------------------------------------------------------------------------------------
>  CTE Scan on t  (cost=0.01..0.03 rows=1 width=0) (actual
> time=0.048..0.052 rows=1 loops=1)
>    CTE t
>      ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual
> time=0.038..0.039 rows=1 loops=1)
>  Total runtime: 0.131 ms
> (4 rows)
> 
> I couldn't manage to come to any solution except faking the reference
> in the resulting query:
> 
> WITH t AS (SELECT random()) SELECT 1 UNION ALL (SELECT 1 FROM t LIMIT 0);

Yes, I can do experiments too; the alternatives I gave before both work
on my test database. What I was asking was whether they are guaranteed
to work in all situations, given that the planner can in principle see
that the extra table reference won't affect the result.

Ben

Re: Volatile functions in WITH

From

Sergey Konoplev

Date:

20 February 2013, 19:10:32

On Wed, Feb 20, 2013 at 12:19 AM, Ben Morrow <ben@morrow.me.uk> wrote:
> That's not reliable. A concurrent txn could insert a conflicting row
> between the update and the insert, which would cause the insert to fail
> with a unique constraint violation.

Okay I think I got it. The function catches exception when INSERTing
and does UPDATE instead, correct?

If you got mixed up with plpgsql anyway what is the reason of making
this WITH query constructions instead of implementing everything in a
plpgsql trigger on DELETE on exp then?

> Yes, I can do experiments too; the alternatives I gave before both work
> on my test database. What I was asking was whether they are guaranteed
> to work in all situations, given that the planner can in principle see
> that the extra table reference won't affect the result.

From the documentation "VOLATILE indicates that the function value can
change even within a single table scan, so no optimizations can be
made". So they are guaranteed to behave as you need in your last
example.

What about optimizing it out in WITH - I would like to listen to
hackers' opinion, because for me it looks like a bug.

--
Sergey Konoplev
Database and Software Architect
http://www.linkedin.com/in/grayhemp

Phones:
USA +1 415 867 9984
Russia, Moscow +7 901 903 0499
Russia, Krasnodar +7 988 888 1979

Skype: gray-hemp
Jabber: gray.ru@gmail.com

Re: Volatile functions in WITH

From

Ben Morrow

Date:

20 February 2013, 21:16:21

At  8AM -0800 on 20/02/13 you (Sergey Konoplev) wrote:
> On Wed, Feb 20, 2013 at 12:19 AM, Ben Morrow <ben@morrow.me.uk> wrote:
> > That's not reliable. A concurrent txn could insert a conflicting row
> > between the update and the insert, which would cause the insert to fail
> > with a unique constraint violation.
> 
> Okay I think I got it. The function catches exception when INSERTing
> and does UPDATE instead, correct?

Well, it tries the update first, but yes. It's pretty-much exactly the
example in the PL/pgSQL docs.

> If you got mixed up with plpgsql anyway what is the reason of making
> this WITH query constructions instead of implementing everything in a
> plpgsql trigger on DELETE on exp then?

I'm not sure what you mean. "exp" isn't a table, it's a WITH CTE. The
statement is deleting some entries from "item", and replacing some of
them with new entries, based on the information in the "item_expired"
view. I can't do anything with a trigger on "item", since there are
other circumstances where items are deleted that shouldn't trigger
replacement.

> > Yes, I can do experiments too; the alternatives I gave before both work
> > on my test database. What I was asking was whether they are guaranteed
> > to work in all situations, given that the planner can in principle see
> > that the extra table reference won't affect the result.
> 
> From the documentation "VOLATILE indicates that the function value can
> change even within a single table scan, so no optimizations can be
> made". So they are guaranteed to behave as you need in your last
> example.

Well, that's ambiguous. The return value can change even within a single
scan, so if you want 3 return values you have to make 3 calls. But what
if you don't actually need one of those three: is the planner allowed to
optimise the whole thing out? For instance, given
   select *    from (select j.type, random() r from item j) i   where i.type = 1

the planner will transform it into
   select i.type, random() r   from item i   where i.type = 1

before planning, so even though random() is volatile it will only get
called for rows of item with type = 1. I don't know if this happens, or
may sometimes happen, or might happen in the future, for rows eliminated
because of DISTINCT.

(I think perhaps what I would ideally want is a PERFORM verb, which is
just like SELECT but says 'actually calculate all the rows implied here,
without pulling in additional filter conditions'. WITH would then have
to treat a top-level PERFORM inside a WITH the same as DML.)

Ben

Re: Volatile functions in WITH

From

Sergey Konoplev

Date:

20 February 2013, 23:28:34

On Wed, Feb 20, 2013 at 10:16 AM, Ben Morrow <ben@morrow.me.uk> wrote:
>> If you got mixed up with plpgsql anyway what is the reason of making
>> this WITH query constructions instead of implementing everything in a
>> plpgsql trigger on DELETE on exp then?
>
> I'm not sure what you mean. "exp" isn't a table, it's a WITH CTE. The

Sorry, I meant "item" of course, "exp" was a typo.

> statement is deleting some entries from "item", and replacing some of
> them with new entries, based on the information in the "item_expired"
> view. I can't do anything with a trigger on "item", since there are
> other circumstances where items are deleted that shouldn't trigger
> replacement.

Okay, I see.

If the case is specific you can make a simple plpgsql function that
will process it like FOR _row IN DELETE ... RETORNING * LOOP ...
RETURN NEXT _row; END LOOP;

>     select *
>     from (select j.type, random() r from item j) i
>     where i.type = 1
>
> the planner will transform it into
>
>     select i.type, random() r
>     from item i
>     where i.type = 1
>
> before planning, so even though random() is volatile it will only get
> called for rows of item with type = 1.

Yes, functions are executed depending on the resulting plan "A query
using a volatile function will re-evaluate the function at every row
where its value is needed".

> I don't know if this happens, or may sometimes happen, or might happen
> in the future, for rows eliminated because of DISTINCT.

It is a good point. Nothing guarantees it in a perspective. Optimizer
guarantees a stable result but not the way it is reached.

--
Sergey Konoplev
Database and Software Architect
http://www.linkedin.com/in/grayhemp

Phones:
USA +1 415 867 9984
Russia, Moscow +7 901 903 0499
Russia, Krasnodar +7 988 888 1979

Skype: gray-hemp
Jabber: gray.ru@gmail.com

Re: Volatile functions in WITH

From

Ben Morrow

Date:

21 February 2013, 04:25:33

At 12PM -0800 on 20/02/13 you (Sergey Konoplev) wrote:
> On Wed, Feb 20, 2013 at 10:16 AM, Ben Morrow <ben@morrow.me.uk> wrote:
> >> If you got mixed up with plpgsql anyway what is the reason of making
> >> this WITH query constructions instead of implementing everything in a
> >> plpgsql trigger on DELETE on exp then?
> >
> > I'm not sure what you mean. "exp" isn't a table, it's a WITH CTE. The
> 
> Sorry, I meant "item" of course, "exp" was a typo.

OK.

> > statement is deleting some entries from "item", and replacing some of
> > them with new entries, based on the information in the "item_expired"
> > view. I can't do anything with a trigger on "item", since there are
> > other circumstances where items are deleted that shouldn't trigger
> > replacement.
> 
> Okay, I see.
> 
> If the case is specific you can make a simple plpgsql function that
> will process it like FOR _row IN DELETE ... RETORNING * LOOP ...
> RETURN NEXT _row; END LOOP;

Yes, I *know* I can write a function if I have to. I can also send the
whole lot down to the client and do the inserts from there, or use a
temporary table. I was hoping to avoid that, since the plain INSERT case
works perfectly well.

> >     select *
> >     from (select j.type, random() r from item j) i
> >     where i.type = 1
> >
> > the planner will transform it into
> >
> >     select i.type, random() r
> >     from item i
> >     where i.type = 1
> >
> > before planning, so even though random() is volatile it will only get
> > called for rows of item with type = 1.
> 
> Yes, functions are executed depending on the resulting plan "A query
> using a volatile function will re-evaluate the function at every row
> where its value is needed".
> 
> > I don't know if this happens, or may sometimes happen, or might happen
> > in the future, for rows eliminated because of DISTINCT.
> 
> It is a good point. Nothing guarantees it in a perspective. Optimizer
> guarantees a stable result but not the way it is reached.

Well, it makes functions which perform DML a lot less useful, so I
wonder whether this is intentional behaviour.

Ben