Thread: Volatile functions in WITH
Suppose I run the following query: WITH "exp" AS ( DELETE FROM "item" i USING "item_expired" e WHERE e.oref = i.ref ANDi.basket = $1 RETURNING i.basket, e.oref, e.nref, i.count, e.msg ), "subst" AS ( INSERT INTO "item"("basket", "ref", "count") SELECT e.basket, e.nref, e.count FROM "exp" e WHERE e.nref IS NOT NULL ) SELECT DISTINCT e.msg FROM "exp" e This is a very convenient and somewhat more flexible alternative to INSERT... DELETE RETURNING (which doesn't work). However, the "item" table has a unique constraint on (basket, ref), so sometimes I need to update instead of insert; to handle this I have a VOLATILE function, add_item. Unfortunately, if I call it the obvious way WITH "exp" AS ( -- as before ), "subst" AS ( SELECT add_item(e.basket, e.nref, e.count) FROM "exp" e WHERE e.nref IS NOT NULL ) SELECT DISTINCT e.msg FROM "exp" e then the planner sees that the results of "subst" are not used, and doesn't include it in the query plan at all. Is there any way I can tell WITH that add_item is actually a data- modifying statement? Adding FOR UPDATE doesn't seem to help (I didn't really expect it would.) Alternatively, are either of these safe (that is, are they guaranteed to call the function once for every row returned by "exp", even if the DISTINCT ends up eliminating some of those rows)? WITH "exp" AS ( -- as before ), "subst" AS ( -- SELECT add_item(...) as before ) SELECT DISTINCT e.msg FROM "exp"e LEFT JOIN "subst" s ON FALSE WITH "exp" AS ( -- as before ) SELECT DISTINCT s.msg FROM ( SELECT e.msg, CASE WHEN e.nref IS NULLTHEN NULL ELSE add_item(e.basket, e.nref, e.count) END "subst" ) s I don't like the second alternative much, but I could live with it if I had to. Ben
On Sat, Feb 16, 2013 at 11:58 PM, Ben Morrow <ben@morrow.me.uk> wrote: > WITH "exp" AS ( -- as before > ), > "subst" AS ( > SELECT add_item(e.basket, e.nref, e.count) > FROM "exp" e > WHERE e.nref IS NOT NULL > ) > SELECT DISTINCT e.msg > FROM "exp" e Alternatively I suppose you can try this one: WITH "exp" AS ( DELETE FROM "item" i USING "item_expired" e WHERE e.oref = i.ref AND i.basket = $1 RETURNING i.basket, e.oref, e.nref, i.count, e.msg ), "upd" AS ( UPDATE "item" SET "count" = e.count FROM "exp" e WHERE e.nref IS NOT NULL AND ("basket", "nref") IS NOT DISTINCT FROM (e.basket, e.nref) RETURNING "basket", "nref" ) "ins" AS ( INSERT INTO "item" ("basket", "ref", "count") SELECT e.basket, e.nref,e.count FROM "exp" e LEFT JOIN "upd" u ON ("basket", "nref") IS NOT DISTINCT FROM (e.basket, e.nref) WHERE e.nref IS NOT NULL AND (u.basket, u.nref) IS NULL ) SELECT DISTINCT e.msg FROM "exp" e > then the planner sees that the results of "subst" are not used, and > doesn't include it in the query plan at all. > > Is there any way I can tell WITH that add_item is actually a data- > modifying statement? Adding FOR UPDATE doesn't seem to help (I didn't > really expect it would.) In this regard I would like to listen to gugrus' opinion too. EXPLAIN ANALYZE WITH t AS (SELECT random()) SELECT 1; QUERY PLAN ------------------------------------------------------------------------------------Result (cost=0.00..0.01 rows=1 width=0)(actual time=0.002..0.003 rows=1 loops=1)Total runtime: 0.063 ms (2 rows) EXPLAIN ANALYZE WITH t AS (SELECT random()) SELECT 1 from t; QUERY PLAN --------------------------------------------------------------------------------------------CTE Scan on t (cost=0.01..0.03rows=1 width=0) (actual time=0.048..0.052 rows=1 loops=1) CTE t -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.038..0.039 rows=1 loops=1)Total runtime: 0.131 ms (4 rows) I couldn't manage to come to any solution except faking the reference in the resulting query: WITH t AS (SELECT random()) SELECT 1 UNION ALL (SELECT 1 FROM t LIMIT 0); > > Alternatively, are either of these safe (that is, are they guaranteed to > call the function once for every row returned by "exp", even if the > DISTINCT ends up eliminating some of those rows)? > > WITH "exp" AS ( -- as before > ), "subst" AS ( -- SELECT add_item(...) as before > ) > SELECT DISTINCT e.msg > FROM "exp" e > LEFT JOIN "subst" s ON FALSE > > WITH "exp" AS ( -- as before > ) > SELECT DISTINCT s.msg > FROM ( > SELECT e.msg, CASE > WHEN e.nref IS NULL THEN NULL > ELSE add_item(e.basket, e.nref, e.count) > END "subst" > ) s > > I don't like the second alternative much, but I could live with it if I > had to. > > Ben > > > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql -- Sergey Konoplev Database and Software Architect http://www.linkedin.com/in/grayhemp Phones: USA +1 415 867 9984 Russia, Moscow +7 901 903 0499 Russia, Krasnodar +7 988 888 1979 Skype: gray-hemp Jabber: gray.ru@gmail.com
Quoth gray.ru@gmail.com (Sergey Konoplev): > On Sat, Feb 16, 2013 at 11:58 PM, Ben Morrow <ben@morrow.me.uk> wrote: > > WITH "exp" AS ( -- as before > > ), > > "subst" AS ( > > SELECT add_item(e.basket, e.nref, e.count) > > FROM "exp" e > > WHERE e.nref IS NOT NULL > > ) > > SELECT DISTINCT e.msg > > FROM "exp" e > > Alternatively I suppose you can try this one: > > WITH "exp" AS ( > DELETE FROM "item" i > USING "item_expired" e > WHERE e.oref = i.ref > AND i.basket = $1 > RETURNING i.basket, e.oref, e.nref, i.count, e.msg > ), > "upd" AS ( > UPDATE "item" SET "count" = e.count > FROM "exp" e > WHERE e.nref IS NOT NULL > AND ("basket", "nref") IS NOT DISTINCT FROM (e.basket, e.nref) > RETURNING "basket", "nref" > ) > "ins" AS ( > INSERT INTO "item" ("basket", "ref", "count") > SELECT e.basket, e.nref, e.count > FROM "exp" e LEFT JOIN "upd" u > ON ("basket", "nref") IS NOT DISTINCT FROM (e.basket, e.nref) > WHERE e.nref IS NOT NULL AND (u.basket, u.nref) IS NULL > ) > SELECT DISTINCT e.msg > FROM "exp" e That's not reliable. A concurrent txn could insert a conflicting row between the update and the insert, which would cause the insert to fail with a unique constraint violation. > > then the planner sees that the results of "subst" are not used, and > > doesn't include it in the query plan at all. > > > > Is there any way I can tell WITH that add_item is actually a data- > > modifying statement? Adding FOR UPDATE doesn't seem to help (I didn't > > really expect it would.) > > In this regard I would like to listen to gugrus' opinion too. > > EXPLAIN ANALYZE WITH t AS (SELECT random()) SELECT 1; > QUERY PLAN > ------------------------------------------------------------------------------------ > Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.003 > rows=1 loops=1) > Total runtime: 0.063 ms > (2 rows) > > EXPLAIN ANALYZE WITH t AS (SELECT random()) SELECT 1 from t; > QUERY PLAN > -------------------------------------------------------------------------------------------- > CTE Scan on t (cost=0.01..0.03 rows=1 width=0) (actual > time=0.048..0.052 rows=1 loops=1) > CTE t > -> Result (cost=0.00..0.01 rows=1 width=0) (actual > time=0.038..0.039 rows=1 loops=1) > Total runtime: 0.131 ms > (4 rows) > > I couldn't manage to come to any solution except faking the reference > in the resulting query: > > WITH t AS (SELECT random()) SELECT 1 UNION ALL (SELECT 1 FROM t LIMIT 0); Yes, I can do experiments too; the alternatives I gave before both work on my test database. What I was asking was whether they are guaranteed to work in all situations, given that the planner can in principle see that the extra table reference won't affect the result. Ben
On Wed, Feb 20, 2013 at 12:19 AM, Ben Morrow <ben@morrow.me.uk> wrote: > That's not reliable. A concurrent txn could insert a conflicting row > between the update and the insert, which would cause the insert to fail > with a unique constraint violation. Okay I think I got it. The function catches exception when INSERTing and does UPDATE instead, correct? If you got mixed up with plpgsql anyway what is the reason of making this WITH query constructions instead of implementing everything in a plpgsql trigger on DELETE on exp then? > Yes, I can do experiments too; the alternatives I gave before both work > on my test database. What I was asking was whether they are guaranteed > to work in all situations, given that the planner can in principle see > that the extra table reference won't affect the result. From the documentation "VOLATILE indicates that the function value can change even within a single table scan, so no optimizations can be made". So they are guaranteed to behave as you need in your last example. What about optimizing it out in WITH - I would like to listen to hackers' opinion, because for me it looks like a bug. -- Sergey Konoplev Database and Software Architect http://www.linkedin.com/in/grayhemp Phones: USA +1 415 867 9984 Russia, Moscow +7 901 903 0499 Russia, Krasnodar +7 988 888 1979 Skype: gray-hemp Jabber: gray.ru@gmail.com
At 8AM -0800 on 20/02/13 you (Sergey Konoplev) wrote: > On Wed, Feb 20, 2013 at 12:19 AM, Ben Morrow <ben@morrow.me.uk> wrote: > > That's not reliable. A concurrent txn could insert a conflicting row > > between the update and the insert, which would cause the insert to fail > > with a unique constraint violation. > > Okay I think I got it. The function catches exception when INSERTing > and does UPDATE instead, correct? Well, it tries the update first, but yes. It's pretty-much exactly the example in the PL/pgSQL docs. > If you got mixed up with plpgsql anyway what is the reason of making > this WITH query constructions instead of implementing everything in a > plpgsql trigger on DELETE on exp then? I'm not sure what you mean. "exp" isn't a table, it's a WITH CTE. The statement is deleting some entries from "item", and replacing some of them with new entries, based on the information in the "item_expired" view. I can't do anything with a trigger on "item", since there are other circumstances where items are deleted that shouldn't trigger replacement. > > Yes, I can do experiments too; the alternatives I gave before both work > > on my test database. What I was asking was whether they are guaranteed > > to work in all situations, given that the planner can in principle see > > that the extra table reference won't affect the result. > > From the documentation "VOLATILE indicates that the function value can > change even within a single table scan, so no optimizations can be > made". So they are guaranteed to behave as you need in your last > example. Well, that's ambiguous. The return value can change even within a single scan, so if you want 3 return values you have to make 3 calls. But what if you don't actually need one of those three: is the planner allowed to optimise the whole thing out? For instance, given select * from (select j.type, random() r from item j) i where i.type = 1 the planner will transform it into select i.type, random() r from item i where i.type = 1 before planning, so even though random() is volatile it will only get called for rows of item with type = 1. I don't know if this happens, or may sometimes happen, or might happen in the future, for rows eliminated because of DISTINCT. (I think perhaps what I would ideally want is a PERFORM verb, which is just like SELECT but says 'actually calculate all the rows implied here, without pulling in additional filter conditions'. WITH would then have to treat a top-level PERFORM inside a WITH the same as DML.) Ben
On Wed, Feb 20, 2013 at 10:16 AM, Ben Morrow <ben@morrow.me.uk> wrote: >> If you got mixed up with plpgsql anyway what is the reason of making >> this WITH query constructions instead of implementing everything in a >> plpgsql trigger on DELETE on exp then? > > I'm not sure what you mean. "exp" isn't a table, it's a WITH CTE. The Sorry, I meant "item" of course, "exp" was a typo. > statement is deleting some entries from "item", and replacing some of > them with new entries, based on the information in the "item_expired" > view. I can't do anything with a trigger on "item", since there are > other circumstances where items are deleted that shouldn't trigger > replacement. Okay, I see. If the case is specific you can make a simple plpgsql function that will process it like FOR _row IN DELETE ... RETORNING * LOOP ... RETURN NEXT _row; END LOOP; > select * > from (select j.type, random() r from item j) i > where i.type = 1 > > the planner will transform it into > > select i.type, random() r > from item i > where i.type = 1 > > before planning, so even though random() is volatile it will only get > called for rows of item with type = 1. Yes, functions are executed depending on the resulting plan "A query using a volatile function will re-evaluate the function at every row where its value is needed". > I don't know if this happens, or may sometimes happen, or might happen > in the future, for rows eliminated because of DISTINCT. It is a good point. Nothing guarantees it in a perspective. Optimizer guarantees a stable result but not the way it is reached. -- Sergey Konoplev Database and Software Architect http://www.linkedin.com/in/grayhemp Phones: USA +1 415 867 9984 Russia, Moscow +7 901 903 0499 Russia, Krasnodar +7 988 888 1979 Skype: gray-hemp Jabber: gray.ru@gmail.com
At 12PM -0800 on 20/02/13 you (Sergey Konoplev) wrote: > On Wed, Feb 20, 2013 at 10:16 AM, Ben Morrow <ben@morrow.me.uk> wrote: > >> If you got mixed up with plpgsql anyway what is the reason of making > >> this WITH query constructions instead of implementing everything in a > >> plpgsql trigger on DELETE on exp then? > > > > I'm not sure what you mean. "exp" isn't a table, it's a WITH CTE. The > > Sorry, I meant "item" of course, "exp" was a typo. OK. > > statement is deleting some entries from "item", and replacing some of > > them with new entries, based on the information in the "item_expired" > > view. I can't do anything with a trigger on "item", since there are > > other circumstances where items are deleted that shouldn't trigger > > replacement. > > Okay, I see. > > If the case is specific you can make a simple plpgsql function that > will process it like FOR _row IN DELETE ... RETORNING * LOOP ... > RETURN NEXT _row; END LOOP; Yes, I *know* I can write a function if I have to. I can also send the whole lot down to the client and do the inserts from there, or use a temporary table. I was hoping to avoid that, since the plain INSERT case works perfectly well. > > select * > > from (select j.type, random() r from item j) i > > where i.type = 1 > > > > the planner will transform it into > > > > select i.type, random() r > > from item i > > where i.type = 1 > > > > before planning, so even though random() is volatile it will only get > > called for rows of item with type = 1. > > Yes, functions are executed depending on the resulting plan "A query > using a volatile function will re-evaluate the function at every row > where its value is needed". > > > I don't know if this happens, or may sometimes happen, or might happen > > in the future, for rows eliminated because of DISTINCT. > > It is a good point. Nothing guarantees it in a perspective. Optimizer > guarantees a stable result but not the way it is reached. Well, it makes functions which perform DML a lot less useful, so I wonder whether this is intentional behaviour. Ben