Re: odd behavior: function not atomic/not seeing it's own - Mailing list pgsql-bugs
From | Robert Treat |
---|---|
Subject | Re: odd behavior: function not atomic/not seeing it's own |
Date | |
Msg-id | 1040149953.25972.289.camel@camel Whole thread Raw |
In response to | Re: odd behavior: function not atomic/not seeing it's own updates/not acting serializable nor read committed (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-bugs |
On Mon, 2002-12-16 at 18:53, Tom Lane wrote: > Robert Treat <xzilla@users.sourceforge.net> writes: > > "this is broken on so many levels..." > > The issue here is that the second transaction has already set its > snapshot when it enters the function body; so even though its LOCK > command faithfully waits for the first transaction to commit, the > second transaction can't see (and so can't update) the two rows > inserted by the first transaction. It doesn't really see the update > of the original row either, except for purposes of its own UPDATE. > I understand that, but what I don't understand is why does the first function call see it's changes (as evidence by the different information in the raise notices -- 0,10) but the second query is unable to see either the first functions changes or its own? (raise notice returns 0,0) The fact that functions behave differently when run concurrently seems dangerous to me. Consider rewriting the test case function as this: CREATE OR REPLACE FUNCTION locktest1(integer) RETURNS INTEGER AS ' DECLARE x INTEGER :=1; y ALIAS FOR $1; foo INTEGER; BEGIN LOCK TABLE locktest IN EXCLUSIVE MODE; INSERT INTO locktest(a) VALUES (y); LOOP x:= x + 1; EXIT WHEN x >= 1000000; END LOOP; INSERT INTO locktest(a) VALUES (y+1); SELECT a FROM locktest ORDER BY c ASC LIMIT 1 INTO foo; RAISE NOTICE \'original entry in column a is %\', foo; UPDATE locktest SET a=a+y; SELECT a FROM locktest ORDER BY c ASC LIMIT 1 INTO foo; IF foo > 0 THEN UPDATE locktest SET a=a+100; END IF; RAISE NOTICE \'original entry in column a is now %\', foo; return y; END; ' language 'plpgsql'; the check for foo returns true in the first function, returns false in the second function, which will cause the functions to behave differently given the same logic. This seems very dangerous to me. > It would work more like you're expecting if you'd issued the LOCK > before calling the function. I realize that's not convenient in > many cases. inconvenient to say the least. one of the major reason to use functions is the ability to capture business logic within the database so that it can be used by multiple apps. Given the need to issue locks in the application code, this functionality takes a major hit. (Your left hoping the application programmers put locks in all the right places, otherwise your data may get corrupted) > > > OTOH if functions ran as if they we're in serializable mode, the second > > function would, upon attempt to update the first record, see that the > > record was already updated, and throw a "ERROR: Can't serialize access > > due to concurrent update", which could then be dealt with accordingly. > > It would do that, if you had run it in serializable mode. I get: > > regression=# begin; > BEGIN > regression=# set TRANSACTION ISOLATION LEVEL SERIALIZABLE ; > SET > regression=# select locktest1(20); > NOTICE: original entry in column a is 0 > WARNING: Error occurred while executing PL/pgSQL function locktest1 > WARNING: line 15 at SQL statement > ERROR: Can't serialize access due to concurrent update > regression=# > true. however again, this work around requires code in the application, since there is no way to set the transaction level within a function. what we need is self-contained solution for functions. > > In the read-committed case, there has been some talk of executing > SetQuerySnapshot between statements of a function; search the archives > for "SetQuerySnapshot" to find past threads. I'm leaning in favor > of that myself, but no one's yet given a convincing analysis of what > this would change and what the downside might be. I've done a bit of searching on this and I don't see any cases where the current way setQuerySnapshot works is improving the situation. My feeling is it seems there might be cases where functions would not work atomically if they always saw what other peoples changes were doing. Imagine a function that selects a value, does some computation, then selects again. If that first value changes in the middle of the function, it might throw off what is supposed to happen. This is a result of the fact that some functions should work instantaneously, but can't always do so. > > (A compromise less likely to break existing code might be to do > SetQuerySnapshot only after a function issues LOCK, but still the real > effects of this would need to be clearly understood before such a > proposal is likely to pass.) > I've thought about this too, and I think it is a decent compromise. If there is one case where your really likely to want an updated QuerySnapshot it's when you grab a lock on a table, after all, why would you want to lock data that you can't see? I can't think of a case where this would break things given that right now we are blindly acquiring locks; how can actually knowing what you've locked cause you problems? Furthermore, this would help ensure that further queries inside the function will be looking at a more consistent snapshot. I would lobby that this should go in 7.3.1 as it at least gives people a viable, self contained work around. I think ideally what would be good would be the ability to create functions as inherently serializable or read committed (perhaps though a parameter passed in at function creation like iscacheable). This would force the function to behave in one of the two modes as documented for transaction types, and update the query snapshots accordingly. I think the default would be serializable in order to preserve atomicity. One last thing, and correct me if I'm wrong, but any fix would need to happen for all of the function languages, not just plpgsql, right? Robert Treat
pgsql-bugs by date: