Thread: SetQuerySnapshot, once again
I've been busy working on my presentation on concurrency for the upcoming O'Reilly conference. While doing so, I've been thinking more about the question of when to do SetQuerySnapshot calls inside functions. We've gone around on that before, without much of a consensus on what to do; see for example the thread starting at http://fts.postgresql.org/db/mw/msg.html?mid=1029236 I have now become convinced that it is correct, in fact necessary, to do SetQuerySnapshot for each new user-supplied query, whether it's inside a function or not. A CommandCounterIncrement without an associated SetQuerySnapshot is okay internally within system utility operations (eg, to make visible a catalog entry we just created), but it is highly suspect otherwise. In serializable mode, SetQuerySnapshots after the first one of a transaction are no-ops, so there's really no difference in that case. All we need to think about is read-committed mode. And in read-committed mode, we can have situations like this: UPDATE webpages SET hits = hits + 1 WHERE url = '...';SELECT hits FROM webpages WHERE url = '...'; If there are no concurrent updates going on, this will work as expected: the SELECT will see the updated row. But if there are concurrent updates and we do not do SetQuerySnapshots in plpgsql, then the SELECT may see two versions of the target row as valid: both the one that was valid as of the last SetQuerySnapshot before we entered the function, and the one created by the UPDATE. This happens if and only if some other client updated the same row and committed after the last SetQuerySnapshot. The UPDATE will see that other client's row as current and will update it, as expected. But then the SELECT will consider the previous version of the row to be still good, because it was after all deleted by a transaction that committed later than the query snapshot! And the version produced by the UPDATE is good too, since it was produced within the current transaction (and we've done CommandCounterIncrement to make it visible). An example of exactly this misbehavior can be seen in http://archives.postgresql.org/pgsql-bugs/2002-02/msg00142.php Particularly in 7.2, it's a tossup which version of the row will be found first by the SELECT, so the bug might appear and disappear depending on the phase of the moon, making it even worse. We get sensible behavior in the normal interactive case *only* because there will be a SetQuerySnapshot between UPDATE and SELECT, and so the SELECT will certainly consider any versions seen as obsolete by UPDATE to be obsolete also. So I've come around to agree with the position that Tatsuo and Hiroshi put forward in the thread mentioned above: plpgsql (and the other PL languages) need to do SetQuerySnapshot not only CommandCounterIncrement between user-supplied queries. Is anyone still unconvinced? If not, I'll try to fix it sometime soon. As that thread pointed out, there also seem to be some problems with plpgsql not doing enough CommandCounterIncrements when it's executing already-planned queries; I'll take a look at that issue at the same time. regards, tom lane
Tom Lane wrote: > > I've been busy working on my presentation on concurrency for the > upcoming O'Reilly conference. While doing so, I've been thinking > more about the question of when to do SetQuerySnapshot calls inside > functions. We've gone around on that before, without much of a > consensus on what to do; see for example the thread starting at > http://fts.postgresql.org/db/mw/msg.html?mid=1029236 > > I have now become convinced that it is correct, in fact necessary, > to do SetQuerySnapshot for each new user-supplied query, whether > it's inside a function or not. I have a question. Could the functions which contain no queries other than SELECT be stable(returns the definite result for a query) with it ? regards, Hiroshi Inouehttp://w2422.nsk.ne.jp/~inoue/
Hiroshi Inoue <Inoue@tpf.co.jp> writes: > I have a question. Could the functions which contain no > queries other than SELECT be stable(returns the definite > result for a query) with it ? Sorry, I don't understand ... regards, tom lane
Tom Lane wrote: > > Hiroshi Inoue <Inoue@tpf.co.jp> writes: > > I have a question. Could the functions which contain no > > queries other than SELECT be stable(returns the definite > > result for a query) with it ? > > Sorry, I don't understand ... Let t be a table which is defined as create table t (id serial primary key, dt text); Then is the following function *stable* ? create function f1(int4) returns text as ' declare txt text; begin select dtinto txt from t where id = $1; return txt; end ' language plpgsql; If SetQuerySnapshot is called for the above *select*, the result isn't determined by the snapshot of the function. regards, Hiroshi Inouehttp://w2422.nsk.ne.jp/~inoue/
Hiroshi Inoue <Inoue@tpf.co.jp> writes: > Tom Lane wrote: >> Sorry, I don't understand ... > Let t be a table which is defined as > create table t (id serial primary key, dt text); > Then is the following function *stable* ? > create function f1(int4) returns text as > ' > declare > txt text; > begin > select dt into txt from t where id = $1; > return txt; > end > ' language plpgsql; I'm not sure exactly what you mean by "stable" here. And I'm even less sure whether you are arguing for or against adding SetQuerySnapshot calls into plpgsql... regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > Hiroshi Inoue <Inoue@tpf.co.jp> writes: > > Tom Lane wrote: > >> Sorry, I don't understand ... > > > Let t be a table which is defined as > > create table t (id serial primary key, dt text); > > Then is the following function *stable* ? > > create function f1(int4) returns text as > > ' > > declare > > txt text; > > begin > > select dt into txt from t where id = $1; > > return txt; > > end > > ' language plpgsql; > > I'm not sure exactly what you mean by "stable" here. Wasn't it you who defined *stable* as Cachable within a single command: given fixed input values, the result will not changeif the function were to be repeatedly evaluated within a single SQL command; but the result could change over time. ? > And I'm even less sure whether you are arguing for or > against adding SetQuerySnapshot calls into plpgsql... I already mentioned an opinion in 2001/09/08. Both the command counters and the snapshots in a function should advance exceptthe leading SELECT statements. regards, Hiroshi Inoue
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: >> I'm not sure exactly what you mean by "stable" here. > Wasn't it you who defined *stable* as > Cachable within a single command: given fixed input values, the > result will not change if the function were to be repeatedly evaluated > within a single SQL command; but the result could change over time. Oh, *that* "stable" ;-) Okay, I get your point now. You are right --- a function that references a table that others might be concurrently changing would not be stable under read-committed rules. (But you could probably get away with marking it stable anyway.) > I already mentioned an opinion in 2001/09/08. > Both the command counters and the snapshots in a > function should advance except the leading SELECT > statements. I do not like the idea of treating the first select in a function differently from the rest. And such a rule wouldn't let you build guaranteed-stable functions anyway; what if the outer query was calling both your function, and another one that did cause the snapshot to advance? The behavior of your function would then vary depending on whether the other function was invoked or not. regards, tom lane
Tom Lane wrote: > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > > I already mentioned an opinion in 2001/09/08. > > Both the command counters and the snapshots in a > > function should advance except the leading SELECT > > statements. > > I do not like the idea of treating the first select in a function > differently from the rest. And such a rule wouldn't let you build > guaranteed-stable functions anyway; AFAIK there has been no analysis where we can get *stable* functions. As far as I see, we can expect SELECT-only functions to be *stable* if and only if they are surrounded by SELECT-only *stable* functions. regards, Hiroshi Inouehttp://w2422.nsk.ne.jp/~inoue/
Hiroshi Inoue <Inoue@tpf.co.jp> writes: >> I do not like the idea of treating the first select in a function >> differently from the rest. And such a rule wouldn't let you build >> guaranteed-stable functions anyway; > AFAIK there has been no analysis where we can get *stable* > functions. As far as I see, we can expect SELECT-only functions > to be *stable* if and only if they are surrounded by SELECT-only > *stable* functions. This idea might be a bit off-the-wall, but how about: 1. If a plpgsql function is declared immutable or stable, then all its queries run with the same snapshot *and* CommandCounterId as prevail in the calling query. Probably we should disallow it from making any updating queries, too; allow only SELECTs. 2. If it's declared volatile (the default), then snapshot and CommandCounterId are both updated for each query in the function, including the first one. So the default behavior would be equivalent to issuing the same queries interactively, which I think is a good default. The non-default behavior would allow truly stable functions to be built. regards, tom lane
Tom Lane wrote: > > Hiroshi Inoue <Inoue@tpf.co.jp> writes: > >> I do not like the idea of treating the first select in a function > >> differently from the rest. And such a rule wouldn't let you build > >> guaranteed-stable functions anyway; > > > AFAIK there has been no analysis where we can get *stable* > > functions. As far as I see, we can expect SELECT-only functions > > to be *stable* if and only if they are surrounded by SELECT-only > > *stable* functions. Oops I was wrong. The last *stable* isn't needed. > This idea might be a bit off-the-wall, Probably I mentioned once long before. We can't expect reasonable result for select fn1(..), fn2(..), ... from ... ; if there are some fnx()-s with strong side effect. > but how about: > > 1. If a plpgsql function is declared immutable or stable, then all its > queries run with the same snapshot *and* CommandCounterId as prevail > in the calling query. IMHO it's impossible to handle anything with one concept. Functions could be *immutable*(? deterministic in SQL99) or *stable* even though they have strong side effect. regards, Hiroshi Inouehttp://w2422.nsk.ne.jp/~inoue/