While fooling around with moving FOR UPDATE into a plan node (WIP
version attached), I came across this interesting behavior:
1. Create test tables:
create table t1(f1 int, f2 int);
insert into t1 values (1,1);
insert into t1 values (2,2);
create table t2(f3 int, f4 int);
insert into t2 values (1,111);
insert into t2 values (1,112);
insert into t2 values (2,113);
insert into t2 values (2,114);
2. Execute test query:
select * from t1 join t2 on f1=f3 for update;
f1 | f2 | f3 | f4
----+----+----+-----
1 | 1 | 1 | 111
1 | 1 | 1 | 112
2 | 2 | 2 | 113
2 | 2 | 2 | 114
(4 rows)
3. In another session, execute:
begin;
update t1 set f2=42 where f1=2;
4. In first session, repeat test query:
select * from t1 join t2 on f1=f3 for update;
As expected, it blocks waiting for the second session to commit.
5. Now commit in the second session. First session resumes and prints
f1 | f2 | f3 | f4
----+----+----+-----
1 | 1 | 1 | 111
1 | 1 | 1 | 112
2 | 42 | 2 | 113
2 | 42 | 2 | 114
2 | 42 | 2 | 113
2 | 42 | 2 | 114
(6 rows)
Of course the expected answer is
f1 | f2 | f3 | f4
----+----+----+-----
1 | 1 | 1 | 111
1 | 1 | 1 | 112
2 | 42 | 2 | 113
2 | 42 | 2 | 114
(4 rows)
which is what you'll get if you simply repeat the test query.
This isn't a new bug ... the same behavior can be demonstrated as far
back as 7.0.
The problem is that execMain.c is set up to pull as many rows as it can
from execution of an EvalPlanQual query. Then, once that's exhausted,
it steps to the next result from the original query. So if a row that
requires locking joins to more than one row in some other table, you
get duplicated output rows.
The attached patch alleviates some cases of this by having the new
LockRows plan node lock *all* the target rows, not just one, before
firing the EvalPlanQual query. It doesn't fix the example above
because only one of the rows being joined is seen to need EPQ treatment.
We could improve that by feeding successfully locked rows into the EPQ
machinery as well as ones that were found to be outdated. But that
would still leave us with two failure cases:
1. if some of the tables being joined are not selected FOR UPDATE.
2. if the select involves any set-returning functions in the targetlist.
I think we could get around #1 by having *all* tables in the query
marked FOR UPDATE at least in a dummy form, ie give them entries in
the rowMarks list and create junk tlist entries to report their current
ctid. Then we'd feed all the relevant rows into the EPQ machinery.
We'd just not lock the ones we weren't asked to lock.
I do not see any very good way around #2. I'm tempted to propose
that we just forbid SRFs in the targetlist of a FOR UPDATE query.
This could be justified on the same grounds that we forbid aggregate
functions there, ie, they destroy the one-to-one correspondence between
table rows and SELECT output rows. If you really had to have it you
could do something like
select srf(...) from (select ... for update) ss;
Anyone have a better idea?
regards, tom lane