Re: BUG #18472: SELECT FOR UPDATE locking more rows than expected - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #18472: SELECT FOR UPDATE locking more rows than expected
Date
Msg-id 2760373.1716044766@sss.pgh.pa.us
Whole thread Raw
In response to BUG #18472: SELECT FOR UPDATE locking more rows than expected  (PG Bug reporting form <noreply@postgresql.org>)
List pgsql-bugs
PG Bug reporting form <noreply@postgresql.org> writes:
> And then retry I can see that I get the expected 5 rows back in the query
> from tab 2 leading me to believe that when both the LIMIT and ORDER BY are
> present postgres is locking more rows than I believe should be necessary
> based on the documentation that I've read and experiments that I've done.

This is not a bug, because there is no guarantee that SELECT FOR
UPDATE will lock only what you think is the minimum number of rows.

I realize that this is probably an oversimplified example of your real
problem, but in the given case the issue is use of the sub-select;
do you really need that?  EXPLAIN shows the plan as

 Limit
   ->  LockRows
         ->  Sort
               Sort Key: t1.b
               ->  Nested Loop
                     ->  Seq Scan on tab1 t1
                     ->  Subquery Scan on t2
                           ->  Limit
                                 ->  LockRows
                                       ->  Seq Scan on tab2
                                             Filter: (t1.a = a_ref)

The upper LockRows node should indeed not lock any rows that
aren't returned, but the trouble is the lower one, which will
probably end up locking every tab2 row that has a match in tab1.
(Remember the Sort will not be able to return any row until
it's run the Nested Loop to completion.)

The reason that's there is that the outer query interprets
application of FOR UPDATE to a sub-SELECT-in-FROM as being
a request to push down a FOR UPDATE into the sub-SELECT.
You could get rid of that by specifying just "FOR UPDATE OF t1"
in the outer query, but of course then you'd end with no lock
applied to the joined t2 row, so I don't know if that works
for you.

I think the answer could be to get rid of the inner LIMIT,
which is what's preventing optimization of the subquery.
I dislike that on semantic grounds anyway, because as this
is written it's totally undefined which tab2 rows get joined
to which tab1 rows.  Without that I get

explain (costs off)
SELECT *
FROM public.tab1 t1,
LATERAL (SELECT * 
FROM public.tab2
WHERE "a"="a_ref" 
) t2
ORDER BY b ASC
LIMIT 1
FOR UPDATE;
                     QUERY PLAN                      
-----------------------------------------------------
 Limit
   ->  LockRows
         ->  Sort
               Sort Key: t1.b
               ->  Merge Join
                     Merge Cond: (t1.a = tab2.a_ref)
                     ->  Sort
                           Sort Key: t1.a
                           ->  Seq Scan on tab1 t1
                     ->  Sort
                           Sort Key: tab2.a_ref
                           ->  Seq Scan on tab2
(12 rows)

which would behave a lot better in terms of locking
only the returned row(s).

            regards, tom lane



pgsql-bugs by date:

Previous
From: Dmitry Dolgov
Date:
Subject: Re: BUG #18465: Wrong results from SELECT DISTINCT MIN in scalar subquery using HashAggregate
Next
From: Tom Lane
Date:
Subject: Re: BUG #18465: Wrong results from SELECT DISTINCT MIN in scalar subquery using HashAggregate