Thread: BUG #18472: SELECT FOR UPDATE locking more rows than expected

BUG #18472: SELECT FOR UPDATE locking more rows than expected

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      18472
Logged by:          Colin Steifel
Email address:      csteifel@gmail.com
PostgreSQL version: 16.3
Operating system:   Official docker container 16.3 tag
Description:

I'm using the following to set up what I believe to be the issue at hand:


CREATE TABLE IF NOT EXISTS public.tab1
(
    a integer NOT NULL,
    b integer NOT NULL
);


CREATE TABLE IF NOT EXISTS public.tab2
(
    a_ref integer NOT NULL,
    c integer NOT NULL
);

INSERT INTO public.tab1 VALUES (1,1),(2,2),(3,3);
INSERT INTO public.tab2 VALUES (1,5),(1,6),(1,7),(2,8),(2,9),(3,10);




Then in pgadmin I'm opening two query tabs for the same database and running
the following in tab 1

BEGIN;
SELECT *
FROM public.tab1 t1,
LATERAL (SELECT * 
FROM public.tab2
WHERE "a"="a_ref" 
LIMIT 1
) t2
ORDER BY b ASC
LIMIT 1
FOR UPDATE;


Keeping the transaction open and running the following in tab 2:

SELECT * FROM public.tab2 FOR UPDATE SKIP LOCKED;


What is happening:
I'm only getting 3 rows back in query tab 2

What I'm expecting:
Expecting to get 5 rows in query tab 2


If I change the original query to either remove the LIMIT clause in the
subquery eg:
SELECT *
FROM public.tab1 t1,
LATERAL (SELECT * 
FROM public.tab2
WHERE "a"="a_ref" 
) t2
ORDER BY b ASC
LIMIT 1
FOR UPDATE;



Or if I remove the ORDER BY b ASC in the top level query eg:

SELECT *
FROM public.tab1 t1,
LATERAL (SELECT * 
FROM public.tab2
WHERE "a"="a_ref" 
LIMIT 1
) t2
LIMIT 1
FOR UPDATE;

And then retry I can see that I get the expected 5 rows back in the query
from tab 2 leading me to believe that when both the LIMIT and ORDER BY are
present postgres is locking more rows than I believe should be necessary
based on the documentation that I've read and experiments that I've done.

Thanks for your time,
Colin


PG Bug reporting form <noreply@postgresql.org> writes:
> And then retry I can see that I get the expected 5 rows back in the query
> from tab 2 leading me to believe that when both the LIMIT and ORDER BY are
> present postgres is locking more rows than I believe should be necessary
> based on the documentation that I've read and experiments that I've done.

This is not a bug, because there is no guarantee that SELECT FOR
UPDATE will lock only what you think is the minimum number of rows.

I realize that this is probably an oversimplified example of your real
problem, but in the given case the issue is use of the sub-select;
do you really need that?  EXPLAIN shows the plan as

 Limit
   ->  LockRows
         ->  Sort
               Sort Key: t1.b
               ->  Nested Loop
                     ->  Seq Scan on tab1 t1
                     ->  Subquery Scan on t2
                           ->  Limit
                                 ->  LockRows
                                       ->  Seq Scan on tab2
                                             Filter: (t1.a = a_ref)

The upper LockRows node should indeed not lock any rows that
aren't returned, but the trouble is the lower one, which will
probably end up locking every tab2 row that has a match in tab1.
(Remember the Sort will not be able to return any row until
it's run the Nested Loop to completion.)

The reason that's there is that the outer query interprets
application of FOR UPDATE to a sub-SELECT-in-FROM as being
a request to push down a FOR UPDATE into the sub-SELECT.
You could get rid of that by specifying just "FOR UPDATE OF t1"
in the outer query, but of course then you'd end with no lock
applied to the joined t2 row, so I don't know if that works
for you.

I think the answer could be to get rid of the inner LIMIT,
which is what's preventing optimization of the subquery.
I dislike that on semantic grounds anyway, because as this
is written it's totally undefined which tab2 rows get joined
to which tab1 rows.  Without that I get

explain (costs off)
SELECT *
FROM public.tab1 t1,
LATERAL (SELECT * 
FROM public.tab2
WHERE "a"="a_ref" 
) t2
ORDER BY b ASC
LIMIT 1
FOR UPDATE;
                     QUERY PLAN                      
-----------------------------------------------------
 Limit
   ->  LockRows
         ->  Sort
               Sort Key: t1.b
               ->  Merge Join
                     Merge Cond: (t1.a = tab2.a_ref)
                     ->  Sort
                           Sort Key: t1.a
                           ->  Seq Scan on tab1 t1
                     ->  Sort
                           Sort Key: tab2.a_ref
                           ->  Seq Scan on tab2
(12 rows)

which would behave a lot better in terms of locking
only the returned row(s).

            regards, tom lane