Thread: BUG #16676: SELECT ... FETCH FIRST ROW WITH TIES FOR UPDATE SKIP LOCKED locks rows it doesn't return

The following bug has been logged on the website:

Bug reference:      16676
Logged by:          David Christensen
Email address:      david@endpoint.com
PostgreSQL version: 13.0
Operating system:   OS X
Description:

Test case (found when demonstrating a queue mechanism for batches):

create table queue (id int generated always as identity, batch_id int not
null);

insert into queue (batch_id) values (1),(1),(2),(3);

postgres=# select * from queue;
 id | batch_id 
----+----------
  1 |        1
  2 |        1
  3 |        2
  4 |        3
(4 rows)

-- backend 1:

postgres=# begin;
BEGIN
postgres=*# select * from queue order by batch_id fetch first row with ties
for update skip locked;
 id | batch_id 
----+----------
  1 |        1
  2 |        1
(2 rows)

-- backend 2:

postgres=# select * from queue order by batch_id fetch first row with ties
for update skip locked;       
 id | batch_id 
----+----------
  4 |        3
(1 row)

-- QED

As you can see, the row id 3 with batch_id 2 is not returned as would be
implied by the strict ordering and the skipped locks.  The working theory
here is the FETCH FIRST ROW WITH TIES locks the rows before deciding if they
should be included or not.


Proposed fix:


Reorder Limit/LockRows nodes to prevent locking extra tuples in FETCH FIRST WITH TIES

Reference bug report: 16676

1 file changed, 12 insertions(+), 12 deletions(-)
src/backend/optimizer/plan/planner.c | 24 ++++++++++++------------

modified   src/backend/optimizer/plan/planner.c
@@ -2293,6 +2293,18 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
     {
         Path       *path = (Path *) lfirst(lc);

+        /*
+         * If there is a LIMIT/OFFSET clause, add the LIMIT node.
+         */
+        if (limit_needed(parse))
+        {
+            path = (Path *) create_limit_path(root, final_rel, path,
+                                              parse->limitOffset,
+                                              parse->limitCount,
+                                              parse->limitOption,
+                                              offset_est, count_est);
+        }
+
         /*
          * If there is a FOR [KEY] UPDATE/SHARE clause, add the LockRows node.
          * (Note: we intentionally test parse->rowMarks not root->rowMarks
@@ -2307,18 +2319,6 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
                                                  assign_special_exec_param(root));
         }

-        /*
-         * If there is a LIMIT/OFFSET clause, add the LIMIT node.
-         */
-        if (limit_needed(parse))
-        {
-            path = (Path *) create_limit_path(root, final_rel, path,
-                                              parse->limitOffset,
-                                              parse->limitCount,
-                                              parse->limitOption,
-                                              offset_est, count_est);
-        }
-
         /*
          * If this is an INSERT/UPDATE/DELETE, and we're not being called from
          * inheritance_planner, add the ModifyTable node.



Attachment
David Christensen <david@endpoint.com> writes:
> Proposed fix:
> Reorder Limit/LockRows nodes to prevent locking extra tuples in FETCH FIRST WITH TIES

Isn't that going to break more cases than it fixes?

            regards, tom lane



> On Oct 19, 2020, at 6:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Christensen <david@endpoint.com> writes:
>> Proposed fix:
>> Reorder Limit/LockRows nodes to prevent locking extra tuples in FETCH FIRST WITH TIES
>
> Isn't that going to break more cases than it fixes?

In the case of Limit, isn’t LockRows supposed to only lock the number of actual rows returned?

What are the scenarios that this might break and do you have any ideas for alternate fixes?

David


> On Oct 19, 2020, at 6:59 PM, David Christensen <david@endpoint.com> wrote:
>
>> On Oct 19, 2020, at 6:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> David Christensen <david@endpoint.com> writes:
>>> Proposed fix:
>>> Reorder Limit/LockRows nodes to prevent locking extra tuples in FETCH FIRST WITH TIES
>>
>> Isn't that going to break more cases than it fixes?
>
> In the case of Limit, isn’t LockRows supposed to only lock the number of actual rows returned?
>
> What are the scenarios that this might break and do you have any ideas for alternate fixes?

Will now that I think about it, if the LockRows node is responsible for skipping locked rows, etc then my proposed
solutionis clearly wrong.  

Maybe splitting LockRows into two nodes, one for locking and one for emitting unlocked nodes then interleaving Limit in
between?(Or only doing something along these lines for this admittedly narrow use case. ) 

Open to ideas on the appropriate fix.

David


On 2020-Oct-19, David Christensen wrote:

> > On Oct 19, 2020, at 6:59 PM, David Christensen <david@endpoint.com> wrote:
> > 
> >> On Oct 19, 2020, at 6:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> 
> >> David Christensen <david@endpoint.com> writes:
> >>> Proposed fix:
> >>> Reorder Limit/LockRows nodes to prevent locking extra tuples in FETCH FIRST WITH TIES
> >> 
> >> Isn't that going to break more cases than it fixes?
> > 
> > In the case of Limit, isn’t LockRows supposed to only lock the number of actual rows returned?
> > 
> > What are the scenarios that this might break and do you have any ideas for alternate fixes?
> 
> Will now that I think about it, if the LockRows node is responsible
> for skipping locked rows, etc then my proposed solution is clearly
> wrong. 

I tried your proposed patch yesterday and found that the second query
returns no rows rather than two rows, which is not an improvement.

> Maybe splitting LockRows into two nodes, one for locking and one for
> emitting unlocked nodes then interleaving Limit in between? (Or only
> doing something along these lines for this admittedly narrow use case.)

I was having a similar idea, but the main reason I don't think it's a
good fix is that we can't backpatch such a change to pg13.

I was thinking if it was possible to modify Limit so that it would
"peek" into its child node without "actually reading".  But that's not
possible in our executor implementation AFAIK -- you only have
ExecProcNode as an interface, there's no way to affect what happens
underneath.

Maybe an approach is to forbid a FOR UPDATE clause in a query that has a
LIMIT WITH TIES clause; so we'd force the user to write a subquery for
LIMIT WITH TIES and then apply FOR UPDATE to an outer query.  However,
I'm surprised to find that we have *two* LockRows nodes when doing that:

explain select * from (select * from queue order by batch_id fetch first 2 rows with ties) t for update skip locked;
                                       QUERY PLAN                                       
────────────────────────────────────────────────────────────────────────────────────────
 LockRows  (cost=55.20..55.27 rows=2 width=40)
   ->  Subquery Scan on t  (cost=55.20..55.25 rows=2 width=40)
         ->  Limit  (cost=55.20..55.23 rows=2 width=14)
               ->  LockRows  (cost=55.20..83.45 rows=2260 width=14)
                     ->  Sort  (cost=55.20..60.85 rows=2260 width=14)
                           Sort Key: queue.batch_id
                           ->  Seq Scan on queue  (cost=0.00..32.60 rows=2260 width=14)
(7 filas)

and of course, we still have the behavior where the third row is
skipped.




Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2020-Oct-19, David Christensen wrote:
>> Maybe splitting LockRows into two nodes, one for locking and one for
>> emitting unlocked nodes then interleaving Limit in between? (Or only
>> doing something along these lines for this admittedly narrow use case.)

> I was having a similar idea, but the main reason I don't think it's a
> good fix is that we can't backpatch such a change to pg13.

Um, why not?  I don't have a position yet on whether this is a good way
to fix it; but if we did do it, AFAICS the only thing we'd have to be
careful of in v13 is not renumbering existing NodeTag values.

If your concern is just that EXPLAIN plans will look different, the same
could be said of David's other proposal.

            regards, tom lane



> On Oct 20, 2020, at 9:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
>> On 2020-Oct-19, David Christensen wrote:
>>> Maybe splitting LockRows into two nodes, one for locking and one for
>>> emitting unlocked nodes then interleaving Limit in between? (Or only
>>> doing something along these lines for this admittedly narrow use case.)
>
>> I was having a similar idea, but the main reason I don't think it's a
>> good fix is that we can't backpatch such a change to pg13.
>
> Um, why not?  I don't have a position yet on whether this is a good way
> to fix it; but if we did do it, AFAICS the only thing we'd have to be
> careful of in v13 is not renumbering existing NodeTag values.
>
> If your concern is just that EXPLAIN plans will look different, the same
> could be said of David's other proposal.

If we can determine the scope to which these different nodes might be used and make it so it still uses the existing
nodestructure for most existing cases (i.e., maybe an additional parameter to create_lockrows_plan or similar) then
thatmight work; AFAIK, this is the first time we’ve had to consider this sort of thing (though I do wonder if there
mightbe something in window functions which might bump into this sort of issue). 

I’m working on an isolation test to formalize this failure, which I can do as a separate patch if we want to have this
priorto implementing a fix. 

--
David Christensen
Senior Software and Database Engineer
End Point Corporation
david@endpoint.com
785-727-1171



Attachment
> On Oct 20, 2020, at 9:23 AM, David Christensen <david@endpoint.com> wrote:
>
>> On Oct 20, 2020, at 9:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
>>> On 2020-Oct-19, David Christensen wrote:
>>>> Maybe splitting LockRows into two nodes, one for locking and one for
>>>> emitting unlocked nodes then interleaving Limit in between? (Or only
>>>> doing something along these lines for this admittedly narrow use case.)
>>
>>> I was having a similar idea, but the main reason I don't think it's a
>>> good fix is that we can't backpatch such a change to pg13.
>>
>> Um, why not?  I don't have a position yet on whether this is a good way
>> to fix it; but if we did do it, AFAICS the only thing we'd have to be
>> careful of in v13 is not renumbering existing NodeTag values.
>>
>> If your concern is just that EXPLAIN plans will look different, the same
>> could be said of David's other proposal.
>
> If we can determine the scope to which these different nodes might be used and make it so it still uses the existing
nodestructure for most existing cases (i.e., maybe an additional parameter to create_lockrows_plan or similar) then
thatmight work; AFAIK, this is the first time we’ve had to consider this sort of thing (though I do wonder if there
mightbe something in window functions which might bump into this sort of issue). 
>
> I’m working on an isolation test to formalize this failure, which I can do as a separate patch if we want to have
thisprior to implementing a fix. 

Enclosed is the (currently failing) isolation test for the expected behavior.


--
David Christensen
Senior Software and Database Engineer
End Point Corporation
david@endpoint.com
785-727-1171




Attachment
> On Oct 19, 2020, at 7:33 PM, David Christensen <david@endpoint.com> wrote:
>
> Maybe splitting LockRows into two nodes, one for locking and one for emitting unlocked nodes then interleaving Limit
inbetween? (Or only doing something along these lines for this admittedly narrow use case. ) 

This does appear to be quite a bit harder than originally hoped for on the surface; creating a node to “pass-thru”
unlockedtuples without locking them would require building out some additional infrastructure that does not appear to
bepresent in all AMs; something to allow a test for locks without actually calling table_tuple_lock() formally. Perhaps
addinga param to indicate “test” only—not sure if something like similar to heapam's “test_lockmode_for_conflict()” is
somethingwe’d need to had.  (Or even a new lock mode which doesn’t create RowMarks but skips any conflicting modes.) 

That said, I think it might be worth trying to go down the road of “forbid this behavior”, particularly if a workaround
isavailable.  Perhaps looking into why Álvaro’s plan was generating the multiple LockRows nodes would be more fruitful
andless likely to cause backpatching pain. 

David
--
David Christensen
Senior Software and Database Engineer
End Point Corporation
david@endpoint.com
785-727-1171




Attachment