Re: SKIP LOCKED DATA (work in progress) - Mailing list pgsql-hackers

From David Rowley
Subject Re: SKIP LOCKED DATA (work in progress)
Date
Msg-id CAApHDvqOZGAdSexbMsVamr21QvPV=ELMhZiUPbzQeSKcbMo56Q@mail.gmail.com
Whole thread Raw
In response to Re: SKIP LOCKED DATA (work in progress)  (Thomas Munro <munro@ip9.org>)
Responses Re: SKIP LOCKED DATA (work in progress)  (Thomas Munro <munro@ip9.org>)
List pgsql-hackers
On Sun, Jun 29, 2014 at 9:01 PM, Thomas Munro <munro@ip9.org> wrote:

Please find attached a rebased version of my SKIP LOCKED
patch (formerly SKIP LOCKED DATA), updated to support only the
Oracle-like syntax.


Hi Thomas,

Apologies for taking this long to get to reviewing this, I'd gotten a bit side tracked with my own patches during this commitfest.

I'm really glad to see this patch is back again. I think it will be very useful for processing queues. I could have made good use of it in my last work, using it for sending unsent emails which were "queued" up in a table in the database.

I've so far read over the patch and done only some brief tests of the functionality.

Here's what I've picked up on so far:

* In heap_lock_tuple() there's a few places where you test the wait_policy, but you're not always doing this in the same order. The previous code did if (nolock) first each time, but since there's now 3 values I think if (wait_policy == LockWaitError) should be first each time as likely this is the most common case.

* The following small whitespace change can be removed in gram.y:

@@ -119,7 +119,6 @@ typedef struct PrivTarget
 #define CAS_NOT_VALID 0x10
 #define CAS_NO_INHERIT 0x20
 
-
 #define parser_yyerror(msg)  scanner_yyerror(msg, yyscanner)
 #define parser_errposition(pos)  scanner_errposition(pos, yyscanner)  
 
* analyze.c. rc->waitPolicy = Max(rc->waitPolicy, waitPolicy); 
I'm not quite sure I understand this yet, but I'm guessing the original code was there so that something like:

SELECT * FROM (SELECT * FROM a FOR UPDATE) a FOR UPDATE NOWAIT;
Would give:
ERROR:  could not obtain lock on row in relation "a"

So it seems that with the patch as you've defined in by the order of the enum values in LockClauseWaitPolicy that SKIP LOCKED overrides FOR UPDATE. I'm wondering if this is dangerous.

Should the following really skip locked tuples?
SELECT * FROM (SELECT * FROM a FOR UPDATE SKIP LOCKED) a FOR UPDATE LIMIT 1;

But on the other hand perhaps I've missed a discussion on this, if so then I think the following comment should be updated to explain it all:

* We also consider that NOWAIT wins if it's specified both ways. This
* is a bit more debatable but raising an error doesn't seem helpful.
* (Consider for instance SELECT FOR UPDATE NOWAIT from a view that
* internally contains a plain FOR UPDATE spec.)
*

* plannodes.h -> RowWaitPolicy waitPolicy;   /* NOWAIT and SKIP LOCKED DATA options */
Should be "NOWAIT and SKIP LOCKED options" since DATA has now been removed from the syntax.

* nodeLockRow.c has extra space in if condition: else if (erm->waitPolicy == RWP_SKIP )

I'm also wondering about this block of code in general:

if (erm->waitPolicy == RWP_WAIT)
wait_policy = LockWaitBlock;
else if (erm->waitPolicy == RWP_SKIP )
wait_policy = LockWaitSkip;
else /* erm->waitPolicy == RWP_NOWAIT */
wait_policy = LockWaitError;

Just after this code heap_lock_tuple() is called, and if that returns HeapTupleWouldBlock, the code does a goto lnext, which then will repeat that whole block again. I'm wondering why there's 2 enums that are for the same thing? if you just had 1 then you'd not need this block of code at all, you could just pass erm->waitPolicy to heap_lock_tuple().

* parsenodes.h comment does not meet project standards (http://www.postgresql.org/docs/devel/static/source-format.html)

typedef enum LockClauseWaitPolicy
{
/* order is important (see applyLockingClause which takes the greatest
  value when several wait policies have been specified), and values must
  match RowWaitPolicy from plannodes.h */

* parsenode.h remove "DATA" from LockClauseWaitPolicy waitPolicy; /* NOWAIT and SKIP LOCKED DATA */


I have noticed the /* TODO -- work out what needs to be released here */ comments in head_lock_tuple(), and perhaps the lack of cleaning up is what is causing the following:

create table skiptest (id int primary key); insert into skiptest (id) select x.x from generate_series(1,1000000) x(x);

Session 1: begin work; select * from skiptest for update limit 999999;
Session 2: select * from skiptest for update skip locked limit 1;
WARNING:  out of shared memory
ERROR:  out of shared memory
HINT:  You might need to increase max_locks_per_transaction.

Yet if I do:

session 1: begin work; select * from skiptest where id > 1 for update;
session 2:  select * from skiptest for update skip locked limit 1;
 id
----
  1
(1 row)

That test makes me think your todo comments are in the right place, something is missing there for sure!

* plays about with patch for a bit *

I don't quite understand how heap_lock_tuple works, as if I debug session 2 from the above set of queries the first call to ConditionalLockTupleTuplock() (heapam.c line 4236) succeeds where I'd have thought it would fail, since session 1 should be locking this tuple? Later at line 4527 on the line if (!ConditionalXactLockTableWait(xwait)), it fails to grab the lock and returns HeapTupleWouldBlock. The above query seems to work ok if I just apply the following patch:

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b661d0e..b3e9dcc 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -4525,8 +4525,10 @@ l3:
                                else /* wait_policy == LockWaitSkip */
                                {
                                        if (!ConditionalXactLockTableWait(xwait))
-                                               /* TODO -- work out what needs to be released here */
+                                       {
+                                               UnlockTupleTuplock(relation, tid, mode);
                                                return HeapTupleWouldBlock;
+                                       }
                                }

                                /* if there are updates, follow the update chain */

Running the test query again, I get:
select * from skiptest for update skip locked limit 1;
   id
---------
 1000000
(1 row)

Would you also be able to supply a rebased patch with current master? The applying the patch is a bit noisy and the isolation tests part does not seem to apply at all now, so I've not managed to look at those yet.

I've still got more to look at, but hopefully this will get things moving again.

Regards

David Rowley

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: PDF builds broken again
Next
From: Magnus Hagander
Date:
Subject: Re: PDF builds broken again