Re: SKIP LOCKED DATA (work in progress) - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: SKIP LOCKED DATA (work in progress)
Date
Msg-id CADLWmXWakSyLGyYQPCM-Mob=tZDEWEfAJ-ghN6-SXozHbksTaA@mail.gmail.com
Whole thread Raw
In response to Re: SKIP LOCKED DATA (work in progress)  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: SKIP LOCKED DATA (work in progress)
Re: SKIP LOCKED DATA (work in progress)
List pgsql-hackers
On 23 July 2014 13:15, David Rowley <dgrowleyml@gmail.com> wrote:
> * In heap_lock_tuple() there's a few places where you test the wait_policy,
> but you're not always doing this in the same order. The previous code did if
> (nolock) first each time, but since there's now 3 values I think if
> (wait_policy == LockWaitError) should be first each time as likely this is
> the most common case.

Ok, I have made the them consistent -- though I put LockWaitBlock
first as that is surely the most common case (plain old FOR UPDATE).

> * The following small whitespace change can be removed in gram.y:
> [snip]

Fixed.

> * analyze.c. rc->waitPolicy = Max(rc->waitPolicy, waitPolicy);
> I'm not quite sure I understand this yet, but I'm guessing the original code
> was there so that something like:
>
> SELECT * FROM (SELECT * FROM a FOR UPDATE) a FOR UPDATE NOWAIT;
> Would give:
> ERROR:  could not obtain lock on row in relation "a"

Surely only if another session already has one or more tuples locked?

> So it seems that with the patch as you've defined in by the order of the
> enum values in LockClauseWaitPolicy that SKIP LOCKED overrides FOR UPDATE.
> I'm wondering if this is dangerous.

My understanding was that we have to choose a single policy for
attempts to lock rows in that relation, and if you've stated in at
least one place that you don't want to wait, then it makes sense for
NOWAIT to take precedence.  I had to figure out how to fit SKIP LOCKED
into that hierarchy, and figured that NOWAIT should still be the
strongest policy, overriding all others, and SKIP LOCKED should
overrule default FOR UPDATE (ie blocking).

> Should the following really skip locked tuples?
> SELECT * FROM (SELECT * FROM a FOR UPDATE SKIP LOCKED) a FOR UPDATE LIMIT 1;

I agree that it's somewhat arbitrary...  I suppose the two most
obvious options are either to reduce to a single policy using some
rule (such as the NOWAIT > SKIP LOCKED > default hierarchy I propose),
or simply reject such queries (which I'm guessing wouldn't fly at this
point since existing valid queries would become invalid).

> But on the other hand perhaps I've missed a discussion on this, if so then I
> think the following comment should be updated to explain it all:
>
> * We also consider that NOWAIT wins if it's specified both ways. This
> * is a bit more debatable but raising an error doesn't seem helpful.
> * (Consider for instance SELECT FOR UPDATE NOWAIT from a view that
> * internally contains a plain FOR UPDATE spec.)

Modified to address SKIP LOCKED (there are also couple of words to the
same effect in select.sgml).

> * plannodes.h -> RowWaitPolicy waitPolicy;   /* NOWAIT and SKIP LOCKED DATA
> options */
> Should be "NOWAIT and SKIP LOCKED options" since DATA has now been removed
> from the syntax.
>
> * nodeLockRow.c has extra space in if condition: else if (erm->waitPolicy ==
> RWP_SKIP )

Fixed.

> I'm also wondering about this block of code in general:
>
> if (erm->waitPolicy == RWP_WAIT)
> wait_policy = LockWaitBlock;
> else if (erm->waitPolicy == RWP_SKIP )
> wait_policy = LockWaitSkip;
> else /* erm->waitPolicy == RWP_NOWAIT */
> wait_policy = LockWaitError;
>
> Just after this code heap_lock_tuple() is called, and if that returns
> HeapTupleWouldBlock, the code does a goto lnext, which then will repeat that
> whole block again. I'm wondering why there's 2 enums that are for the same
> thing? if you just had 1 then you'd not need this block of code at all, you
> could just pass erm->waitPolicy to heap_lock_tuple().

True.  Somewhere upthread I mentioned the difficulty I had deciding
how many enumerations were needed, for the various subsystems, ie
which headers and type they were allowed to share.  Then I put off
working on this for so long that a nice example came along that showed
me the way: the lock strength enums LockTupleMode (heapam.h) and
RowMarkType (plannodes.h).  The wait policy enums LockWaitPolicy
(heapam.h) and RowWaitPolicy (plannodes.h) mirror them closely, and
the same type of enumeration translation takes place in nodeLockRows.c
immediately above the code you pasted.  I don't have any more
principled argument than monkey-see-monkey-do for that one...

> * parsenodes.h comment does not meet project standards
> (http://www.postgresql.org/docs/devel/static/source-format.html)
>
> typedef enum LockClauseWaitPolicy
> {
> /* order is important (see applyLockingClause which takes the greatest
>   value when several wait policies have been specified), and values must
>   match RowWaitPolicy from plannodes.h */

Fixed.

> * parsenode.h remove "DATA" from LockClauseWaitPolicy waitPolicy; /* NOWAIT
> and SKIP LOCKED DATA */

Fixed.

> I have noticed the /* TODO -- work out what needs to be released here */
> comments in head_lock_tuple(), and perhaps the lack of cleaning up is what
> is causing the following:
>
> create table skiptest (id int primary key); insert into skiptest (id) select
> x.x from generate_series(1,1000000) x(x);
>
> Session 1: begin work; select * from skiptest for update limit 999999;
> Session 2: select * from skiptest for update skip locked limit 1;
> WARNING:  out of shared memory
> ERROR:  out of shared memory
> HINT:  You might need to increase max_locks_per_transaction.
>
> Yet if I do:
>
> session 1: begin work; select * from skiptest where id > 1 for update;
> session 2:  select * from skiptest for update skip locked limit 1;
>  id
> ----
>   1
> (1 row)
>
> That test makes me think your todo comments are in the right place,
> something is missing there for sure!
>
> * plays about with patch for a bit *
>
> I don't quite understand how heap_lock_tuple works, as if I debug session 2
> from the above set of queries the first call to
> ConditionalLockTupleTuplock() (heapam.c line 4236) succeeds where I'd have
> thought it would fail, since session 1 should be locking this tuple? Later
> at line 4527 on the line if (!ConditionalXactLockTableWait(xwait)), it fails
> to grab the lock and returns HeapTupleWouldBlock. The above query seems to
> work ok if I just apply the following patch:
>
> diff --git a/src/backend/access/heap/heapam.c
> b/src/backend/access/heap/heapam.c
> index b661d0e..b3e9dcc 100644
> --- a/src/backend/access/heap/heapam.c
> +++ b/src/backend/access/heap/heapam.c
> @@ -4525,8 +4525,10 @@ l3:
>                                 else /* wait_policy == LockWaitSkip */
>                                 {
>                                         if
> (!ConditionalXactLockTableWait(xwait))
> -                                               /* TODO -- work out what
> needs to be released here */
> +                                       {
> +                                               UnlockTupleTuplock(relation,
> tid, mode);
>                                                 return HeapTupleWouldBlock;
> +                                       }
>                                 }
>
>                                 /* if there are updates, follow the update
> chain */
>
> Running the test query again, I get:
> select * from skiptest for update skip locked limit 1;
>    id
> ---------
>  1000000
> (1 row)

Ahh!  Ok, take a look now, I included that change, but also made the
unlocking conditional on have_tuple_lock, in the second two cases
where HeapTupleWouldBlock is returned.  In the first case it doesn't
look possible for it to be true.

> Would you also be able to supply a rebased patch with current master? The
> applying the patch is a bit noisy and the isolation tests part does not seem
> to apply at all now, so I've not managed to look at those yet.

Done, and attached with the above changes.

> I've still got more to look at, but hopefully this will get things moving
> again.

Thank you very much for taking the time to look at this!

Best regards,
Thomas Munro

Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: PDF builds broken again
Next
From: Noah Misch
Date:
Subject: Re: [COMMITTERS] pgsql: Diagnose incompatible OpenLDAP versions during build and test.