Thread: SKIP LOCKED DATA (work in progress)

SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
Hi

A couple of years ago I posted an outline of a plan [1] and an
initial patch [2] for implementing SKIP LOCKED DATA.  I have
recently come back to this idea, rebased the patch and added a
simple isolation test -- please see attached.

However, heap_lock_tuple is clearly not for the faint hearted,
and I freely admit that I don't understand half of the things
going on in there yet.  My general approach has been to follow
the example set by NOWAIT, generalising that flag into a 3-way
policy...  but of course NOWAIT doesn't have to worry about
cleaning anything up, because it uses ereport, hence my TODOs.

As always, I would be grateful for any feedback.

Thanks!

Thomas Munro


Attachment

Re: SKIP LOCKED DATA (work in progress)

From
Craig Ringer
Date:
On 05/14/2014 07:06 AM, Thomas Munro wrote:
> Hi
> 
> A couple of years ago I posted an outline of a plan [1] and an
> initial patch [2] for implementing SKIP LOCKED DATA.  I have
> recently come back to this idea, rebased the patch and added a
> simple isolation test -- please see attached.

Simon did some similar work a while ago, so I've cc'd him for his input.



-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
On 13 May 2014 23:06, Thomas Munro <munro@ip9.org> wrote:
A couple of years ago I posted an outline of a plan [1] and an
initial patch [2] for implementing SKIP LOCKED DATA.  I have
recently come back to this idea, rebased the patch and added a
simple isolation test -- please see attached.

I have attached a new version which copies the skipLocked member
in _copyPlanRowMark, _copyRowMarkClause and _copyLockClause.
Without these, SKIP LOCKED DATA didn't work in pl/pgsql (as reported
by David Rowley off-list).  (Also some whitespace changes.)

Best regards,
Thomas Munro
Attachment

Re: SKIP LOCKED DATA (work in progress)

From
Craig Ringer
Date:
I'm rebasing another implementation of this against current HEAD at the
moment. It was well tested but has bitrotted a bit, in particular it
needs merging with the multixact changes (eep).

That should provide a useful basis for comparison and a chance to share
ideas.

I'll follow up with the patch and a git tree when it's ready, hopefully
tonight.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: SKIP LOCKED DATA (work in progress)

From
Craig Ringer
Date:
On 05/16/2014 04:46 PM, Craig Ringer wrote:
>
> I'll follow up with the patch and a git tree when it's ready, hopefully
> tonight.

Here's a rebased version of Simon's original patch that runs on current
master.

I still need to merge the isolation tests for it merged and sorted out,
and after re-reading it I'd like to change "waitMode" into an enum, not
just some #defines .

Hope it's useful for comparison and ideas.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
On 16 May 2014 13:21, Craig Ringer <craig@2ndquadrant.com> wrote:
On 05/16/2014 04:46 PM, Craig Ringer wrote:
>
> I'll follow up with the patch and a git tree when it's ready, hopefully
> tonight.

Here's a rebased version of Simon's original patch that runs on current
master.

I still need to merge the isolation tests for it merged and sorted out,
and after re-reading it I'd like to change "waitMode" into an enum, not
just some #defines .

Hope it's useful for comparison and ideas.

Thank you!  At first glance they're sort of similar which is reassuring.  I'm especially interested in the buffer release semantics which I was confused about and will look into that (to resolve the TODO notes in my patch).

I noticed that in applyLockingClause, Simon has "rc->waitMode |= waitMode". Is that right? The values are 0, 1, and 2, but if you had both NOWAIT and SKIP LOCKED somewhere in your query you could up with rc->waitMode == 3 unless I'm mistaken. In my patch I have code that would give precedence to NOWAIT, though looking at it again it could be simpler. (One reviewer pointed out, that it should really be a single unified enum. In fact I have been a bit unsure about what scope such an enumeration should have in the application -- could it even be used in parser code? I tried to follow existing examples which is why I used #define macros in gram.y).

From a bikeshed colour point of view:
* I used SKIP LOCKED DATA  like DB2, and Simon used SKIP LOCKED like Oracle, and I guess shorter is sweeter
* I used the term wait_policy and an enum, Simon used waitMode and an int
* I had noWait and skipLocked travelling separately in some places, Simon had a single parameter, which is much better

Best regards,
Thomas Munro

Re: SKIP LOCKED DATA (work in progress)

From
Craig Ringer
Date:
On 05/17/2014 05:24 AM, Thomas Munro wrote:

> I noticed that in applyLockingClause, Simon has "rc->waitMode |=
> waitMode". Is that right? The values are 0, 1, and 2, but if you had
> both NOWAIT and SKIP LOCKED somewhere in your query you could up with
> rc->waitMode == 3 unless I'm mistaken.

I do not think that |= is correct there.

It may be that no case can arise where you get the bogus value, but
since in all other places the values are tested for equalty not as bit
fields ( waitMode == NOWAIT not waitMode & NOWAIT ) it doesn't make
sense to |= it there.

? In my patch I have code that
> would give precedence to NOWAIT, though looking at it again it could be
> simpler.

I agree; if NOWAIT is present anywhere it should be preferred to
SKIP_LOCKED, as it's OK to apply NOWAIT where SKIP LOCKED appears, but
possibly semantically incorrect to apply SKIP LOCKED where only NOWAIT
was expected.

> (One reviewer pointed out, that it should really be a single
> unified enum. In fact I have been a bit unsure about what scope such an
> enumeration should have in the application -- could it even be used in
> parser code? I tried to follow existing examples which is why I used
> #define macros in gram.y).

Not sure there.

> From a bikeshed colour point of view:
> * I used SKIP LOCKED DATA  like DB2, and Simon used SKIP LOCKED like
> Oracle, and I guess shorter is sweeter

We have a long tradition of trying to allow noise keywords where it's
harmless.

So the clause should probably be
    SKIP LOCKED [DATA]

in much the same way we have
   BEGIN [ WORK | TRANSACTION ] ...

There won't be any ambiguity there.

> * I used the term wait_policy and an enum, Simon used waitMode and an int

I prefer an enum and intended to change Simon's patch but didn't have
the time.

I have some isolation tester and regression tests that are still to follow.

> * I had noWait and skipLocked travelling separately in some places,
> Simon had a single parameter, which is much better

Yes, I strongly prefer that.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
On 17 May 2014 05:02, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 05/17/2014 05:24 AM, Thomas Munro wrote:
> > I noticed that in applyLockingClause, Simon has "rc->waitMode |=                           > > waitMode". Is that right? The values are 0, 1, and 2, but if you had
> > both NOWAIT and SKIP LOCKED somewhere in your query you could up with
> > rc->waitMode == 3 unless I'm mistaken.
>
> I do not think that |= is correct there.
>
> It may be that no case can arise where you get the bogus value, but
> since in all other places the values are tested for equalty not as bit
> fields ( waitMode == NOWAIT not waitMode & NOWAIT ) it doesn't make
> sense to |= it there.

There are indeed ways (subselects etc) to have more than one
locking clause applying to the same table, which are supposed to
be merged into the appropriate wait policy by that line.

> ? In my patch I have code that
> > would give precedence to NOWAIT, though looking at it again it could be
> > simpler.
>
> I agree; if NOWAIT is present anywhere it should be preferred to                                 > SKIP_LOCKED, as it's OK to apply NOWAIT where SKIP LOCKED appears, but
> possibly semantically incorrect to apply SKIP LOCKED where only NOWAIT
> was expected.

In the new v5 patch (attached) I used the same approach as
the "lock strength" support, that is, relying on the values of
the enumeration being arranged so that the enumerators that
should take precendence have higher numerical values so I could
just write:

                rc->strength = Max(rc->strength, strength);                                                                                                                                                                                                                                      
+               rc->waitPolicy = Max(rc->waitPolicy, waitPolicy);                                                                                                                                                                                                                                

> > (One reviewer pointed out, that it should really be a single
> > unified enum. In fact I have been a bit unsure about what scope such an
> > enumeration should have in the application -- could it even be used in
> > parser code? I tried to follow existing examples which is why I used
> > #define macros in gram.y).
>
> Not sure there.

I removed those macros and went for an enumeration -- see below.

> > From a bikeshed colour point of view:
> > * I used SKIP LOCKED DATA  like DB2, and Simon used SKIP LOCKED like
> > Oracle, and I guess shorter is sweeter
>
> We have a long tradition of trying to allow noise keywords where it's
> harmless.
>
> So the clause should probably be
>
>      SKIP LOCKED [DATA]

Ok, done in the v5 patch.

> I have some isolation tester and regression tests that are still to follow.

Great, I am looking forward to seeing those.  I have added an
isolation test for NOWAIT, and I am planning to design some tests
that will test the various affected branched of
heap_lock_tuple (eg multi-transaction ID etc), and test
interaction with various other features (various lock stregths,
UPDATEs, combinations of SKIP LOCKED DATA, NOWAIT and default
policies etc).

> > * I had noWait and skipLocked travelling separately in some places,
> > Simon had a single parameter, which is much better
>
> Yes, I strongly prefer that.

I agree in principal but couldn't decide *how many* enumerations
to use.  In v5 I have added two more:

* one called LockClauseWaitPolicy defined in parsenodes.h that is
  used in LockClause and RowMarkCause

* another called RowWaitPolicy defined in plannodes.h that is
  used in PlanRowMark and ExecRowMark

The parser produces LockClauseWaitPolicy values, which InitPlan
converts to RowWaitPolicy values by simple assignment (the values
are the same, which obviously wouldn't work if this were compiled
as C++).  Then ExecLockRows converts those into LockWaitPolicy
values when calling heal_lock_tuple.

Obviously there is an opportunity to unify all three, or at least
LockClauseWaitPolicy (parser) and RowWaitPolicy (plan and
execution), but I didn't know how appropriate that would be, and
followed the new lock strength feature for inspiration.

One difference between my patch and Simon's is that mine
introduces a new return value for heap_lock_type called
HeapTupleWouldBlock, whereas his returns the existin
HeapTupleBeingUpdated and then handles it differently in
ExecLockRows if in SKIP LOCKED mode.

Best regards,
Thomas Munro

Attachment

Re: SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
Hello

As a simple example for people wondering what the point of this
feature is, I created a table work (id, data, status)
and then create 10,000 items with status 'NEW' and then started
a number of worker threads that did the following pair of
transactions, with and without SKIP LOCKED DATA on the end of the
SELECT statement, until all rows were deleted:

  BEGIN
  SELECT id, data FROM work WHERE status = 'NEW' LIMIT 1 FOR UPDATE
  -- if no rows returned, then finish
  UPDATE work SET status = 'WORK' WHERE id = $id
  COMMIT

  BEGIN
  DELETE FROM work WHERE id = $id
  COMMIT

Here are the times taken to process all items, in elapsed seconds, on
a slow laptop (i5 4 core with an SSD, with default postgresql.conf
except for checkpoint_segments set to 300):

  | Threads | default |  SKIP |
  |       1 |   26.78 | 27.02 |
  |       2 |   23.46 | 22.00 |
  |       3 |   22.02 | 14.83 |
  |       4 |   22.59 | 11.16 |
  |       5 |   22.37 |  9.05 |
  |       6 |   22.55 |  7.66 |
  |       7 |   22.46 |  6.69 |
  |       8 |   22.57 |  8.39 |
  |       9 |   22.40 |  8.38 |
  |      10 |   22.38 |  7.93 |
  |      11 |   22.43 |  6.86 |
  |      12 |   22.34 |  6.77 |

I am not experienced at benchmarking and I don't claim that this
particular workload or configuration is particularly sensible or
representative of anything but it might give some idea of the
motivation.

Best regards,
Thomas Munro

Re: SKIP LOCKED DATA (work in progress)

From
Robert Haas
Date:
On Sat, May 17, 2014 at 1:02 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

> We have a long tradition of trying to allow noise keywords where it's
> harmless.
>
> So the clause should probably be
>
>      SKIP LOCKED [DATA]
>
> in much the same way we have
>
>     BEGIN [ WORK | TRANSACTION ] ...
>
> There won't be any ambiguity there.

We've had some problems in the past where allowing optional noise
words resulted in grammar conflicts that made future features harder
to add.  See previous discussions about LOCK TABLE, wherein we almost
went to the extreme of adding a completely separate ACQUIRE LOCK
command.  A lot of these things seem harmless when you first do them,
and then later they seem less harmless.

Anyway, +1 for the general idea of this feature.  It's come up a
number of times on this mailing list, and we've had customer requests
for it, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: SKIP LOCKED DATA (work in progress)

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, May 17, 2014 at 1:02 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> We have a long tradition of trying to allow noise keywords where it's
>> harmless.
>> 
>> So the clause should probably be
>> 
>> SKIP LOCKED [DATA]
>> 
>> in much the same way we have
>> 
>> BEGIN [ WORK | TRANSACTION ] ...
>> 
>> There won't be any ambiguity there.

> We've had some problems in the past where allowing optional noise
> words resulted in grammar conflicts that made future features harder
> to add.

In this particular case, I'd be worried about whether we'd not end up
having to fully reserve DATA in order to allow it to be optional here.
That would be necessary if this clause could be followed immediately
by an identifier, either now or in the future.  That would be a mighty
high price to pay for failing to make up our minds about which syntax
to use.  (How many tables out there do you think have "data" as a column
name?)

A different concern is that this patch adds not one but two new unreserved
keywords, ie SKIP and LOCKED.  That bloats our parser tables, which are
too darn large already, and it has a nonzero compatibility cost (since
we only allow AS-less column aliases when they are no keyword at all).
If we're pulling syntax out of the air it'd be nice if we could avoid
adding new keywords to the grammar.
        regards, tom lane



Re: SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
<div dir="ltr">On 23 May 2014 15:40, Tom Lane <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>
wrote:<br/>> A different concern is that this patch adds not one but two new unreserved<br />> keywords, ie SKIP
andLOCKED.  That bloats our parser tables, which are<br /> > too darn large already, and it has a nonzero
compatibilitycost (since<br />> we only allow AS-less column aliases when they are no keyword at all).<br />> If
we'repulling syntax out of the air it'd be nice if we could avoid<br /> > adding new keywords to the grammar.<br
/> <br/>How about some of these combinations of existing words:<br /><br />EXCLUDE LOCK<br />NOWAIT EXCLUDE<br />NOWAIT
NEXT<br/>NOWAIT FOLLOWING<br />NOWAIT DISCARD<br /><br />Of those I think I prefer NOWAIT EXCLUDE (perhaps with NOWAIT
ABORTas a long version of the existing NOWAIT behaviour for contrast).<br /><br />Or adding just one new keyword:<br
/><br/>NOWAIT SKIP<br />SKIP LOCK<br /><br />Regards,<br />Thomas Munro</div> 

Re: SKIP LOCKED DATA (work in progress)

From
Simon Riggs
Date:
On 23 May 2014 10:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> If we're pulling syntax out of the air it'd be nice if we could avoid
> adding new keywords to the grammar.

Oracle, SQLServer and DB2 have this capability. MySQL does not.

SQLServer implements that using the table hint of READPAST. Since that
whole syntax area is radically different to what we have, it isn't
easy to maintain code compatibility.

DB2 z/OS 10 provides SKIP LOCKED DATA clause to allow moving past
already locked rows. That's fairly recent and I don't believe there
will be many programs using that. DB2 UDB supports some complex
functionality using DB2_SKIPINSERTED, DB2_EVALUNCOMMITTED and
DB2_SKIPDELETED, all of which is complex and mostly exists for
benchmarks, AFAICS.

Oracle uses both SKIP LOCKED and NOWAIT.

PostgreSQL already chose to follow the Oracle syntax when we
implemented NOWAIT. So my proposal is that we follow the Oracle syntax
again and use the words SKIP LOCKED.

I don't see any advantage in inventing new syntax that leaves us
incompatible with Oracle, nor do I see any need to be compatible with
both Oracle and DB2 since the latter is much less likely to gain us
anything in practice.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: SKIP LOCKED DATA (work in progress)

From
Pavel Stehule
Date:



2014-05-23 21:24 GMT+02:00 Simon Riggs <simon@2ndquadrant.com>:
On 23 May 2014 10:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> If we're pulling syntax out of the air it'd be nice if we could avoid
> adding new keywords to the grammar.

Oracle, SQLServer and DB2 have this capability. MySQL does not.

SQLServer implements that using the table hint of READPAST. Since that
whole syntax area is radically different to what we have, it isn't
easy to maintain code compatibility.

DB2 z/OS 10 provides SKIP LOCKED DATA clause to allow moving past
already locked rows. That's fairly recent and I don't believe there
will be many programs using that. DB2 UDB supports some complex
functionality using DB2_SKIPINSERTED, DB2_EVALUNCOMMITTED and
DB2_SKIPDELETED, all of which is complex and mostly exists for
benchmarks, AFAICS.

Oracle uses both SKIP LOCKED and NOWAIT.

PostgreSQL already chose to follow the Oracle syntax when we
implemented NOWAIT. So my proposal is that we follow the Oracle syntax
again and use the words SKIP LOCKED.

I don't see any advantage in inventing new syntax that leaves us
incompatible with Oracle, nor do I see any need to be compatible with
both Oracle and DB2 since the latter is much less likely to gain us
anything in practice.

+1

Pavel
 

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: SKIP LOCKED DATA (work in progress)

From
Robert Haas
Date:
On Fri, May 23, 2014 at 3:24 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> PostgreSQL already chose to follow the Oracle syntax when we
> implemented NOWAIT. So my proposal is that we follow the Oracle syntax
> again and use the words SKIP LOCKED.
>
> I don't see any advantage in inventing new syntax that leaves us
> incompatible with Oracle, nor do I see any need to be compatible with
> both Oracle and DB2 since the latter is much less likely to gain us
> anything in practice.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
On 23 May 2014 21:18, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, May 23, 2014 at 3:24 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> PostgreSQL already chose to follow the Oracle syntax when we
>> implemented NOWAIT. So my proposal is that we follow the Oracle syntax
>> again and use the words SKIP LOCKED.
>>
>> I don't see any advantage in inventing new syntax that leaves us
>> incompatible with Oracle, nor do I see any need to be compatible with
>> both Oracle and DB2 since the latter is much less likely to gain us
>> anything in practice.
>
> +1.

Hello

Please find attached a rebased version of my SKIP LOCKED
patch (formerly SKIP LOCKED DATA), updated to support only the
Oracle-like syntax.

I posted a small test program here:
https://github.com/macdice/skip-locked-tests

Would anyone who is interested in a SKIP LOCKED feature and
attending the CHAR(14)/PGDay UK conference next week be
interested in a birds-of-a-feather discussion?

Best regards,
Thomas Munro

Attachment

Re: SKIP LOCKED DATA (work in progress)

From
Simon Riggs
Date:
On 29 June 2014 10:01, Thomas Munro <munro@ip9.org> wrote:

> Would anyone who is interested in a SKIP LOCKED feature and
> attending the CHAR(14)/PGDay UK conference next week be
> interested in a birds-of-a-feather discussion?

Sounds like a plan. I'll check my schedule.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: SKIP LOCKED DATA (work in progress)

From
David Rowley
Date:
On Sun, Jun 29, 2014 at 9:01 PM, Thomas Munro <munro@ip9.org> wrote:

Please find attached a rebased version of my SKIP LOCKED
patch (formerly SKIP LOCKED DATA), updated to support only the
Oracle-like syntax.


Hi Thomas,

Apologies for taking this long to get to reviewing this, I'd gotten a bit side tracked with my own patches during this commitfest.

I'm really glad to see this patch is back again. I think it will be very useful for processing queues. I could have made good use of it in my last work, using it for sending unsent emails which were "queued" up in a table in the database.

I've so far read over the patch and done only some brief tests of the functionality.

Here's what I've picked up on so far:

* In heap_lock_tuple() there's a few places where you test the wait_policy, but you're not always doing this in the same order. The previous code did if (nolock) first each time, but since there's now 3 values I think if (wait_policy == LockWaitError) should be first each time as likely this is the most common case.

* The following small whitespace change can be removed in gram.y:

@@ -119,7 +119,6 @@ typedef struct PrivTarget
 #define CAS_NOT_VALID 0x10
 #define CAS_NO_INHERIT 0x20
 
-
 #define parser_yyerror(msg)  scanner_yyerror(msg, yyscanner)
 #define parser_errposition(pos)  scanner_errposition(pos, yyscanner)  
 
* analyze.c. rc->waitPolicy = Max(rc->waitPolicy, waitPolicy); 
I'm not quite sure I understand this yet, but I'm guessing the original code was there so that something like:

SELECT * FROM (SELECT * FROM a FOR UPDATE) a FOR UPDATE NOWAIT;
Would give:
ERROR:  could not obtain lock on row in relation "a"

So it seems that with the patch as you've defined in by the order of the enum values in LockClauseWaitPolicy that SKIP LOCKED overrides FOR UPDATE. I'm wondering if this is dangerous.

Should the following really skip locked tuples?
SELECT * FROM (SELECT * FROM a FOR UPDATE SKIP LOCKED) a FOR UPDATE LIMIT 1;

But on the other hand perhaps I've missed a discussion on this, if so then I think the following comment should be updated to explain it all:

* We also consider that NOWAIT wins if it's specified both ways. This
* is a bit more debatable but raising an error doesn't seem helpful.
* (Consider for instance SELECT FOR UPDATE NOWAIT from a view that
* internally contains a plain FOR UPDATE spec.)
*

* plannodes.h -> RowWaitPolicy waitPolicy;   /* NOWAIT and SKIP LOCKED DATA options */
Should be "NOWAIT and SKIP LOCKED options" since DATA has now been removed from the syntax.

* nodeLockRow.c has extra space in if condition: else if (erm->waitPolicy == RWP_SKIP )

I'm also wondering about this block of code in general:

if (erm->waitPolicy == RWP_WAIT)
wait_policy = LockWaitBlock;
else if (erm->waitPolicy == RWP_SKIP )
wait_policy = LockWaitSkip;
else /* erm->waitPolicy == RWP_NOWAIT */
wait_policy = LockWaitError;

Just after this code heap_lock_tuple() is called, and if that returns HeapTupleWouldBlock, the code does a goto lnext, which then will repeat that whole block again. I'm wondering why there's 2 enums that are for the same thing? if you just had 1 then you'd not need this block of code at all, you could just pass erm->waitPolicy to heap_lock_tuple().

* parsenodes.h comment does not meet project standards (http://www.postgresql.org/docs/devel/static/source-format.html)

typedef enum LockClauseWaitPolicy
{
/* order is important (see applyLockingClause which takes the greatest
  value when several wait policies have been specified), and values must
  match RowWaitPolicy from plannodes.h */

* parsenode.h remove "DATA" from LockClauseWaitPolicy waitPolicy; /* NOWAIT and SKIP LOCKED DATA */


I have noticed the /* TODO -- work out what needs to be released here */ comments in head_lock_tuple(), and perhaps the lack of cleaning up is what is causing the following:

create table skiptest (id int primary key); insert into skiptest (id) select x.x from generate_series(1,1000000) x(x);

Session 1: begin work; select * from skiptest for update limit 999999;
Session 2: select * from skiptest for update skip locked limit 1;
WARNING:  out of shared memory
ERROR:  out of shared memory
HINT:  You might need to increase max_locks_per_transaction.

Yet if I do:

session 1: begin work; select * from skiptest where id > 1 for update;
session 2:  select * from skiptest for update skip locked limit 1;
 id
----
  1
(1 row)

That test makes me think your todo comments are in the right place, something is missing there for sure!

* plays about with patch for a bit *

I don't quite understand how heap_lock_tuple works, as if I debug session 2 from the above set of queries the first call to ConditionalLockTupleTuplock() (heapam.c line 4236) succeeds where I'd have thought it would fail, since session 1 should be locking this tuple? Later at line 4527 on the line if (!ConditionalXactLockTableWait(xwait)), it fails to grab the lock and returns HeapTupleWouldBlock. The above query seems to work ok if I just apply the following patch:

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b661d0e..b3e9dcc 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -4525,8 +4525,10 @@ l3:
                                else /* wait_policy == LockWaitSkip */
                                {
                                        if (!ConditionalXactLockTableWait(xwait))
-                                               /* TODO -- work out what needs to be released here */
+                                       {
+                                               UnlockTupleTuplock(relation, tid, mode);
                                                return HeapTupleWouldBlock;
+                                       }
                                }

                                /* if there are updates, follow the update chain */

Running the test query again, I get:
select * from skiptest for update skip locked limit 1;
   id
---------
 1000000
(1 row)

Would you also be able to supply a rebased patch with current master? The applying the patch is a bit noisy and the isolation tests part does not seem to apply at all now, so I've not managed to look at those yet.

I've still got more to look at, but hopefully this will get things moving again.

Regards

David Rowley

Re: SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
On 23 July 2014 13:15, David Rowley <dgrowleyml@gmail.com> wrote:
> * In heap_lock_tuple() there's a few places where you test the wait_policy,
> but you're not always doing this in the same order. The previous code did if
> (nolock) first each time, but since there's now 3 values I think if
> (wait_policy == LockWaitError) should be first each time as likely this is
> the most common case.

Ok, I have made the them consistent -- though I put LockWaitBlock
first as that is surely the most common case (plain old FOR UPDATE).

> * The following small whitespace change can be removed in gram.y:
> [snip]

Fixed.

> * analyze.c. rc->waitPolicy = Max(rc->waitPolicy, waitPolicy);
> I'm not quite sure I understand this yet, but I'm guessing the original code
> was there so that something like:
>
> SELECT * FROM (SELECT * FROM a FOR UPDATE) a FOR UPDATE NOWAIT;
> Would give:
> ERROR:  could not obtain lock on row in relation "a"

Surely only if another session already has one or more tuples locked?

> So it seems that with the patch as you've defined in by the order of the
> enum values in LockClauseWaitPolicy that SKIP LOCKED overrides FOR UPDATE.
> I'm wondering if this is dangerous.

My understanding was that we have to choose a single policy for
attempts to lock rows in that relation, and if you've stated in at
least one place that you don't want to wait, then it makes sense for
NOWAIT to take precedence.  I had to figure out how to fit SKIP LOCKED
into that hierarchy, and figured that NOWAIT should still be the
strongest policy, overriding all others, and SKIP LOCKED should
overrule default FOR UPDATE (ie blocking).

> Should the following really skip locked tuples?
> SELECT * FROM (SELECT * FROM a FOR UPDATE SKIP LOCKED) a FOR UPDATE LIMIT 1;

I agree that it's somewhat arbitrary...  I suppose the two most
obvious options are either to reduce to a single policy using some
rule (such as the NOWAIT > SKIP LOCKED > default hierarchy I propose),
or simply reject such queries (which I'm guessing wouldn't fly at this
point since existing valid queries would become invalid).

> But on the other hand perhaps I've missed a discussion on this, if so then I
> think the following comment should be updated to explain it all:
>
> * We also consider that NOWAIT wins if it's specified both ways. This
> * is a bit more debatable but raising an error doesn't seem helpful.
> * (Consider for instance SELECT FOR UPDATE NOWAIT from a view that
> * internally contains a plain FOR UPDATE spec.)

Modified to address SKIP LOCKED (there are also couple of words to the
same effect in select.sgml).

> * plannodes.h -> RowWaitPolicy waitPolicy;   /* NOWAIT and SKIP LOCKED DATA
> options */
> Should be "NOWAIT and SKIP LOCKED options" since DATA has now been removed
> from the syntax.
>
> * nodeLockRow.c has extra space in if condition: else if (erm->waitPolicy ==
> RWP_SKIP )

Fixed.

> I'm also wondering about this block of code in general:
>
> if (erm->waitPolicy == RWP_WAIT)
> wait_policy = LockWaitBlock;
> else if (erm->waitPolicy == RWP_SKIP )
> wait_policy = LockWaitSkip;
> else /* erm->waitPolicy == RWP_NOWAIT */
> wait_policy = LockWaitError;
>
> Just after this code heap_lock_tuple() is called, and if that returns
> HeapTupleWouldBlock, the code does a goto lnext, which then will repeat that
> whole block again. I'm wondering why there's 2 enums that are for the same
> thing? if you just had 1 then you'd not need this block of code at all, you
> could just pass erm->waitPolicy to heap_lock_tuple().

True.  Somewhere upthread I mentioned the difficulty I had deciding
how many enumerations were needed, for the various subsystems, ie
which headers and type they were allowed to share.  Then I put off
working on this for so long that a nice example came along that showed
me the way: the lock strength enums LockTupleMode (heapam.h) and
RowMarkType (plannodes.h).  The wait policy enums LockWaitPolicy
(heapam.h) and RowWaitPolicy (plannodes.h) mirror them closely, and
the same type of enumeration translation takes place in nodeLockRows.c
immediately above the code you pasted.  I don't have any more
principled argument than monkey-see-monkey-do for that one...

> * parsenodes.h comment does not meet project standards
> (http://www.postgresql.org/docs/devel/static/source-format.html)
>
> typedef enum LockClauseWaitPolicy
> {
> /* order is important (see applyLockingClause which takes the greatest
>   value when several wait policies have been specified), and values must
>   match RowWaitPolicy from plannodes.h */

Fixed.

> * parsenode.h remove "DATA" from LockClauseWaitPolicy waitPolicy; /* NOWAIT
> and SKIP LOCKED DATA */

Fixed.

> I have noticed the /* TODO -- work out what needs to be released here */
> comments in head_lock_tuple(), and perhaps the lack of cleaning up is what
> is causing the following:
>
> create table skiptest (id int primary key); insert into skiptest (id) select
> x.x from generate_series(1,1000000) x(x);
>
> Session 1: begin work; select * from skiptest for update limit 999999;
> Session 2: select * from skiptest for update skip locked limit 1;
> WARNING:  out of shared memory
> ERROR:  out of shared memory
> HINT:  You might need to increase max_locks_per_transaction.
>
> Yet if I do:
>
> session 1: begin work; select * from skiptest where id > 1 for update;
> session 2:  select * from skiptest for update skip locked limit 1;
>  id
> ----
>   1
> (1 row)
>
> That test makes me think your todo comments are in the right place,
> something is missing there for sure!
>
> * plays about with patch for a bit *
>
> I don't quite understand how heap_lock_tuple works, as if I debug session 2
> from the above set of queries the first call to
> ConditionalLockTupleTuplock() (heapam.c line 4236) succeeds where I'd have
> thought it would fail, since session 1 should be locking this tuple? Later
> at line 4527 on the line if (!ConditionalXactLockTableWait(xwait)), it fails
> to grab the lock and returns HeapTupleWouldBlock. The above query seems to
> work ok if I just apply the following patch:
>
> diff --git a/src/backend/access/heap/heapam.c
> b/src/backend/access/heap/heapam.c
> index b661d0e..b3e9dcc 100644
> --- a/src/backend/access/heap/heapam.c
> +++ b/src/backend/access/heap/heapam.c
> @@ -4525,8 +4525,10 @@ l3:
>                                 else /* wait_policy == LockWaitSkip */
>                                 {
>                                         if
> (!ConditionalXactLockTableWait(xwait))
> -                                               /* TODO -- work out what
> needs to be released here */
> +                                       {
> +                                               UnlockTupleTuplock(relation,
> tid, mode);
>                                                 return HeapTupleWouldBlock;
> +                                       }
>                                 }
>
>                                 /* if there are updates, follow the update
> chain */
>
> Running the test query again, I get:
> select * from skiptest for update skip locked limit 1;
>    id
> ---------
>  1000000
> (1 row)

Ahh!  Ok, take a look now, I included that change, but also made the
unlocking conditional on have_tuple_lock, in the second two cases
where HeapTupleWouldBlock is returned.  In the first case it doesn't
look possible for it to be true.

> Would you also be able to supply a rebased patch with current master? The
> applying the patch is a bit noisy and the isolation tests part does not seem
> to apply at all now, so I've not managed to look at those yet.

Done, and attached with the above changes.

> I've still got more to look at, but hopefully this will get things moving
> again.

Thank you very much for taking the time to look at this!

Best regards,
Thomas Munro

Attachment

Re: SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
On 24 July 2014 00:52, Thomas Munro <munro@ip9.org> wrote:
> On 23 July 2014 13:15, David Rowley <dgrowleyml@gmail.com> wrote:
>> I'm also wondering about this block of code in general:
>>
>> if (erm->waitPolicy == RWP_WAIT)
>> wait_policy = LockWaitBlock;
>> else if (erm->waitPolicy == RWP_SKIP )
>> wait_policy = LockWaitSkip;
>> else /* erm->waitPolicy == RWP_NOWAIT */
>> wait_policy = LockWaitError;
>>
>> Just after this code heap_lock_tuple() is called, and if that returns
>> HeapTupleWouldBlock, the code does a goto lnext, which then will repeat that
>> whole block again. I'm wondering why there's 2 enums that are for the same
>> thing? if you just had 1 then you'd not need this block of code at all, you
>> could just pass erm->waitPolicy to heap_lock_tuple().
>
> True.  Somewhere upthread I mentioned the difficulty I had deciding
> how many enumerations were needed, for the various subsystems, ie
> which headers and type they were allowed to share.  Then I put off
> working on this for so long that a nice example came along that showed
> me the way: the lock strength enums LockTupleMode (heapam.h) and
> RowMarkType (plannodes.h).  The wait policy enums LockWaitPolicy
> (heapam.h) and RowWaitPolicy (plannodes.h) mirror them closely, and
> the same type of enumeration translation takes place in nodeLockRows.c
> immediately above the code you pasted.  I don't have any more
> principled argument than monkey-see-monkey-do for that one...

On reflection, I agree that this sucks, and I would like to unify the
three new enums in the current patch (see below for recap) into one
that can be passed between parser, planner, executor and heap access
manager code as I think you are suggesting.  My only question is: in
which header should the enum be defined, that all those modules could
include?

Best regards,
Thomas Munro


Enumeration explosion recap:

* parsenode.h defines enum LockClauseWaitPolicy, which is used in the structs LockClause and RowMarkClause, for use by
theparser code
 

* plannodes.h defines enum RowWaitPolicy, which is used in the structs PlanRowMark and ExecRowMark, for use by the
plannerand executor code (numbers coincide with LockClauseWaitPolicy)
 

* heapam.h defines enum LockWaitPolicy, which is used as a parameter to heap_lock_tuple, for use by heap access code

The parser produces LockClauseWaitPolicy values.  InitPlan
converts these to RowWaitPolicy values in execMain.c.  Then
nodeLockRows.c converts RowWaitPolicy values to LockWaitPolicy
values (by if-then-else) so it can call heap_lock_tuple.

This roughly mirrors what happens to lock strength information.
The unpatched code simply passes a boolean 'nowait' flag around.
An earlier version of my patch passed a pair of booleans around.
Simon's independent patch[1] uses an int in the various node structs
and the heap_lock_tuple function, and in execNode.h it has macros to
give names to the values, and that is included by access/heapm.h.

[1] http://www.postgresql.org/message-id/537610D5.3090405@2ndquadrant.com



Re: SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
On 24 July 2014 00:52, Thomas Munro <munro@ip9.org> wrote:
> On 23 July 2014 13:15, David Rowley <dgrowleyml@gmail.com> wrote:
>> I'm also wondering about this block of code in general:
>>
>> if (erm->waitPolicy == RWP_WAIT)
>> wait_policy = LockWaitBlock;
>> else if (erm->waitPolicy == RWP_SKIP )
>> wait_policy = LockWaitSkip;
>> else /* erm->waitPolicy == RWP_NOWAIT */
>> wait_policy = LockWaitError;
>>
>> Just after this code heap_lock_tuple() is called, and if that returns
>> HeapTupleWouldBlock, the code does a goto lnext, which then will repeat that
>> whole block again. I'm wondering why there's 2 enums that are for the same
>> thing? if you just had 1 then you'd not need this block of code at all, you
>> could just pass erm->waitPolicy to heap_lock_tuple().
>
> True.  Somewhere upthread I mentioned the difficulty I had deciding
> how many enumerations were needed, for the various subsystems, ie
> which headers and type they were allowed to share. [...]

I tried getting rid of the offending if-then-else enum conversion code
and replaced it with a simple assignment -- please see attached.  I
also added compile time assertions that the enum values line up to
make that work, and are correctly ordered for use in that 'Max'
expression.  Please let me know if you think this is an improvement or
an abomination.

I couldn't find an existing reasonable place to share a single wait
policy enumeration between parser/planner/executor and the heap access
module, and I get the feeling that it would be unacceptable to
introduce one.

I suppose that the LockClauseWaitPolicy and RowWaitPolicy could at
least be merged into a single enum defined in nodes.h following the
example of CmdType, which is also used by both parsenodes.h and
plannnode.h, but do I detect a tiny hint of reluctance in its comment,
"so put it here..."?

(The attached patch also has a couple of trivial typo fixes in
documentation and comments).

Best regards,
Thomas Munro

Attachment

Re: SKIP LOCKED DATA (work in progress)

From
David Rowley
Date:
On Sat, Jul 26, 2014 at 9:34 PM, Thomas Munro <munro@ip9.org> wrote:
I couldn't find an existing reasonable place to share a single wait
policy enumeration between parser/planner/executor and the heap access
module, and I get the feeling that it would be unacceptable to
introduce one.


I guess the way I justify it in my head is something like, "the 3 enums are for the same purpose, so having 3 exist all with different names is confusing and it makes the code harder to follow". So to fix that up I think, "oh we can just give them all the same name... But then, how can be we be sure each definition matches the other 2?" ... hmm, "just merge it into one and put it somewhere that can be accessed from everywhere."

Saying that I don't know what the project best practises are for locations for sharing such things, but if nothing exists then maybe this would be a good time to invent somewhere.

Maybe someone with more experience can chime in and give advice on this?

Regards

David Rowley

Re: SKIP LOCKED DATA (work in progress)

From
Tom Lane
Date:
Thomas Munro <munro@ip9.org> writes:
> I couldn't find an existing reasonable place to share a single wait
> policy enumeration between parser/planner/executor and the heap access
> module, and I get the feeling that it would be unacceptable to
> introduce one.

There is a precedent in the form of AclMode, which is needed throughout
the system and is currently declared in parsenodes.h.  I can't say I've
ever been particularly pleased with that arrangement though, since it
forces inclusion of parsenodes.h in many places that might not otherwise
have any interest in parse nodes.

It might be better if we'd declared AclMode in a single-purpose header,
say utils/aclmode.h, and then #include'd that into parsenodes.h.
There's certainly plenty of other single-datatype headers laying about.
        regards, tom lane



Re: SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
On 26 July 2014 15:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <munro@ip9.org> writes:
>> I couldn't find an existing reasonable place to share a single wait
>> policy enumeration between parser/planner/executor and the heap access
>> module, and I get the feeling that it would be unacceptable to
>> introduce one.
>
> There is a precedent in the form of AclMode, which is needed throughout
> the system and is currently declared in parsenodes.h.  I can't say I've
> ever been particularly pleased with that arrangement though, since it
> forces inclusion of parsenodes.h in many places that might not otherwise
> have any interest in parse nodes.
>
> It might be better if we'd declared AclMode in a single-purpose header,
> say utils/aclmode.h, and then #include'd that into parsenodes.h.
> There's certainly plenty of other single-datatype headers laying about.

Here is a new version of the patch with a single enum LockWaitPolicy
defined in utils/lockwaitpolicy.h.

Best regards,
Thomas Munro

Attachment

Re: SKIP LOCKED DATA (work in progress)

From
David Rowley
Date:
On Sun, Jul 27, 2014 at 4:49 AM, Thomas Munro <munro@ip9.org> wrote:
Here is a new version of the patch with a single enum LockWaitPolicy
defined in utils/lockwaitpolicy.h.


That seems much cleaner

A few more comments:
You seem to have lost the comment which indicates that the values of the enum are important due to the code in applyLockingClause(), but I see now instead that you've added some assert checks in applyLockingClause(), likely these should use Assert() rather than StaticAssertExpr().

I've also been looking at the isolation tests and I see that you've added a series of tests for NOWAIT. I was wondering why you did that as that's really existing code, probably if you thought the tests were a bit thin around NOWAIT then maybe that should be a separate patch?

In ExecLockRows(), is there a need to define the wait_policy variable now? It's just used once so you could probably just pass erm->waitPolicy directly to heap_lock_tuple().

I'm a bit confused at some of the code in heap_lock_tuple(). If I'm not wrong then after the line that does have_tuple_lock = true; it's never possible for have_tuple_lock to be false, but I see you've added checks to ensure we only unlock if have_tuple_lock is true. I'm thinking you probably did this because in the goto failed situation the check is done, but I was thinking that was perhaps put there in case a goto jump was added before have_tuple_lock is set to true. I'm wondering if it would be ok just to replace the test with an Assert() instead, or maybe just no check. 

Also, I'm just looking at some of the changes that you've done to function signatures... I see quite a few of them are now beyond 80 chars wide (see http://www.postgresql.org/docs/devel/static/source-format.html). 

Regards

David Rowley

Re: SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
On 27 July 2014 14:31, David Rowley <dgrowleyml@gmail.com> wrote:
> On Sun, Jul 27, 2014 at 4:49 AM, Thomas Munro <munro@ip9.org> wrote:
>>
>> Here is a new version of the patch with a single enum LockWaitPolicy
>> defined in utils/lockwaitpolicy.h.
>>
>
> That seems much cleaner
>
> A few more comments:
> You seem to have lost the comment which indicates that the values of the
> enum are important due to the code in applyLockingClause(), but I see now
> instead that you've added some assert checks in applyLockingClause(), likely
> these should use Assert() rather than StaticAssertExpr().

Here's a new version with explicit numerical values and a comment in
lockwaitpolicy.h to explain that the order is important and point to
the relevant code.  The assertions are about the relationship between
constant values known at compile time, so why would we want a runtime
assertion?  I have changed it from StaticAssertExpr to
StaticAssertStmt though.

> I've also been looking at the isolation tests and I see that you've added a
> series of tests for NOWAIT. I was wondering why you did that as that's
> really existing code, probably if you thought the tests were a bit thin
> around NOWAIT then maybe that should be a separate patch?

Since I was meddling with code that controls both NOWAIT and SKIP
LOCKED, I wanted to convince myself that I had not broken NOWAIT using
at least a basic smoke test .  I suppose by the same logic I should
also wite isolation tests for default blocking FOR UPDATE...  Ok, I've
taken nowait.spec out for now.

On the subject of isolation tests, I think skip-locked.spec is only
producing schedules that reach third of the three 'return
HeapTupleWouldBlock' statements in heap_lock_tuple.  I will follow up
with some more thorough isolation tests in the next week or so to
cover the other two, and some other scenarios and interactions with
other feature.

> In ExecLockRows(), is there a need to define the wait_policy variable now?
> It's just used once so you could probably just pass erm->waitPolicy directly
> to heap_lock_tuple().

Fixed.

> I'm a bit confused at some of the code in heap_lock_tuple(). If I'm not
> wrong then after the line that does have_tuple_lock = true; it's never
> possible for have_tuple_lock to be false, but I see you've added checks to
> ensure we only unlock if have_tuple_lock is true. I'm thinking you probably
> did this because in the goto failed situation the check is done, but I was
> thinking that was perhaps put there in case a goto jump was added before
> have_tuple_lock is set to true. I'm wondering if it would be ok just to
> replace the test with an Assert() instead, or maybe just no check.

Right, I have removed the redundant conditionals.

> Also, I'm just looking at some of the changes that you've done to function
> signatures... I see quite a few of them are now beyond 80 chars wide (see
> http://www.postgresql.org/docs/devel/static/source-format.html).

Fixed.

Best regards,
Thomas Munro

Attachment

Re: SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
On 27 July 2014 23:19, Thomas Munro <munro@ip9.org> wrote:
> On the subject of isolation tests, I think skip-locked.spec is only
> producing schedules that reach third of the three 'return
> HeapTupleWouldBlock' statements in heap_lock_tuple.  I will follow up
> with some more thorough isolation tests in the next week or so to
> cover the other two, and some other scenarios and interactions with
> other feature.

Now with extra isolation tests so that the three different code
branches that can skip rows are covered.  I temporarily added some
logging lines to double check that the expected branches are reached
by each permutation while developing the specs.  They change the
output and are not part of the patch -- attaching separately.

Attachment

Re: SKIP LOCKED DATA (work in progress)

From
Alvaro Herrera
Date:
Tom Lane wrote:

> It might be better if we'd declared AclMode in a single-purpose header,
> say utils/aclmode.h, and then #include'd that into parsenodes.h.
> There's certainly plenty of other single-datatype headers laying about.

Do you mean src/include/datatype/aclmode.h?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: SKIP LOCKED DATA (work in progress)

From
Alvaro Herrera
Date:
David Rowley wrote:

> I've also been looking at the isolation tests and I see that you've added a
> series of tests for NOWAIT. I was wondering why you did that as that's
> really existing code, probably if you thought the tests were a bit thin
> around NOWAIT then maybe that should be a separate patch?

The isolation tester is new so we don't have nearly enough tests for it.
Adding more meaningful tests is good even if they're unrelated to the
patch at hand.

FWIW you can use configure --enable-coverage and "make coverage-html" to
get coverage reports.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: SKIP LOCKED DATA (work in progress)

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> It might be better if we'd declared AclMode in a single-purpose header,
>> say utils/aclmode.h, and then #include'd that into parsenodes.h.
>> There's certainly plenty of other single-datatype headers laying about.

> Do you mean src/include/datatype/aclmode.h?

I was thinking src/include/utils/, actually, but maybe datatype/ would
be a good choice.

OTOH, what we've got in there now is just timestamp.h, and IIRC it was put
there because it needed to be accessible from both frontend and backend
contexts.  That would not be true of aclmode.h, so perhaps aclmode.h
doesn't belong there.
        regards, tom lane



Re: SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
On 29 July 2014 02:35, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> David Rowley wrote:
>
>> I've also been looking at the isolation tests and I see that you've added a
>> series of tests for NOWAIT. I was wondering why you did that as that's
>> really existing code, probably if you thought the tests were a bit thin
>> around NOWAIT then maybe that should be a separate patch?
>
> The isolation tester is new so we don't have nearly enough tests for it.
> Adding more meaningful tests is good even if they're unrelated to the
> patch at hand.

Here are my isolation tests for NOWAIT as a separate patch,
independent of SKIP LOCKED.  They cover the tuple lock, regular row
lock and multixact row lock cases.  I guess this might be called white
box testing, since it usese knowledge of how to construct schedules
that hit the three interesting code paths that trigger the error, even
though you can't see from the output why the error was raised in each
case without extra instrumentation (though it did cross my mind that
it could be interesting at the very least for testing if the error
message were different in each case).  If there are no objections I
will add this to the next commitfest.

Best regards
Thomas Munro

Attachment

Re: SKIP LOCKED DATA (work in progress)

From
David Rowley
Date:
On Tue, Jul 29, 2014 at 1:35 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
David Rowley wrote:

> I've also been looking at the isolation tests and I see that you've added a
> series of tests for NOWAIT. I was wondering why you did that as that's
> really existing code, probably if you thought the tests were a bit thin
> around NOWAIT then maybe that should be a separate patch?

The isolation tester is new so we don't have nearly enough tests for it.
Adding more meaningful tests is good even if they're unrelated to the
patch at hand.


I completely agree that some more isolation tests coverage would be a good thing. I just saw it as something not directly related to this feature, so thought it would be better as a separate patch. From my experience with the project, normally when I try to sneak something extra in, it either does not make the final commit, or gets added in a separate commit.
 
Regards

David Rowley

Re: SKIP LOCKED DATA (work in progress)

From
David Rowley
Date:
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Tue, Jul 29, 2014 at 9:48 AM, Thomas Munro <span
dir="ltr"><<ahref="mailto:munro@ip9.org" target="_blank">munro@ip9.org</a>></span> wrote:<br /><blockquote
class="gmail_quote"style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div
class="">On27 July 2014 23:19, Thomas Munro <<a href="mailto:munro@ip9.org">munro@ip9.org</a>> wrote:<br /> >
Onthe subject of isolation tests, I think skip-locked.spec is only<br /> > producing schedules that reach third of
thethree 'return<br /> > HeapTupleWouldBlock' statements in heap_lock_tuple.  I will follow up<br /> > with some
morethorough isolation tests in the next week or so to<br /> > cover the other two, and some other scenarios and
interactionswith<br /> > other feature.<br /><br /></div>Now with extra isolation tests so that the three different
code<br/> branches that can skip rows are covered.  I temporarily added some<br /> logging lines to double check that
theexpected branches are reached<br /> by each permutation while developing the specs.  They change the<br /> output
andare not part of the patch -- attaching separately.<br /></blockquote></div><br /></div><div class="gmail_extra">I've
hada look over the isolation tests now and I can't see too much wrong, just a couple of typos... </div><div
class="gmail_extra"><br/></div><div class="gmail_extra"><div class="gmail_extra">* skip-locked-2.spec</div><div
class="gmail_extra"><br/></div><div class="gmail_extra"># s2 againt skips because it can't acquired a multi-xact
lock</div><divclass="gmail_extra"><br /></div><div class="gmail_extra"> "againt" should be "again"</div><div
class="gmail_extra"><br/></div><div class="gmail_extra">also mixed use of multixact and multi-xact, probably would be
betterto stick to just 1.</div><div class="gmail_extra"><br /></div><div class="gmail_extra">Also this file would
probablybe slightly easier to read with a new line after each "permutation" line.</div><div class="gmail_extra"><br
/></div><divclass="gmail_extra">* skip_locked-3.spec</div><div class="gmail_extra"><br /></div><div
class="gmail_extra">#s3 skips to second record due to tuple lock held by s2</div><div class="gmail_extra"><br
/></div><divclass="gmail_extra">There's a missing "the" after "skips to"</div><div class="gmail_extra"><br /></div><div
class="gmail_extra">Also,won't the lock be held by s1 not s2?</div><div class="gmail_extra"><br /></div><div
class="gmail_extra">There'sjust a couple of other tiny things.</div><div class="gmail_extra"><br /></div><div
class="gmail_extra"><divclass="gmail_extra">* Some whitespace issues shown by git diff --check</div><div
class="gmail_extra"><br/></div><div class="gmail_extra">src/backend/parser/gram.y:9221: trailing whitespace.</div><div
class="gmail_extra">+opt_nowait_or_skip:</div><divclass="gmail_extra">src/backend/rewrite/rewriteHandler.c:65: trailing
whitespace.</div><divclass="gmail_extra">+                                                              
LockClauseStrengthstrength, LockWaitPolicy waitPolicy,</div><div class="gmail_extra"><br /></div><div
class="gmail_extra">*analyze.c</div><div class="gmail_extra"><br /></div><div class="gmail_extra">The
StaticAssertStmt'sI think these should be removed. The only place where this behaviour can be changed</div><div
class="gmail_extra">isin lockwaitpolicy.h, I think it would be better to just strengthen the comment on the enum's
definition.</div><divclass="gmail_extra"><br /></div><div class="gmail_extra">Perhaps something more along the lines
of:</div><divclass="gmail_extra"><br /></div><div class="gmail_extra">Policy for what to do when a row lock cannot be
obtainedimmediately.</div><div class="gmail_extra"><br /></div><div class="gmail_extra">The enum values defined here
havecritical importance to how the parser</div><div class="gmail_extra">treats multiple FOR UPDATE/SHARE statements in
differentnested levels of</div><div class="gmail_extra">the query. Effectively if multiple locking clauses are defined
inthe query</div><div class="gmail_extra"> then the one with the highest enum value takes precedence over the
others.</div><divclass="gmail_extra"><br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra">Apart
fromthis I can't see any other problems with the patch and I'd be very inclined, once the above are fixed up to mark
thepatch ready for commiter.</div><div class="gmail_extra"><br /></div><div class="gmail_extra">Good work</div><div
class="gmail_extra"><br/></div><div class="gmail_extra">Regards</div><div class="gmail_extra"><br /></div><div
class="gmail_extra">DavidRowley </div></div></div></div> 

Re: SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
On 1 August 2014 10:37, David Rowley <dgrowleyml@gmail.com> wrote:
> * skip-locked-2.spec
>
> # s2 againt skips because it can't acquired a multi-xact lock
>
> "againt" should be "again"

Fixed.

> also mixed use of multixact and multi-xact, probably would be better to
> stick to just 1.

Changed to multixact as seen in other places.

> Also this file would probably be slightly easier to read with a new line
> after each "permutation" line.

Done.

> * skip_locked-3.spec
>
> # s3 skips to second record due to tuple lock held by s2
>
> There's a missing "the" after "skips to"

Fixed.

> Also, won't the lock be held by s1 not s2?

s1 holds the row lock, and s2 holds the tuple lock (because it is at
the head of the queue waiting for the row lock).  s3 skips because it
couldn't acquire the tuple lock held by s2.  The other two specs are
about not being able to acquire a row lock and only need two sessions.
This one tests the code path when there is already a queue forming and
you can't even get the tuple lock, which requires an extra session.  I
have updated the comment to make that clearer.

> There's just a couple of other tiny things.
>
> * Some whitespace issues shown by git diff --check
>
> src/backend/parser/gram.y:9221: trailing whitespace.
> +opt_nowait_or_skip:
> src/backend/rewrite/rewriteHandler.c:65: trailing whitespace.
> +
> LockClauseStrength strength, LockWaitPolicy waitPolicy,

Fixed.

> * analyze.c
>
> The StaticAssertStmt's I think these should be removed. The only place where
> this behaviour can be changed
> is in lockwaitpolicy.h, I think it would be better to just strengthen the
> comment on the enum's definition.

Removed.

> Perhaps something more along the lines of:
>
> Policy for what to do when a row lock cannot be obtained immediately.
>
> The enum values defined here have critical importance to how the parser
> treats multiple FOR UPDATE/SHARE statements in different nested levels of
> the query. Effectively if multiple locking clauses are defined in the query
> then the one with the highest enum value takes precedence over the others.

Added something along those lines.

> Apart from this I can't see any other problems with the patch and I'd be
> very inclined, once the above are fixed up to mark the patch ready for
> commiter.
>
> Good work

Thanks for all the guidance, I appreciate it!  My review karma account
is now well overdrawn.

Best regards,
Thomas Munro

Attachment

Re: SKIP LOCKED DATA (work in progress)

From
David Rowley
Date:
On Sat, Aug 2, 2014 at 3:55 AM, Thomas Munro <munro@ip9.org> wrote:
On 1 August 2014 10:37, David Rowley <dgrowleyml@gmail.com> wrote:
> Apart from this I can't see any other problems with the patch and I'd be
> very inclined, once the above are fixed up to mark the patch ready for
> commiter.
>
> Good work

Thanks for all the guidance, I appreciate it!  My review karma account
is now well overdrawn.

Ok, then I have nothing more so it's time to pass this one along.

The only notes I can think to leave for the commiter would be around the precedence order of the lock policy, especially around a query such as:

SELECT * FROM (SELECT * FROM a FOR UPDATE SKIP LOCKED) a FOR UPDATE; -- skip locked wins

Of course the current behaviour is that NOWAIT wins over the standard FOR UPDATE, but with NOWAIT, there's only a chance of an error, there's no chance of giving incorrect results. 

I checked what Oracle did in this situation and I see that they completely disallow FOR UPDATE inside of views and subqueries.

I could see an argument here that the outer most FOR UPDATE clause should be used, but I guess that ship has sailed when NOWAIT was added.

Marking as ready for commiter.

Regards

David Rowley

Re: SKIP LOCKED DATA (work in progress)

From
Alvaro Herrera
Date:
David Rowley wrote:

> The only notes I can think to leave for the commiter would be around the
> precedence order of the lock policy, especially around a query such as:
> 
> SELECT * FROM (SELECT * FROM a FOR UPDATE SKIP LOCKED) a FOR UPDATE; --
> skip locked wins
> 
> Of course the current behaviour is that NOWAIT wins over the standard FOR
> UPDATE, but with NOWAIT, there's only a chance of an error, there's no
> chance of giving incorrect results.

Another option is to throw an error at parse analysis time if there is a
conflict in the specified locking policies, as in the above case.  Are
there cases in which it would make sense to have one clause trump the
other?  It seems reasonable to have NOWAIT trump regular FOR UPDATE (as
it already does), since, as you say, there's chance of error being
thrown at runtime, but not of incorrect result.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: SKIP LOCKED DATA (work in progress)

From
Alvaro Herrera
Date:
One thing I just noticed is that we uselessly set an error context
callback when "waiting" in ConditionalMultiXactIdWait, which is pretty
useless (because we don't actually wait there at all) -- we don't set
one in ConditionalXactLockTableWait, which makes sense, but for some
reason I failed to realize this in review of f88d4cfc9.  I think I will
take that out instead of cluttering this patch with it.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: SKIP LOCKED DATA (work in progress)

From
Alvaro Herrera
Date:
heap_lock_tuple() has the following comment on top:

 * In the failure cases, the routine fills *hufd with the tuple's t_ctid,
 * t_xmax (resolving a possible MultiXact, if necessary), and t_cmax
 * (the last only for HeapTupleSelfUpdated, since we
 * cannot obtain cmax from a combocid generated by another transaction).
 * See comments for struct HeapUpdateFailureData for additional info.

With the patch as submitted, this struct is not filled in the
HeapTupleWouldBlock case.  I'm not sure this is okay, though I admit the
only caller that passes LockWaitSkip doesn't care, so maybe we could
just ignore the issue (after properly modifying the comment).  It seems
easy to add a LockBuffer() and "goto failed" rather than a return; that
would make that failure case conform to HeapTupleUpdated and
HeapTupleSelfUpdated.  OTOH it might be pointless to worry about what
would be essentially dead code currently ...


Did you consider heap_lock_updated_tuple?  A rationale for saying it
doesn't need to pay attention to the wait policy is: if you're trying to
lock-skip-locked an updated tuple, then you either skip it because its
updater is running, or you return it because it's no longer running; and
if you return it, it is not possible for the updater to be locking the
updated version.  However, what if there's a third transaction that
locked the updated version?  It might be difficult to hit this case or
construct an isolationtester spec file though, since there's a narrow
window you need to race to.

I also pgindented.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
On 22 August 2014 23:02, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> heap_lock_tuple() has the following comment on top:
>
>  * In the failure cases, the routine fills *hufd with the tuple's t_ctid,
>  * t_xmax (resolving a possible MultiXact, if necessary), and t_cmax
>  * (the last only for HeapTupleSelfUpdated, since we
>  * cannot obtain cmax from a combocid generated by another transaction).
>  * See comments for struct HeapUpdateFailureData for additional info.
>
> With the patch as submitted, this struct is not filled in the
> HeapTupleWouldBlock case.  I'm not sure this is okay, though I admit the
> only caller that passes LockWaitSkip doesn't care, so maybe we could
> just ignore the issue (after properly modifying the comment).  It seems
> easy to add a LockBuffer() and "goto failed" rather than a return; that
> would make that failure case conform to HeapTupleUpdated and
> HeapTupleSelfUpdated.  OTOH it might be pointless to worry about what
> would be essentially dead code currently ...

Thanks Alvaro.

Forgive me if I have misunderstood but it looks like your incremental
patch included a couple of unrelated changes, namely
s/0/InvalidCommandId/ and a reversion of ConditionalMultiXactIdWait.
Undoing those gave me skip-locked-v12-b.patch, attached for reference,
and I've included those changes in a new full patch
skip-locked-v13.patch (also rebased).

+1 for the change from if-then-else to switch statements.

I was less sure about the 'goto failed' change, but I couldn't measure
any change in concurrent throughput in my simple benchmark, so I guess
that extra buffer lock/unlock doesn't matter and I can see why you
prefer that control flow.

> Did you consider heap_lock_updated_tuple?  A rationale for saying it
> doesn't need to pay attention to the wait policy is: if you're trying to
> lock-skip-locked an updated tuple, then you either skip it because its
> updater is running, or you return it because it's no longer running; and
> if you return it, it is not possible for the updater to be locking the
> updated version.  However, what if there's a third transaction that
> locked the updated version?  It might be difficult to hit this case or
> construct an isolationtester spec file though, since there's a narrow
> window you need to race to.

Hmm.  I will look into this, thanks.

Best regards,
Thomas Munro

Attachment

Re: SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
On 24 August 2014 22:04, Thomas Munro <munro@ip9.org> wrote:
> On 22 August 2014 23:02, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> Did you consider heap_lock_updated_tuple?  A rationale for saying it
>> doesn't need to pay attention to the wait policy is: if you're trying to
>> lock-skip-locked an updated tuple, then you either skip it because its
>> updater is running, or you return it because it's no longer running; and
>> if you return it, it is not possible for the updater to be locking the
>> updated version.  However, what if there's a third transaction that
>> locked the updated version?  It might be difficult to hit this case or
>> construct an isolationtester spec file though, since there's a narrow
>> window you need to race to.
>
> Hmm.  I will look into this, thanks.

While trying to produce the heap_lock_updated_tuple_rec case you
describe (so far unsuccessfully), I discovered I could make SELECT ...
FOR UPDATE NOWAIT block indefinitely on unpatched 9.3 in a different
code path after heap_lock_tuple returns: in another session, UPDATE,
COMMIT, then UPDATE, all after the first session has taken its
snapshot but before it tries to lock a given row.  The code in
EvalPlanQualFetch (reached from the HeapTupleUpdated case in
ExecLockRow) finishes up waiting for the uncommitted transaction.

I think I see how to teach EvalPlanQualFetch how to handle wait
policies: for NOWAIT it should ereport (fixing a pre-existing bug
(?)), and I guess it should handle SKIP LOCKED by returning NULL,
similarly to the way it handles deleted rows, and of course in all
cases passing the wait policy forward to heap_lock_tuple, which it
eventually calls.

Still looking at heap_lock_updated_tuple.

The difficulty of course will be testing all these racy cases reproducibly...

Best regards,
Thomas Munro



Re: SKIP LOCKED DATA (work in progress)

From
Craig Ringer
Date:
On 08/25/2014 09:44 AM, Thomas Munro wrote:
> On 24 August 2014 22:04, Thomas Munro <munro@ip9.org> wrote:
>> On 22 August 2014 23:02, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>>> Did you consider heap_lock_updated_tuple?  A rationale for saying it
>>> doesn't need to pay attention to the wait policy is: if you're trying to
>>> lock-skip-locked an updated tuple, then you either skip it because its
>>> updater is running, or you return it because it's no longer running; and
>>> if you return it, it is not possible for the updater to be locking the
>>> updated version.  However, what if there's a third transaction that
>>> locked the updated version?  It might be difficult to hit this case or
>>> construct an isolationtester spec file though, since there's a narrow
>>> window you need to race to.
>>
>> Hmm.  I will look into this, thanks.
> 
> While trying to produce the heap_lock_updated_tuple_rec case you
> describe (so far unsuccessfully), I discovered I could make SELECT ...
> FOR UPDATE NOWAIT block indefinitely on unpatched 9.3 in a different
> code path after heap_lock_tuple returns: in another session, UPDATE,
> COMMIT, then UPDATE, all after the first session has taken its
> snapshot but before it tries to lock a given row.  The code in
> EvalPlanQualFetch (reached from the HeapTupleUpdated case in
> ExecLockRow) finishes up waiting for the uncommitted transaction.

I think that's the issue Andres and I patched for 9.3, but I don't know
if it got committed.

I'll need to have a look. A search in the archives for heap_lock_tuple
and nowait might be informative.

> The difficulty of course will be testing all these racy cases reproducibly...

Yep, especially as isolationtester can only really work at the statement
level, and can only handle one blocked connection at a time.

It's possible a helper extension could be used - set up some locks in
shmem, register two sessions for different "test roles" within a given
test to install appropriate hooks, wait until they're both blocked on
the locks the hooks acquire, then release the locks.

That might land up with hook points scattered everywhere, but they could
be some pretty minimal macros at least.

IMO that's a separate project, though. For this kind of testing I've
tended to just set conditional breakpoints in gdb, wait until they trap,
then once everything's lined up release the breakpoints in both sessions
as simultaneously as possible. Not perfect, but has tended to work.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: SKIP LOCKED DATA (work in progress)

From
Alvaro Herrera
Date:
Thomas Munro wrote:

> While trying to produce the heap_lock_updated_tuple_rec case you
> describe (so far unsuccessfully), I discovered I could make SELECT ...
> FOR UPDATE NOWAIT block indefinitely on unpatched 9.3 in a different
> code path after heap_lock_tuple returns: in another session, UPDATE,
> COMMIT, then UPDATE, all after the first session has taken its
> snapshot but before it tries to lock a given row.  The code in
> EvalPlanQualFetch (reached from the HeapTupleUpdated case in
> ExecLockRow) finishes up waiting for the uncommitted transaction.

Interesting -- perhaps this helps explain the deadlock issue reported in
http://www.postgresql.org/message-id/CAKrjmhdN+GhAjNwqfHsOtGp+7YN27zR79m99RcAJMNazt5NJrA@mail.gmail.com

> I think I see how to teach EvalPlanQualFetch how to handle wait
> policies: for NOWAIT it should ereport (fixing a pre-existing bug
> (?)), and I guess it should handle SKIP LOCKED by returning NULL,
> similarly to the way it handles deleted rows, and of course in all
> cases passing the wait policy forward to heap_lock_tuple, which it
> eventually calls.
> 
> Still looking at heap_lock_updated_tuple.
> 
> The difficulty of course will be testing all these racy cases reproducibly...

Does this help?
http://www.postgresql.org/message-id/51FB4305.3070600@2ndquadrant.com
The useful trick there is forcing a query to get its snapshot and then
go to sleep before actually doing anything, by way of an advisory lock.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: SKIP LOCKED DATA (work in progress)

From
Alvaro Herrera
Date:
Thomas Munro wrote:
> On 22 August 2014 23:02, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> Forgive me if I have misunderstood but it looks like your incremental
> patch included a couple of unrelated changes, namely
> s/0/InvalidCommandId/ and a reversion of ConditionalMultiXactIdWait.

Yeah, sorry about those, will push separately.

> Undoing those gave me skip-locked-v12-b.patch, attached for reference,
> and I've included those changes in a new full patch
> skip-locked-v13.patch (also rebased).
>
> +1 for the change from if-then-else to switch statements.
>
> I was less sure about the 'goto failed' change, but I couldn't measure
> any change in concurrent throughput in my simple benchmark, so I guess
> that extra buffer lock/unlock doesn't matter and I can see why you
> prefer that control flow.

I was also thinking in reducing the lock level acquired to shared rather
than exclusive in all the paths that "goto failed".  Since the lock is
only used to read a couple of fields from the tuple, shared is enough
and should give slightly better concurrency.  Per buffer access rules in
src/backend/storage/buffer/README:

: 1. To scan a page for tuples, one must hold a pin and either shared or
: exclusive content lock.  To examine the commit status (XIDs and status bits)
: of a tuple in a shared buffer, one must likewise hold a pin and either shared
: or exclusive lock.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: SKIP LOCKED DATA (work in progress)

From
Heikki Linnakangas
Date:
On 07/31/2014 12:29 AM, Thomas Munro wrote:
> On 29 July 2014 02:35, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> David Rowley wrote:
>>
>>> I've also been looking at the isolation tests and I see that you've added a
>>> series of tests for NOWAIT. I was wondering why you did that as that's
>>> really existing code, probably if you thought the tests were a bit thin
>>> around NOWAIT then maybe that should be a separate patch?
>>
>> The isolation tester is new so we don't have nearly enough tests for it.
>> Adding more meaningful tests is good even if they're unrelated to the
>> patch at hand.
>
> Here are my isolation tests for NOWAIT as a separate patch,
> independent of SKIP LOCKED.  They cover the tuple lock, regular row
> lock and multixact row lock cases.

Thanks, committed.

> I guess this might be called white
> box testing, since it usese knowledge of how to construct schedules
> that hit the three interesting code paths that trigger the error, even
> though you can't see from the output why the error was raised in each
> case without extra instrumentation (though it did cross my mind that
> it could be interesting at the very least for testing if the error
> message were different in each case).

Yeah, seems reasonable. This kind of tests might become obsolete in the 
future if the internals change a lot, so that e.g. we don't use 
multixids anymore. But even if that happens, there's little harm in 
keeping the tests, and meanwhile it's good to have coverage of these cases.

- Heikki




Re: SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
On 25 August 2014 02:57, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Thomas Munro wrote:
>> The difficulty of course will be testing all these racy cases reproducibly...
>
> Does this help?
> http://www.postgresql.org/message-id/51FB4305.3070600@2ndquadrant.com
> The useful trick there is forcing a query to get its snapshot and then
> go to sleep before actually doing anything, by way of an advisory lock.

Yes it does, thanks Alvaro and Craig.  I think the attached spec
reproduces the problem using that trick, ie shows NOWAIT blocking,
presumably in EvalPlanQualFetch (though I haven't stepped through it
with a debugger yet).  I'm afraid I'm out of Postgres hacking cycles
for a few days, but next weekend I should have a new patch that fixes
this by teaching EvalPlanQualFetch about wait policies, with isolation
tests for NOWAIT and SKIP LOCKED.

Best regards,
Thomas Munro

Attachment

Re: SKIP LOCKED DATA (work in progress)

From
Alvaro Herrera
Date:
Thomas Munro wrote:
> On 25 August 2014 02:57, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > Thomas Munro wrote:
> >> The difficulty of course will be testing all these racy cases reproducibly...
> >
> > Does this help?
> > http://www.postgresql.org/message-id/51FB4305.3070600@2ndquadrant.com
> > The useful trick there is forcing a query to get its snapshot and then
> > go to sleep before actually doing anything, by way of an advisory lock.
> 
> Yes it does, thanks Alvaro and Craig.  I think the attached spec
> reproduces the problem using that trick, ie shows NOWAIT blocking,
> presumably in EvalPlanQualFetch (though I haven't stepped through it
> with a debugger yet).  I'm afraid I'm out of Postgres hacking cycles
> for a few days, but next weekend I should have a new patch that fixes
> this by teaching EvalPlanQualFetch about wait policies, with isolation
> tests for NOWAIT and SKIP LOCKED.

Hmm, http://www.postgresql.org/message-id/51FB6703.9090801@2ndquadrant.com


-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
On 27 August 2014 17:18, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Thomas Munro wrote:
>> On 25 August 2014 02:57, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> > Thomas Munro wrote:
>> >> The difficulty of course will be testing all these racy cases reproducibly...
>> >
>> > Does this help?
>> > http://www.postgresql.org/message-id/51FB4305.3070600@2ndquadrant.com
>> > The useful trick there is forcing a query to get its snapshot and then
>> > go to sleep before actually doing anything, by way of an advisory lock.
>>
>> Yes it does, thanks Alvaro and Craig.  I think the attached spec
>> reproduces the problem using that trick, ie shows NOWAIT blocking,
>> presumably in EvalPlanQualFetch (though I haven't stepped through it
>> with a debugger yet).  I'm afraid I'm out of Postgres hacking cycles
>> for a few days, but next weekend I should have a new patch that fixes
>> this by teaching EvalPlanQualFetch about wait policies, with isolation
>> tests for NOWAIT and SKIP LOCKED.
>
> Hmm, http://www.postgresql.org/message-id/51FB6703.9090801@2ndquadrant.com

Thanks, I hadn't seen this, I should have checked the archives better.
I have actually already updated my patch to handle EvalPlanQualFetch
with NOWAIT and SKIP LOCKED with isolation specs, see attached.  I
will compare with Craig's and see if I screwed anything up... of
course I am happy to merge and submit a new patch on top of Craig's if
it's going to be committed.

I haven't yet figured out how to get get into a situation where
heap_lock_updated_tuple_rec waits.

Best regards,
Thomas Munro

Attachment

Re: SKIP LOCKED DATA (work in progress)

From
Alvaro Herrera
Date:
Thomas Munro wrote:
> On 27 August 2014 17:18, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > Thomas Munro wrote:

> >> Yes it does, thanks Alvaro and Craig.  I think the attached spec
> >> reproduces the problem using that trick, ie shows NOWAIT blocking,
> >> presumably in EvalPlanQualFetch (though I haven't stepped through it
> >> with a debugger yet).  I'm afraid I'm out of Postgres hacking cycles
> >> for a few days, but next weekend I should have a new patch that fixes
> >> this by teaching EvalPlanQualFetch about wait policies, with isolation
> >> tests for NOWAIT and SKIP LOCKED.
> >
> > Hmm, http://www.postgresql.org/message-id/51FB6703.9090801@2ndquadrant.com
> 
> Thanks, I hadn't seen this, I should have checked the archives better.
> I have actually already updated my patch to handle EvalPlanQualFetch
> with NOWAIT and SKIP LOCKED with isolation specs, see attached.  I
> will compare with Craig's and see if I screwed anything up... of
> course I am happy to merge and submit a new patch on top of Craig's if
> it's going to be committed.

I tried Craig's patch with your test case and found that it stalls in
XactLockTableWait inside EPQFetch because it doesn't throw an error in
the noWait case before waiting.  I think I will fix that and push,
including both test cases.  Then we can see about rebasing your patch.

I am wondering about backpatching Craig's fix.  It looks to me like it
should be backpatched as far back as NOWAIT exists, but that was in 8.1
and we haven't ever gotten a complaint until Craig's report AFAIK, which
I understand wasn't coming from a user finding a problem but rather some
new development.  So I hesitate.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: SKIP LOCKED DATA (work in progress)

From
Alvaro Herrera
Date:
Thomas Munro wrote:

> Thanks, I hadn't seen this, I should have checked the archives better.
> I have actually already updated my patch to handle EvalPlanQualFetch
> with NOWAIT and SKIP LOCKED with isolation specs, see attached.  I
> will compare with Craig's and see if I screwed anything up... of
> course I am happy to merge and submit a new patch on top of Craig's if
> it's going to be committed.

Thanks, please rebase.

> I haven't yet figured out how to get get into a situation where
> heap_lock_updated_tuple_rec waits.

Well, as I think I said in the first post I mentioned this, maybe there
is no such situation.  In any case, like the EvalPlanQualFetch issue, we
can fix it later if we find it.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
On 28 August 2014 00:25, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Thomas Munro wrote:
>
>> Thanks, I hadn't seen this, I should have checked the archives better.
>> I have actually already updated my patch to handle EvalPlanQualFetch
>> with NOWAIT and SKIP LOCKED with isolation specs, see attached.  I
>> will compare with Craig's and see if I screwed anything up... of
>> course I am happy to merge and submit a new patch on top of Craig's if
>> it's going to be committed.
>
> Thanks, please rebase.

Thanks -- new patch attached.

Best regards,
Thomas Munro

Attachment

Re: SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
On 28 August 2014 00:25, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Thomas Munro wrote:
>> I haven't yet figured out how to get get into a situation where
>> heap_lock_updated_tuple_rec waits.
>
> Well, as I think I said in the first post I mentioned this, maybe there
> is no such situation.  In any case, like the EvalPlanQualFetch issue, we
> can fix it later if we find it.

I finally came up with a NOWAIT spec that reaches
heap_lock_updated_rec and then blocks.  I can't explain why exactly...
but please see attached.  The fix seems fairly straightforward.  Do
you think I should submit an independent patch to fix this case (well
there are really two cases, since there is a separate multixact path)
for the existing NOWAIT support and then tackle the SKIP LOCKED
equivalent separately?

Best regards,
Thomas Munro

Attachment

Re: SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
On 31 August 2014 01:36, Thomas Munro <munro@ip9.org> wrote:
> On 28 August 2014 00:25, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> Thomas Munro wrote:
>>> I haven't yet figured out how to get get into a situation where
>>> heap_lock_updated_tuple_rec waits.
>>
>> Well, as I think I said in the first post I mentioned this, maybe there
>> is no such situation.  In any case, like the EvalPlanQualFetch issue, we
>> can fix it later if we find it.
>
> I finally came up with a NOWAIT spec that reaches
> heap_lock_updated_rec and then blocks.  I can't explain why exactly...
> but please see attached.  The fix seems fairly straightforward.  Do
> you think I should submit an independent patch to fix this case (well
> there are really two cases, since there is a separate multixact path)
> for the existing NOWAIT support and then tackle the SKIP LOCKED
> equivalent separately?

Oops, that isolation wasn't right, please disregard.  Unless you think
this obscure case needs to be addressed before the SKIP LOCKED patch
can be committed, I will investigate it separately and maybe submit
something to a future commitfest.

Best regards,
Thomas Munro



Re: SKIP LOCKED DATA (work in progress)

From
Alvaro Herrera
Date:
Thomas Munro wrote:

> @@ -2022,7 +2030,7 @@ EvalPlanQualFetch(EState *estate, Relation relation, int lockmode, bool noWait,
>               */
>              test = heap_lock_tuple(relation, &tuple,
>                                     estate->es_output_cid,
> -                                   lockmode, noWait,
> +                                   lockmode, wait_policy,
>                                     false, &buffer, &hufd);
>              /* We now have two pins on the buffer, get rid of one */
>              ReleaseBuffer(buffer);

Doesn't this heap_lock_tuple() need to check for a WouldBlock result?
Not sure that this is the same case that you were trying to test in
heap_lock_updated_tuple; I think this requires an update chain (so that
EPQFetch is invoked) and some tuple later in the chain is locked by a
third transaction.

I attach some additional minor suggestions to your patch.  Please feel
free to reword comments differently if you think my wording isn't an
improvements (or I've maked an english mistakes).

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: SKIP LOCKED DATA (work in progress)

From
Robert Haas
Date:
On Wed, Sep 10, 2014 at 9:47 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> I attach some additional minor suggestions to your patch.

I don't see any attachment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: SKIP LOCKED DATA (work in progress)

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Wed, Sep 10, 2014 at 9:47 AM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> > I attach some additional minor suggestions to your patch.
>
> I don't see any attachment.

Uh.  Here it is.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
On 10 September 2014 14:47, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Thomas Munro wrote:
>
>> @@ -2022,7 +2030,7 @@ EvalPlanQualFetch(EState *estate, Relation relation, int lockmode, bool noWait,
>>                        */
>>                       test = heap_lock_tuple(relation, &tuple,
>>                                                                  estate->es_output_cid,
>> -                                                                lockmode, noWait,
>> +                                                                lockmode, wait_policy,
>>                                                                  false, &buffer, &hufd);
>>                       /* We now have two pins on the buffer, get rid of one */
>>                       ReleaseBuffer(buffer);
>
> Doesn't this heap_lock_tuple() need to check for a WouldBlock result?
> Not sure that this is the same case that you were trying to test in
> heap_lock_updated_tuple; I think this requires an update chain (so that
> EPQFetch is invoked) and some tuple later in the chain is locked by a
> third transaction.

You're right, that case was clearly lacking.  The new patch, attached,
releases the buffer and returns NULL in that case.  But I have no idea
how to reach it in an isolation test!  skip-locked-4.spec gets pretty
close, it creates the right kind of update chain to get into
EvalPlanQualFetch and then enter the conditional block beginning:

            if (TransactionIdIsValid(SnapshotDirty.xmax))

But to reach the case you mentioned, it would need to get past that
(xmax is not a valid transaction) but then the tuple would need to be
locked by another session before heap_lock_tuple is called a few lines
below.  That's a race scenario that I don't believe we can create
using advisory lock tricks in an isolation test.

> I attach some additional minor suggestions to your patch.  Please feel
> free to reword comments differently if you think my wording isn't an
> improvements (or I've maked an english mistakes).

Thanks, these are incorporated in the new version (also rebased).

Best regards,
Thomas Munro

Attachment

Re: SKIP LOCKED DATA (work in progress)

From
Alvaro Herrera
Date:
Thomas Munro wrote:

> > Doesn't this heap_lock_tuple() need to check for a WouldBlock result?
> > Not sure that this is the same case that you were trying to test in
> > heap_lock_updated_tuple; I think this requires an update chain (so that
> > EPQFetch is invoked) and some tuple later in the chain is locked by a
> > third transaction.
> 
> You're right, that case was clearly lacking.  The new patch, attached,
> releases the buffer and returns NULL in that case.  But I have no idea
> how to reach it in an isolation test!  skip-locked-4.spec gets pretty
> close, it creates the right kind of update chain to get into
> EvalPlanQualFetch and then enter the conditional block beginning:
> 
>             if (TransactionIdIsValid(SnapshotDirty.xmax))
> 
> But to reach the case you mentioned, it would need to get past that
> (xmax is not a valid transaction) but then the tuple would need to be
> locked by another session before heap_lock_tuple is called a few lines
> below.  That's a race scenario that I don't believe we can create
> using advisory lock tricks in an isolation test.

Hm, are you able to reproduce it using GDB?

Craig Ringer was saying elsewhere that there are other cases that are
impossible to test reliably and was proposing addings hooks or
something to block backends at convenient times.  Not an easy problem ...

> > I attach some additional minor suggestions to your patch.  Please feel
> > free to reword comments differently if you think my wording isn't an
> > improvements (or I've maked an english mistakes).
> 
> Thanks, these are incorporated in the new version (also rebased).

Great, thanks; I'll look at it again soon to commit, as I think we're
done now.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: SKIP LOCKED DATA (work in progress)

From
Thomas Munro
Date:
On 12 September 2014 03:56, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Thomas Munro wrote:
>> But to reach the case you mentioned, it would need to get past that
>> (xmax is not a valid transaction) but then the tuple would need to be
>> locked by another session before heap_lock_tuple is called a few lines
>> below.  That's a race scenario that I don't believe we can create
>> using advisory lock tricks in an isolation test.
>
> Hm, are you able to reproduce it using GDB?
>
> Craig Ringer was saying elsewhere that there are other cases that are
> impossible to test reliably and was proposing addings hooks or
> something to block backends at convenient times.  Not an easy problem ...

+1, I think that is a great idea.

FWIW here's some throwaway code that I used to do that:

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 79667f1..fbb3b55 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -54,6 +54,7 @@
 #include "storage/lmgr.h"
 #include "tcop/utility.h"
 #include "utils/acl.h"
+#include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
@@ -2029,6 +2030,20 @@ EvalPlanQualFetch(EState *estate, Relation
relation, int lockmode,
                        }

                        /*
+                        * Begin wait point debugging hack...
+                        * TODO: Only in a special build mode...
+                        * We tell anyone waiting that we have reached
wait point #42.
+                        * We wait for permission to proceed from wait
point #43.
+                        */
+                       elog(WARNING, "XXX reached point 42, waiting
at point 43");
+                       DirectFunctionCall1(pg_advisory_unlock_int8,
Int64GetDatum(42));
+                       DirectFunctionCall1(pg_advisory_lock_int8,
Int64GetDatum(43));
+                       elog(WARNING, "XXX continuing after point 43");
+                       /*
+                        * End wait point debugging hack.
+                        */
+
+                       /*
                         * This is a live tuple, so now try to lock it.
                         */
                        test = heap_lock_tuple(relation, &tuple,

Using the attached isolation spec, that race case is reached.  Yeah,
it's crude and confusing having those three advisory locks (one to
allow an update chain to be created after s1 takes a snapshot, and the
other two so that s2 can block s1 at the right point to produce that
race case), but I found this less messy than trying to reproduce
complicated concurrency scenarios with GDB.

IMHO it would be great if there were a tidy and supported way to do
this kind of thing, perhaps with a formal notion of named wait points
which are only compiled in in special test builds, and an optional set
of extra isolation specs that use them.

>> > I attach some additional minor suggestions to your patch.  Please feel
>> > free to reword comments differently if you think my wording isn't an
>> > improvements (or I've maked an english mistakes).
>>
>> Thanks, these are incorporated in the new version (also rebased).
>
> Great, thanks; I'll look at it again soon to commit, as I think we're
> done now.

Thanks!

Thomas Munro

Attachment

Re: SKIP LOCKED DATA (work in progress)

From
Alvaro Herrera
Date:
Thomas Munro wrote:

> > I attach some additional minor suggestions to your patch.  Please feel
> > free to reword comments differently if you think my wording isn't an
> > improvements (or I've maked an english mistakes).
> 
> Thanks, these are incorporated in the new version (also rebased).

Pushed, thanks.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services