Thread: buffer README is out of date

buffer README is out of date

From

Jeff Janes

Date:

29 August 2015, 21:45:23

The buffer/README section on buffer clean up locks never got updated for the creation of Heap Only Tuples and their associated compaction logic.

I've attached a patch to change the explanation. I'm sure someone can word it better than I have.

Cheers,

Jeff

Re: buffer README is out of date

From

Jeff Janes

Date:

29 August 2015, 22:21:37

On Sat, Aug 29, 2015 at 11:45 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

The buffer/README section on buffer clean up locks never got updated for the creation of Heap Only Tuples and their associated compaction logic.

I've attached a patch to change the explanation. I'm sure someone can word it better than I have.

...particularly if I won't let you see my wording.

Actually attached this time.

Attachment

buffer_readme.patch

Re: buffer README is out of date

From

Jim Nasby

Date:

29 August 2015, 23:27:30

On 8/29/15 2:21 PM, Jeff Janes wrote:
>     The buffer/README section on buffer clean up locks never got updated
>     for the creation of Heap Only Tuples and their associated compaction
>     logic.
>
>     I've attached a patch to change the explanation.  I'm sure someone
>     can word it better than I have.

> ! Obtaining the necessary lock is done by the bufmgr routines
> ! LockBufferForCleanup() or ConditionalLockBufferForCleanup().
> ! They first get an exclusive lock and then check to see if the shared pin
> ! count is currently 1.  If not, ConditionalLockBufferForCleanup() releases
> ! the exclusive lock and then returns false, while LockBufferForCleanup()
> ! releases the exclusive lock (but not the caller's pin) and waits until
> ! signaled by another backend, whereupon it tries again.  The signal will
> ! occur when UnpinBuffer decrements the shared pin count to 1.  As

I don't think that's true. If 2 other backends have a pin then AFAIK 
you'd wake up twice. There's also this comment in LockBufferForCleanup:

/* * Remove flag marking us as waiter. Normally this will not be set * anymore, but ProcWaitForSignal() can return for
othersignals as * well.  We take care to only reset the flag if we're the waiter, as * theoretically another backend
couldhave started waiting. That's * impossible with the current usages due to table level locking, but * better be
safe.*/
 

> ! indicated above, this operation might have to wait a good while before
> ! it acquires lock, but that shouldn't matter much for concurrent VACUUM.
> ! The current implementation only supports a single waiter for pin-count-1
> ! on any particular shared buffer.  This is enough for VACUUM's use, since
> ! we don't allow multiple VACUUMs concurrently on a single relation anyway.
> ! Anyone wishing to obtain a clean up lock outside of a VACUUM must use
> ! the conditional variant of the function.

That last statement should possibly be worded more strongly or just 
removed. If someone thinks they want to use this mechanism for something 
other than VACUUM there's probably a lot of other things to consider 
beyond just buffer locking.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: buffer README is out of date

From

Jeff Janes

Date:

30 August 2015, 01:52:59

On Sat, Aug 29, 2015 at 1:27 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 8/29/15 2:21 PM, Jeff Janes wrote:
The buffer/README section on buffer clean up locks never got updated
for the creation of Heap Only Tuples and their associated compaction
logic.

I've attached a patch to change the explanation. I'm sure someone
can word it better than I have.

! Obtaining the necessary lock is done by the bufmgr routines
! LockBufferForCleanup() or ConditionalLockBufferForCleanup().
! They first get an exclusive lock and then check to see if the shared pin
! count is currently 1. If not, ConditionalLockBufferForCleanup() releases
! the exclusive lock and then returns false, while LockBufferForCleanup()
! releases the exclusive lock (but not the caller's pin) and waits until
! signaled by another backend, whereupon it tries again. The signal will
! occur when UnpinBuffer decrements the shared pin count to 1. As

I don't think that's true. If 2 other backends have a pin then AFAIK you'd wake up twice. There's also this comment in LockBufferForCleanup:

/*
* Remove flag marking us as waiter. Normally this will not be set
* anymore, but ProcWaitForSignal() can return for other signals as
* well. We take care to only reset the flag if we're the waiter, as
* theoretically another backend could have started waiting. That's
* impossible with the current usages due to table level locking, but
* better be safe.
*/

If 2 other backends have a pin, only the last one to drop it should do the waking. I don't see a way they could both try to do the waking, the interlock on the buffer header seems to prevent that. But if it did, that would just be another way you could have a spurious wake-up, which can already happen for other reasons. I don't think the spurious wake up needs to be documented in the higher level README file.

! indicated above, this operation might have to wait a good while before
! it acquires lock, but that shouldn't matter much for concurrent VACUUM.
! The current implementation only supports a single waiter for pin-count-1
! on any particular shared buffer. This is enough for VACUUM's use, since
! we don't allow multiple VACUUMs concurrently on a single relation anyway.
! Anyone wishing to obtain a clean up lock outside of a VACUUM must use
! the conditional variant of the function.

That last statement should possibly be worded more strongly or just removed. If someone thinks they want to use this mechanism for something other than VACUUM there's probably a lot of other things to consider beyond just buffer locking.

But the point of this is that HOT updates are **already** doing this outside of VACUUM. That is why the README is out of date.

Cheers,

Jeff

Re: buffer README is out of date

From

Jim Nasby

Date:

01 September 2015, 08:33:54

On 8/29/15 5:52 PM, Jeff Janes wrote:
>         ! Obtaining the necessary lock is done by the bufmgr routines
>         ! LockBufferForCleanup() or ConditionalLockBufferForCleanup().
>         ! They first get an exclusive lock and then check to see if the
>         shared pin
>         ! count is currently 1.  If not,
>         ConditionalLockBufferForCleanup() releases
>         ! the exclusive lock and then returns false, while
>         LockBufferForCleanup()
>         ! releases the exclusive lock (but not the caller's pin) and
>         waits until
>         ! signaled by another backend, whereupon it tries again.  The
>         signal will
>         ! occur when UnpinBuffer decrements the shared pin count to 1.  As
>
>
>     I don't think that's true. If 2 other backends have a pin then AFAIK
>     you'd wake up twice. There's also this comment in LockBufferForCleanup:
>
>     /*
>       * Remove flag marking us as waiter. Normally this will not be set
>       * anymore, but ProcWaitForSignal() can return for other signals as
>       * well.  We take care to only reset the flag if we're the waiter, as
>       * theoretically another backend could have started waiting. That's
>       * impossible with the current usages due to table level locking, but
>       * better be safe.
>       */
>
>
> If 2 other backends have a pin, only the last one to drop it should do
> the waking.  I don't see a way they could both try to do the waking, the
> interlock on the buffer header seems to prevent that.  But if it did,
> that would just be another way you could have a spurious wake-up, which
> can already happen for other reasons.  I don't think the spurious wake
> up needs to be documented in the higher level README file.

My concern is someone will read too much into "signal will occur ... 
count to 1" and think there's no other ways to be woken up. I realize 
I'm being pedantic here, but given the nasty race condition bugs we've 
had lately maybe it's warranted.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: buffer README is out of date

From

Noah Misch

Date:

02 January 2016, 05:59:06

On Tue, Sep 01, 2015 at 12:33:26AM -0500, Jim Nasby wrote:
> On 8/29/15 5:52 PM, Jeff Janes wrote:
> >If 2 other backends have a pin, only the last one to drop it should do
> >the waking.  I don't see a way they could both try to do the waking, the
> >interlock on the buffer header seems to prevent that.  But if it did,
> >that would just be another way you could have a spurious wake-up, which
> >can already happen for other reasons.  I don't think the spurious wake
> >up needs to be documented in the higher level README file.
> 
> My concern is someone will read too much into "signal will occur ... count
> to 1" and think there's no other ways to be woken up. I realize I'm being
> pedantic here, but given the nasty race condition bugs we've had lately
> maybe it's warranted.

Fair concern.  I'm not too worried in this instance, because I can't think of
a choice the caller would make differently based on such a misunderstanding.
Precise wakeup conditions are not really part of the LockBufferForCleanup()
API contract[1].  The submission seems to move this README in the right
direction, so I committed it with some changes:

1. s/clean up/cleanup/
2. remove patch-added trailing whitespace
3. re-add to the first sentence a citation of rule #5
4. say that recovery can use LockBufferForCleanup(); heap_xlog_clean() does so

[1] If there's more to improve, it might be to move finer implementation
details out of the README and into code comments.