Re: Removing PD_ALL_VISIBLE - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Removing PD_ALL_VISIBLE
Date
Msg-id 1353967029.13162.40.camel@sussancws0025
Whole thread Raw
In response to Re: Removing PD_ALL_VISIBLE  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, 2012-11-26 at 16:10 -0500, Tom Lane wrote:
> There's still the issue of whether the IOS code is safe in machines with
> weak memory ordering.  I think that it probably is safe, but the
> argument for it in the current code comment is wrong; most likely, a
> correct argument has to depend on read/write barriers associated with
> taking snapshots.  I think what you ought to do is work through that,
> fix the existing comment, and then see whether the same argument works
> for what you want to do.

As a part of the patch, I did change the comment, and here's what I came
up with:
 * Note on Memory Ordering Effects: visibilitymap_test does not lock * the visibility map buffer, and therefore the
resultwe read here * could be slightly stale.  However, it can't be stale enough to * matter. * * We need to detect
clearinga VM bit due to an insert right away, * because the tuple is present in the index page but not visible. The *
readingof the TID by this scan (using a shared lock on the index * buffer) is serialized with the insert of the TID
intothe index * (using an exclusive lock on the index buffer). Because the VM bit is * cleared before updating the
index,and locking/unlocking of the * index page acts as a full memory barrier, we are sure to see the * cleared bit if
wesee a recently-inserted TID. * * Deletes do not update the index page (only VACUUM will clear out the * TID), so the
clearingof the VM bit by a delete is not serialized * with this test below, and we may see a value that is
significantly* stale. However, we don't care about the delete right away, because * the tuple is still visible until
thedeleting transaction commits or * the statement ends (if it's our transaction). In either case, the * lock on the VM
bufferwill have been released (acting as a write * barrier) after clearing the bit. And for us to have a snapshot that
*includes the deleting transaction (making the tuple invisible), we * must have acquired ProcArrayLock after that time,
actingas a read * barrier. * * It's worth going through this complexity to avoid needing to lock * the VM buffer, which
couldcause significant contention.
 

And I updated the comment in visibilitymap.c as well (reformatted for
this email):

"To test a bit in the visibility map, most callers should have a pin on
the VM buffer, and at least a shared lock on the data buffer. Any
process that clears the VM bit must have an exclusive lock on the data
buffer, so that will serialize access to the appropriate bit. Because
lock acquisition and release are full memory barriers, then there is no
danger of seeing the state of the bit before it was last cleared.
Callers that don't have the data buffer yet, such as an index only scan
or a VACUUM that is skipping pages, must handle the concurrency as
appropriate."

Regards,Jeff Davis




pgsql-hackers by date:

Previous
From: "Karl O. Pinc"
Date:
Subject: Re: Suggestion for --truncate-tables to pg_restore
Next
From: Bruce Momjian
Date:
Subject: Re: Further pg_upgrade analysis for many tables