On Thu, Jun 7, 2012 at 9:41 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> Well, one, commits are irrelevant; the page ceases to be all-visible
>> as soon as the delete happens.
> Its not irrelevant for the code as it stands if non-mvcc snapshots were
> allowed. Unless I miss something, even disregarding memory ordering issues,
> there is no interlocking providing protection against the index doing the
> visibilitymap_test() and some concurrent backend doing a heap_delete + commit
> directly after that.
Hmm. Well, the backend that did the heap_delete would clear the
visibility map bit when it did the delete, before releasing the lock
on the heap buffer. So normally we'll see that buffer as
not-all-visible as soon as the delete is performed, even if no commit
has happened yet. But I guess there could be a memory-ordering issue
there for a SnapshotNow scan, because as you say there's no barrier on
the read side in that case. The delete + commit would have to happen
after we finish fetching the index TID but before we test the
visibility map bit. So I suppose that if we wanted to support
index-only scans under SnapshotNow maybe we'd want to lock and unlock
the visibility map page when testing each bit. But that almost seems
like overkill anyway: for it to matter, someone would have to be
relying on starting a scan, deleting a tuple in another transaction
while the scan was in progress, and having the scan see the delete
when it reached the deleted tuple. But of course if the scan had take
a microsecond longer to reach the deleted tuple it wouldn't have seen
it as deleted anyway, so it's a bit hard to see how any such code
could be race-free anyhow.
>> Proposed patch attached. This adds some more comments in various
>> places, and implements your suggestion of retesting the visibility-map
>> bit when we detect a possible mismatch with the page-level bit.
> Thanks, will look at it in a bit.
Thanks.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company