Re: "page is not marked all-visible" warning in regression tests - Mailing list pgsql-hackers

From Robert Haas
Subject Re: "page is not marked all-visible" warning in regression tests
Date
Msg-id CA+Tgmoaa+pS_6kr80PaBnKLxbMprU7y_94VWV-EU01=NBxNz6Q@mail.gmail.com
Whole thread Raw
In response to Re: "page is not marked all-visible" warning in regression tests  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: "page is not marked all-visible" warning in regression tests
List pgsql-hackers
On Wed, Jun 6, 2012 at 1:46 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On a cursory lock it might just be a race condition in
> vacuumlazy.c:lazy_scan_heap. If scan_all is set, which it has to be for the
> warning to be visible, all_visible_according_to_vm is determined before we
> loop over all blocks. At the point where one specific heap block is actually
> read and locked that knowledge might be completely outdated by any concurrent
> backend. Am I missing something?

No, I think you're right.  I think that warning is bogus.  I added it
in place of some older warning which no longer made sense, but I think
this one doesn't make sense either.

> I have to say the whole visibilitymap correctness and crash-safety seems to be
> quite under documented, especially as it seems to be somewhat intricate (to
> me). E.g. not having any note why visibilitymap_test doesn't need locking. (I
> guess the theory is that a 1 byte read will always be consistent. But how does
> that ensure other backends see an up2date value?).

It's definitely intricate, and it's very possible that we should have
some more documentation.  I am not sure exactly what and where, but
feel free to suggest something.

visibilitymap_test() does have a comment saying that:
       /*        * We don't need to lock the page, as we're only looking at a
single bit.        */

But that's a bit unsatisfying, because, as you say, it doesn't address
the question of memory-ordering issues.  I think that there's no
situation in which it causes a problem to see the visibility map bit
as unset when in reality it has just recently been set by some other
back-end.  It would be bad if someone did something like:

if (visibilitymap_test(...))   visibilitymap_clear();

...because then memory-ordering issues could cause us to accidentally
fail to clear the bit.   No one should be doing that, though; the
relevant locking and conditional logic is built directly into
visibilitymap_clear().

On the flip side, if someone sees the visibility map bit as set when
it's actually just been cleared, that could cause a problem - most
seriously, index-only scans could return wrong answers.  For that to
happen, someone would have to insert a heap tuple onto a previously
all-visible page, clearing the visibility map bit, and then insert an
index tuple; concurrently, some other backend would need to see the
index tuple but not the fact that the visibility map bit had been
cleared.  I don't think that can happen: after inserting the heap
tuple, the inserting backend would release buffer content lock, which
acts as a full memory barrier; before reading the index tuple, the
index-only-scanning backend would have to take the content lock on the
index buffer, which also acts as a full memory barrier.  So the
inserter can't do the writes out of order, and the index-only-scanner
can't do the reads out of order, so I think it's safe.... but we
probably do need to explain that somewhere.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: incorrect handling of the timeout in pg_receivexlog
Next
From: Fujii Masao
Date:
Subject: Re: pg_receivexlog and feedback message