On Mon, Mar 3, 2014 at 8:33 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-03-03 06:57:00 -0500, Robert Haas wrote:
>> On Sun, Mar 2, 2014 at 8:39 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > I don't think this is neccessary >= 9.2. The are two only "interestings" place
>> > where PD_ALL_VISIBLE is set:
>> > a) lazy_vacuum_page() where a xl_heap_clean is logged *before*
>> > PD_ALL_VISIBLE/the vm is touched and that causes recovery
>> > conflicts. The heap page is locked for cleanup at that point. As the
>> > logging of xl_heap_clean sets the page's LSN there's no way the page
>> > can appear on the standby too early.
>> > b) empty pages in lazy_scan_heap(). If they always were empty, there's
>> > no need for conflicts. The only other way I can see to end up there
>> > is a previous heap_page_prune() that repaired fragmentation. But that
>> > logs a WAL record with conflict information.
>>
>> I don't think there's any reason to believe that lazy_scan_heap() can
>> only hit pages that are empty or have just been defragged. Suppose
>> that there's a tuple on the page which was recently inserted; the
>> inserting transaction has committed but there are some backends that
>> still have older snapshots. The page won't be marked all-visible
>> because it isn't. Now, eventually those older snapshots will go away,
>> and sometime after that the relation will get vacuumed again, and
>> we'll once again look the page. But this time we notice that it is
>> all-visible, and mark it so.
>
> Right now I am missing how this isn't an actual correctness problem
> after a crash. Without an LSN interlock we could crash *after* the heap
> page has been written out, but *before* the vm WAL record has been
> flushed to disk.
Yes. In that case, the PD_ALL_VISIBLE bit will be set on the page,
but the corresponding visibility map bit will be unset.
> Combined with synchronous_commit=off there could be
> transactions that appeared as safely committed for vacuum (i.e. are
> below GetOldestXmin()), but which are actually aborted after the
> commit.
> Normal hint bits circumvent that by checking XLogNeedsFlush(commitLSN),
> but that doesn't work here.
Well, we'd better not try to mark a page all-visible if it's only
all-visible on the assumption that unwritten xlog will be successfully
flushed to disk. But lazy_scan_heap() has code that only regards the
tuple as all-visible once the tuple is hinted committed, and there's
code elsewhere to keep hint bits from being set too early. And
heap_page_is_all_visible() follows the same pattern. So I think it's
OK, but maybe you see something I don't.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company