Re: heapgetpage() and ->takenDuringRecovery - Mailing list pgsql-hackers

From Robert Haas
Subject Re: heapgetpage() and ->takenDuringRecovery
Date
Msg-id CA+TgmoYKmt_+MLRu=emWR-z-07hENZnKsB1-GyixMwxUN3bmDw@mail.gmail.com
Whole thread Raw
In response to Re: heapgetpage() and ->takenDuringRecovery  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Mon, Mar 3, 2014 at 8:33 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-03-03 06:57:00 -0500, Robert Haas wrote:
>> On Sun, Mar 2, 2014 at 8:39 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > I don't think this is neccessary >= 9.2. The are two only "interestings" place
>> > where PD_ALL_VISIBLE is set:
>> > a) lazy_vacuum_page() where a xl_heap_clean is logged *before*
>> >    PD_ALL_VISIBLE/the vm is touched and that causes recovery
>> >    conflicts. The heap page is locked for cleanup at that point. As the
>> >    logging of xl_heap_clean sets the page's LSN there's no way the page
>> >    can appear on the standby too early.
>> > b) empty pages in lazy_scan_heap(). If they always were empty, there's
>> >    no need for conflicts. The only other way I can see to end up there
>> >    is a previous heap_page_prune() that repaired fragmentation. But that
>> >    logs a WAL record with conflict information.
>>
>> I don't think there's any reason to believe that lazy_scan_heap() can
>> only hit pages that are empty or have just been defragged.  Suppose
>> that there's a tuple on the page which was recently inserted; the
>> inserting transaction has committed but there are some backends that
>> still have older snapshots.  The page won't be marked all-visible
>> because it isn't.  Now, eventually those older snapshots will go away,
>> and sometime after that the relation will get vacuumed again, and
>> we'll once again look the page.  But this time we notice that it is
>> all-visible, and mark it so.
>
> Right now I am missing how this isn't an actual correctness problem
> after a crash. Without an LSN interlock we could crash *after* the heap
> page has been written out, but *before* the vm WAL record has been
> flushed to disk.

Yes.  In that case, the PD_ALL_VISIBLE bit will be set on the page,
but the corresponding visibility map bit will be unset.

> Combined with synchronous_commit=off there could be
> transactions that appeared as safely committed for vacuum (i.e. are
> below GetOldestXmin()), but which are actually aborted after the
> commit.
> Normal hint bits circumvent that by checking XLogNeedsFlush(commitLSN),
> but that doesn't work here.

Well, we'd better not try to mark a page all-visible if it's only
all-visible on the assumption that unwritten xlog will be successfully
flushed to disk.  But lazy_scan_heap() has code that only regards the
tuple as all-visible once the tuple is hinted committed, and there's
code elsewhere to keep hint bits from being set too early.  And
heap_page_is_all_visible() follows the same pattern.  So I think it's
OK, but maybe you see something I don't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: ALTER TABLE lock strength reduction patch is unsafe
Next
From: Robert Haas
Date:
Subject: Re: Performance Improvement by reducing WAL for Update Operation