Home > mailing lists

Re: heapgetpage() and ->takenDuringRecovery - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: heapgetpage() and ->takenDuringRecovery
Date	March 3, 2014 13:33:43
Msg-id	20140303133338.GD23352@awork2.anarazel.de Whole thread Raw
In response to	Re: heapgetpage() and ->takenDuringRecovery (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: heapgetpage() and ->takenDuringRecovery
List	pgsql-hackers

Tree view

On 2014-03-03 06:57:00 -0500, Robert Haas wrote:
> On Sun, Mar 2, 2014 at 8:39 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > I don't think this is neccessary >= 9.2. The are two only "interestings" place
> > where PD_ALL_VISIBLE is set:
> > a) lazy_vacuum_page() where a xl_heap_clean is logged *before*
> >    PD_ALL_VISIBLE/the vm is touched and that causes recovery
> >    conflicts. The heap page is locked for cleanup at that point. As the
> >    logging of xl_heap_clean sets the page's LSN there's no way the page
> >    can appear on the standby too early.
> > b) empty pages in lazy_scan_heap(). If they always were empty, there's
> >    no need for conflicts. The only other way I can see to end up there
> >    is a previous heap_page_prune() that repaired fragmentation. But that
> >    logs a WAL record with conflict information.
> 
> I don't think there's any reason to believe that lazy_scan_heap() can
> only hit pages that are empty or have just been defragged.  Suppose
> that there's a tuple on the page which was recently inserted; the
> inserting transaction has committed but there are some backends that
> still have older snapshots.  The page won't be marked all-visible
> because it isn't.  Now, eventually those older snapshots will go away,
> and sometime after that the relation will get vacuumed again, and
> we'll once again look the page.  But this time we notice that it is
> all-visible, and mark it so.

Right now I am missing how this isn't an actual correctness problem
after a crash. Without an LSN interlock we could crash *after* the heap
page has been written out, but *before* the vm WAL record has been
flushed to disk. Combined with synchronous_commit=off there could be
transactions that appeared as safely committed for vacuum (i.e. are
below GetOldestXmin()), but which are actually aborted after the
commit.
Normal hint bits circumvent that by checking XLogNeedsFlush(commitLSN),
but that doesn't work here.

Am I missing something?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

pgsql-hackers by date:

From: Andrew Dunstan
Date: 03 March 2014, 13:32:41
Subject: Re: Securing "make check" (CVE-2014-0067)

From: Kohei KaiGai
Date: 03 March 2014, 14:10:39
Subject: Re: Triggers on foreign tables

Re: heapgetpage() and ->takenDuringRecovery - Mailing list pgsql-hackers

Previous

Next