Re: heapgetpage() and ->takenDuringRecovery - Mailing list pgsql-hackers

From Andres Freund
Subject Re: heapgetpage() and ->takenDuringRecovery
Date
Msg-id 20140303120729.GC23352@awork2.anarazel.de
Whole thread Raw
In response to Re: heapgetpage() and ->takenDuringRecovery  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: heapgetpage() and ->takenDuringRecovery  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 2014-03-03 06:57:00 -0500, Robert Haas wrote:
> On Sun, Mar 2, 2014 at 8:39 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > While reading around which references to SnapshotData's members exist, I
> > once more came about the following tidbit in heapgetpage():
> >         /*
> >          * If the all-visible flag indicates that all tuples on the page are
> >          * visible to everyone, we can skip the per-tuple visibility tests.
> >          *
> >          * Note: In hot standby, a tuple that's already visible to all
> >          * transactions in the master might still be invisible to a read-only
> >          * transaction in the standby. We partly handle this problem by tracking
> >          * the minimum xmin of visible tuples as the cut-off XID while marking a
> >          * page all-visible on master and WAL log that along with the visibility
> >          * map SET operation. In hot standby, we wait for (or abort) all
> >          * transactions that can potentially may not see one or more tuples on the
> >          * page. That's how index-only scans work fine in hot standby. A crucial
> >          * difference between index-only scans and heap scans is that the
> >          * index-only scan completely relies on the visibility map where as heap
> >          * scan looks at the page-level PD_ALL_VISIBLE flag. We are not sure if
> >          * the page-level flag can be trusted in the same way, because it might
> >          * get propagated somehow without being explicitly WAL-logged, e.g. via a
> >          * full page write. Until we can prove that beyond doubt, let's check each
> >          * tuple for visibility the hard way.
> >          */
> >         all_visible = PageIsAllVisible(dp) && !snapshot->takenDuringRecovery;
> >
> > I don't think this is neccessary >= 9.2. The are two only "interestings" place
> > where PD_ALL_VISIBLE is set:
> > a) lazy_vacuum_page() where a xl_heap_clean is logged *before*
> >    PD_ALL_VISIBLE/the vm is touched and that causes recovery
> >    conflicts. The heap page is locked for cleanup at that point. As the
> >    logging of xl_heap_clean sets the page's LSN there's no way the page
> >    can appear on the standby too early.
> > b) empty pages in lazy_scan_heap(). If they always were empty, there's
> >    no need for conflicts. The only other way I can see to end up there
> >    is a previous heap_page_prune() that repaired fragmentation. But that
> >    logs a WAL record with conflict information.
> 
> I don't think there's any reason to believe that lazy_scan_heap() can
> only hit pages that are empty or have just been defragged.  Suppose
> that there's a tuple on the page which was recently inserted; the
> inserting transaction has committed but there are some backends that
> still have older snapshots.  The page won't be marked all-visible
> because it isn't.  Now, eventually those older snapshots will go away,
> and sometime after that the relation will get vacuumed again, and
> we'll once again look the page.  But this time we notice that it is
> all-visible, and mark it so.

Hm, right, that can happen. How about adding a LSN interlock in
visibilitymap_set() for those cases as well, not just for checksums?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Defining macro LSNOID for pg_lsn in pg_type.h
Next
From: Robert Haas
Date:
Subject: Re: Dump pageinspect to 1.2: page_header with pg_lsn datatype