Re: PageIsAllVisible()'s trustworthiness in Hot Standby - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: PageIsAllVisible()'s trustworthiness in Hot Standby |
Date | |
Msg-id | 20121204140014.GA8717@alap2 Whole thread Raw |
In response to | Re: PageIsAllVisible()'s trustworthiness in Hot Standby (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: PageIsAllVisible()'s trustworthiness in Hot Standby
|
List | pgsql-hackers |
On 2012-12-04 08:38:48 -0500, Robert Haas wrote: > On Tue, Dec 4, 2012 at 8:08 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > > > > I was looking at the following code in heapam.c: > > > > 261 /* > > 262 * If the all-visible flag indicates that all tuples on the page > > are > > 263 * visible to everyone, we can skip the per-tuple visibility tests. > > But > > 264 * not in hot standby mode. A tuple that's already visible to all > > 265 * transactions in the master might still be invisible to a > > read-only > > 266 * transaction in the standby. > > 267 */ > > 268 all_visible = PageIsAllVisible(dp) && > > !snapshot->takenDuringRecovery; > > > > Isn't the check for !snapshot->takenDuringRecovery redundant now in master > > or whenever since we added crash-safety for VM ? In fact, this comment made > > me think if we are really handling index-only scans correctly or not on the > > Hot Standby. But apparently we are by forcing conflicting transactions to > > abort before redoing VM bit set operation on the standby. The same mechanism > > should protect us against the above case. Now I concede that the entire > > magic around setting and clearing the page level all-visible bit and the VM > > bit and our ability to keep them in sync is something I don't fully > > understand, but I see that every operation that sets the page level > > PD_ALL_VISIBLE flag also sets the VM bit while holding the buffer lock and > > emits a WAL record. So AFAICS the conflict resolution logic will take care > > of the above too. > > I wasn't sure whether that could be safely changed. There's a subtle > distinction here: the PD_ALL_VISIBLE bit isn't the same as the > visibility map bit. And, technically, the WAL record only fully > protects the setting of *the visibility map bit* not the > PD_ALL_VISIBLE page-level bit. The purpose of that WAL logging is to > make sure that the page-level bit is never clear while the > visibility-map bit is set; it does not guarantee that the page-level > bit can never be set without issuing a WAL record. So, for example, > it's perfectly possible for a crash on the master might leave the > page-level bit set while the VM bit is clear. Now, if that page > somehow makes its way to the standby - via a base backup or a > full-page image - before the tuples it contains are all-visible > according to the standby's xmin horizon, we've got a problem. Can > that happen? It seems unlikely, but can we prove it's not possible? > Perhaps, but I wasn't sure. Youre right, it currently seems to be possible, there's no LSN interlock prohibiting this as far as I can see. in lazy_scan_heap we do: if (!PageIsAllVisible(page)) { PageSetAllVisible(page); MarkBufferDirty(buf); visibilitymap_set(onerel, blkno, InvalidXLogRecPtr, vmbuffer, visibility_cutoff_xid); } So buf can be written out independently from the xlog record that visibilitymap_set emits. ISTM we should set the LSN of the heap page as well as the one of the visibilitymap page. Not sure about the performance impact of that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: