Re: PageIsAllVisible()'s trustworthiness in Hot Standby - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: PageIsAllVisible()'s trustworthiness in Hot Standby |
Date | |
Msg-id | CA+TgmoY=n70HT4SgxZjj-YCr8NpR4pSXzfQ5dUD8m8rXC629Mg@mail.gmail.com Whole thread Raw |
In response to | PageIsAllVisible()'s trustworthiness in Hot Standby (Pavan Deolasee <pavan.deolasee@gmail.com>) |
Responses |
Re: PageIsAllVisible()'s trustworthiness in Hot Standby
|
List | pgsql-hackers |
On Tue, Dec 4, 2012 at 8:08 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > > I was looking at the following code in heapam.c: > > 261 /* > 262 * If the all-visible flag indicates that all tuples on the page > are > 263 * visible to everyone, we can skip the per-tuple visibility tests. > But > 264 * not in hot standby mode. A tuple that's already visible to all > 265 * transactions in the master might still be invisible to a > read-only > 266 * transaction in the standby. > 267 */ > 268 all_visible = PageIsAllVisible(dp) && > !snapshot->takenDuringRecovery; > > Isn't the check for !snapshot->takenDuringRecovery redundant now in master > or whenever since we added crash-safety for VM ? In fact, this comment made > me think if we are really handling index-only scans correctly or not on the > Hot Standby. But apparently we are by forcing conflicting transactions to > abort before redoing VM bit set operation on the standby. The same mechanism > should protect us against the above case. Now I concede that the entire > magic around setting and clearing the page level all-visible bit and the VM > bit and our ability to keep them in sync is something I don't fully > understand, but I see that every operation that sets the page level > PD_ALL_VISIBLE flag also sets the VM bit while holding the buffer lock and > emits a WAL record. So AFAICS the conflict resolution logic will take care > of the above too. I wasn't sure whether that could be safely changed. There's a subtle distinction here: the PD_ALL_VISIBLE bit isn't the same as the visibility map bit. And, technically, the WAL record only fully protects the setting of *the visibility map bit* not the PD_ALL_VISIBLE page-level bit. The purpose of that WAL logging is to make sure that the page-level bit is never clear while the visibility-map bit is set; it does not guarantee that the page-level bit can never be set without issuing a WAL record. So, for example, it's perfectly possible for a crash on the master might leave the page-level bit set while the VM bit is clear. Now, if that page somehow makes its way to the standby - via a base backup or a full-page image - before the tuples it contains are all-visible according to the standby's xmin horizon, we've got a problem. Can that happen? It seems unlikely, but can we prove it's not possible? Perhaps, but I wasn't sure. Index-only scans are safe, because they're looking at the visibility map itself, not the page-level bit, but the analysis is a little murkier for sequential scans. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: