Re: PageIsAllVisible()'s trustworthiness in Hot Standby - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: PageIsAllVisible()'s trustworthiness in Hot Standby
Date
Msg-id CABOikdNbwdJcgd70VJC6oKwQ0T-foRPe0+R0fmd2P5mOHMHtig@mail.gmail.com
Whole thread Raw
In response to Re: PageIsAllVisible()'s trustworthiness in Hot Standby  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: PageIsAllVisible()'s trustworthiness in Hot Standby
List pgsql-hackers


On Tue, Dec 4, 2012 at 8:03 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Dec 4, 2012 at 9:00 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> Youre right, it currently seems to be possible, there's no LSN interlock
> prohibiting this as far as I can see.

Yeah, there certainly isn't that.  Now you could perhaps make an
argument that no operation that can propagate a set bit from master to
standby can arrive until after the standby's xmin horizon is
sufficiently current, but that does feel a little fragile to me...
even if it's true now, new WAL record types might break it, for
example.


Hmm. Yeah, I do not have guts to prove that either. I'll probably write up a comment for your consideration to explain why we don't trust PD_ALL_VISIBLE in Hot standby for seq scans, but still trust VM for index-only scans.

Another piece of code that caught my attention is this (sorry for digging up topics that probably have been discussed to death before and I might not have paid attention to then):

Whenever we clear PD_ALL_VISIBLE flag, we do that while holding the heap page lock and also WAL log that action (in fact, piggy-back with insert/update/delete). The only exception that I saw is in lazy_scan_heap()

 916         else if (PageIsAllVisible(page) && has_dead_tuples)
 917         {
 918             elog(WARNING, "page containing dead tuples is marked as all-visible in relation \"%s\" page %u",
 919                  relname, blkno);
 920             PageClearAllVisible(page);
 921             MarkBufferDirty(buf);
 922             visibilitymap_clear(onerel, blkno, vmbuffer);
 923         }
 
Obviously, this is not a case that we expect to occur. But if it does occur for whatever reason, then even though we will correct the page flag on master, this would never be populated to the standby. So the standby may continue to have that tiny bit set on the page, at least until another insert/update/delete happens on the page. Not sure if there is anything to worry about, but looks a bit strange. I looked at the archive but can't see any report of the warning, so may be we are safe after all.

Another side issue: The way we set visibility map bits in the VACUUM, we do that in the first phase of the vacuum. Now the very fact that we have selected the page for vacuuming, it will have one or more dead tuples. As soon as we find a dead tuple in the page, we decide not to mark the page all-visible and rightly so. But this page would not get marked all-visible even if we remove all dead tuples in the second phase. Why don't we push that work to the second phase or at least retry that again ? Of course, we can't just do it that way without scanning the page again because new tuples may get added or updated/deleted in the meanwhile. But that can be addressed with some tricks - like tracking LSN of every such (which has only dead and live tuples, but not recently dead or insert-in-progress or delete-in-progress) and then comparing that LSN with the page LSN during the second phase. If an update had happened in between, we will know that and we won't deal with the visibility bits.

Another idea would be to track this intermediate state in the page header itself, something like PD_ALL_VISIBLE_OR_DEAD. If an intermediate update/delete/insert comes in, it will clear the bit. If we see the bit set during the second phase of the vacuum, we will know that it contains only dead and live tuples and the dead ones are being removed now. So we can mark the page PD_ALL_VISIBLE.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: autovacuum truncate exclusive lock round two
Next
From: Bruce Momjian
Date:
Subject: Re: [PATCH] Patch to fix libecpg.so for isinf missing