Simon Riggs wrote: > During VACUUM of btrees, we need to pin all pages, even those where tuples > are not removed, which I am calling here the "pin scan". This is especially > a problem during redo, where removing one tuple from a 100GB btree can take > a minute or more. That causes replication lags. Bad Thing.
Agreed on there being a problem here.
As a reminder, this "pin scan" is implemented in the HotStandbyActiveInReplay() block in btree_xlog_vacuum; this routine is in charge of replaying an action recorded by _bt_delitems_vacuum. That replay works by acquiring cleanup lock on all btree blocks from lastBlockVacuumed to "this one" (i.e. the one in which elements are being removed). In essence, this means pinning *all* the blocks in the index, which is bad. The new code sets lastBlockVacuumed to Invalid; a new test in the replay code makes that value not pin anything. This is supposed to optimize the case in question.
Nice summary, thanks.
One thing that this patch should update is the comment above struct xl_btree_vacuum, which currently state that EnsureBlockUnpinned() is one of the options to apply to each block, but that routine doesn't exist.
Updated
One possible naive optimization is to avoid locking pages that aren't leaf pages, but that would still require fetching the block from disk, so it wouldn't actually optimize anything other than the cleanup lock acquisition. (And probably not too useful, since leaf pages are said to be 95% - 99% of index pages anyway.)
Agreed, not worth it.
Also of interest is the comment above the call to _bt_delitems_vacuum in btvacuumpage(); that needs an update too.
Updated
It appears to me that your patch changes the call in btvacuumscan() but not the one in btvacuumpage(). I assume you should be changing both.
Yes, that was correct. Updated.
I think the new comment that talks about Toast Index should explain *why* we can skip the pinning in all cases except that one, instead of just saying we can do it.
Greatly expanded comments.
Thanks for the review.
New patch attached.
--
Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services