Re: Proposal: Another attempt at vacuum improvements - Mailing list pgsql-hackers
From | Pavan Deolasee |
---|---|
Subject | Re: Proposal: Another attempt at vacuum improvements |
Date | |
Msg-id | BANLkTi=6kR01m0Oe9vFknB6M3fsDwDO6Zw@mail.gmail.com Whole thread Raw |
In response to | Re: Proposal: Another attempt at vacuum improvements (Pavan Deolasee <pavan.deolasee@gmail.com>) |
Responses |
Re: Proposal: Another attempt at vacuum improvements
|
List | pgsql-hackers |
On Thu, May 26, 2011 at 4:10 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > > So are there any other objections/suggestions ? Anyone else cares to > look at the brief design that we discussed above ? Otherwise, I would > go ahead and work on this in the coming days. Of course, I will keep > the list posted about any new issues that I see. > I went on to create a WIP patch based on our discussion. There are couple of issues that I stumbled upon while testing it. 1. The start-of-index-vacuum LSN that we want to track must be noted even before the heap scan is started. This is because we must be absolutely sure that the index vacuum removes index pointers to all dead line pointers generated by any operation with LSN less than the start-of-index-vacuum LSN. If we don't remember the LSN before heap scan starts and rather delay it until the start of the index vacuum, new dead line pointers may get generated on a page which is already scanned by the heap scan but before the start of the index scan. Since the index pointers to these new dead line pointers haven't been vacuumed, we should really not be removing them. But as a consequence of using a LSN from the start of the heap scan, at the end of vacuum, all pruned pages will have vacuum LSN greater than the index vacuum LSN that we are going to remember in the pg_class. And by our design, we can't remove dead line pointers on those pages because we don't know if the index pointers have been vacuumed or not. We might not be able to reclaim any dead line pointers, if the page is again HOT pruned before the next vacuum cycle because that will overwrite the page vacuum LSN with a newer value. I think we definitely need to track the dead line pointers that a heap scan has collected. The index pointers to them will be removed if the vacuum completes successfully. That gets us back to the original idea that we had discussed a while back about marking such dead line pointers as LP_DEAD_RECLAIMED or something like that. When vacuum runs heap scan, it would collect all dead line pointers and mark them dead-reclaimed and also store an identifier of the vacuum operation that would remove the associated index pointers. During HOT cleanup or the next vacuum, we can safely remove the LP_DEAD_RECLAIMED line pointers if we can safely check if the vacuum completed successfully or not. We don't have any free flags in ItemIdData, but we can use special lp_off to recognize a dead and dead-reclaimed line pointer. The identifier itself can either be an LSN or XID or anything else. Also, since we just need one identifier, I think this technique would work for unlogged and temp relations, with little adjustments. 2. Another issue is with analyze counting dead line pointers as dead rows. While its correct in principle because a vacuum is needed to remove these dead line pointers, the overhead of having a dead line pointer is much lesser than a dead tuple. Also, with single pass vacuum, there will be many dead line pointers waiting to be cleaned up in the next vacuum or HOT-prune. We should not really count them as dead rows because they don't require a vacuum per se and counting them as dead will force more vacuum cycles than required. If we go by the idea described above, we can definitely skip the dead-reclaimed line pointers, definitely when we know that index vacuum was completed successfully. Thoughts ? Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: