Re: Avoiding second heap scan in VACUUM - Mailing list pgsql-hackers
From | Pavan Deolasee |
---|---|
Subject | Re: Avoiding second heap scan in VACUUM |
Date | |
Msg-id | 2e78013d0805282127g27c9e8c0re25010bcbd221753@mail.gmail.com Whole thread Raw |
In response to | Re: Avoiding second heap scan in VACUUM (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: Avoiding second heap scan in VACUUM
(Simon Riggs <simon@2ndquadrant.com>)
Re: Avoiding second heap scan in VACUUM (Simon Riggs <simon@2ndquadrant.com>) |
List | pgsql-hackers |
On Thu, May 29, 2008 at 2:02 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > > I'm not happy that the VACUUM waits. It might wait a very long time and > cause worse overall performance than the impact of the second scan. > Lets not get too paranoid about the wait. It's a minor detail in the whole theory. I would suggest that the benefit of avoiding second scan would be huge. Remember, its just not a scan, it also dirties those blocks again, forcing them write to disk. Also, if you really have a situation where vacuum needs to wait for very long, then you are already in trouble. The long running transactions would prevent vacuuming many tuples. I think we can easily tweak the "wait" so that it doesn't wait indefinitely. If the "wait" times out, vacuum can still proceed, but it can mark the DEAD line pointers as DEAD_RECLAIMED. It would then have a choice of making a second pass and reclaiming the DEAD line pointers (like it does today). > > So the idea is to have one pass per VACUUM, but make that one pass do > the first pass of *this* VACUUM and the second pass of the *last* > VACUUM. > > We mark the xid of the VACUUM in pg_class as you suggest, but we do it > after VACUUM has completed the pass. > The trick is to correctly know if the last vacuum removed the index pointers or not. There could be several ways to do that. But you need to explain in detail how it would work in cases of vacuum failures and database crash. > In single pass we mark DEAD line pointers as RECENTLY_DEAD. If the last > VACUUM xid is old enough we mark RECENTLY_DEAD as UNUSED, as well, > during this first pass. If last xid is not old enough we do second pass > to remove them. > Lets not call them RECENTLY_DEAD :-) DEAD is already stricter than that. We need something even more strong. That's why I used DEAD_RECLAIMED, to note that the line pointer is DEAD and the index pointer may have been removed as well. > That has the effect that large tables that are infrequently VACUUMed > will need only a single scan. Smaller tables that require almost > continual VACUUMing will probably do two scans, but who cares? > Yeah, I think we need to target the large table case. The second pass is obviously much more costly for large tables. I think the timed-wait answers your concern. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: