On Tue, 2007-09-18 at 12:10 -0400, Tom Lane wrote:
> I wrote:
> > * The patch makes undocumented changes that cause autovacuum's decisions
> > to be driven by total estimated dead space rather than total number of
> > dead tuples. Do we like this?
>
> No one seems to have picked up on this point, but after reflection
> I think there's actually a pretty big problem here. Per-page pruning
> is perfectly capable of keeping dead space in check. In a system with
> HOT running well, the reasons to vacuum a table will be:
>
> 1. Remove dead index entries.
> 2. Remove LP_DEAD line pointers.
> 3. Truncate off no-longer-used end pages.
> 4. Transfer knowledge about free space into FSM.
>
> Pruning cannot accomplish #1, #2, or #3, and without significant changes
> in the FSM infrastructure it has no hope about #4 either. What I'm
> afraid of is that steady page-level pruning will keep the amount of dead
> space low, causing autovacuum never to fire, causing the indexes to
> bloat indefinitely because of #1 and the table itself to bloat
> indefinitely because of #2 and #4. Thus, the proposed change in
> autovacuum seems badly misguided: instead of making autovacuum trigger
> on things that only it can fix, it makes autovacuum trigger on something
> that per-page pruning can deal with perfectly well.
>
> I'm inclined to think that we should continue to drive autovac off a
> count of dead rows, as this is directly related to points #1 and #2,
> and doesn't seem any worse for #3 and #4 than an estimate based on space
> would be. Possibly it would be sensible for per-page pruning to report
> a reduction in number of dead rows when it removes heap-only tuples,
> but I'm not entirely sure --- any thoughts?
Some behavioural comments only: I was part of the earlier discussion
about when-to-VACUUM and don't have any fixed view of how to do this.
If HOT is running well, then there will be less need for #1, #3 and #4,
as I understand it. Deletes will still cause the need for #1, #3, #4 as
well as dead-space removal. Many tables have only Inserts and Deletes,
so we need to take that into account.
On large tables, VACUUM hurts very badly, so I would like to see it run
significantly less often.
In your last post you mentioned multiple UPDATEs. Pruning multiple times
for successive UPDATEs isn't going to release more space, so why do it?
-- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com