Re: Open issues for HOT patch - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Open issues for HOT patch
Date
Msg-id 1190190339.4164.25.camel@ebony.site
Whole thread Raw
In response to Re: Open issues for HOT patch  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Open issues for HOT patch
List pgsql-hackers
On Tue, 2007-09-18 at 12:10 -0400, Tom Lane wrote:
> I wrote:
> > * The patch makes undocumented changes that cause autovacuum's decisions
> > to be driven by total estimated dead space rather than total number of
> > dead tuples.  Do we like this?
> 
> No one seems to have picked up on this point, but after reflection
> I think there's actually a pretty big problem here.  Per-page pruning
> is perfectly capable of keeping dead space in check.  In a system with
> HOT running well, the reasons to vacuum a table will be:
> 
> 1. Remove dead index entries.
> 2. Remove LP_DEAD line pointers.
> 3. Truncate off no-longer-used end pages.
> 4. Transfer knowledge about free space into FSM.
> 
> Pruning cannot accomplish #1, #2, or #3, and without significant changes
> in the FSM infrastructure it has no hope about #4 either.  What I'm
> afraid of is that steady page-level pruning will keep the amount of dead
> space low, causing autovacuum never to fire, causing the indexes to
> bloat indefinitely because of #1 and the table itself to bloat
> indefinitely because of #2 and #4.  Thus, the proposed change in
> autovacuum seems badly misguided: instead of making autovacuum trigger
> on things that only it can fix, it makes autovacuum trigger on something
> that per-page pruning can deal with perfectly well.
> 
> I'm inclined to think that we should continue to drive autovac off a
> count of dead rows, as this is directly related to points #1 and #2,
> and doesn't seem any worse for #3 and #4 than an estimate based on space
> would be.  Possibly it would be sensible for per-page pruning to report
> a reduction in number of dead rows when it removes heap-only tuples,
> but I'm not entirely sure --- any thoughts?

Some behavioural comments only: I was part of the earlier discussion
about when-to-VACUUM and don't have any fixed view of how to do this.

If HOT is running well, then there will be less need for #1, #3 and #4,
as I understand it. Deletes will still cause the need for #1, #3, #4 as
well as dead-space removal. Many tables have only Inserts and Deletes,
so we need to take that into account.

On large tables, VACUUM hurts very badly, so I would like to see it run
significantly less often.

In your last post you mentioned multiple UPDATEs. Pruning multiple times
for successive UPDATEs isn't going to release more space, so why do it?

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Dynamically adding index types (was GIT indexes)
Next
From: Heikki Linnakangas
Date:
Subject: Re: Open issues for HOT patch