Re: Proposal: Another attempt at vacuum improvements - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Proposal: Another attempt at vacuum improvements
Date
Msg-id BANLkTi=Arq+vFwmFO9v7JOdcgFdJYi0UeQ@mail.gmail.com
Whole thread Raw
In response to Re: Proposal: Another attempt at vacuum improvements  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Wed, May 25, 2011 at 1:27 PM, Robert Haas <robertmhaas@gmail.com> wrote:

>> At the moment we scan indexes if we have > 0 rows to remove, which is
>> probably wasteful. Perhaps it would be better to keep a running total
>> of rows to remove, by updating pg_stats, then when we hit a certain
>> threshold in total we can do the index scan. So we don't need to
>> remember the TIDs, just remember how many there were and use that to
>> avoid cleaning too vigorously.
>
> That occurred to me, too.  If we're being launched by autovacuum then
> we know that a number of updates and deletes equal ~20% (or whatever
> autovacuum_vacuum_scale_factor is set to) of the table size have
> occurred since the last autovacuum.  But it's possible that many of
> those were HOT updates, in which case the number of index entries to
> be cleaned up might be much less than 20% of the table size.
> Alternatively, it's possible that we'd be better off vacuuming the
> table more often (say, autovacuum_vacuum_scale_factor=0.10 or 0.08 or
> something) but only doing the index scans every once in a while when
> enough dead line pointers have accumulated.  After all, it's the first
> heap pass that frees up most of the space; cleaning dead line pointers
> seems a bit less urgent.  But not having done any real analysis of how
> this would work out in practice, I'm not sure whether it's a good idea
> or not.

We know whether a TID was once in the index or not, so we can keep an
exact count. HOT doesn't come into it.

We can remove TIDs from index as well without VACUUM during btree
split avoidance. We can optimise the second scan by skipping htids no
longer present in the index, though we'd need a spare bit to mark
usage that which I'm not sure we have.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: tackling full page writes
Next
From: Leonardo Francalanci
Date:
Subject: Re: use less space in xl_xact_commit patch