On Mon, 2006-05-08 at 11:26 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > That wasn't the proposal. Every split would be marked and stay marked
> > until those blocks were VACUUMed. The data used to mark is readily
> > available and doesn't rely on whether or not VACUUM is running.
> > IMHO this does work.
>
> OK, I misunderstood what you had in mind, but now that I do understand
> it doesn't seem terribly efficient. What you're suggesting is that we
> take as a "vacuum group" all the pages that have been split off from a
> single original page since that page was last vacuumed, and that this
> group must be vacuumed as a whole. That works, but it seems that the
> groups would get awfully large. In particular, this substantially
> penalizes btbulkdelete in hopes of someday improving matters for what
> remains an entirely fictional partial vacuum.
OK, so we have the germ of a new mechanism - and I very much agree that
the idea of a partial vacuum is at present entirely fictional...but we
at least have a starting place.
> As it stands today,
> btbulkdelete only has to worry about page groups formed since it began
> to run, not since the last vacuum. Changing the data representation
> like this would force it to retrace much more often and over much larger
> page groups.
Yes, I saw the potential issue you mention - but for many cases the
index grows forwards and so we wouldn't care in either case. Page splits
that go to lower blockids are limited by available space, so would be
less of a problem. I'm balancing the additional cost of page splits
against the additional cost on the vacuum. I would prefer to keep
in-line ops faster and pay a little extra on the out-of-line operations,
if thats what it takes. I note your point that there is little
contention, but there is still a cost and in many cases this cost is
being paid on tables that never will be VACUUMed.
For insert-intensive apps, this adds cost, for little benefit.
For update-intensive apps, we're VACUUMing continually anyway so there's
no benefit from doing this only-during-VACUUM.
So we just optimised for slowly-but-continually churning tables (i.e.
DELETEs match INSERTs, or just UPDATEs). i.e. we just improved VACUUM
performance for those that don't need it that often. That might be the
common case, but it isn't the one thats hurting most.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com