Re: [PATCHES] Resurrecting per-page cleaner for btree - Mailing list pgsql-hackers

From Jim C. Nasby
Subject Re: [PATCHES] Resurrecting per-page cleaner for btree
Date
Msg-id 20060728171147.GP66525@pervasive.com
Whole thread Raw
In response to Re: [PATCHES] Resurrecting per-page cleaner for btree  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
On Thu, Jul 27, 2006 at 05:24:35PM -0400, Greg Stark wrote:
>
> Jim Nasby <jnasby@pervasive.com> writes:
>
> > Even if we stopped right there it would still be a huge win in many  (most?)
> > cases. How often do the indexes on a table comprise even 50%  of the table's
> > size?
>
> I would say they're usually roughly comparable actually. It depends on how
> wide your table is of course but the wider your table rows the more indexes
> you're likely to have on the table too.

I think the number of fields in a table will correlate with the number
of indexes, but I don't think the width of those fields matters.

> > Even in the  50% case, you've gone from 1.5X to .6X
>
> Sure, and a 3x speedup is nothing to sneeze at, that would be a great
> improvement to vacuum. But it's still just a linear speedup and doesn't
> address the algorithmic problem.
>
> The fundamental problem is we have a process that's O(m) where m is the total
> space taken by a table and its indexes. The actual amount of space it has to
> reclaim is n. Other than n<m there's basically no relationship between these
> figures. As long as that's the case vacuum may as well be O(n^2) or O(n!).
>
> We frequently assume -- and often it's a valid assumption -- that these
> figures are roughly proportional. Hence all the talk about databases reaching
> a "steady-state" where the amount of dead space is constantly being reclaimed
> at more or less the same speed it's being generated. But there are also plenty
> of use cases where a complete vacuum pass takes thousands of times longer than
> the i/o it took to generate those dead tuples. Currently Postgres just isn't
> that great a tool for those use cases.

This is exactly why I'm suggesting that we stop waiting for the perfect
vacuum that will only hit the exact tuples in both the heap and indexes
that it needs to and at least take the giant leap forward of only
hitting heap tuples/pages that we know are dead. Even if indexes are the
same size as the heap, you've still gained nearly a 2x speed improvement
(depending on how long you wait to vacuum).

> Unfortunately while I'm convinced of the problem I'm equally unconvinced of
> the solution. I tried to solve online index builds using retail index lookups
> in a very similar way to what's being discussed here. And ran into the same
> problems. I eventually decided that while it could be made to work that way it
> would be far too much code, far too unsafe, and far too invasive in the index
> access methods to be the right approach.
>
> Our existing method works with minimal help from the index access methods
> which allows for an enormous degree of freedom in the index design.. To be
> able to support retail vacuum you would have to force index method
> implementors to keep information in a way that allowed them to look up a
> particular value/tid efficiently which would limit the kinds of indexes you
> could support drastically.
>
> --
> greg
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Role incompatibilities
Next
From: "Jim C. Nasby"
Date:
Subject: Re: Hash indexes (was: On-disk bitmap index patch)