Re: Reviewing new index types (was Re: [PATCHES] Updatedbitmap indexpatch) - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Reviewing new index types (was Re: [PATCHES] Updatedbitmap indexpatch)
Date
Msg-id 1185228715.4284.433.camel@ebony.site
Whole thread Raw
In response to Re: Reviewing new index types (was Re: [PATCHES] Updated bitmap indexpatch)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Reviewing new index types (was Re: [PATCHES]Updatedbitmap indexpatch)  ("Simon Riggs" <simon@2ndquadrant.com>)
List pgsql-hackers
On Mon, 2007-07-23 at 17:19 -0400, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > ... BMI is not useful at all
> > for PKs, whilst GIT is specifically designed to handle them.
> 
> This seems a strange statement, because GIT doesn't look particularly
> efficient for unique indexes AFAICS.  In the worst case you'd have to
> look individually at each tuple on a heap page to check for uniqueness
> conflict (no binary search, because you couldn't assume they are
> ordered).

That is one of a few heuristics about the patch that need some active
discussion, so I'm glad you asked.

The main use case is nearly-unique, so for cases where we have a
Master:Detail relationship, e.g. Order:OrderItem. The Order index is a
PK, with the OrderItem index as a nearly unique key. The index is not
brilliant for the Order index, but is good for the OrderItem index.

Heikki designed the grouping so that there is a state change between
non-grouped and non-grouped (normal) index entries. By default the patch
uses a threshold of non-grouped -> grouped at N=2 index entries and then
no limit on the number of rows/block. Currently you can tune N, but we
might also envisage setting a limit on the width of the range of values
to limit the number of tids stored in a grouped index entry. That could
control the uniqueness overhead.

On an I/O bound workload the space saving on the index outweighs the CPU
loss from uniqueness checking. When I/O is not an issue then
unfortunately there is a CPU overhead.

For GIT it would appear that the summary is that it gives a slight loss
on medium sized PK indexes and an increasing win as index size
increases. We struggled to come up with ways of making it Just Work with
as few parameters as possible.

> > B-TREE INDEXES (Integers) 
> 
> > Rows/value    Best time    Size in blocks
> > 10000000    49s        21899
> > 1000000        49s        21899
> > 100000        49s        21899
> > 10000        47s        21899
> > 1000        43s        21899
> > 100        38s        21899
> > 10        38s        21899
> > 1        33s        21899
> 
> Surely the GIT code failed to kick in at all here?  That's just about
> exactly the index size I'd expect for 10 million integers with the
> existing btree code (at least when MAXALIGN=4).

That was the b-tree test, i.e. the control. The GIT patch has bitrot, so
not able to test just yet.

--  Simon Riggs EnterpriseDB  http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Reviewing new index types (was Re: [PATCHES] Updated bitmap indexpatch)
Next
From: "Mark Wong"
Date:
Subject: Re: Why so many out-of-disk-space failures on buildfarm machines?