Re: why is gist index taking so much space on the disc - Mailing list pgsql-hackers

From Martijn van Oosterhout
Subject Re: why is gist index taking so much space on the disc
Date
Msg-id 20051121194527.GG16764@svana.org
Whole thread Raw
In response to Re: why is gist index taking so much space on the disc  (Grzegorz Jaskiewicz <gj@pointblue.com.pl>)
Responses Re: why is gist index taking so much space on the disc
List pgsql-hackers
On Mon, Nov 21, 2005 at 08:14:44PM +0100, Grzegorz Jaskiewicz wrote:
> >You mean you sometimes put the same elements in the two halves? You
> >shouldn't do that. The whole point is that the search will descend any
> >node that matches consistant, but any single key should only appear
> >once in each index.
> >
> >picksplit should *split* the set, not return two sets about the same
> >size as you started...
>
> Nope, I mean that 'masks' created to match either 'half' sometimes
> match elements in the other one.
> This shouldn't be a big deal, just one level to go down on query to
> much more specific result set.
> I have fixed that with, somewhat hack.

It's not a hack, that's how it's supposed to work. An entry should only
appear once in the index, but it could appear in multiple places. Like
you say, some entries can go into either half.

B-Trees are the rather special case that you can always split a set of
values into two non-overlapping sets. With geometric types (like your
bitmasks) you can't avoid overlap sometimes so you have to follow
multiple branches to check if an element is there or not.

Your pseudo code is good and will work fine. However, ideally you want
to divide the overlap in such a way that later splits work better.
Maybe by trying to decide which mask is "closer".

The better the splitting the more efficient your tree will become.
Ofcourse, perfect splitting may be expensive but then it depends on how
many inserts vs how many selects. If you do a lot of searches it may be
worth the time.

BTW, I glad you're making progress and hopefully you might be able to
publish some code. PostgreSQL could do with some example GiST indexes
on bitmaps.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: PostgreSQL 8.1.0 catalog corruption
Next
From: Tino Wildenhain
Date:
Subject: Re: plpython and bytea