Home > mailing lists

Re: GiST: PickSplit and multi-attr indexes - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: GiST: PickSplit and multi-attr indexes
Date	November 17, 2004 00:37:42
Msg-id	2761.1100641040@sss.pgh.pa.us Whole thread Raw
In response to	Re: GiST: PickSplit and multi-attr indexes (Greg Stark <gsstark@mit.edu>)
List	pgsql-hackers

Tree view

Greg Stark <gsstark@MIT.EDU> writes:
> The approach they take is to have a function which calculates an
> abstract "distance" between any two entries. There's an algorithm that
> they use to pick the split based on this distance function.

> If you abandoned "PickSplit" and instead exposed this distance
> function as the external API then the behaviour for multi-column
> indexes is clear. You calculate the distance along all the axes and
> calculate the diagonal distance.

Hmm ... the problem with that is the assumption that different opclasses
will compute similarly-scaled distances.  If opclass A generates
distances in the range (0,1e6) while B generates in the range (0,1),
combining them with Euclidean distance won't work well at all.  OTOH you
can't blindly normalize, because in some cases maybe the data is such
that a massive difference in distances is truly appropriate.

I'm also a bit leery of the assumption that every GiST application can
reduce its PickSplit logic to Euclidean distances.
        regards, tom lane

pgsql-hackers by date:

From: John Hansen
Date: 17 November 2004, 00:32:26
Subject: Unicode characters above 0x10000 #2

From: Tom Lane
Date: 17 November 2004, 01:06:31
Subject: Re: [PATCHES] plperl Safe restrictions

Re: GiST: PickSplit and multi-attr indexes - Mailing list pgsql-hackers

Previous

Next