Re: knngist - 0.8 - Mailing list pgsql-hackers

From Martijn van Oosterhout
Subject Re: knngist - 0.8
Date
Msg-id 20101228082252.GA3085@svana.org
Whole thread Raw
In response to Re: knngist - 0.8  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: knngist - 0.8
List pgsql-hackers
On Sun, Dec 26, 2010 at 08:13:40PM -0500, Tom Lane wrote:
> [ thinks for a bit... ]  One reason for having a different structure
> would be if we needed to represent abstract semantics for some operators
> that couldn't be associated with a btree opclass.  This is clearly not
> an issue for what RANGE needs, since anything you can order by will
> surely have a btree opclass for that, and in fact we probably need to
> tie those operators to specific orderings if a datatype has more than
> one sort ordering.  But maybe there could be some other situation where
> we'd need to describe operator behavior for a datatype independently of
> any sort ordering.  Can anyone come up with a plausible example?

One thing that comes to mind is the operators used for hash indexes,
namely the hash() function. It is closely related to the collation but
isn't actually used for sorting. For every btree class you can make a
hash class with the same equality operator and an appropriate hash
function.

I've had the idea of defining a parent object and deriving the btree
and hash operator classes from that, but it gets messy once you get
into cross-type operators (i.e. operator families).

With respect to the collation of strings I have thought it useful to be
able to define a sortkey() function, which would map the input space to
a 8 byte integer and satisfies the rule:

sortkey(a) < sortkey(b)  implies a < b

The idea being that you can use this as an efficient first step to
speed up sorting strings, since adding it to the sort list implicitly
before the actual column doesn't change the result. In the case of
strings the sortkey() could be generated with strxfrm().

A similar idea could be used with other expensive comparison
operations, but I can't think of any at the moment. Actually, perhaps
you could use it with ints/floats/etc as well, since you could skip the
function call overhead. You'd be trading (n log n int compares + n
sortkeys) with (n log n comparisions).

Just some thoughts,

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
>                                       - Charles de Gaulle

pgsql-hackers by date:

Previous
From: Shigeru HANADA
Date:
Subject: Re: SQL/MED - core functionality
Next
From: Heikki Linnakangas
Date:
Subject: Re: SQL/MED - core functionality