Re: KNNGiST for knn-search - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: KNNGiST for knn-search
Date
Msg-id 4B0BD389.5000509@enterprisedb.com
Whole thread Raw
In response to Re: KNNGiST for knn-search  (Teodor Sigaev <teodor@sigaev.ru>)
List pgsql-hackers
Teodor Sigaev wrote:
>>> 1. KNNGiST is about 5% slower than GiST on non-knn search queries, like
>>>   contains or contained by, because of some overhead of new algorithm of
>>>   tree traversal
>>
>> Is it possible to use the regular GiST traversal algorithm on a
>> KNNGiST-tree, when performing regular GiST searches that don't require a
>> particular order?
> New algorithm works much more with memory for allocation/free to manage
> lists and it's a single reason of performance loss. Choosing of
> algorithm could not be done by consistent function, it should be done at
> least in amrescan method or even earlier - in planner.

Ok, that sounds good. The bottom line is that you can use the same
on-disk tree with both algorithms. No need for a separate indexam in
that case.

> One idea:
> SELECT p FROM pt WHERE p << '5.0,5.0'::point ORDER BY (p <->
> '5.0,5.0'::point) DESC LIMIT 10;
> And add <-> to opclass (but for now any indexable operation should
> return boolean type).

You really shouldn't need to have a WHERE clause.

> Of course, KNNGiST should be modified to support
> not only k-nearest search but k-"farest" search and NULLS LAST/FIRST.

Well, as long as the planner knows the capabilities of the indexam, it
can just fall back to a seqscan+sort if the query can't be sped up with
the index.

> And now you can specify p >< 'one point' AND p >< 'another
> point', but it's impossible to do that by ORDER BY clause.

Huh, what does that mean? Is it like "ORDER BY (min( p >< 'one point', p
>< 'another point')" ?

> Second idea with non-standard syntax.
> SELECT ... ORDER BY PROXIMITY OF expression[, expression [..]] TO
> expression[, expression [..]] USING [operator [, operator [..]]
> and operator is distance operator, i.e. it's not a member of btree
> opclass, but returns non-negative float8 value.
> 
> Without index it will be essentially the same as
> ORDER BY expression operator expression[ + ..] DESC NULLS LAST

We already have the syntax to represent the query, using ORDER BY. IMHO
we just need to teach the planner that when it sees a query like that,
it can use a GiST index to speed it up. A number of indexam and operator
class API changes are probably required, but it should be invisible to
the user.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: enable-thread-safety defaults?
Next
From: Pavel Stehule
Date:
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION