Re: KNNGiST for knn-search - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: KNNGiST for knn-search
Date
Msg-id 1259115190.27757.11194.camel@ebony
Whole thread Raw
In response to KNNGiST for knn-search  (Teodor Sigaev <teodor@sigaev.ru>)
Responses Re: KNNGiST for knn-search
List pgsql-hackers
On Mon, 2009-11-23 at 20:44 +0300, Teodor Sigaev wrote:
> Old way:
> SELECT coordinates, (coordinates <-> '5.0,5.0'::point) AS dist FROM
> spots
> order by dist asc LIMIT 10;
> 
> Time: 1024.242 ms
> 
> knn-search:
> SELECT coordinates, (coordinates <-> '5.0,5.0'::point) AS dist FROM
> spots
> WHERE coordinates  >< '5.0,5.0'::point LIMIT 10;
> 
> Time: 3.158 ms
> 
> 
> We didn't patch existing implementation of GiST for several reasons:
> 
> 1. KNNGiST is about 5% slower than GiST on non-knn search queries,
> like
>    contains or contained by, because of some overhead of new algorithm
> of
>    tree traversal
> 2.  KNNGiST can't be used in  bitmap index scan, which destroys order
> of results,
>    We don't know the way to forbid bitmap index scan only for knn
> queries.
>    Current version of KNNGiST doesn't distinguish knn-search and usual
> search
>    and postgres doesn't know about ordered output from KNNGiST.

Sounds very cool.

Seems like you should look at the way sorted_path works in
query_planner().

If you have a query like this
 explain select col1 from s order by col1 limit 10;

then we currently understand that we should use an IndexScan for that.
We don't specifically exclude the bitmap scan, it's just that we know
that the results from the index are ordered and therefore the cost of
sorting the output need not be added. In the bitmap case the cost of the
sort must be added and that's enough to ensure we almost never do that.

I notice that a query like 
 explain select col1 from s order by abs(col1 - 5) limit 10;

is the one-dimensional equivalent of the type of query you're proposing
and that doesn't work either until you put an index on abs(col1 - 5),
then it just works, but only for k = 5.

Maybe you should look at the above query and see if there are any usable
similarities for the Knn index.

Part of your problem appears to be that cost_sort does not include
anything about the cost of the comparison operators for different
datatypes.

-- Simon Riggs           www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Hot standby and removing VACUUM FULL
Next
From: Itagaki Takahiro
Date:
Subject: Re: Partitioning option for COPY