Re: ToDo: KNN Search should to support DISTINCT clasuse? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: ToDo: KNN Search should to support DISTINCT clasuse?
Date
Msg-id 2448.1350921455@sss.pgh.pa.us
Whole thread Raw
In response to ToDo: KNN Search should to support DISTINCT clasuse?  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: ToDo: KNN Search should to support DISTINCT clasuse?  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: ToDo: KNN Search should to support DISTINCT clasuse?  (Robert Haas <robertmhaas@gmail.com>)
Re: ToDo: KNN Search should to support DISTINCT clasuse?  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
Pavel Stehule <pavel.stehule@gmail.com> writes:
> but using DISTINCT breaks KNN searching optimization

> postgres=# explain select distinct nazobce, nazobce <-> 'Benešov' from
> obce order by nazobce <-> 'Benešov' limit 10

Don't hold your breath.  There are two ways the system could implement
the DISTINCT clause: either sort and uniq, or hashaggregate.
hashaggregate will destroy any input ordering, so there's no value in
using the index as input.  sort and uniq requires the input to be sorted
by *all* the columns being distinct'ed, not just one, so again this
index isn't useful.  You could get a plan using the index if you only
wanted the <-> output column, eg

contrib_regression=# explain select distinct t <-> 'foo' from test_trgm order by t <-> 'foo' limit 10;
                 QUERY PLAN                                      
 
-------------------------------------------------------------------------------------Limit  (cost=0.00..0.87 rows=10
width=12) ->  Unique  (cost=0.00..86.75 rows=1000 width=12)        ->  Index Scan using ti on test_trgm
(cost=0.00..84.25rows=1000 width=12)              Order By: (t <-> 'foo'::text)
 
(4 rows)

Perhaps it would be close enough to what you want to use DISTINCT ON:

contrib_regression=# explain select distinct on( t <-> 'foo') *,t <-> 'foo' from test_trgm order by t <-> 'foo' limit
10;                                   QUERY PLAN                                      
 
-------------------------------------------------------------------------------------Limit  (cost=0.00..0.87 rows=10
width=12) ->  Unique  (cost=0.00..86.75 rows=1000 width=12)        ->  Index Scan using ti on test_trgm
(cost=0.00..84.25rows=1000 width=12)              Order By: (t <-> 'foo'::text)
 
(4 rows)
        regards, tom lane



pgsql-hackers by date:

Previous
From: Phil Sorber
Date:
Subject: Re: [WIP] pg_ping utility
Next
From: Pavel Stehule
Date:
Subject: Re: ToDo: KNN Search should to support DISTINCT clasuse?