Re: GIN improvements part 3: ordering in index - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: GIN improvements part 3: ordering in index
Date
Msg-id 51BF5527.1090203@vmware.com
Whole thread Raw
In response to Re: GIN improvements part 3: ordering in index  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: GIN improvements part 3: ordering in index
List pgsql-hackers
On 17.06.2013 15:56, Alexander Korotkov wrote:
> On Sat, Jun 15, 2013 at 3:02 AM, Alexander Korotkov<aekorotkov@gmail.com>wrote:
>
>> This patch introduces new interface method of GIN which takes same
>> arguments as consistent but returns float8.
>> float8 gin_ordering(bool check[], StrategyNumber n, Datum query, int32
>> nkeys, Pointer extra_data[], bool *recheck, Datum queryKeys[], bool
>> nullFlags[], Datum addInfo[], bool addInfoIsNull[])
>> This patch implements gingettuple method which can return ordering data
>> using KNN infrastructure. Also it introduces><  operator for fts which
>> support ordering in GIN index. Some example:
>>
>> postgres=# explain analyze select * from dblp_titles2 where tsvector @@
>> to_tsquery('english', 'statistics') order by tsvector><
>> to_tsquery('english', 'statistics') limit 10;
>>                                                                     QUERY
>> PLAN
>>
>>
-------------------------------------------------------------------------------------------------------------------------------------------------
>>   Limit  (cost=12.00..48.22 rows=10 width=136) (actual time=6.999..7.120
>> rows=10 loops=1)
>>     ->   Index Scan using dblp_titles2_idx on dblp_titles2
>>   (cost=12.00..43003.03 rows=11868 width=136) (actual time=6.996..7.115
>> rows=10 loops=1)
>>           Index Cond: (tsvector @@ '''statist'''::tsquery)
>>           Order By: (tsvector><  '''statist'''::tsquery)
>>   Total runtime: 7.556 ms
>> (5 rows)
>>
>
> Attached version of patch has some refactoring and bug fixes.

Thanks. There are no docs changes and not many comments, that needs to 
be fixed, but I think I understand how it works:

On the first call to gingettuple, the index is first scanned for all the 
matches, which are collected in an array in memory. Then, the array is 
sorted with qsort(), and the matches are returned one by one from the 
sorted array.

That has some obvious limitations. First of all, you can run out of 
memory. Secondly, is that really any faster than the plan you get 
without this patch? Ie. scan all the matches with a bitmap index scan, 
and sort the results on the rank function. If it is faster, why?

What parts of the previous two patches does this rely on?

- Heikki



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Support for REINDEX CONCURRENTLY
Next
From: Josh Berkus
Date:
Subject: Re: [9.4 CF 1] Commit Fest has started