Re: Using multidimensional indexes in ordinal queries - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Using multidimensional indexes in ordinal queries
Date
Msg-id AANLkTim8yryTqBV7rVucvN1h9SpIMgt8fFEmXJSO1_sE@mail.gmail.com
Whole thread Raw
In response to Re: Using multidimensional indexes in ordinal queries  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Using multidimensional indexes in ordinal queries  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Mon, Jun 21, 2010 at 5:42 PM, Robert Haas <robertmhaas@gmail.com> wrote:
It seems like you can get more or less the same benefit from a
multicolumn btree index.  On my system, with the individual btree
indices, the query ran in 7625 ms; with an additional index on (v1,
v2, v3), it ran in 94 ms.  I didn't get the same plans as Alexander
did, though, so it may not really be apples to apples.  See attached
session trace.
Benefit of multicolumn btree index was more or less the same than cube benefit because of very bad picksplit behavior in this case. I attached the patch which significally improves cube index search performance:

test=# explain (analyze, buffers) select * from test where cube(ARRAY[v1,v2,v3]) <@ cube(ARRAY[480,480,'-inf'::float8], ARRAY[500,500,'+inf'::float8]) order by cube(ARRAY[v1,v2,v3]) <*> 4 LIMIT 10;
                                                             QUERY PLAN                                                      
       
------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..38.07 rows=10 width=28) (actual time=0.495..0.570 rows=10 loops=1)
   Buffers: shared hit=21
   ->  Index Scan using test_cube_idx on test  (cost=0.00..38064.52 rows=10000 width=28) (actual time=0.489..0.537 rows=10 loops=1)
         Index Cond: (cube(ARRAY[v1, v2, v3]) <@ '(480, 480, -inf),(500, 500, inf)'::cube)
         Sort Cond: (cube(ARRAY[v1, v2, v3]) <*> 4)
         Buffers: shared hit=21
 Total runtime: 0.659 ms
(7 rows)

Now this patch greatly increases tree construction time, but I believe that picksplit implementation, that is good enough for tree search and tree construction, can be found.
 
The trouble is that it's hard to think of
a way of teaching the planner about these cases without hard-coding
lots and lots of special-case kludges into the planner.  Still, if
someone has a clever idea...

I think that two things can be done to improve the situation:
1) Make knngist deal with negative values. I think this will make easier using knngist just for sorting, not only k-neighbor searching.
2) Let gist interface methods take care about multicolumn indexes. I think that if cube index from the example above will be constructed on separate columns v1, v2, v3 then it would be easier for planner to use cube index for queries with filters on these columns. I don't know exactly how to do that.
Attachment

pgsql-hackers by date:

Previous
From: Andy Balholm
Date:
Subject: Re: dividing money by money
Next
From: Robert Haas
Date:
Subject: Re: Using multidimensional indexes in ordinal queries