Home > mailing lists

Using multidimensional indexes in ordinal queries - Mailing list pgsql-hackers

From	Alexander Korotkov
Subject	Using multidimensional indexes in ordinal queries
Date	June 7, 2010 22:01:33
Msg-id	AANLkTimhFaq6hCibRnk0tlcQMIyhYWHwAQ2ZD87wbH86@mail.gmail.com Whole thread Raw
Responses	Re: Using multidimensional indexes in ordinal queries (Thom Brown <thombrown@gmail.com>)
List	pgsql-hackers

Tree view

Hello hackers,

I would like to share some my thoughts about usage of multidimensional indexes for queries which deal with ordinal unidimensional data types. I think that gist indexes (especially with knngist) can produce great benefit for complex multi-criterion queries.

Let's consider come example. I use postgresql-9.0beta1 with knngist patch. Also I have created simple patch that allows to use knngist for ordinal sorting in cube extension (patch is attached). The "<*>" operator was introduced in my patch. The first operand is the cube and the second operand is number n. If n = 2*k then the ascending ordering by k-dimension occurs. If n = 2*k + 1 then descending ordering by k-dimension occurs. Now this operator have a limitation and works only with nonnegative coordinate values.

Let's create table with 3 float-point columns and fill it with 10M rows;

create table test (id serial primary key, v1 double precision, v2 double precision, v3 double precision);

insert into test (v1,v2,v3) (select random()*1000, random()*1000, random()*1000 from generate_series(1,10000000,1));

Now, let's create 3 separate btree indexes and one gist cube index.

create index test_v1_idx on test(v1);

create index test_v2_idx on test(v2);

create index test_v3_idx on test(v3);

create index test_cube_idx on test using gist(cube(ARRAY[v1,v2,v3]));

Let's consider some complex query with filtering, ordering and limit.

test=# select * from test where v1 between 480 and 500 and v2 between 480 and 500 order by v3 limit 10;

id | v1 | v2 | v3

----------+------------------+------------------+-------------------

12283631 | 485.982828773558 | 496.795611456037 | 0.213871244341135

4936086 | 497.239370364696 | 491.878624074161 | 1.26481195911765

8963067 | 484.963194001466 | 497.094289399683 | 1.30057940259576

12435440 | 498.670902103186 | 498.667187988758 | 1.33110675960779

11667415 | 494.398592971265 | 497.440234292299 | 1.44533207640052

8530558 | 482.85893118009 | 496.267838869244 | 1.48530444130301

4004942 | 483.679085504264 | 489.547223784029 | 1.57393841072917

14897796 | 491.37338064611 | 487.475242000073 | 1.81775307282805

4105759 | 489.506138022989 | 486.91446846351 | 1.94038823246956

12895656 | 499.508572742343 | 487.065799534321 | 2.34963605180383

(10 rows)

test=# explain analyze select * from test where v1 between 480 and 500 and v2 between 480 and 500 order by v3 limit 10;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------

Limit (cost=22786.73..22786.75 rows=10 width=28) (actual time=3242.135..3242.162 rows=10 loops=1)

-> Sort (cost=22786.73..22797.59 rows=4345 width=28) (actual time=3242.131..3242.144 rows=10 loops=1)

Sort Key: v3

Sort Method: top-N heapsort Memory: 25kB

-> Bitmap Heap Scan on test (cost=8755.91..22692.83 rows=4345 width=28) (actual time=1281.030..3234.934 rows=4027 loops=1)

Recheck Cond: ((v1 >= 480::double precision) AND (v1 <= 500::double precision) AND (v2 >= 480::double precision) AND (v2 <= 500::double precision))

-> BitmapAnd (cost=8755.91..8755.91 rows=4345 width=0) (actual time=1280.783..1280.783 rows=0 loops=1)

-> Bitmap Index Scan on test_v1_idx (cost=0.00..4243.12 rows=202177 width=0) (actual time=644.702..644.702 rows=200715 loops=1)

Index Cond: ((v1 >= 480::double precision) AND (v1 <= 500::double precision))

-> Bitmap Index Scan on test_v2_idx (cost=0.00..4510.37 rows=214902 width=0) (actual time=630.085..630.085 rows=200200 loops=1)

Index Cond: ((v2 >= 480::double precision) AND (v2 <= 500::double precision))

Total runtime: 3242.253 ms

(12 rows)

This query can be rewritten in order to let planner use gist cube index.

test=# select * from test where cube(array[v1,v2,v3]) <@ cube(array[480,480,'-Infinity'::float],array[500,500,'+Infinity'::float]) order by cube(array[v1,v2,v3]) <*> 4 limit 10;