Re: GiST for range types (was Re: Range Types - typo + NULL string constructor) - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date
Msg-id CAPpHfduBWP15Ufc6F_W=+VYxQmBdSdXbjz3EygM-Uc-5i60W8Q@mail.gmail.com
Whole thread Raw
In response to Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)  (Greg Smith <greg@2ndQuadrant.com>)
Responses Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
Hi!

Studying this question little more I found that current approach of range indexing can be dramatically inefficient in some cases. It's not because of penalty or split implementation, but because of approach itself. Mapping intervals to two-dimensional space produce much better results in case of high-overlapping ranges and "@>", "<@" operators with low selectivity. 

There is a simple test case for proof of concept.

create table source as (select l, (l + s) as r from (select (random()*10000)::int as l, (random()*1000 + 1)::int s from generate_series(1,1000000) g) x);
create table range_test as (select int4range(l,r) as x from source);
create table point_test as (select point(l,r) as x from source);
create index range_test_idx on range_test using gist (x);
create index point_test_idx on point_test using gist (x);

test=# explain (analyze, buffers) select * from range_test where x <@ int4range(5000,5010);
                                                         QUERY PLAN                                 
                         
-----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on range_test  (cost=40.31..2585.65 rows=1000 width=32) (actual time=37.304..37.310 rows=2 loops=1)
   Recheck Cond: (x <@ '[5000,5010)'::int4range)
   Buffers: shared hit=767
   ->  Bitmap Index Scan on range_test_idx  (cost=0.00..40.06 rows=1000 width=0) (actual time=37.288..37.288 rows=2 loops=1)
         Index Cond: (x <@ '[5000,5010)'::int4range)
         Buffers: shared hit=765
 Total runtime: 37.385 ms
(7 rows)


test=# explain (analyze, buffers) select * from point_test where x <@ box(point(5000,5000),point(5010,5010));
                                                        QUERY PLAN                                  
                       
---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on point_test  (cost=44.36..2589.69 rows=1000 width=16) (actual time=0.197..0.206 rows=2 loops=1)
   Recheck Cond: (x <@ '(5010,5010),(5000,5000)'::box)
   Buffers: shared hit=5
   ->  Bitmap Index Scan on point_test_idx  (cost=0.00..44.11 rows=1000 width=0) (actual time=0.182..0.182 rows=2 loops=1)
         Index Cond: (x <@ '(5010,5010),(5000,5000)'::box)
         Buffers: shared hit=3
 Total runtime: 0.265 ms
(7 rows)

test=# explain (analyze, buffers) select * from range_test where x @> int4range(5000,5990);
                                                        QUERY PLAN                                  
                       
---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on range_test  (cost=40.31..2585.65 rows=1000 width=32) (actual time=4.578..4.603 
rows=5 loops=1)
   Recheck Cond: (x @> '[5000,5990)'::int4range)
   Buffers: shared hit=52
   ->  Bitmap Index Scan on range_test_idx  (cost=0.00..40.06 rows=1000 width=0) (actual time=4.561..4.561 rows=5 loops=1)
         Index Cond: (x @> '[5000,5990)'::int4range)
         Buffers: shared hit=47
 Total runtime: 4.669 ms
(7 rows)


test=# explain (analyze, buffers) select * from point_test where x <@ box(point('-inf'::float,5990),point(5000,'+inf'::float));
                                                        QUERY PLAN                                  
                       
---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on point_test  (cost=44.36..2589.69 rows=1000 width=16) (actual time=0.328..0.353 rows=5 loops=1)
   Recheck Cond: (x <@ '(5000,inf),(-inf,5990)'::box)
   Buffers: shared hit=8
   ->  Bitmap Index Scan on point_test_idx  (cost=0.00..44.11 rows=1000 width=0) (actual time=0.312..0.312 rows=5 loops=1)
         Index Cond: (x <@ '(5000,inf),(-inf,5990)'::box)
         Buffers: shared hit=3
 Total runtime: 0.419 ms
(7 rows)

If you like to learn more information about such mapping you can start from here: http://www.comsis.org/ComSIS/Vol7No4/RegularPapers/paper16.pdf

Any thoughts?

-----
With best regards,
Alexander Korotkov.

pgsql-hackers by date:

Previous
From: Nikhil Sontakke
Date:
Subject: Re: Review: Non-inheritable check constraints
Next
From: Magnus Hagander
Date:
Subject: Re: JSON for PG 9.2