Re: GiST for range types (was Re: Range Types - typo + NULL string constructor) - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date
Msg-id CAPpHfds5M21x4idTOO4gYyPEu3B0JBtUqX2srTDPa4dDNT3UyQ@mail.gmail.com
Whole thread Raw
In response to Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
List pgsql-hackers
New version of GiST for range types patch is here. This version seems to be complete and ready for review.

Main features of this patch are:
1) Different handling of empty ranges. Problem of "column <@ const" acceleration is that all empty ranges satisfies to it. It's because empty range can be anywhere in the tree, because empty range is contained in any range. My solution is to use additional bit in range flags especially for GiST. This bit means that internal range have underlying empty ranges. This approach allows to accelerate "column <@ const" case and also fast search for empty ranges. 
2) Penalty and picksplit methods are trying not to mix ordinal ranges with special cases ((-inf;x), (x; +inf), (-inf, +inf) and empty). Picksplit divides ranges to classes (see get_gist_range_class for details). If several classes are presented in input vector, split will be between classes. If we have only one class, picksplit method splits ranges using double sorting split for ordinal ranges, and simple sorting split for (-inf,x) and (x,+inf) cases.

Questions:
1) I'm not sure about whether we need support of <> in GiST, because it always produces full index scan (except search for non-empty ranges).
2) With no selectivity estimation functions we have no GiST index usage for &&, <@, @>. Possible temporarty solution is to create lower constant selectivity estimation like we have for geometric datatypes.

A little testing:
Let's create dataset with 1000 empty ranges, 1000 (-inf, x) ranges, 1000 (x, inf) ranges, 1000 (-inf, + inf) ranges and 1000000 ordinal ranges and mix them.

create table temp (r int4range);
insert into temp (r) (select int4range() from generate_series(1,1000));
insert into temp (r) (select int4range(NULL, NULL) from generate_series(1,1000));
insert into temp (r) (select int4range((random()*1000000)::int, NULL) from generate_series(1,1000));
insert into temp (r) (select int4range(NULL,(random()*1000000)::int) from generate_series(1,1000));
insert into temp (select int4range(v,v+1+(random()*100)::int) from (select (random()*1000000)::int v from generate_series(1,1000000)) x);
create table range_test as (select * from temp order by random());
drop table temp;

There are some results without patch.

test=# create index range_idx on range_test using gist (r);
CREATE INDEX
Time: 174578,153 ms

test=# explain (analyze, buffers) select * from range_test where r @> int4range(500000,500010);
                                                           QUERY PLAN                               
                            
--------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on range_test  (cost=17659.03..29840.03 rows=502000 width=17) (actual time=87.961..116.074 rows=2028 loops=1)
   Recheck Cond: (r @> '[500000,500010)'::int4range)
   Buffers: shared hit=192 read=1856 written=886
   ->  Bitmap Index Scan on range_idx  (cost=0.00..17533.53 rows=502000 width=0) (actual time=87.591..87.591 rows=2028 loops=1)
         Index Cond: (r @> '[500000,500010)'::int4range)
         Buffers: shared hit=185 read=139
 Total runtime: 118.804 ms
(7 rows)

test=# explain (analyze, buffers) select * from range_test where r <@ int4range(500000,501000);
                                                             QUERY PLAN                             
                                
------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on range_test  (cost=17659.03..29840.03 rows=502000 width=17) (actual time=1725.249..1744.327 rows=1994 loops=1)
   Recheck Cond: (r <@ '[500000,501000)'::int4range)
   Buffers: shared hit=3 read=8604
   ->  Bitmap Index Scan on range_idx  (cost=0.00..17533.53 rows=502000 width=0) (actual time=1724.834..1724.834 rows=1994 loops=1)
         Index Cond: (r <@ '[500000,501000)'::int4range)
         Buffers: shared hit=3 read=6880
 Total runtime: 1747.007 ms
(7 rows)

test=# explain (analyze, buffers) select * from range_test where r && int4range(500000,501000);
                                                           QUERY PLAN                               
                            
--------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on range_test  (cost=17659.03..29840.03 rows=502000 width=17) (actual time=88.196..105.423 rows=3082 loops=1)
   Recheck Cond: (r && '[500000,501000)'::int4range)
   Buffers: shared hit=1183 read=1545
   ->  Bitmap Index Scan on range_idx  (cost=0.00..17533.53 rows=502000 width=0) (actual time=87.638..87.638 rows=3082 loops=1)
         Index Cond: (r && '[500000,501000)'::int4range)
         Buffers: shared hit=44 read=286
 Total runtime: 109.486 ms
(7 rows)

And results for same queries with patch.

test=# create index range_idx on range_test using gist (r);
CREATE INDEX
Time: 79220,888 ms

test=# explain (analyze, buffers) select * from range_test where r @> int4range(500000,500010);
                                                          QUERY PLAN                                
                          
------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on range_test  (cost=15954.99..28135.99 rows=502000 width=36) (actual time=9.699..36.352 rows=2028 loops=1)
   Recheck Cond: (r @> '[500000,500010)'::int4range)
   Buffers: shared hit=8 read=1734 written=556
   ->  Bitmap Index Scan on range_idx  (cost=0.00..15829.49 rows=502000 width=0) (actual time=9.297..9.297 rows=2028 loops=1)
         Index Cond: (r @> '[500000,500010)'::int4range)
         Buffers: shared hit=8 read=10
 Total runtime: 39.229 ms
(7 rows)

test=# explain (analyze, buffers) select * from range_test where r <@ int4range(500000,501000);
                                                           QUERY PLAN                               
                            
--------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on range_test  (cost=15954.99..28135.99 rows=502000 width=17) (actual time=15.886..25.571 rows=1994 loops=1)
   Recheck Cond: (r <@ '[500000,501000)'::int4range)
   Buffers: shared hit=1199 read=554
   ->  Bitmap Index Scan on range_idx  (cost=0.00..15829.49 rows=502000 width=0) (actual time=15.518..15.518 rows=1994 loops=1)
         Index Cond: (r <@ '[500000,501000)'::int4range)
         Buffers: shared hit=23 read=6
 Total runtime: 28.243 ms
(7 rows)

test=# explain (analyze, buffers) select * from range_test where r && int4range(500000,501000);
                                                          QUERY PLAN                                
                          
------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on range_test  (cost=15954.99..28135.99 rows=502000 width=17) (actual time=7.845..19.686 rows=3082 loops=1)
   Recheck Cond: (r && '[500000,501000)'::int4range)
   Buffers: shared hit=2389 read=33
   ->  Bitmap Index Scan on range_idx  (cost=0.00..15829.49 rows=502000 width=0) (actual time=7.282..7.282 rows=3082 loops=1)
         Index Cond: (r && '[500000,501000)'::int4range)
         Buffers: shared hit=24
 Total runtime: 23.728 ms
(7 rows)

We can see pretty dramatical acceleration. Also index build becomes faster, actually I don't know why :)

------
With best regards,
Alexander Korotkov. 
Attachment

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: parallel make failure
Next
From: "Kevin Grittner"
Date:
Subject: Re: const correctness