Fix for gistchoose - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Fix for gistchoose
Date
Msg-id CAPpHfdunMMv3qUv-MXTzVwWP4L9mc-A9We_OKGQEB19pse6nGQ@mail.gmail.com
Whole thread Raw
Responses Re: Fix for gistchoose
List pgsql-hackers
I found that code of gistchoose doesn't follow it's logic. Idea of gistchoose is that first column penalty is more important than penalty of second column. If we meet same penalty values of first column then we choose minimal penalty value of second column among them.

Current code doesn't follow described logic. Let's illustrate it on example. Imagine we've following input data.

tuple col1 col2
1     1    0
2     0    2
3     0    1

According to function logic, tuple #3 should be selected. But gistchoose will choose #2, because after processing #2 which_grow will contain two zeros. It's enough to remove "&& i == FirstOffsetNumber" from gistchoose in order to fix it. I've discussed it with Teodor in person and he have confirmed my thoughts.

Let's consider an example.

Create table with two columns where first column contain only few distinct values.
test=# create table test as (select (random()*5)::integer, random() from generate_series(1,1000000));
test=# create index test_idx on test using gist (x,y);

Without patch:

test=# explain (analyze, buffers) select * from test where x = 1 and y between 0.1 and 0.2;
                                                         QUERY PLAN                                 
                         
-----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=986.89..6737.30 rows=19681 width=12) (actual time=40.543..52.033 rows=19894 loops=1)
   Recheck Cond: ((x = 1) AND (y >= 0.1::double precision) AND (y <= 0.2::double precision))
   Buffers: shared hit=5957
   ->  Bitmap Index Scan on test_idx  (cost=0.00..981.96 rows=19681 width=0) (actual time=39.248..39.248 rows=19894 loops=1)
         Index Cond: ((x = 1) AND (y >= 0.1::double precision) AND (y <= 0.2::double precision))
         Buffers: shared hit=699
 Total runtime: 53.494 ms
(7 rows)

test=# explain (analyze, buffers) select * from test where x = 1 and y between 0.5 and 0.6;
                                                         QUERY PLAN                                 
                         
-----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=998.29..6753.38 rows=19948 width=12) (actual time=38.646..50.136 rows=19830 loops=1)
   Recheck Cond: ((x = 1) AND (y >= 0.5::double precision) AND (y <= 0.6::double precision))
   Buffers: shared hit=6243
   ->  Bitmap Index Scan on test_idx  (cost=0.00..993.30 rows=19948 width=0) (actual time=37.342..37.342 rows=19830 loops=1)
         Index Cond: ((x = 1) AND (y >= 0.5::double precision) AND (y <= 0.6::double precision))
         Buffers: shared hit=974
 Total runtime: 51.561 ms
(7 rows)

With patch

test=# explain (analyze, buffers) select * from test2 where x = 1 and y between 0.1 and 0.2;
                                                          QUERY PLAN                                
                          
------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test2  (cost=889.96..6645.37 rows=19966 width=12) (actual time=17.761..29.499 rows=19894 loops=1)
   Recheck Cond: ((x = 1) AND (y >= 0.1::double precision) AND (y <= 0.2::double precision))
   Buffers: shared hit=5436
   ->  Bitmap Index Scan on test2_idx  (cost=0.00..884.97 rows=19966 width=0) (actual time=16.598..16.598 rows=19894 loops=1)
         Index Cond: ((x = 1) AND (y >= 0.1::double precision) AND (y <= 0.2::double precision))
         Buffers: shared hit=178
 Total runtime: 30.996 ms
(7 rows)


test=# explain (analyze, buffers) select * from test2 where x = 1 and y between 0.5 and 0.6;
                                                          QUERY PLAN                                
                          
------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test2  (cost=885.34..6639.89 rows=19917 width=12) (actual time=21.734..33.655 rows=19830 loops=1)
   Recheck Cond: ((x = 1) AND (y >= 0.5::double precision) AND (y <= 0.6::double precision))
   Buffers: shared hit=5452
   ->  Bitmap Index Scan on test2_idx  (cost=0.00..880.36 rows=19917 width=0) (actual time=20.604..20.604 rows=19830 loops=1)
         Index Cond: ((x = 1) AND (y >= 0.5::double precision) AND (y <= 0.6::double precision))
         Buffers: shared hit=183
 Total runtime: 35.136 ms
(7 rows)


You can see that difference in number of read buffers during index scan is pretty significant.

------
With best regards,
Alexander Korotkov.
Attachment

pgsql-hackers by date:

Previous
From: Rushabh Lathia
Date:
Subject: Re: Primary Key Constraint on inheritance table not getting route to child tables
Next
From: Alexander Korotkov
Date:
Subject: Re: gistchoose vs. bloat