Thread: Fix for GiST penalty

Fix for GiST penalty

From
Alexander Korotkov
Date:
Hi!

During my work on GSoC project, I found bad perfomace of GiST for point datatype. It appears even on uniform random data.

test=# create table test as (select point(random(), random()) from generate_series(1, 1000000));
SELECT 1000000
test=# create index test_idx on test using gist(point);
CREATE INDEX
test=# explain (analyze, buffers) select * from test where point <@ box(point(0.5,0.5), point(0.505,0.505));
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=48.40..2593.73 rows=1000 width=16) (actual time=97.479..97.551 rows=24 loops=1)
   Recheck Cond: (point <@ '(0.505,0.505),(0.5,0.5)'::box)
   Buffers: shared hit=5126
   ->  Bitmap Index Scan on test_idx  (cost=0.00..48.15 rows=1000 width=0) (actual time=97.462..97.462 rows=24 loops=1)
         Index Cond: (point <@ '(0.505,0.505),(0.5,0.5)'::box)
         Buffers: shared hit=5102
 Total runtime: 97.644 ms
(7 rows)

Search for the cause takes relatively long time from me, but finally I did. In gist_box_penalty function floating point error in line 
  *result = (float) (size_box(ud) - size_box(origentry->key));
sometimes makes *result a very small negative number. 
I beleive that best place to fix it is gistpenalty function. The attached patch makes this function treating negative number from user's penalty as zero. I didn't find mention of negative penalty value in documentation. So, AFAICS such behaviour shouldn't break anything.
After the patch index performance is ok.

test=# explain (analyze, buffers) select * from test where point <@ box(point(0.5,0.5), point(0.505,0.505));
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=44.35..2589.68 rows=1000 width=16) (actual time=0.988..1.116 rows=24 loops=1)
   Recheck Cond: (point <@ '(0.505,0.505),(0.5,0.5)'::box)
   Buffers: shared hit=44
   ->  Bitmap Index Scan on test_idx  (cost=0.00..44.10 rows=1000 width=0) (actual time=0.966..0.966 rows=24 loops=1)
         Index Cond: (point <@ '(0.505,0.505),(0.5,0.5)'::box)
         Buffers: shared hit=20
 Total runtime: 1.313 ms
(7 rows)

------
With best regards,
Alexander Korotkov.
Attachment

Re: Fix for GiST penalty

From
Heikki Linnakangas
Date:
On 31.05.2011 01:07, Alexander Korotkov wrote:
> In gist_box_penalty function floating point error in line
>    *result = (float) (size_box(ud) - size_box(origentry->key));
> sometimes makes *result a very small negative number.
> I beleive that best place to fix it is gistpenalty function. The attached
> patch makes this function treating negative number from user's penalty as
> zero. I didn't find mention of negative penalty value in documentation. So,
> AFAICS such behaviour shouldn't break anything.
> After the patch index performance is ok.

Yeah, there seems to be a hidden assumption that the return value of the 
penalty function is always >= 0. The logic in gistchoose() in particular 
seems to assume that, by using < 0 to mean uninitialized slots in the 
which_grow array.

The documentation should be fixed too.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Fix for GiST penalty

From
Alexander Korotkov
Date:
On Tue, May 31, 2011 at 12:06 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
The documentation should be fixed too.
Patch with documentation fix is attached.

I tried to reproduce this problem on another computer with Windows, but problem doesn't occurs. So, seems that it can be OS, compiler, optimization level or CPU specific.
I provide some information about my laptop where I've encountered with problem. I hope this helps to reproduce this problem if needed.

$ uname -a
Linux smagen-notebook 2.6.38-8-generic-pae #42-Ubuntu SMP Mon Apr 11 05:17:09 UTC 2011 i686 i686 i386 GNU/Linux

$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=11.04
DISTRIB_CODENAME=natty
DISTRIB_DESCRIPTION="Ubuntu 11.04"

$ gcc --version
gcc (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2

$ cat /proc/cpuinfo 
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 Duo CPU     T7300  @ 2.00GHz
stepping : 10
cpu MHz : 800.000
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida dts tpr_shadow vnmi flexpriority
bogomips : 3990.66
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 Duo CPU     T7300  @ 2.00GHz
stepping : 10
cpu MHz : 800.000
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida dts tpr_shadow vnmi flexpriority
bogomips : 3989.97
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

------
With best regards,
Alexander Korotkov.
Attachment

Re: Fix for GiST penalty

From
Tom Lane
Date:
Alexander Korotkov <aekorotkov@gmail.com> writes:
> On Tue, May 31, 2011 at 12:06 PM, Heikki Linnakangas <
> heikki.linnakangas@enterprisedb.com> wrote:
>> The documentation should be fixed too.

> Patch with documentation fix is attached.

Applied, thanks.  I threw in an isnan() test too, just to be really sure
funny penalty results wouldn't break anything.
        regards, tom lane