Re: Incorrect behaviour when using a GiST index on points - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Incorrect behaviour when using a GiST index on points
Date
Msg-id CAPpHfds-sGWBaRaaPB7HpjUYTV0TgkN2i+3O0oe8QqyWSR1CQA@mail.gmail.com
Whole thread Raw
In response to Re: Incorrect behaviour when using a GiST index on points  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Incorrect behaviour when using a GiST index on points
List pgsql-hackers
On Mon, Feb 20, 2012 at 7:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <aekorotkov@gmail.com> writes:
> On Thu, Feb 16, 2012 at 11:43 AM, Alexander Korotkov
> <aekorotkov@gmail.com>wrote:
>> Described differences leads to incorrect behaviour of GiST index.
>> The question is: what is correct way to fix it? Should on_pb also use FP*
>> or consistent method should behave like on_pb?

> Any comments on this? Current behaviour definitely indicates a bug, and I'm
> ready to fix it. The only question: is this bug in on_pb or gist?

I'm inclined to think the right answer is to make on_pb use the FP*
macros, for consistency with other geometric operators.  But it's worth
asking whether that will actually fix the problem.  I've thought for
some time that we'd eventually find cases where geo_ops' use of fuzzy
comparisons breaks index behavior entirely, because it destroys natural
assumptions like the transitive law.  So that could eventually lead us
to rip out the FP* macros everywhere.

In any case, this doesn't seem like something we could back-patch;
it'd be a behavioral change in HEAD only.

Analyzing GiST interface methods more detail I found one other place where fuzzy comparison was used. It is gist_box_same function. And it's really scary thing. It means that entry of internal page is not extended when inserting index tuple extends it in less than EPSILON. Correspondingly current GiST search behaviour is neither exact search or fuzzy search with given EPSILON. It is something in the middle. See following example for proof.

test=# create table test(a box);
CREATE TABLE
test=# insert into test (select (case when i%2= 0 then box(point(1,1),point(1,1)) else box(point(2,2),point(2,2)) end) from generate_series(1,300) i);
INSERT 0 300
test=# create index test_idx on test using gist(a);
CREATE INDEX
test=# 
test=# insert into test values (box(point(1.0000009,1.0000009), point(1.0000009,1.0000009)));
INSERT 0 1
test=# select * from test where a && box(point(1.0000018,1.0000018), point(1.0000018,1.0000018));
                      a                      
---------------------------------------------
 (1.0000009,1.0000009),(1.0000009,1.0000009)
(1 row)

test=# set enable_seqscan = off;
SET
test=# select * from test where a && box(point(1.0000018,1.0000018), point(1.0000018,1.0000018));
 a 
---
(0 rows)

But, I believe we still can bachpatch it in one of following ways:
1) Fix consistent and same functions. Add to release note notice that users should rebuild indexes if they want correct behaviour.
2) Leave same function as is. Add kluges to consistent functions which provides correct search on current not truly correct trees.

But anyway +1 for rip out the FP* macros everywhere in HEAD. Because I really dislike the way FP* are currently implemented. Why EPSILON is 1.0E-06? We don't know anything about nature of data that users store in geometrical datatypes. The lesser double precision value is 5E-324. For some datasets EPSILON can easily cover whole range. 

------
With bestregards,
Alexander Korotkov. 

pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: pgsql_fdw, FDW for PostgreSQL server
Next
From: Josh Berkus
Date:
Subject: Re: 16-bit page checksums for 9.2