Re: Incorrect behaviour when using a GiST index on points - Mailing list pgsql-hackers
From | Alexander Korotkov |
---|---|
Subject | Re: Incorrect behaviour when using a GiST index on points |
Date | |
Msg-id | CAPpHfds-sGWBaRaaPB7HpjUYTV0TgkN2i+3O0oe8QqyWSR1CQA@mail.gmail.com Whole thread Raw |
In response to | Re: Incorrect behaviour when using a GiST index on points (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Incorrect behaviour when using a GiST index on points
|
List | pgsql-hackers |
On Mon, Feb 20, 2012 at 7:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
test=# create table test(a box);
CREATE TABLE
test=# insert into test (select (case when i%2= 0 then box(point(1,1),point(1,1)) else box(point(2,2),point(2,2)) end) from generate_series(1,300) i);
INSERT 0 300
test=# create index test_idx on test using gist(a);
CREATE INDEX
test=#
test=# insert into test values (box(point(1.0000009,1.0000009), point(1.0000009,1.0000009)));
INSERT 0 1
test=# select * from test where a && box(point(1.0000018,1.0000018), point(1.0000018,1.0000018));
a
---------------------------------------------
(1.0000009,1.0000009),(1.0000009,1.0000009)
(1 row)
test=# set enable_seqscan = off;
SET
test=# select * from test where a && box(point(1.0000018,1.0000018), point(1.0000018,1.0000018));
a
---
(0 rows)
But, I believe we still can bachpatch it in one of following ways:
I'm inclined to think the right answer is to make on_pb use the FP*Alexander Korotkov <aekorotkov@gmail.com> writes:
> On Thu, Feb 16, 2012 at 11:43 AM, Alexander Korotkov
> <aekorotkov@gmail.com>wrote:
>> Described differences leads to incorrect behaviour of GiST index.
>> The question is: what is correct way to fix it? Should on_pb also use FP*
>> or consistent method should behave like on_pb?
> Any comments on this? Current behaviour definitely indicates a bug, and I'm
> ready to fix it. The only question: is this bug in on_pb or gist?
macros, for consistency with other geometric operators. But it's worth
asking whether that will actually fix the problem. I've thought for
some time that we'd eventually find cases where geo_ops' use of fuzzy
comparisons breaks index behavior entirely, because it destroys natural
assumptions like the transitive law. So that could eventually lead us
to rip out the FP* macros everywhere.
In any case, this doesn't seem like something we could back-patch;
it'd be a behavioral change in HEAD only.
Analyzing GiST interface methods more detail I found one other place where fuzzy comparison was used. It is gist_box_same function. And it's really scary thing. It means that entry of internal page is not extended when inserting index tuple extends it in less than EPSILON. Correspondingly current GiST search behaviour is neither exact search or fuzzy search with given EPSILON. It is something in the middle. See following example for proof.
test=# create table test(a box);
CREATE TABLE
test=# insert into test (select (case when i%2= 0 then box(point(1,1),point(1,1)) else box(point(2,2),point(2,2)) end) from generate_series(1,300) i);
INSERT 0 300
test=# create index test_idx on test using gist(a);
CREATE INDEX
test=#
test=# insert into test values (box(point(1.0000009,1.0000009), point(1.0000009,1.0000009)));
INSERT 0 1
test=# select * from test where a && box(point(1.0000018,1.0000018), point(1.0000018,1.0000018));
a
---------------------------------------------
(1.0000009,1.0000009),(1.0000009,1.0000009)
(1 row)
test=# set enable_seqscan = off;
SET
test=# select * from test where a && box(point(1.0000018,1.0000018), point(1.0000018,1.0000018));
a
---
(0 rows)
But, I believe we still can bachpatch it in one of following ways:
1) Fix consistent and same functions. Add to release note notice that users should rebuild indexes if they want correct behaviour.
2) Leave same function as is. Add kluges to consistent functions which provides correct search on current not truly correct trees.
But anyway +1 for rip out the FP* macros everywhere in HEAD. Because I really dislike the way FP* are currently implemented. Why EPSILON is 1.0E-06? We don't know anything about nature of data that users store in geometrical datatypes. The lesser double precision value is 5E-324. For some datasets EPSILON can easily cover whole range.
------
With bestregards,
Alexander Korotkov. ------
With bestregards,
pgsql-hackers by date: