Thread: GiST rtree logic is not right

GiST rtree logic is not right

From
Tom Lane
Date:
It occurred to me to wonder whether contrib/rtree_gist fixes the rtree
bug documented here:
http://archives.postgresql.org/pgsql-general/2004-03/msg01143.php

The answer is unfortunately "no".  In the regression database,
install rtree_gist and do:

regression=# create table gist_emp4000 as select * from slow_emp4000;
SELECT
regression=# create index grect2ind ON gist_emp4000 USING gist (home_base);
CREATE INDEX
regression=# select count(*) from gist_emp4000 where home_base << '(35565,5404),(35546,5360)';count 
------- 2144
(1 row)

The correct answer is

regression=# select count(*) from slow_emp4000 where home_base << '(35565,5404),(35546,5360)';count 
------- 2214
(1 row)

Now this is noticeably better than the rtree implementation, which finds
only 1363 rows, but broken is still broken :-(

This is doubtless not as high priority as the concurrency stuff you
are working on, but it'd be good to fix anyway.  I was thinking of
proposing that we move rtree_gist into the core --- but the case for
it would be stronger if it worked correctly ...
        regards, tom lane


Re: GiST rtree logic is not right

From
Teodor Sigaev
Date:
I'll look at problem after GiST concurrency. Fixing rtree_gist is bug a fix, not 
a new feature, so I'm not limited by 1 July.

> This is doubtless not as high priority as the concurrency stuff you
> are working on, but it'd be good to fix anyway.  I was thinking of
> proposing that we move rtree_gist into the core --- but the case for
> it would be stronger if it worked correctly ...
> 

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: GiST rtree logic is not right

From
"John Hansen"
Date:
> I'll look at problem after GiST concurrency. Fixing
> rtree_gist is bug a fix, not a new feature, so I'm not
> limited by 1 July.

Wont fixing rtree(_gist) require initdb, since the behaviour of the
operators will change?

... John



Re: GiST rtree logic is not right

From
Oleg Bartunov
Date:
Hmm,

something is broken in HEAD. I recall it worked in STABLE, here is what I have
Welcome to psql 8.0.3, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms       \h for help with SQL commands       \? for help with psql commands
\gor terminate with semicolon to execute query       \q to quit
 

regression=# create index grect2ind ON gist_emp4000 USING gist (home_base);
CREATE INDEX
regression=# select count(*) from gist_emp4000 where home_base << '(35565,5404)
,(35546,5360)'; count 
-------  2214
(1 row)


btw, Teodor, I noticed gist_print from gevel package doesn't works in HEAD

regression=# select * from gist_print('grect2ind') as t(level int, a box) where level =1;
ERROR:  function return row and query-specified return row do not match
DETAIL:  Returned row contains 3 attributes, but query expects 2.


Oleg

On Wed, 22 Jun 2005, Tom Lane wrote:

> It occurred to me to wonder whether contrib/rtree_gist fixes the rtree
> bug documented here:
> http://archives.postgresql.org/pgsql-general/2004-03/msg01143.php
>
> The answer is unfortunately "no".  In the regression database,
> install rtree_gist and do:
>
> regression=# create table gist_emp4000 as select * from slow_emp4000;
> SELECT
> regression=# create index grect2ind ON gist_emp4000 USING gist (home_base);
> CREATE INDEX
> regression=# select count(*) from gist_emp4000 where home_base << '(35565,5404),(35546,5360)';
> count
> -------
>  2144
> (1 row)
>
> The correct answer is
>
> regression=# select count(*) from slow_emp4000 where home_base << '(35565,5404),(35546,5360)';
> count
> -------
>  2214
> (1 row)
>
> Now this is noticeably better than the rtree implementation, which finds
> only 1363 rows, but broken is still broken :-(
>
> This is doubtless not as high priority as the concurrency stuff you
> are working on, but it'd be good to fix anyway.  I was thinking of
> proposing that we move rtree_gist into the core --- but the case for
> it would be stronger if it worked correctly ...
>
>             regards, tom lane
>
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Re: GiST rtree logic is not right

From
Tom Lane
Date:
Oleg Bartunov <oleg@sai.msu.su> writes:
> something is broken in HEAD. I recall it worked in STABLE, here is what I have

No, actually it is broken in 8.0 too.  The difference is that 8.0
defaults to not using an indexscan for this query.  In 8.0 I see:

regression=# set enable_seqscan TO 1;
SET
regression=# explain select count(*) from gist_emp4000 where home_base << '(35565,5404),(35546,5360)';
         QUERY PLAN
 
---------------------------------------------------------------------Aggregate  (cost=65.53..65.53 rows=1 width=0)  ->
SeqScan on gist_emp4000  (cost=0.00..64.75 rows=310 width=0)        Filter: (home_base <<
'(35565,5404),(35546,5360)'::box)
(3 rows)

regression=# select count(*) from gist_emp4000 where home_base << '(35565,5404),(35546,5360)';count
------- 2214
(1 row)

regression=# set enable_seqscan TO 0;
SET
regression=# explain select count(*) from gist_emp4000 where home_base << '(35565,5404),(35546,5360)';
                   QUERY PLAN
 
----------------------------------------------------------------------------------------Aggregate  (cost=112.96..112.96
rows=1width=0)  ->  Index Scan using grect2ind on gist_emp4000  (cost=0.00..112.18 rows=310 width=0)        Index Cond:
(home_base<< '(35565,5404),(35546,5360)'::box)
 
(3 rows)

regression=# select count(*) from gist_emp4000 where home_base << '(35565,5404),(35546,5360)';count
------- 2144
(1 row)

In CVS tip the default plan is:

regression=# explain select count(*) from gist_emp4000 where home_base << '(35565,5404),(35546,5360)';
               QUERY PLAN
 
--------------------------------------------------------------------------------Aggregate  (cost=35.74..35.74 rows=1
width=0) ->  Bitmap Heap Scan on gist_emp4000  (cost=5.08..34.96 rows=310 width=0)        Recheck Cond: (home_base <<
'(35565,5404),(35546,5360)'::box)       ->  Bitmap Index Scan on grect2ind  (cost=0.00..5.08 rows=310 width=0)
   Index Cond: (home_base << '(35565,5404),(35546,5360)'::box)
 
(5 rows)

regression=# select count(*) from gist_emp4000 where home_base << '(35565,5404),(35546,5360)';count
------- 2144
(1 row)
        regards, tom lane


Re: GiST rtree logic is not right

From
Tom Lane
Date:
"John Hansen" <john@geeknet.com.au> writes:
>> I'll look at problem after GiST concurrency. Fixing 
>> rtree_gist is bug a fix, not a new feature, so I'm not 
>> limited by 1 July.

> Wont fixing rtree(_gist) require initdb, since the behaviour of the
> operators will change?

Possibly, but we never guarantee no initdb until final release anyway.

Teodor is right that this is a bug fix and so can be postponed on its
own terms.  But moving rtree_gist into the core looks like a feature
change to me, so if that's going to happen it has to happen before
1 July.  It would be a lot easier to sell that if it gave the right
answers ;-)
        regards, tom lane


Re: GiST rtree logic is not right

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> It would be a lot easier to sell that if it gave the right answers ;-)

Of course given that the real rtree index gives the wrong answers perhaps
moving rtree_gist into core (after fixing this) is just a bug fix :)

-- 
greg