Thread: Index on points

Index on points

From
A B
Date:
Hello.

If I have a table like this

create table fleet ( ship_id   integer,  location point);

and fill it with a lot of ships and their locations and then want to
create an index on this to speed up operations on finding ships within
a certain region (let's say its a rectangular region), how do I do
this?

I tried:

CREATE INDEX my_index  ON fleet USING gist ( box(location,location)); ?

and that command worked, but I have no idea why and what I have just
done. The docs are a little to advanced.

How should one do it?

Re: Index on points

From
Jeff Davis
Date:
On Thu, 2010-09-23 at 12:45 +0200, A B wrote:
> Hello.
>
> If I have a table like this
>
> create table fleet ( ship_id   integer,  location point);

I recommend taking a look into PostGIS: http://postgis.org

Regards,
    Jeff Davis


Re: Index on points

From
A B
Date:
But how do I do it without Postgis?

Right now I have a table
fleet (id bigserial primary key, location point);

and I have filled it with random data and then tried selecting

explain analyze select count(1) from fleet where location <@
box(point(300,300),point(600,600));

to gather runningtime data, and then I have created and index (or I
think  I have atleast)

CREATE INDEX fleet_location ON fleet USING GIST ( box(location,location) );

but I still get almost exaclty the same run time of the query


explain analyze select count(1) from fleet where location <@
box(point(300,300),point(600,600));

Aggregate  (cost=100597.89..100597.90 rows=1 width=0) (actual
time=706.604..706.605 rows=1 loops=1)
   ->  Seq Scan on fleet  (cost=0.00..94347.90 rows=2499996 width=0)
(actual time=0.252..701.624 rows=4533 loops=1)
         Filter: (location <@ '(600,600),(300,300)'::box)
 Total runtime: 706.675 ms

I have then tried to avoid the seq. scan by

set enable_seqscan=off;
set seq_page_cost=4000; (which would make it more expensive to scan,
wouldn't it?)

and the result is the same

Aggregate  (cost=10127460749.89..10127460749.90 rows=1 width=0)
(actual time=799.077..799.078 rows=1 loops=1)
   ->  Seq Scan on fleet  (cost=10000000000.00..10127454499.90
rows=2499996 width=0) (actual time=0.221..792.374 rows=4533 loops=1)
         Filter: (location <@ '(600,600),(300,300)'::box)
 Total runtime: 799.117 ms

So how do I create an index that gets used?
(I've run the queries a thousand times to make sure the total runtime
is consistent, and it is)

2010/9/23 Jeff Davis <pgsql@j-davis.com>:
> On Thu, 2010-09-23 at 12:45 +0200, A B wrote:
>> Hello.
>>
>> If I have a table like this
>>
>> create table fleet ( ship_id   integer,  location point);
>
> I recommend taking a look into PostGIS: http://postgis.org
>
> Regards,
>        Jeff Davis
>
>

Re: Index on points

From
Richard Huxton
Date:
On 23/09/10 11:45, A B wrote:
> Hello.
>
> If I have a table like this
>
> create table fleet ( ship_id   integer,  location point);
>
> and fill it with a lot of ships and their locations and then want to
> create an index on this to speed up operations on finding ships within
> a certain region (let's say its a rectangular region), how do I do
> this?
>
> I tried:
>
> CREATE INDEX my_index  ON fleet USING gist ( box(location,location)); ?

That's the idea, but you'll need to be careful about how you're
searching against it. Remember, the index is on a box based on the
location, not the point location itself.

CREATE TABLE fleet (ship int, locn point);

INSERT INTO fleet SELECT (x*1000 + y), point(x,y)
FROM generate_series(0,999) x, generate_series(0,999) y;

CREATE INDEX fleet_locn_idx ON fleet USING gist( box(locn,locn) );
ANALYSE fleet;

EXPLAIN ANALYSE SELECT count(*) FROM fleet
WHERE box(locn,locn) <@ box '(10,10),(20,20)';
                                                             QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=2654.84..2654.85 rows=1 width=0) (actual
time=4.611..4.612 rows=1 loops=1)
    ->  Bitmap Heap Scan on fleet  (cost=44.34..2652.33 rows=1000
width=0) (actual time=4.344..4.491 rows=121 loops=1)
          Recheck Cond: (box(locn, locn) <@ '(20,20),(10,10)'::box)
          ->  Bitmap Index Scan on fleet_locn_idx  (cost=0.00..44.09
rows=1000 width=0) (actual time=4.311..4.311 rows=121 loops=1)
                Index Cond: (box(locn, locn) <@ '(20,20),(10,10)'::box)
  Total runtime: 4.694 ms
(6 rows)

DROP INDEX fleet_locn_idx;

EXPLAIN ANALYSE SELECT count(*) FROM fleet WHERE box(locn,locn) <@ box
'(10,10),(20,20)';
                                                    QUERY PLAN

----------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=20885.50..20885.51 rows=1 width=0) (actual
time=551.756..551.757 rows=1 loops=1)
    ->  Seq Scan on fleet  (cost=0.00..20883.00 rows=1000 width=0)
(actual time=5.142..551.624 rows=121 loops=1)
          Filter: (box(locn, locn) <@ '(20,20),(10,10)'::box)
  Total runtime: 551.831 ms
(4 rows)

--
   Richard Huxton
   Archonet Ltd

Re: Index on points

From
Jeff Davis
Date:
On Sat, 2010-09-25 at 09:18 +0200, A B wrote:
> fleet (id bigserial primary key, location point);
>
...
> CREATE INDEX fleet_location ON fleet USING GIST ( box(location,location) );
>
> but I still get almost exaclty the same run time of the query
>
>
> explain analyze select count(1) from fleet where location <@
> box(point(300,300),point(600,600));
>
> Aggregate  (cost=100597.89..100597.90 rows=1 width=0) (actual
> time=706.604..706.605 rows=1 loops=1)
>    ->  Seq Scan on fleet  (cost=0.00..94347.90 rows=2499996 width=0)
> (actual time=0.252..701.624 rows=4533 loops=1)
>          Filter: (location <@ '(600,600),(300,300)'::box)
>  Total runtime: 706.675 ms
>

...

> So how do I create an index that gets used?
> (I've run the queries a thousand times to make sure the total runtime
> is consistent, and it is)

Looks like there's no entry in the box_ops opclass for point <@ box, but
there is an entry for box <@ box.

So, try:

explain analyze
  select count(1) from fleet
  where box(location,location) <@ box(point(300,300),point(600,600));

There's no reason that there couldn't be a point <@ box operator in the
opclass, but nobody really uses these geometric types that come with
core postgres (at least, not that I can tell). PostGIS is a dedicated
project that has removed most of the justification for trying to improve
the built-in geometric types. However, keep in mind that PostGIS is
under a different license (GPL, I think).

Regards,
    Jeff Davis


Re: Index on points

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> There's no reason that there couldn't be a point <@ box operator in the
> opclass, but nobody really uses these geometric types that come with
> core postgres (at least, not that I can tell).

Actually, as of 9.0 there is a point_ops opclass for GIST, with these
indexable operators:

 >^(point,point)
 <<(point,point)
 >>(point,point)
 <^(point,point)
 ~=(point,point)
 <@(point,box)
 <@(point,polygon)
 <@(point,circle)

I agree that for any more than light-duty geometric work, you ought
to look at PostGIS.

            regards, tom lane

Re: Index on points

From
A B
Date:
2010/9/25 Tom Lane <tgl@sss.pgh.pa.us>:
> Jeff Davis <pgsql@j-davis.com> writes:
>> There's no reason that there couldn't be a point <@ box operator in the
>> opclass, but nobody really uses these geometric types that come with
>> core postgres (at least, not that I can tell).
>
> Actually, as of 9.0 there is a point_ops opclass for GIST, with these
> indexable operators:
>
>  >^(point,point)
>  <<(point,point)
>  >>(point,point)
>  <^(point,point)
>  ~=(point,point)
>  <@(point,box)
>  <@(point,polygon)
>  <@(point,circle)
>
> I agree that for any more than light-duty geometric work, you ought
> to look at PostGIS.
>
>                        regards, tom lane

Thank you Jeff for your reply, that solved the problem.

Tom, would you like to elaborate on that  PostGIS should be used for
other than "light-duty" geometric work?
Is it speed, accuracy or features that is the difference?
For this project I think <@(point,box) is sufficient. What would it
take to motivate a switch to PostGIS for that?

Best wishes.

Re: Index on points

From
A B
Date:
Sorry, Gmail made med confused, my biggest "thank you" was to Richard
Huxton, who showed me code that worked.

2010/9/26 A B <gentosaker@gmail.com>:
> 2010/9/25 Tom Lane <tgl@sss.pgh.pa.us>:
>> Jeff Davis <pgsql@j-davis.com> writes:
>>> There's no reason that there couldn't be a point <@ box operator in the
>>> opclass, but nobody really uses these geometric types that come with
>>> core postgres (at least, not that I can tell).
>>
>> Actually, as of 9.0 there is a point_ops opclass for GIST, with these
>> indexable operators:
>>
>>  >^(point,point)
>>  <<(point,point)
>>  >>(point,point)
>>  <^(point,point)
>>  ~=(point,point)
>>  <@(point,box)
>>  <@(point,polygon)
>>  <@(point,circle)
>>
>> I agree that for any more than light-duty geometric work, you ought
>> to look at PostGIS.
>>
>>                        regards, tom lane
>
> Thank you Jeff for your reply, that solved the problem.
>
> Tom, would you like to elaborate on that  PostGIS should be used for
> other than "light-duty" geometric work?
> Is it speed, accuracy or features that is the difference?
> For this project I think <@(point,box) is sufficient. What would it
> take to motivate a switch to PostGIS for that?
>
> Best wishes.
>