Thread: Index on points
Hello. If I have a table like this create table fleet ( ship_id integer, location point); and fill it with a lot of ships and their locations and then want to create an index on this to speed up operations on finding ships within a certain region (let's say its a rectangular region), how do I do this? I tried: CREATE INDEX my_index ON fleet USING gist ( box(location,location)); ? and that command worked, but I have no idea why and what I have just done. The docs are a little to advanced. How should one do it?
On Thu, 2010-09-23 at 12:45 +0200, A B wrote: > Hello. > > If I have a table like this > > create table fleet ( ship_id integer, location point); I recommend taking a look into PostGIS: http://postgis.org Regards, Jeff Davis
But how do I do it without Postgis? Right now I have a table fleet (id bigserial primary key, location point); and I have filled it with random data and then tried selecting explain analyze select count(1) from fleet where location <@ box(point(300,300),point(600,600)); to gather runningtime data, and then I have created and index (or I think I have atleast) CREATE INDEX fleet_location ON fleet USING GIST ( box(location,location) ); but I still get almost exaclty the same run time of the query explain analyze select count(1) from fleet where location <@ box(point(300,300),point(600,600)); Aggregate (cost=100597.89..100597.90 rows=1 width=0) (actual time=706.604..706.605 rows=1 loops=1) -> Seq Scan on fleet (cost=0.00..94347.90 rows=2499996 width=0) (actual time=0.252..701.624 rows=4533 loops=1) Filter: (location <@ '(600,600),(300,300)'::box) Total runtime: 706.675 ms I have then tried to avoid the seq. scan by set enable_seqscan=off; set seq_page_cost=4000; (which would make it more expensive to scan, wouldn't it?) and the result is the same Aggregate (cost=10127460749.89..10127460749.90 rows=1 width=0) (actual time=799.077..799.078 rows=1 loops=1) -> Seq Scan on fleet (cost=10000000000.00..10127454499.90 rows=2499996 width=0) (actual time=0.221..792.374 rows=4533 loops=1) Filter: (location <@ '(600,600),(300,300)'::box) Total runtime: 799.117 ms So how do I create an index that gets used? (I've run the queries a thousand times to make sure the total runtime is consistent, and it is) 2010/9/23 Jeff Davis <pgsql@j-davis.com>: > On Thu, 2010-09-23 at 12:45 +0200, A B wrote: >> Hello. >> >> If I have a table like this >> >> create table fleet ( ship_id integer, location point); > > I recommend taking a look into PostGIS: http://postgis.org > > Regards, > Jeff Davis > >
On 23/09/10 11:45, A B wrote: > Hello. > > If I have a table like this > > create table fleet ( ship_id integer, location point); > > and fill it with a lot of ships and their locations and then want to > create an index on this to speed up operations on finding ships within > a certain region (let's say its a rectangular region), how do I do > this? > > I tried: > > CREATE INDEX my_index ON fleet USING gist ( box(location,location)); ? That's the idea, but you'll need to be careful about how you're searching against it. Remember, the index is on a box based on the location, not the point location itself. CREATE TABLE fleet (ship int, locn point); INSERT INTO fleet SELECT (x*1000 + y), point(x,y) FROM generate_series(0,999) x, generate_series(0,999) y; CREATE INDEX fleet_locn_idx ON fleet USING gist( box(locn,locn) ); ANALYSE fleet; EXPLAIN ANALYSE SELECT count(*) FROM fleet WHERE box(locn,locn) <@ box '(10,10),(20,20)'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=2654.84..2654.85 rows=1 width=0) (actual time=4.611..4.612 rows=1 loops=1) -> Bitmap Heap Scan on fleet (cost=44.34..2652.33 rows=1000 width=0) (actual time=4.344..4.491 rows=121 loops=1) Recheck Cond: (box(locn, locn) <@ '(20,20),(10,10)'::box) -> Bitmap Index Scan on fleet_locn_idx (cost=0.00..44.09 rows=1000 width=0) (actual time=4.311..4.311 rows=121 loops=1) Index Cond: (box(locn, locn) <@ '(20,20),(10,10)'::box) Total runtime: 4.694 ms (6 rows) DROP INDEX fleet_locn_idx; EXPLAIN ANALYSE SELECT count(*) FROM fleet WHERE box(locn,locn) <@ box '(10,10),(20,20)'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Aggregate (cost=20885.50..20885.51 rows=1 width=0) (actual time=551.756..551.757 rows=1 loops=1) -> Seq Scan on fleet (cost=0.00..20883.00 rows=1000 width=0) (actual time=5.142..551.624 rows=121 loops=1) Filter: (box(locn, locn) <@ '(20,20),(10,10)'::box) Total runtime: 551.831 ms (4 rows) -- Richard Huxton Archonet Ltd
On Sat, 2010-09-25 at 09:18 +0200, A B wrote: > fleet (id bigserial primary key, location point); > ... > CREATE INDEX fleet_location ON fleet USING GIST ( box(location,location) ); > > but I still get almost exaclty the same run time of the query > > > explain analyze select count(1) from fleet where location <@ > box(point(300,300),point(600,600)); > > Aggregate (cost=100597.89..100597.90 rows=1 width=0) (actual > time=706.604..706.605 rows=1 loops=1) > -> Seq Scan on fleet (cost=0.00..94347.90 rows=2499996 width=0) > (actual time=0.252..701.624 rows=4533 loops=1) > Filter: (location <@ '(600,600),(300,300)'::box) > Total runtime: 706.675 ms > ... > So how do I create an index that gets used? > (I've run the queries a thousand times to make sure the total runtime > is consistent, and it is) Looks like there's no entry in the box_ops opclass for point <@ box, but there is an entry for box <@ box. So, try: explain analyze select count(1) from fleet where box(location,location) <@ box(point(300,300),point(600,600)); There's no reason that there couldn't be a point <@ box operator in the opclass, but nobody really uses these geometric types that come with core postgres (at least, not that I can tell). PostGIS is a dedicated project that has removed most of the justification for trying to improve the built-in geometric types. However, keep in mind that PostGIS is under a different license (GPL, I think). Regards, Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > There's no reason that there couldn't be a point <@ box operator in the > opclass, but nobody really uses these geometric types that come with > core postgres (at least, not that I can tell). Actually, as of 9.0 there is a point_ops opclass for GIST, with these indexable operators: >^(point,point) <<(point,point) >>(point,point) <^(point,point) ~=(point,point) <@(point,box) <@(point,polygon) <@(point,circle) I agree that for any more than light-duty geometric work, you ought to look at PostGIS. regards, tom lane
2010/9/25 Tom Lane <tgl@sss.pgh.pa.us>: > Jeff Davis <pgsql@j-davis.com> writes: >> There's no reason that there couldn't be a point <@ box operator in the >> opclass, but nobody really uses these geometric types that come with >> core postgres (at least, not that I can tell). > > Actually, as of 9.0 there is a point_ops opclass for GIST, with these > indexable operators: > > >^(point,point) > <<(point,point) > >>(point,point) > <^(point,point) > ~=(point,point) > <@(point,box) > <@(point,polygon) > <@(point,circle) > > I agree that for any more than light-duty geometric work, you ought > to look at PostGIS. > > regards, tom lane Thank you Jeff for your reply, that solved the problem. Tom, would you like to elaborate on that PostGIS should be used for other than "light-duty" geometric work? Is it speed, accuracy or features that is the difference? For this project I think <@(point,box) is sufficient. What would it take to motivate a switch to PostGIS for that? Best wishes.
Sorry, Gmail made med confused, my biggest "thank you" was to Richard Huxton, who showed me code that worked. 2010/9/26 A B <gentosaker@gmail.com>: > 2010/9/25 Tom Lane <tgl@sss.pgh.pa.us>: >> Jeff Davis <pgsql@j-davis.com> writes: >>> There's no reason that there couldn't be a point <@ box operator in the >>> opclass, but nobody really uses these geometric types that come with >>> core postgres (at least, not that I can tell). >> >> Actually, as of 9.0 there is a point_ops opclass for GIST, with these >> indexable operators: >> >> >^(point,point) >> <<(point,point) >> >>(point,point) >> <^(point,point) >> ~=(point,point) >> <@(point,box) >> <@(point,polygon) >> <@(point,circle) >> >> I agree that for any more than light-duty geometric work, you ought >> to look at PostGIS. >> >> regards, tom lane > > Thank you Jeff for your reply, that solved the problem. > > Tom, would you like to elaborate on that PostGIS should be used for > other than "light-duty" geometric work? > Is it speed, accuracy or features that is the difference? > For this project I think <@(point,box) is sufficient. What would it > take to motivate a switch to PostGIS for that? > > Best wishes. >