Thread: index on a box

index on a box

From
Dustin Sallings
Date:
    I've got a site with a ton of geometric data and I'm using a
little of postgresql's geometrical types.  I've got very large polygons,
up to 12kilopoints or so, in individual rows with floats for my x and y
values.  I'm calculating a box that contains all of my points and am using
the @ operator to find my polygons by a point.

    I was wondering, however, if there's a way I can use an index to
avoid table scanning for this.  The relevant parts of my sample table look
like this:

create table tmp (
    id integer,
    name text,
    b box
)

and I added the following index:

create index tmp_bybox on tmp using rtree(b);

I've got 33,507 rows currently (still loading data).

Here are the problems I'm having:

explain select name from tmp where point(-121,37) @ b order by area(b);

Sort  (cost=2428.02..2428.02 rows=16754 width=44)
  ->  Seq Scan on tmp  (cost=0.00..969.84 rows=16754 width=44)

    Any ideas that might help me speed things up?

--
SPY                      My girlfriend asked me which one I like better.
pub  1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
|    Key fingerprint =  87 02 57 08 02 D0 DA D6  C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________


Re: index on a box

From
Tom Lane
Date:
Dustin Sallings <dustin@spy.net> writes:
>     I was wondering, however, if there's a way I can use an index to
> avoid table scanning for this.

If you say "SET enable_seqscan TO off", and repeat the EXPLAIN, do you
get an indexscan plan?

I don't recommend doing such a SET for production purposes, but if this
works then the problem is just inaccurate selectivity/cost estimation.
I see that the on_pb operator has no selectivity estimator defined at
all :-( ... as a quick hack, try setting its oprrest and oprjoin to be
areasel and areajoinsel.

            regards, tom lane

Re: index on a box

From
Dustin Sallings
Date:
Around 11:08 on Jun 21, 2001, Tom Lane said:

# If you say "SET enable_seqscan TO off", and repeat the EXPLAIN, do you
# get an indexscan plan?

Seq Scan on tmp  (cost=100000000.00..100002500.84 rows=30434 width=92)

# I don't recommend doing such a SET for production purposes, but if
# this works then the problem is just inaccurate selectivity/cost
# estimation. I see that the on_pb operator has no selectivity estimator
# defined at all :-( ... as a quick hack, try setting its oprrest and
# oprjoin to be areasel and areajoinsel.

    I'm not sure what you just said.  :)

--
SPY                      My girlfriend asked me which one I like better.
pub  1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
|    Key fingerprint =  87 02 57 08 02 D0 DA D6  C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________


Re: index on a box

From
Tom Lane
Date:
Dustin Sallings <dustin+postgres@spy.net> writes:
> Around 11:08 on Jun 21, 2001, Tom Lane said:
> # If you say "SET enable_seqscan TO off", and repeat the EXPLAIN, do you
> # get an indexscan plan?

> Seq Scan on tmp  (cost=100000000.00..100002500.84 rows=30434 width=92)

Drat.

>     I'm not sure what you just said.  :)

Never mind, it wouldn't work anyway.

The problem is that "point @ box" isn't an rtree-indexable operation.
You could use "box @ box" instead, where the lefthand box is a zero-area
box with all corners at the point of interest.  Crufty, but unless you
want to go in and teach rtree about a new operator type...

            regards, tom lane

Re: index on a box

From
Dustin Sallings
Date:
Around 17:21 on Jun 21, 2001, Tom Lane said:

# Never mind, it wouldn't work anyway.
#
# The problem is that "point @ box" isn't an rtree-indexable operation.
# You could use "box @ box" instead, where the lefthand box is a
# zero-area box with all corners at the point of interest.  Crufty, but
# unless you want to go in and teach rtree about a new operator type...

select * from tmp where box(point(-121,37),point(-121,37)) @ b;

Index Scan using tmp_bybox on tmp  (cost=0.00..238.59 rows=61 width=92)

    Hmm...  That's interesting.  It seems that point @ box would be
more generally useful than box @ box.  Then again, I've only used this for
this one particular task I'm doing right now.  :)  It seems that rtree
already knows how to do what I'm trying to do if all I've got to do is
make a box containing the point twice to get the lookup to be fast.

    If anyone's interested in what I'm doing with this, you can see it
in action here:

http://bleu.west.spy.net/~dustin/geo/pointinfoform.jsp

    I've loaded about 60k polygons (consisting of a total of about
seven million points) describing the shape of various geographical areas
in the United States.  I've got a table with the descriptions of the
polygons and box boundaries, then another table with the actual ordered
polygon data.  I do a box match on the first table to get a list of
candidates, then examine them all in more detail with a point-in-polygon
algorithm in my application.  It's currently pretty slow because I'm don't
actually have a box column on my first table, just the boundaries, which I
cast to a box and use point @ box(point(),point()) with a table scan.  It
looks like, when the index works, it'll be as fast as it was when I had
very little data in the tables again.  :)

    Thanks for the help!

--
SPY                      My girlfriend asked me which one I like better.
pub  1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
|    Key fingerprint =  87 02 57 08 02 D0 DA D6  C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________


Re: index on a box

From
Tom Lane
Date:
Dustin Sallings <dustin+postgres@spy.net> writes:
> create function box(point) returns box as
>     'select box($1, $1)'
>     language 'sql';

> misc=# explain select * from tmp where box(point(-121, 37)) @ b;
> NOTICE:  QUERY PLAN:

> Seq Scan on tmp  (cost=0.00..2653.01 rows=61 width=92)

You'd better declare the function as 'iscachable'.  As is, the planner
doesn't trust it to return a constant.

            regards, tom lane

Re: index on a box

From
Dustin Sallings
Date:
Around 15:14 on Jun 21, 2001, Dustin Sallings said:

# select * from tmp where box(point(-121,37),point(-121,37)) @ b;
#
# Index Scan using tmp_bybox on tmp  (cost=0.00..238.59 rows=61 width=92)

    Ugh, any idea here?

create function box(point) returns box as
    'select box($1, $1)'
    language 'sql';

misc=# explain select * from tmp where box(point(-121, 37)) @ b;
NOTICE:  QUERY PLAN:

Seq Scan on tmp  (cost=0.00..2653.01 rows=61 width=92)

misc=# explain select * from tmp
misc-#    where box(point(-121,37),point(-121,37)) @ b;
NOTICE:  QUERY PLAN:

Index Scan using tmp_bybox on tmp  (cost=0.00..238.59 rows=61 width=92)

misc=# select box(point(-121, 37));
         box
---------------------
 (-121,37),(-121,37)

misc=# select box(point(-121,37),point(-121,37));
         box
---------------------
 (-121,37),(-121,37)

misc=# select 1 where box(point(-121,37),point(-121,37))=box(point(-121,
37));
 ?column?
----------
        1


--
SPY                      My girlfriend asked me which one I like better.
pub  1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
|    Key fingerprint =  87 02 57 08 02 D0 DA D6  C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________


Re: index on a box

From
Dustin Sallings
Date:
Around 18:26 on Jun 21, 2001, Tom Lane said:

# You'd better declare the function as 'iscachable'.  As is, the planner
# doesn't trust it to return a constant.

    Got it!  Tried a few variations on the create until I got this:

create function box(point) returns box as
    'select box($1, $1)'
    language 'sql'
    with (iscachable);

misc=# explain select * from tmp where box(point(-121, 37)) @ b;
NOTICE:  QUERY PLAN:

Index Scan using tmp_bybox on tmp  (cost=0.00..238.59 rows=61 width=92)

    Thanks!  :)

    (oh, and would it be reasonable to list at least some of the
possible attributes in psql?)

--
SPY                      My girlfriend asked me which one I like better.
pub  1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
|    Key fingerprint =  87 02 57 08 02 D0 DA D6  C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________


Re: Re: index on a box

From
ÀîÁ¢ÐÂ
Date:
Tom Lane,
   Does anyone who knows a user's information is storeed in what pgSQL's system table?  Thank all!






Stored user information

From
GH
Date:
On Tue, Jun 26, 2001 at 10:35:00PM +0800, some SMTP stream spewed forth:
>    Does anyone who knows a user's information is storeed in what pgSQL's system table?  Thank all!
>

You probably want pg_shadow, accessible by a database superuser.


gh


Re: Re: index on a box

From
"Thalis A. Kalfigopoulos"
Date:
On Tue, 26 Jun 2001, [ISO-8859-1] ������ wrote:

> Tom Lane��
>    Does anyone who knows a user's information is storeed in what pgSQL's system table?  Thank all!
>

pg_user
(rather obvious i guess)

t.

>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>


Re: Re: index on a box

From
ÀîÁ¢ÐÂ
Date:
Tom Lane,
   Does anyone who knows a user's information is storeed in what pgSQL's system table?  Thank all!



Re: Re: index on a box

From
"omid omoomi"
Date:
select * from pg_users;

>From: ������ <lilixin@cqu.edu.cn>
>Reply-To: lilixin@cqu.edu.cn
>To: Tom Lane <tgl@sss.pgh.pa.us>, Dustin Sallings <dustin+postgres@spy.net>
>CC: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
>Subject: Re: Re: [GENERAL] index on a box
>Date: Sun, 08 Jul 2001 21:02:32 +0800
>
>Tom Lane��
>    Does anyone who knows a user's information is storeed in what pgSQL's
>system table?  Thank all!
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 5: Have you checked our extensive FAQ?
>
>http://www.postgresql.org/users-lounge/docs/faq.html

_________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.