Thread: index on a box
I've got a site with a ton of geometric data and I'm using a little of postgresql's geometrical types. I've got very large polygons, up to 12kilopoints or so, in individual rows with floats for my x and y values. I'm calculating a box that contains all of my points and am using the @ operator to find my polygons by a point. I was wondering, however, if there's a way I can use an index to avoid table scanning for this. The relevant parts of my sample table look like this: create table tmp ( id integer, name text, b box ) and I added the following index: create index tmp_bybox on tmp using rtree(b); I've got 33,507 rows currently (still loading data). Here are the problems I'm having: explain select name from tmp where point(-121,37) @ b order by area(b); Sort (cost=2428.02..2428.02 rows=16754 width=44) -> Seq Scan on tmp (cost=0.00..969.84 rows=16754 width=44) Any ideas that might help me speed things up? -- SPY My girlfriend asked me which one I like better. pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net> | Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE L_______________________ I hope the answer won't upset her. ____________
Dustin Sallings <dustin@spy.net> writes: > I was wondering, however, if there's a way I can use an index to > avoid table scanning for this. If you say "SET enable_seqscan TO off", and repeat the EXPLAIN, do you get an indexscan plan? I don't recommend doing such a SET for production purposes, but if this works then the problem is just inaccurate selectivity/cost estimation. I see that the on_pb operator has no selectivity estimator defined at all :-( ... as a quick hack, try setting its oprrest and oprjoin to be areasel and areajoinsel. regards, tom lane
Around 11:08 on Jun 21, 2001, Tom Lane said: # If you say "SET enable_seqscan TO off", and repeat the EXPLAIN, do you # get an indexscan plan? Seq Scan on tmp (cost=100000000.00..100002500.84 rows=30434 width=92) # I don't recommend doing such a SET for production purposes, but if # this works then the problem is just inaccurate selectivity/cost # estimation. I see that the on_pb operator has no selectivity estimator # defined at all :-( ... as a quick hack, try setting its oprrest and # oprjoin to be areasel and areajoinsel. I'm not sure what you just said. :) -- SPY My girlfriend asked me which one I like better. pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net> | Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE L_______________________ I hope the answer won't upset her. ____________
Dustin Sallings <dustin+postgres@spy.net> writes: > Around 11:08 on Jun 21, 2001, Tom Lane said: > # If you say "SET enable_seqscan TO off", and repeat the EXPLAIN, do you > # get an indexscan plan? > Seq Scan on tmp (cost=100000000.00..100002500.84 rows=30434 width=92) Drat. > I'm not sure what you just said. :) Never mind, it wouldn't work anyway. The problem is that "point @ box" isn't an rtree-indexable operation. You could use "box @ box" instead, where the lefthand box is a zero-area box with all corners at the point of interest. Crufty, but unless you want to go in and teach rtree about a new operator type... regards, tom lane
Around 17:21 on Jun 21, 2001, Tom Lane said: # Never mind, it wouldn't work anyway. # # The problem is that "point @ box" isn't an rtree-indexable operation. # You could use "box @ box" instead, where the lefthand box is a # zero-area box with all corners at the point of interest. Crufty, but # unless you want to go in and teach rtree about a new operator type... select * from tmp where box(point(-121,37),point(-121,37)) @ b; Index Scan using tmp_bybox on tmp (cost=0.00..238.59 rows=61 width=92) Hmm... That's interesting. It seems that point @ box would be more generally useful than box @ box. Then again, I've only used this for this one particular task I'm doing right now. :) It seems that rtree already knows how to do what I'm trying to do if all I've got to do is make a box containing the point twice to get the lookup to be fast. If anyone's interested in what I'm doing with this, you can see it in action here: http://bleu.west.spy.net/~dustin/geo/pointinfoform.jsp I've loaded about 60k polygons (consisting of a total of about seven million points) describing the shape of various geographical areas in the United States. I've got a table with the descriptions of the polygons and box boundaries, then another table with the actual ordered polygon data. I do a box match on the first table to get a list of candidates, then examine them all in more detail with a point-in-polygon algorithm in my application. It's currently pretty slow because I'm don't actually have a box column on my first table, just the boundaries, which I cast to a box and use point @ box(point(),point()) with a table scan. It looks like, when the index works, it'll be as fast as it was when I had very little data in the tables again. :) Thanks for the help! -- SPY My girlfriend asked me which one I like better. pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net> | Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE L_______________________ I hope the answer won't upset her. ____________
Dustin Sallings <dustin+postgres@spy.net> writes: > create function box(point) returns box as > 'select box($1, $1)' > language 'sql'; > misc=# explain select * from tmp where box(point(-121, 37)) @ b; > NOTICE: QUERY PLAN: > Seq Scan on tmp (cost=0.00..2653.01 rows=61 width=92) You'd better declare the function as 'iscachable'. As is, the planner doesn't trust it to return a constant. regards, tom lane
Around 15:14 on Jun 21, 2001, Dustin Sallings said: # select * from tmp where box(point(-121,37),point(-121,37)) @ b; # # Index Scan using tmp_bybox on tmp (cost=0.00..238.59 rows=61 width=92) Ugh, any idea here? create function box(point) returns box as 'select box($1, $1)' language 'sql'; misc=# explain select * from tmp where box(point(-121, 37)) @ b; NOTICE: QUERY PLAN: Seq Scan on tmp (cost=0.00..2653.01 rows=61 width=92) misc=# explain select * from tmp misc-# where box(point(-121,37),point(-121,37)) @ b; NOTICE: QUERY PLAN: Index Scan using tmp_bybox on tmp (cost=0.00..238.59 rows=61 width=92) misc=# select box(point(-121, 37)); box --------------------- (-121,37),(-121,37) misc=# select box(point(-121,37),point(-121,37)); box --------------------- (-121,37),(-121,37) misc=# select 1 where box(point(-121,37),point(-121,37))=box(point(-121, 37)); ?column? ---------- 1 -- SPY My girlfriend asked me which one I like better. pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net> | Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE L_______________________ I hope the answer won't upset her. ____________
Around 18:26 on Jun 21, 2001, Tom Lane said: # You'd better declare the function as 'iscachable'. As is, the planner # doesn't trust it to return a constant. Got it! Tried a few variations on the create until I got this: create function box(point) returns box as 'select box($1, $1)' language 'sql' with (iscachable); misc=# explain select * from tmp where box(point(-121, 37)) @ b; NOTICE: QUERY PLAN: Index Scan using tmp_bybox on tmp (cost=0.00..238.59 rows=61 width=92) Thanks! :) (oh, and would it be reasonable to list at least some of the possible attributes in psql?) -- SPY My girlfriend asked me which one I like better. pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net> | Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE L_______________________ I hope the answer won't upset her. ____________
Tom Lane, Does anyone who knows a user's information is storeed in what pgSQL's system table? Thank all!
On Tue, Jun 26, 2001 at 10:35:00PM +0800, some SMTP stream spewed forth: > Does anyone who knows a user's information is storeed in what pgSQL's system table? Thank all! > You probably want pg_shadow, accessible by a database superuser. gh
On Tue, 26 Jun 2001, [ISO-8859-1] ������ wrote: > Tom Lane�� > Does anyone who knows a user's information is storeed in what pgSQL's system table? Thank all! > pg_user (rather obvious i guess) t. > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html >
Tom Lane, Does anyone who knows a user's information is storeed in what pgSQL's system table? Thank all!
select * from pg_users; >From: ������ <lilixin@cqu.edu.cn> >Reply-To: lilixin@cqu.edu.cn >To: Tom Lane <tgl@sss.pgh.pa.us>, Dustin Sallings <dustin+postgres@spy.net> >CC: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org> >Subject: Re: Re: [GENERAL] index on a box >Date: Sun, 08 Jul 2001 21:02:32 +0800 > >Tom Lane�� > Does anyone who knows a user's information is storeed in what pgSQL's >system table? Thank all! > > > >---------------------------(end of broadcast)--------------------------- >TIP 5: Have you checked our extensive FAQ? > >http://www.postgresql.org/users-lounge/docs/faq.html _________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.