Thread: R-tree and start/end queries

R-tree and start/end queries

From
Sean Davis
Date:
I have a table like:

Create table gf (   pk    serial,   start int,   end   int,   gf    varchar
);

I want to do queries along the lines of:

"find all gf that overlap with (10000,20000)" or
"find all gf that overlap with each other"

And others.  I have read over the documentation, but I still remain unclear
about how to implement R-tree indexing in this situation.  Any suggestions?

Thanks,
Sean



Re: R-tree and start/end queries

From
Bruno Wolff III
Date:
On Wed, Sep 21, 2005 at 13:52:40 -0400, Sean Davis <sdavis2@mail.nih.gov> wrote:
> I have a table like:
> 
> Create table gf (
>     pk    serial,
>     start int,
>     end   int,
>     gf    varchar
> );
> 
> I want to do queries along the lines of:
> 
> "find all gf that overlap with (10000,20000)" or
> "find all gf that overlap with each other"
> 
> And others.  I have read over the documentation, but I still remain unclear
> about how to implement R-tree indexing in this situation.  Any suggestions?

There is a built in type for line segments that uses floating point. That
will probably be usable by you directly unless the integers can can large
enough that precision is a problem. There is an overlaps operator for the
geometric types that could be used to answer your sample questions.


Re: R-tree and start/end queries

From
Chris Mungall
Date:
On Wed, 21 Sep 2005, Sean Davis wrote:

> I have a table like:
>
> Create table gf (
>     pk    serial,
>     start int,
>     end   int,
>     gf    varchar
> );
>
> I want to do queries along the lines of:
>
> "find all gf that overlap with (10000,20000)" or
> "find all gf that overlap with each other"
>
> And others.  I have read over the documentation, but I still remain unclear
> about how to implement R-tree indexing in this situation.  Any suggestions?

Hi Sean

I'm guessing that this is for some kind of genome database, yep?

You may want to look at the chado database which has a growing library of
functions for this sort of thing; www.gmod.org/schema

Here is the code for doing range interval functions; our featureloc is
equivalent to your "gf" (though we separate the entity from the entity
being located). Our fmin and fmax may be equivalent to your start and end
above (unless you indicate directionality with start>end in which case the
intersection functions get a bit trickier). Our feature_id is probably
equivalent to your gf column.

We use the builtin pg types "point" and "box" and make an RTREE index over
this:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--
-- functions operating on featureloc ranges
--

-- create a point
CREATE OR REPLACE FUNCTION create_point (int, int) RETURNS point AS'SELECT point ($1, $2)'
LANGUAGE 'sql';

-- create a range box
-- (make this immutable so we can index it)
CREATE OR REPLACE FUNCTION boxrange (int, int) RETURNS box AS'SELECT box (create_point(0, $1),
create_point($2,500000000))'
LANGUAGE 'sql' IMMUTABLE;

-- create a query box
CREATE OR REPLACE FUNCTION boxquery (int, int) RETURNS box AS'SELECT box (create_point($1, $2), create_point($1, $2))'
LANGUAGE 'sql' IMMUTABLE;

--functional index that depends on the above functions
CREATE INDEX binloc_boxrange ON featureloc USING RTREE (boxrange(fmin, fmax));


CREATE OR REPLACE FUNCTION featureloc_slice(int, int) RETURNS setof featureloc AS 'SELECT * from featureloc where
boxquery($1,$2) @ boxrange(fmin,fmax)'
 
LANGUAGE 'sql';

CREATE OR REPLACE FUNCTION featureloc_slice(varchar, int, int) RETURNS setof featureloc AS 'SELECT featureloc.*  FROM
featureloc INNER JOIN feature AS srcf ON (srcf.feature_id = featureloc.srcfeature_id)  WHERE boxquery($2, $3) @
boxrange(fmin,fmax) AND srcf.name = $1 '
 
LANGUAGE 'sql';

CREATE OR REPLACE FUNCTION featureloc_slice(int, int, int) RETURNS setof featureloc AS 'SELECT *  FROM featureloc
WHEREboxquery($2, $3) @ boxrange(fmin,fmax)  AND srcfeature_id = $1 '
 
LANGUAGE 'sql';

-- can we not just do these as views?
CREATE OR REPLACE FUNCTION feature_overlaps(int)RETURNS setof feature AS'SELECT feature.* FROM feature  INNER JOIN
featurelocAS x ON (x.feature_id=feature.feature_id)  INNER JOIN featureloc AS y ON (y.feature_id=$1) WHERE
x.srcfeature_id= y.srcfeature_id            AND  ( x.fmax >= y.fmin AND x.fmin <= y.fmax ) '
 
LANGUAGE 'sql';

CREATE OR REPLACE FUNCTION feature_disjoint_from(int)RETURNS setof feature AS'SELECT feature.* FROM feature  INNER JOIN
featurelocAS x ON (x.feature_id=feature.feature_id)  INNER JOIN featureloc AS y ON (y.feature_id=$1) WHERE
x.srcfeature_id= y.srcfeature_id            AND  ( x.fmax < y.fmin OR x.fmin > y.fmax ) '
 
LANGUAGE 'sql';

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Cheers
Chris

> Thanks,
> Sean
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
>




Re: R-tree and start/end queries

From
"Dmitri Bichko"
Date:
How does the performance of the R-tree searches compare to the Bio::GFF
binning approach?

I've been wondering for a while how well the postgres builtins would do
for this application.

Dmitri

> -----Original Message-----
> From: pgsql-sql-owner@postgresql.org
> [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Chris Mungall
> Sent: Wednesday, September 21, 2005 3:14 PM
> To: Sean Davis
> Cc: pgsql-sql@postgresql.org; Hilmar Lapp
> Subject: Re: [SQL] R-tree and start/end queries
>
>
>
> On Wed, 21 Sep 2005, Sean Davis wrote:
>
> > I have a table like:
> >
> > Create table gf (
> >     pk    serial,
> >     start int,
> >     end   int,
> >     gf    varchar
> > );
> >
> > I want to do queries along the lines of:
> >
> > "find all gf that overlap with (10000,20000)" or
> > "find all gf that overlap with each other"
> >
> > And others.  I have read over the documentation, but I still remain
> > unclear about how to implement R-tree indexing in this
> situation.  Any
> > suggestions?
>
> Hi Sean
>
> I'm guessing that this is for some kind of genome database, yep?
>
> You may want to look at the chado database which has a
> growing library of functions for this sort of thing;
> www.gmod.org/schema
>
> Here is the code for doing range interval functions; our
> featureloc is equivalent to your "gf" (though we separate the
> entity from the entity being located). Our fmin and fmax may
> be equivalent to your start and end above (unless you
> indicate directionality with start>end in which case the
> intersection functions get a bit trickier). Our feature_id is
> probably equivalent to your gf column.
>
> We use the builtin pg types "point" and "box" and make an
> RTREE index over
> this:
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> --
> -- functions operating on featureloc ranges
> --
>
> -- create a point
> CREATE OR REPLACE FUNCTION create_point (int, int) RETURNS
> point AS  'SELECT point ($1, $2)' LANGUAGE 'sql';
>
> -- create a range box
> -- (make this immutable so we can index it)
> CREATE OR REPLACE FUNCTION boxrange (int, int) RETURNS box AS
>  'SELECT box (create_point(0, $1),
> create_point($2,500000000))' LANGUAGE 'sql' IMMUTABLE;
>
> -- create a query box
> CREATE OR REPLACE FUNCTION boxquery (int, int) RETURNS box AS
>  'SELECT box (create_point($1, $2), create_point($1, $2))'
> LANGUAGE 'sql' IMMUTABLE;
>
> --functional index that depends on the above functions
> CREATE INDEX binloc_boxrange ON featureloc USING RTREE
> (boxrange(fmin, fmax));
>
>
> CREATE OR REPLACE FUNCTION featureloc_slice(int, int) RETURNS
> setof featureloc AS
>   'SELECT * from featureloc where boxquery($1, $2) @
> boxrange(fmin,fmax)' LANGUAGE 'sql';
>
> CREATE OR REPLACE FUNCTION featureloc_slice(varchar, int, int)
>   RETURNS setof featureloc AS
>   'SELECT featureloc.*
>    FROM featureloc
>    INNER JOIN feature AS srcf ON (srcf.feature_id =
> featureloc.srcfeature_id)
>    WHERE boxquery($2, $3) @ boxrange(fmin,fmax)
>    AND srcf.name = $1 '
> LANGUAGE 'sql';
>
> CREATE OR REPLACE FUNCTION featureloc_slice(int, int, int)
>   RETURNS setof featureloc AS
>   'SELECT *
>    FROM featureloc
>    WHERE boxquery($2, $3) @ boxrange(fmin,fmax)
>    AND srcfeature_id = $1 '
> LANGUAGE 'sql';
>
> -- can we not just do these as views?
> CREATE OR REPLACE FUNCTION feature_overlaps(int)
>  RETURNS setof feature AS
>  'SELECT feature.*
>   FROM feature
>    INNER JOIN featureloc AS x ON (x.feature_id=feature.feature_id)
>    INNER JOIN featureloc AS y ON (y.feature_id=$1)
>   WHERE
>    x.srcfeature_id = y.srcfeature_id            AND
>    ( x.fmax >= y.fmin AND x.fmin <= y.fmax ) '
> LANGUAGE 'sql';
>
> CREATE OR REPLACE FUNCTION feature_disjoint_from(int)
>  RETURNS setof feature AS
>  'SELECT feature.*
>   FROM feature
>    INNER JOIN featureloc AS x ON (x.feature_id=feature.feature_id)
>    INNER JOIN featureloc AS y ON (y.feature_id=$1)
>   WHERE
>    x.srcfeature_id = y.srcfeature_id            AND
>    ( x.fmax < y.fmin OR x.fmin > y.fmax ) '
> LANGUAGE 'sql';
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Cheers
> Chris
>
> > Thanks,
> > Sean
> >
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 1: if posting/reading through Usenet, please send an appropriate
> >        subscribe-nomail command to majordomo@postgresql.org
> so that your
> >        message can get through to the mailing list cleanly
> >
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: explain analyze is your friend
>
The information transmitted is intended only for the person or entity to which it is addressed and may contain
confidentialand/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any
actionin reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you
receivedthis in error, please contact the sender and delete the material from any computer 


Re: R-tree and start/end queries

From
Tom Lane
Date:
Bruno Wolff III <bruno@wolff.to> writes:
> There is a built in type for line segments that uses floating point. That
> will probably be usable by you directly unless the integers can can large
> enough that precision is a problem. There is an overlaps operator for the
> geometric types that could be used to answer your sample questions.

However, there's no built-in rtree opclass for that datatype, so he'd
still be stuck with respect to getting indexing support for overlaps
queries.

I think the contrib/seg datatype might help, though the precision issue
is still a possible problem.
        regards, tom lane