several questions about R-tree index - Mailing list pgsql-sql

From TJ O'Donnell
Subject several questions about R-tree index
Date
Msg-id 3366.209.223.166.104.1114559525.squirrel@gnova.com
Whole thread Raw
Responses Re: several questions about R-tree index
List pgsql-sql
According to the manual at:
http://www.postgresql.org/docs/7.4/static/functions-geometry.html
" The PostgreSQL query planner will consider using an R-tree index whenever an indexed column is
involved in a comparison using one of these operators: <<, &<, &>, >>, @, ~=, && (Refer to Section
9.9 about the meaning of these operators.)"

Shouldn't the ~ (contains) operator be included also?
Isn't ~ the commutator of @ ?

I am considering using R-tree's for other-than-geometric purposes.  Here is the story:
I have string data (column smiles) which represents chemical structures.  Comparing strings
for equality works perfectly using an ordinary B-tree index.  But locating chemical
substructures requires a more elaborate (read time-consuming) match function.
I search for substructures like this:
Select count(smiles) from structures where matches(smiles, subsmiles);

where subsmiles is the search pattern desired.  This is not a fast search - about
15 seconds for 0.25 million rows.
To speed up this search I have a function (called fingerprint) which produces a bit string
representation of some common and uncommon substructures.  I have populated my table
with column fp = fingerprint(smiles)
So, this is about 5 times faster:
 Select count(smiles) from structures where (fp & fingerprint(subsmiles)) = fingerprint(subsmiles) & matches(smiles,
subsmiles);
The sequence scan using the first where-clause takes about 3 seconds and the final
matches on the resulting subset is insignificant when the subset is small.

But I think I might be able to do better (faster) using R-trees.
Bitstrings can be thought of as "containing" when one bitstring has all the
same bits set as another, even if it has other bits set too - this is the gist of the
first where-clause above.

Can I expect to be able to use R-tree's to do this?
Will I simply have to define a new datatype and three R-tree functions
(union, intersection and size).
Will the use of the ~ (contains) operator cause the planner to consider
using an index (this is my first question, way above)?

I hope someone has had the patience and interest to read this far.

Thanks,
TJ




pgsql-sql by date:

Previous
From: Rodrigo Carvalhaes
Date:
Subject: Re: UPDATE WITH ORDER BY
Next
From: Andrew - Supernews
Date:
Subject: Re: several questions about R-tree index