Thread: several questions about R-tree index
According to the manual at: http://www.postgresql.org/docs/7.4/static/functions-geometry.html " The PostgreSQL query planner will consider using an R-tree index whenever an indexed column is involved in a comparison using one of these operators: <<, &<, &>, >>, @, ~=, && (Refer to Section 9.9 about the meaning of these operators.)" Shouldn't the ~ (contains) operator be included also? Isn't ~ the commutator of @ ? I am considering using R-tree's for other-than-geometric purposes. Here is the story: I have string data (column smiles) which represents chemical structures. Comparing strings for equality works perfectly using an ordinary B-tree index. But locating chemical substructures requires a more elaborate (read time-consuming) match function. I search for substructures like this: Select count(smiles) from structures where matches(smiles, subsmiles); where subsmiles is the search pattern desired. This is not a fast search - about 15 seconds for 0.25 million rows. To speed up this search I have a function (called fingerprint) which produces a bit string representation of some common and uncommon substructures. I have populated my table with column fp = fingerprint(smiles) So, this is about 5 times faster: Select count(smiles) from structures where (fp & fingerprint(subsmiles)) = fingerprint(subsmiles) & matches(smiles, subsmiles); The sequence scan using the first where-clause takes about 3 seconds and the final matches on the resulting subset is insignificant when the subset is small. But I think I might be able to do better (faster) using R-trees. Bitstrings can be thought of as "containing" when one bitstring has all the same bits set as another, even if it has other bits set too - this is the gist of the first where-clause above. Can I expect to be able to use R-tree's to do this? Will I simply have to define a new datatype and three R-tree functions (union, intersection and size). Will the use of the ~ (contains) operator cause the planner to consider using an index (this is my first question, way above)? I hope someone has had the patience and interest to read this far. Thanks, TJ
On 2005-04-26, "TJ O'Donnell" <tjo@acm.org> wrote: > But I think I might be able to do better (faster) using R-trees. > Bitstrings can be thought of as "containing" when one bitstring has all the > same bits set as another, even if it has other bits set too - this is the > gist of the first where-clause above. > > Can I expect to be able to use R-tree's to do this? You may want to use GIST instead - it is more flexible. > Will I simply have to define a new datatype and three R-tree functions > (union, intersection and size). > Will the use of the ~ (contains) operator cause the planner to consider > using an index (this is my first question, way above)? What the planner actually looks for is, having identified the specific operator and types, whether an index opclass exists for them, and if so whether an index using that opclass exists. The actual names of the operators are irrelevent. So for either rtree or GIST, all you need is to define your new datatype, with its associated operators, and create an operator class for it with appropriate support functions, and create indexes using that opclass. Once all that is done, the planner will consider using them. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
"TJ O'Donnell" <tjo@acm.org> writes: > Shouldn't the ~ (contains) operator be included also? > Isn't ~ the commutator of @ ? Yeah, it looks like the documentation is in error: regression=# select amopopr::regoperator,amopstrategy from pg_amop where amopclaid in regression-# (select oid from pg_opclass where opcamid = 402); amopopr | amopstrategy ---------------------+--------------<<(box,box) | 1&<(box,box) | 2&&(box,box) | 3&>(box,box) | 4>>(box,box) | 5~=(box,box) | 6~(box,box) | 7@(box,box) | 8<<(polygon,polygon) | 1&<(polygon,polygon)| 2&&(polygon,polygon) | 3&>(polygon,polygon) | 4>>(polygon,polygon)| 5~=(polygon,polygon) | 6~(polygon,polygon) | 7@(polygon,polygon) | 8 (16 rows) Of course, the thing that leaps out here is that there are only two built-in opclasses for rtree. Not exactly confidence building. > I am considering using R-tree's for other-than-geometric purposes. I concur with Andrew's suggestion to consider GIST. GIST has its own issues, but at least there are people looking at it/using it/working on it. R-tree doesn't seem to have any user community that really cares. regards, tom lane