John Major wrote:
> Hello-
>
> #I am a biologist, and work with large datasets (tables with millions of
> rows are common).
> #These datasets often can be simplified as features with a name, and a
> start and end position (ie: a range along a number line. GeneX is on
> some chromosome from position 10->40)
>
> I store these features in tables that generally have the form:
>
> SIMPLE_TABLE:
> FeatureID(PrimaryKey) -- FeatureName(varchar) --
> FeatureChromosomeName(varchar) -- StartPosition(int) -- EndPosition(int)
>
> My problem is, I often need to execute searches of tables like these
> which find "All features within a range". Ie: select FeatureID from
> SIMPLE_TABLE where FeatureChromosomeName like 'chrX' and StartPosition >
> 1000500 and EndPosition < 2000000;
>
> This kind of query is VERY slow, and I've tried tinkering with indexes
> to speed it up, but with little success.
> Indexes on Chromosome help a little, but it I can't think of a way to
> avoid full table scans for each of the position range queries.
>
> Any advice on how I might be able to improve this situation would be
> very helpful.
Basic question - What version, and what indexes do you have?
Have an EXPLAIN?
Something like -
CREATE INDEX index_name ON SIMPLE_TABLE ( FeatureChromosomeName
varchar_pattern_ops, StartPosition, EndPosition );
The varchar_pattern_ops being the "key" so LIKE can use an index.
Provided of course its LIKE 'something%' and not LIKE '%something'
>
> Thanks!
> John
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
> http://www.postgresql.org/about/donate
>
Weslee