Hi,
I am working with a simple table similar to the following:
CREATE TABLE Seq ( SeqID int8 PRIMARY KEY, FileID int2 NOT NULL, FileOffset int8 NOT NULL, SeqLength int4,
FOREIGNKEY (FileID) REFERENCES File (FileID)
);
CREATE INDEX Seq_SeqLength ON Seq (SeqLength);
The table contains facts about a DNA sequence (eg its length) and how to access
that sequence from an ordinary Unix file (FileID and FileOffset) where it is
concatonated with all the other DNA sequences.
Currently the TABLE Seq contains 4,586,379 (times 2) rows and the overall size
of the Unix file is 3,660,117,293 bytes (ie I expect sum(SeqLength) to be in the
region of 3.6 Gbytes).
I want to use the PostgreSQL aggregate functions to learn more about the
sequences.
The following query works well and quickly (I need to cast as int8 to avoid
overflow of the sum() and perhaps for avg()):
SELECT count(seqlength::int8), min(seqlength::int8), max(seqlength::int8),
avg(seqlength::int8), sum(seqlength::int8)
FROM Seq
WHERE FileID=4 AND SeqLength>1000;
An EXPLAIN of the above query returns the following:
NOTICE: QUERY PLAN:
Aggregate (cost=202000.62 rows=918779 width=4) -> Index Scan using seq_seqlength on seq (cost=202000.62 rows=918779
width=4)
EXPLAIN
If I remove the SeqLength constraint, the query takes forever to execute and
then crashes with an out-of-memory (on my SGI, I have 256 Mbyte RAM and equal
sized swap space; I run out of swap space during the query which makes a very
unpleasent noise).
EXPLAIN
SELECT count(seqlength::int8), min(seqlength::int8), max(seqlength::int8), avg(seqlength::int8),
sum(seqlength::int8)
FROM Seq
WHERE FileID=4;
NOTICE: QUERY PLAN:
Aggregate (cost=397916.69 rows=2756337 width=4) -> Seq Scan on seq (cost=397916.69 rows=2756337 width=4)
EXPLAIN
I do not understand how to interpret this Query Plan, except that I am doing a
sequential scan of the DB (hence slow), but, since all the elements requested
are aggregates, why do I run out of memory? I am not trying to load the whole
table into memory at once, am I? Suggestions for how to make this query run
would be welcome; my database is rapidly growing larger!
Thanks,
Mark
--
Mark Dalphin email: mdalphin@amgen.com
Mail Stop: 29-2-A phone: +1-805-447-4951 (work)
One Amgen Center Drive +1-805-375-0680 (home)
Thousand Oaks, CA 91320 fax: +1-805-499-9955 (work)