Home > mailing lists

How to avoid "Out of memory" using aggregate functions? - Mailing list pgsql-sql

From	Mark Dalphin
Subject	How to avoid "Out of memory" using aggregate functions?
Date	December 2, 1999 14:31:09
Msg-id	3846C8D3.DB066E78@amgen.com Whole thread Raw
Responses	Re: [SQL] How to avoid "Out of memory" using aggregate functions?
List	pgsql-sql

Tree view

Hi,

I am working with a simple table similar to the following:

CREATE TABLE Seq (   SeqID int8 PRIMARY KEY,   FileID int2 NOT NULL,   FileOffset int8 NOT NULL,   SeqLength int4,
FOREIGNKEY (FileID) REFERENCES File (FileID)
 
);

CREATE INDEX Seq_SeqLength ON Seq (SeqLength);

The table contains facts about a DNA sequence (eg its length) and how to access
that sequence from an ordinary Unix file (FileID and FileOffset) where it is
concatonated with all the other DNA sequences.

Currently the TABLE Seq contains 4,586,379 (times 2) rows and the overall size
of the Unix file is 3,660,117,293 bytes (ie I expect sum(SeqLength) to be in the
region of 3.6 Gbytes).

I want to use the PostgreSQL aggregate functions to learn more about the
sequences.

The following query works well and quickly (I need to cast as int8 to avoid
overflow of the sum() and perhaps for avg()):
SELECT    count(seqlength::int8),         min(seqlength::int8),         max(seqlength::int8),
avg(seqlength::int8),        sum(seqlength::int8)
 
FROM Seq
WHERE FileID=4 AND SeqLength>1000;

An EXPLAIN of the above query returns the following:
NOTICE:  QUERY PLAN:

Aggregate  (cost=202000.62 rows=918779 width=4) ->  Index Scan using seq_seqlength on seq  (cost=202000.62 rows=918779
width=4)

EXPLAIN


If I remove the SeqLength constraint, the query takes forever to execute and
then crashes with an out-of-memory (on my SGI, I have 256 Mbyte RAM and equal
sized swap space; I run out of swap space during the query which makes a very
unpleasent noise).

EXPLAIN
SELECT  count(seqlength::int8),       min(seqlength::int8),       max(seqlength::int8),       avg(seqlength::int8),
 sum(seqlength::int8)
 
FROM Seq
WHERE FileID=4;

NOTICE:  QUERY PLAN:

Aggregate  (cost=397916.69 rows=2756337 width=4) ->  Seq Scan on seq  (cost=397916.69 rows=2756337 width=4)

EXPLAIN

I do not understand how to interpret this Query Plan, except that I am doing a
sequential scan of the DB (hence slow), but, since all the elements requested
are aggregates, why do I run out of memory? I am not trying to load the whole
table into memory at once, am I?  Suggestions for how to make this query run
would be welcome; my database is rapidly growing larger!

Thanks,
Mark

--
Mark Dalphin                          email: mdalphin@amgen.com
Mail Stop: 29-2-A                     phone: +1-805-447-4951 (work)
One Amgen Center Drive                       +1-805-375-0680 (home)
Thousand Oaks, CA 91320                 fax: +1-805-499-9955 (work)

pgsql-sql by date:

From: Chris Griffin
Date: 02 December 1999, 13:10:07
Subject: Logical and for int4?

From: "Zot O'Connor"
Date: 02 December 1999, 15:19:09
Subject: PHP/Postgres dieing in random but consistent place.

How to avoid "Out of memory" using aggregate functions? - Mailing list pgsql-sql

Previous

Next