Thread: Query memory usage

Query memory usage

From
Tom Duffey
Date:
Hi Everyone,

I have a table with several hundred million rows of timestamped
values.  Using pg_dump we are able to dump the entire table to disk no
problem.  However, I would like to retrieve a large subset of data
from this table using something like:

COPY (SELECT * FROM history WHERE timestamp > '2009-01-01') TO STDOUT;

Executing this query causes our server to consume all available swap
and crash.  Can anyone help me figure out what needs to be done to
allow this query to execute?  How long it takes doesn't really matter
as long as it can be performed reliably.  The database currently lives
on a Red Hat EL 5.3 server with 16GB RAM and 4GB swap running
PostgreSQL 8.3.7.  Possibly relevant lines from postgresql.conf:

shared_buffers = 4GB
work_mem = 32MB
maintenance_work_mem = 1GB
effective_cache_size = 8GB

Tom

Re: Query memory usage

From
Karl Denninger
Date:
Tom Duffey wrote:
> Hi Everyone,
>
> I have a table with several hundred million rows of timestamped
> values.  Using pg_dump we are able to dump the entire table to disk no
> problem.  However, I would like to retrieve a large subset of data
> from this table using something like:
>
> COPY (SELECT * FROM history WHERE timestamp > '2009-01-01') TO STDOUT;
>
> Executing this query causes our server to consume all available swap
> and crash.  Can anyone help me figure out what needs to be done to
> allow this query to execute?  How long it takes doesn't really matter
> as long as it can be performed reliably.  The database currently lives
> on a Red Hat EL 5.3 server with 16GB RAM and 4GB swap running
> PostgreSQL 8.3.7.  Possibly relevant lines from postgresql.conf:
>
> shared_buffers = 4GB
> work_mem = 32MB
> maintenance_work_mem = 1GB
> effective_cache_size = 8GB
>
> Tom
>
Is there an index on the time stamp?  If not, create one - it will make
possible the select without having to read the entire table (only the index)

-- Karl

Attachment

Re: Query memory usage

From
Tom Lane
Date:
Tom Duffey <tduffey@trillitech.com> writes:
> I have a table with several hundred million rows of timestamped
> values.  Using pg_dump we are able to dump the entire table to disk no
> problem.  However, I would like to retrieve a large subset of data
> from this table using something like:

> COPY (SELECT * FROM history WHERE timestamp > '2009-01-01') TO STDOUT;

> Executing this query causes our server to consume all available swap
> and crash.

What's being done on the client side with the data?  AFAIK that
operation really shouldn't consume a lot of memory on the server side.
It would help if you'd be more specific about which process is consuming
swap space.

            regards, tom lane

Re: Query memory usage

From
Tom Duffey
Date:
On May 15, 2010, at 4:51 PM, Tom Lane wrote:

> Tom Duffey <tduffey@trillitech.com> writes:
>> I have a table with several hundred million rows of timestamped
>> values.  Using pg_dump we are able to dump the entire table to disk
>> no
>> problem.  However, I would like to retrieve a large subset of data
>> from this table using something like:
>
>> COPY (SELECT * FROM history WHERE timestamp > '2009-01-01') TO
>> STDOUT;
>
>> Executing this query causes our server to consume all available swap
>> and crash.
>
> What's being done on the client side with the data?  AFAIK that
> operation really shouldn't consume a lot of memory on the server side.
> It would help if you'd be more specific about which process is
> consuming
> swap space.

I am executing the query in psql at the command line and piping the
result to a file, e.g.,

psql < get_data.sql > data.sql

Tom

Re: Query memory usage

From
Tom Lane
Date:
Tom Duffey <tduffey@trillitech.com> writes:
> On May 15, 2010, at 4:51 PM, Tom Lane wrote:
>> What's being done on the client side with the data?

> I am executing the query in psql at the command line and piping the
> result to a file, e.g.,
> psql < get_data.sql > data.sql

Well, I tried executing a large "copy (select ...)" query and couldn't
see any memory bloat at all in either the backend or psql.  So there's
something relevant that you haven't told us.

Could we see the full schema (eg via psql \dt) for the table being
copied?

            regards, tom lane

Re: Query memory usage

From
Tom Duffey
Date:

On May 15, 2010, at 7:28 PM, Tom Lane wrote:

Tom Duffey <tduffey@trillitech.com> writes:
On May 15, 2010, at 4:51 PM, Tom Lane wrote:
What's being done on the client side with the data?

I am executing the query in psql at the command line and piping the  
result to a file, e.g.,
psql < get_data.sql > data.sql

Well, I tried executing a large "copy (select ...)" query and couldn't
see any memory bloat at all in either the backend or psql.  So there's
something relevant that you haven't told us.

Could we see the full schema (eg via psql \dt) for the table being
copied?

I hope you are right!  The actual query is different because I was not aware until right before I posted this question that you can have a WHERE clause with COPY.  Here is the actual query I ran:

SELECT point_id || E'\t' || status || E'\t' || value || E'\t' || timestamp
FROM point_history
WHERE timestamp > NOW() - interval '18 months';

And here is the table schema:

prod=> \dt point_history
                List of relations
 Schema |     Name      | Type  |     Owner      
--------+---------------+-------+----------------
 public | point_history | table | prod
(1 row)

prod=> \d point_history
            Table "public.point_history"
  Column   |            Type             | Modifiers 
-----------+-----------------------------+-----------
 point_id  | integer                     | not null
 value     | real                        | not null
 status    | integer                     | not null
 timestamp | timestamp without time zone | not null
Indexes:
    "point_history_pkey" PRIMARY KEY, btree (point_id, "timestamp")
Foreign-key constraints:
    "$1" FOREIGN KEY (point_id) REFERENCES point(id)

Tom

Re: Query memory usage

From
Tom Lane
Date:
Tom Duffey <tduffey@trillitech.com> writes:
> On May 15, 2010, at 7:28 PM, Tom Lane wrote:
>> Well, I tried executing a large "copy (select ...)" query and couldn't
>> see any memory bloat at all in either the backend or psql.  So there's
>> something relevant that you haven't told us.

> I hope you are right!  The actual query is different because I was not
> aware until right before I posted this question that you can have a
> WHERE clause with COPY.  Here is the actual query I ran:

> SELECT point_id || E'\t' || status || E'\t' || value || E'\t' ||
> timestamp
> FROM point_history
> WHERE timestamp > NOW() - interval '18 months';

Ermm ... is that the whole query, or did you wrap it in COPY (...) TO
STDOUT?  The former case will cause psql to eat memory, because it tries
to buffer the whole result of an ordinary query.  In the latter case
psql will just stream the data through to the output file.

            regards, tom lane

Re: Query memory usage

From
Tom Duffey
Date:
On May 15, 2010, at 8:00 PM, Tom Lane wrote:

> Tom Duffey <tduffey@trillitech.com> writes:
>> On May 15, 2010, at 7:28 PM, Tom Lane wrote:
>>> Well, I tried executing a large "copy (select ...)" query and
>>> couldn't
>>> see any memory bloat at all in either the backend or psql.  So
>>> there's
>>> something relevant that you haven't told us.
>
>> I hope you are right!  The actual query is different because I was
>> not
>> aware until right before I posted this question that you can have a
>> WHERE clause with COPY.  Here is the actual query I ran:
>
>> SELECT point_id || E'\t' || status || E'\t' || value || E'\t' ||
>> timestamp
>> FROM point_history
>> WHERE timestamp > NOW() - interval '18 months';
>
> Ermm ... is that the whole query, or did you wrap it in COPY (...) TO
> STDOUT?  The former case will cause psql to eat memory, because it
> tries
> to buffer the whole result of an ordinary query.  In the latter case
> psql will just stream the data through to the output file.


That's the whole query.  If I understand your reply correctly it
sounds like psql was the culprit and that I should try again using
COPY (...) TO STDOUT, no?

Tom