Thread: Query memory usage
Hi Everyone, I have a table with several hundred million rows of timestamped values. Using pg_dump we are able to dump the entire table to disk no problem. However, I would like to retrieve a large subset of data from this table using something like: COPY (SELECT * FROM history WHERE timestamp > '2009-01-01') TO STDOUT; Executing this query causes our server to consume all available swap and crash. Can anyone help me figure out what needs to be done to allow this query to execute? How long it takes doesn't really matter as long as it can be performed reliably. The database currently lives on a Red Hat EL 5.3 server with 16GB RAM and 4GB swap running PostgreSQL 8.3.7. Possibly relevant lines from postgresql.conf: shared_buffers = 4GB work_mem = 32MB maintenance_work_mem = 1GB effective_cache_size = 8GB Tom
Tom Duffey wrote: > Hi Everyone, > > I have a table with several hundred million rows of timestamped > values. Using pg_dump we are able to dump the entire table to disk no > problem. However, I would like to retrieve a large subset of data > from this table using something like: > > COPY (SELECT * FROM history WHERE timestamp > '2009-01-01') TO STDOUT; > > Executing this query causes our server to consume all available swap > and crash. Can anyone help me figure out what needs to be done to > allow this query to execute? How long it takes doesn't really matter > as long as it can be performed reliably. The database currently lives > on a Red Hat EL 5.3 server with 16GB RAM and 4GB swap running > PostgreSQL 8.3.7. Possibly relevant lines from postgresql.conf: > > shared_buffers = 4GB > work_mem = 32MB > maintenance_work_mem = 1GB > effective_cache_size = 8GB > > Tom > Is there an index on the time stamp? If not, create one - it will make possible the select without having to read the entire table (only the index) -- Karl
Attachment
Tom Duffey <tduffey@trillitech.com> writes: > I have a table with several hundred million rows of timestamped > values. Using pg_dump we are able to dump the entire table to disk no > problem. However, I would like to retrieve a large subset of data > from this table using something like: > COPY (SELECT * FROM history WHERE timestamp > '2009-01-01') TO STDOUT; > Executing this query causes our server to consume all available swap > and crash. What's being done on the client side with the data? AFAIK that operation really shouldn't consume a lot of memory on the server side. It would help if you'd be more specific about which process is consuming swap space. regards, tom lane
On May 15, 2010, at 4:51 PM, Tom Lane wrote: > Tom Duffey <tduffey@trillitech.com> writes: >> I have a table with several hundred million rows of timestamped >> values. Using pg_dump we are able to dump the entire table to disk >> no >> problem. However, I would like to retrieve a large subset of data >> from this table using something like: > >> COPY (SELECT * FROM history WHERE timestamp > '2009-01-01') TO >> STDOUT; > >> Executing this query causes our server to consume all available swap >> and crash. > > What's being done on the client side with the data? AFAIK that > operation really shouldn't consume a lot of memory on the server side. > It would help if you'd be more specific about which process is > consuming > swap space. I am executing the query in psql at the command line and piping the result to a file, e.g., psql < get_data.sql > data.sql Tom
Tom Duffey <tduffey@trillitech.com> writes: > On May 15, 2010, at 4:51 PM, Tom Lane wrote: >> What's being done on the client side with the data? > I am executing the query in psql at the command line and piping the > result to a file, e.g., > psql < get_data.sql > data.sql Well, I tried executing a large "copy (select ...)" query and couldn't see any memory bloat at all in either the backend or psql. So there's something relevant that you haven't told us. Could we see the full schema (eg via psql \dt) for the table being copied? regards, tom lane
On May 15, 2010, at 7:28 PM, Tom Lane wrote:
Tom Duffey <tduffey@trillitech.com> writes:On May 15, 2010, at 4:51 PM, Tom Lane wrote:What's being done on the client side with the data?I am executing the query in psql at the command line and piping theresult to a file, e.g.,psql < get_data.sql > data.sql
Well, I tried executing a large "copy (select ...)" query and couldn't
see any memory bloat at all in either the backend or psql. So there's
something relevant that you haven't told us.
Could we see the full schema (eg via psql \dt) for the table being
copied?
I hope you are right! The actual query is different because I was not aware until right before I posted this question that you can have a WHERE clause with COPY. Here is the actual query I ran:
SELECT point_id || E'\t' || status || E'\t' || value || E'\t' || timestamp
FROM point_history
WHERE timestamp > NOW() - interval '18 months';
And here is the table schema:
prod=> \dt point_history
List of relations
Schema | Name | Type | Owner
--------+---------------+-------+----------------
public | point_history | table | prod
(1 row)
prod=> \d point_history
Table "public.point_history"
Column | Type | Modifiers
-----------+-----------------------------+-----------
point_id | integer | not null
value | real | not null
status | integer | not null
timestamp | timestamp without time zone | not null
Indexes:
"point_history_pkey" PRIMARY KEY, btree (point_id, "timestamp")
Foreign-key constraints:
"$1" FOREIGN KEY (point_id) REFERENCES point(id)
Tom
Tom Duffey <tduffey@trillitech.com> writes: > On May 15, 2010, at 7:28 PM, Tom Lane wrote: >> Well, I tried executing a large "copy (select ...)" query and couldn't >> see any memory bloat at all in either the backend or psql. So there's >> something relevant that you haven't told us. > I hope you are right! The actual query is different because I was not > aware until right before I posted this question that you can have a > WHERE clause with COPY. Here is the actual query I ran: > SELECT point_id || E'\t' || status || E'\t' || value || E'\t' || > timestamp > FROM point_history > WHERE timestamp > NOW() - interval '18 months'; Ermm ... is that the whole query, or did you wrap it in COPY (...) TO STDOUT? The former case will cause psql to eat memory, because it tries to buffer the whole result of an ordinary query. In the latter case psql will just stream the data through to the output file. regards, tom lane
On May 15, 2010, at 8:00 PM, Tom Lane wrote: > Tom Duffey <tduffey@trillitech.com> writes: >> On May 15, 2010, at 7:28 PM, Tom Lane wrote: >>> Well, I tried executing a large "copy (select ...)" query and >>> couldn't >>> see any memory bloat at all in either the backend or psql. So >>> there's >>> something relevant that you haven't told us. > >> I hope you are right! The actual query is different because I was >> not >> aware until right before I posted this question that you can have a >> WHERE clause with COPY. Here is the actual query I ran: > >> SELECT point_id || E'\t' || status || E'\t' || value || E'\t' || >> timestamp >> FROM point_history >> WHERE timestamp > NOW() - interval '18 months'; > > Ermm ... is that the whole query, or did you wrap it in COPY (...) TO > STDOUT? The former case will cause psql to eat memory, because it > tries > to buffer the whole result of an ordinary query. In the latter case > psql will just stream the data through to the output file. That's the whole query. If I understand your reply correctly it sounds like psql was the culprit and that I should try again using COPY (...) TO STDOUT, no? Tom