selects from large tables - Mailing list pgsql-performance

From Nikk Anderson
Subject selects from large tables
Date
Msg-id DA1274E682D3734B8802904A9B36124C298A93@nic-nts1.nic.parallel.ltd.uk
Whole thread Raw
Responses Re: selects from large tables
List pgsql-performance

Hi All,

We are using Postgres 7.1, on Solaris 8 - hardware is a 400mhz Netra X1, 512Mb ram, with the database on a separate partition.

Our main result tables are getting really big, and we don't want to delete any data yet.  Currently, our largest table has around 10 million rows and is going up at a rate of around 1 million per month.  The table has 13 integer, one boolean and one timestamp column.  We index the table on an ID number and the timestamp.  We vacuum analyse the table every night.  The performance has steadily degraded, and the more data we try and select, the longer the select queries take.

The queries are not complex, and do not involve any unions etc, eg:

SELECT * FROM table_name WHERE column1 = 454 AND time BETWEEN '2002-10-13 13:44:00.0' AND '2002-11-14'

SELECT count(DISTINCT id) FROM table_name WHERE column1 = 454 AND time BETWEEN '2002-10-13 13:44:00.0' AND '2002-11-14

See various queries and explains at the end this email for more info on the type of queries we are doing.
Most of the queries use a sequence scan - disabling this and forcing index scan decreases performance further for those queries.

These queries are sometimes taking over 2 minutes to perform!!!! If we reduce the table size significantly (i.e. around 1 million rows)is is obviously faster - down to a few seconds.

We then tried the DB on a clean installation of Solaris 9, on a dual 400mhz processor SunE250 with 2Gb ram, and 2 scsi 17gb disks.  We put the database onto the second disk.  Surprisingly the performance is only 5-10% greater.  I expected far more, due to the increased power of the machine.  Looking at the os info on this machine, the IO wait is negligible as is the cpu usage.  So this machine is not working as hard as the Netra X1, though the time taken to perform queries is not too much different.

We have tried tweaking the shared buffers and sort mem (also tweaking kernel shared mem size), which make little difference, and in fact if we increase it to around 25% of total memory performance degrades slightly.  We have changed from the default amount of shared buffers, to 64000 to give us access to 25% of the total system memory.

Any ideas on how we can select data more quickly from large tables?

Other ideas we had was to split the data over multiple table by id (resulting in several thousand tables), however this would make management of the database in terms of keys, triggers and integrity very difficult and messy.

I hope someone can offer some advice.

Cheers

Nikk

- Queries and explain plans

select count(*) from table_name;
NOTICE:  QUERY PLAN:
Aggregate  (cost=488700.65..488700.65 rows=1 width=0)
  ->  Seq Scan on table_name  (cost=0.00..439527.12 rows=19669412 width=0)

hawkdb=# explain select count(job_id) from table_name;
NOTICE:  QUERY PLAN:
Aggregate  (cost=488700.65..488700.65 rows=1 width=4)
  ->  Seq Scan on table_name  (cost=0.00..439527.12 rows=19669412 width=4)

hawkdb=# explain select * from table_name;
NOTICE:  QUERY PLAN:
Seq Scan on table_name  (cost=0.00..439527.12 rows=19669412 width=57)

hawkdb=# explain select count(*) from table_name where job_id = 13;
NOTICE:  QUERY PLAN:
Aggregate  (cost=537874.18..537874.18 rows=1 width=0)
  ->  Seq Scan on table_name  (cost=0.00..488700.65 rows=19669412 width=0)

hawkdb=# explain select * from table_name where job_id = 13;
NOTICE:  QUERY PLAN:
Seq Scan on http_result  (cost=0.00..488700.65 rows=19669412 width=57)

hawkdb=# explain select * from table_name where job_id = 1;
NOTICE:  QUERY PLAN:
Index Scan using http_result_pk on table_name  (cost=0.00..5.01 rows=1 width=57)

hawkdb=#explain select * from table_name where time > '2002-10-10';
NOTICE:  QUERY PLAN:
Seq Scan on table_name  (cost=0.00..488700.65 rows=19649743 width=57)

hawkdb=# explain select * from http_result where time < '2002-10-10';
NOTICE:  QUERY PLAN:
Index Scan using table_name_time on table_name  (cost=0.00..75879.17 rows=19669 width=57)

Nikk Anderson

Parallel ltd.
Cranfield Innovation Centre
University Way
Cranfield
Bedfordshire
MK43 0BT

http://www.nexuswatch.com
http://www.parallel.ltd.uk

Tel: +44 (0)8700 PARALLEL (727255)
Fax: +44 (0)8700 PARAFAX  (727232)

******************************************************************
Privileged/Confidential Information may be contained in this
message.  If you are not the addressee indicated in this message
(or responsible for delivery of the message to such person), you
may not copy or deliver this message to anyone. In such case, you
should destroy this message and kindly notify the sender by reply
email. Please advise immediately if you or your employer do not
consent to Internet email for messages of this kind.  Opinions,
conclusions and other information in this message that do not
relate to the official business of Parallel shall be understood
as neither given nor endorsed by it.

Unless agreed otherwise by way of a signed agreement, any business
conducted by Parallel shall be subject to its Standard Terms
and Conditions which are available upon request.
******************************************************************

pgsql-performance by date:

Previous
From: Stephan Szabo
Date:
Subject: Re: Sort time
Next
From: Stephan Szabo
Date:
Subject: Re: selects from large tables