Hi,
Can someone provide some insight for me as to why this query takes so
long on 6655 rows of data. I'm beginning to think something is seriously
wrong with my config wrt to memory settings. I note vmstats (at the
bottom) indicates no swapping so I'm not running out system wide but I
don't know whether I am within postgres.
sql> explain ANALYZE SELECT MIN(ref), h_message_id, COUNT(h_message_id)
FROM mail_969 GROUP BY h_message_id HAVING COUNT(h_message_id) > 25;
Aggregate (cost=85031.01..87894.10 rows=28631 width=44) (actual
time=185449.57..185449.57 rows=0 loops=1)
-> Group (cost=85031.01..85746.78 rows=286309 width=44) (actual
time=185374.92..185413.32 rows=6655 loops=1)
-> Sort (cost=85031.01..85031.01 rows=286309 width=44)
(actual time=185374.91..185379.23 rows=6655 loops=1)
-> Seq Scan on mail_969 (cost=0.00..59081.09
rows=286309 width=44) (actual time=179.65..185228.19 rows=6655 loops=1)
Total runtime: 185451.08 msec
To put this into perspective, we see similar results on a table with
over 300,000 rows :
sql> explain ANALYZE SELECT MIN(ref), h_message_id, COUNT(h_message_id)
FROM mail_650 GROUP BY h_message_id HAVING COUNT(h_message_id) > 25;
NOTICE: QUERY PLAN:
Aggregate (cost=88503.85..91596.83 rows=30930 width=48) (actual
time=142483.52..149102.57 rows=244 loops=1)
-> Group (cost=88503.85..89277.10 rows=309298 width=48) (actual
time=142444.19..148477.99 rows=309245 loops=1)
-> Sort (cost=88503.85..88503.85 rows=309298 width=48)
(actual time=142444.17..142652.83 rows=309245 loops=1)
-> Seq Scan on mail_650 (cost=0.00..60297.98
rows=309298 width=48) (actual time=445.75..105818.97 rows=309245 loops=1)
Total runtime: 149181.30 msec
These selects are part of a cleanup operation on a 70GB DB (normal
conditions are around 25GB). They find dupes, preserve one primary key,
and delete the rest. Currently I have an issue that the DB requires a
full vacuum from prior runs of the above however this is another
problem, detailed on the 'admin' list.
Perhaps this is due to the dataset still being so big (70GB) on
effectively one disk, but I just thought I'd check with you guys.
Hardware:
1GB Ram, SMP 1GHz P3, SvrWks OSB4 chipset, Adaptec aic7899 with 2
SCSI-160 disks split between DB and pg_xlog. (I know disks should be
better laid out for a busy db, but this hasn't been my decision :)
Config:
max_fsm_relations = 1000
max_fsm_pages = 20000
vacuum_mem = 65536
effective_cache_size = 95694
random_page_cost = 2
sort_mem=65536
max_connections = 128
shared_buffers = 15732
wal_buffers = 64 # need to determin
wal_files = 64 # range 0-64
wal_sync_method = fsync # the default varies across platforms:
wal_debug = 0 # range 0-16
# hopefully this should see less LogFlushes per LogInsert - use more WAL
though.
commit_delay = 10000 # range 0-100000
commit_siblings = 2 # range 1-1000
checkpoint_segments = 32 # in logfile segments (16MB each), min 1
checkpoint_timeout = 600 # in seconds, range 30-3600
fsync = false
#fsync = true
vmstats whilst running (indicating no swaping) :
procs memory swap io system
cpu
r b w swpd free buff cache si so bi bo in cs us
sy id
1 2 1 45592 10868 11028 853700 0 0 7 3 1 0 5
5 7
2 2 0 45592 10288 11236 849312 8 0 732 888 1980 3516 63
12 26
8 2 0 45592 11208 11304 849224 0 0 4438 286 2696 3758 66
15 19
10 2 3 45592 10284 11332 848872 0 0 6344 664 2888 3614 71
18 11
2 8 1 45592 10408 11388 845140 0 0 4622 402 2216 2306 70
11 19
3 7 2 45592 10416 11440 845972 0 0 3538 68 2052 2079 66
9 25
10 5 1 45592 10916 11496 846676 0 0 4428 444 2968 4385 75
17 8
2 4 0 45592 10380 11592 848348 0 0 5940 184 2609 3421 69
15 16
Cheers,
--
Rob Fielding
rob@dsvr.net
www.dsvr.co.uk Development Designer Servers Ltd