I'm wondering how to speed up sorting which is slowing a query run regularly
on a linux postgresql 7.3.3 system.
The box is a dual PIII with 1Gb ram. The database is located on a 20Gb SCSI
disk, with WAL on a separate disk.
The only changes I've made to postgresql.conf so far are:
shared_buffers=4000
sort_mem=4000
The results of explain analyze are below. The largest tables in the join,
genotypes and ebv, contain 440681 and 3060781 rows respectively, the others
contain perhaps a couple of hundred each.
I've reduced the runtime of the query to 2 minutes down from 10
minutes by eliminating a left join and ORDER BY, but it would be nice to
speed it up further.
The time goes up to nearly 6 minutes if I put the ORDER BY back into the
query.
Any suggestions gratefully received.
Regards,
Chris Hutchinson
----------------------------------------------------------------------
'fast query' without sort
explain analyze
SELECT A.genid,A.name,B.familyid,B.name,B.mumid,B.mumtype,B.dadid,
B.dadtype,CBA.title,CA.sctrait,C.nvalue,C.accuracy
FROM genotypes A , ebv C , sctraitdefinition CA ,
tpr CB , tpsystems CBA , geneticfamily B
WHERE (C.genid=A.genid) and ((A.speciesid='2') and (B.familyid=A.familyid))
and
((C.runid='72') and (CA.sctraitid=C.sctraitid)
and (CB.runid=C.runid) and (CBA.systemid=CB.systemid));
QUERY PLAN
-------------------------------------------------------
Hash Join (cost=18114.84..287310.15 rows=678223 width=130) (actual
time=11824.24..139737.82 rows=1460290 loops=1)
Hash Cond: ("outer".sctraitid = "inner".sctraitid)
-> Hash Join (cost=18113.32..273744.17 rows=678223 width=110) (actual
time=11813.27..124934.74 rows=1460290 loops=1)
Hash Cond: ("outer".runid = "inner".runid)
-> Hash Join (cost=18106.69..260173.07 rows=678223 width=73)
(actual time=11782.12..107005.61 rows=1460290 loops=1)
Hash Cond: ("outer".genid = "inner".genid)
-> Seq Scan on ebv c (cost=0.00..195640.76 rows=1392655
width=20) (actual time=4684.71..68728.26 rows=1460290 loops=1)
Filter: (runid = 72)
-> Hash (cost=15474.16..15474.16 rows=214612 width=53)
(actual time=7089.57..7089.57 rows=0 loops=1)
-> Hash Join (cost=147.83..15474.16 rows=214612
width=53) (actual time=58.74..6597.01 rows=226561 loops=1)
Hash Cond: ("outer".familyid = "inner".familyid)
-> Seq Scan on genotypes a (cost=0.00..9424.51
rows=214612 width=17) (actual time=10.47..3914.89 rows=226561 loops=1)
Filter: (speciesid = 2)
-> Hash (cost=132.46..132.46 rows=6146
width=36) (actual time=48.06..48.06 rows=0 loops=1)
-> Seq Scan on geneticfamily b
(cost=0.00..132.46 rows=6146 width=36) (actual time=2.90..36.74 rows=6146
loops=1)
-> Hash (cost=6.60..6.60 rows=13 width=37) (actual
time=31.06..31.06 rows=0 loops=1)
-> Hash Join (cost=5.16..6.60 rows=13 width=37) (actual
time=30.93..31.04 rows=13 loops=1)
Hash Cond: ("outer".systemid = "inner".systemid)
-> Seq Scan on tpsystems cba (cost=0.00..1.16 rows=16
width=29) (actual time=6.68..6.72 rows=16 loops=1)
-> Hash (cost=5.13..5.13 rows=13 width=8) (actual
time=24.17..24.17 rows=0 loops=1)
-> Seq Scan on tpr cb (cost=0.00..5.13 rows=13
width=8) (actual time=4.75..24.15 rows=13 loops=1)
-> Hash (cost=1.41..1.41 rows=41 width=20) (actual time=10.90..10.90
rows=0 loops=1)
-> Seq Scan on sctraitdefinition ca (cost=0.00..1.41 rows=41
width=20) (actual time=10.72..10.82 rows=41 loops=1)
Total runtime: 140736.98 msec
(24 rows)
'slow query' with sort
explain analyze
SELECT A.genid,A.name,B.familyid,B.name,B.mumid,B.mumtype,B.dadid,
B.dadtype,CBA.title,CA.sctrait,C.nvalue,C.accuracy
FROM genotypes A , ebv C , sctraitdefinition CA ,
tpr CB , tpsystems CBA , geneticfamily B
WHERE (C.genid=A.genid) and ((A.speciesid='2') and (B.familyid=A.familyid))
and
((C.runid='72') and (CA.sctraitid=C.sctraitid)
and (CB.runid=C.runid) and (CBA.systemid=CB.systemid))
ORDER BY A.genid ASC,B.familyid ASC,CA.sctrait ASC;
QUERY PLAN
-------------------------------------------------------
Sort (cost=540740.81..542436.37 rows=678223 width=130) (actual
time=322602.06..346710.43 rows=1460290 loops=1)
Sort Key: a.genid, b.familyid, ca.sctrait
-> Hash Join (cost=18114.84..287310.15 rows=678223 width=130) (actual
time=10398.55..144991.62 rows=1460290 loops=1)
Hash Cond: ("outer".sctraitid = "inner".sctraitid)
-> Hash Join (cost=18113.32..273744.17 rows=678223 width=110)
(actual time=10384.84..129637.23 rows=1460290 loops=1)
Hash Cond: ("outer".runid = "inner".runid)
-> Hash Join (cost=18106.69..260173.07 rows=678223
width=73) (actual time=10353.69..111239.94 rows=1460290 loops=1)
Hash Cond: ("outer".genid = "inner".genid)
-> Seq Scan on ebv c (cost=0.00..195640.76
rows=1392655 width=20) (actual time=4499.94..74509.34 rows=1460290 loops=1)
Filter: (runid = 72)
-> Hash (cost=15474.16..15474.16 rows=214612
width=53) (actual time=5845.85..5845.85 rows=0 loops=1)
-> Hash Join (cost=147.83..15474.16 rows=214612
width=53) (actual time=58.75..5346.04 rows=226561 loops=1)
Hash Cond: ("outer".familyid =
"inner".familyid)
-> Seq Scan on genotypes a
(cost=0.00..9424.51 rows=214612 width=17) (actual time=7.00..2799.43
rows=226561 loops=1)
Filter: (speciesid = 2)
-> Hash (cost=132.46..132.46 rows=6146
width=36) (actual time=51.54..51.54 rows=0 loops=1)
-> Seq Scan on geneticfamily b
(cost=0.00..132.46 rows=6146 width=36) (actual time=2.88..39.66 rows=6146
loops=1)
-> Hash (cost=6.60..6.60 rows=13 width=37) (actual
time=31.05..31.05 rows=0 loops=1)
-> Hash Join (cost=5.16..6.60 rows=13 width=37)
(actual time=30.92..31.03 rows=13 loops=1)
Hash Cond: ("outer".systemid = "inner".systemid)
-> Seq Scan on tpsystems cba (cost=0.00..1.16
rows=16 width=29) (actual time=6.67..6.72 rows=16 loops=1)
-> Hash (cost=5.13..5.13 rows=13 width=8)
(actual time=24.17..24.17 rows=0 loops=1)
-> Seq Scan on tpr cb (cost=0.00..5.13
rows=13 width=8) (actual time=4.72..24.14 rows=13 loops=1)
-> Hash (cost=1.41..1.41 rows=41 width=20) (actual
time=13.62..13.62 rows=0 loops=1)
-> Seq Scan on sctraitdefinition ca (cost=0.00..1.41
rows=41 width=20) (actual time=13.43..13.54 rows=41 loops=1)
Total runtime: 347780.04 msec
(26 rows)
The tables look like this:
Table "public.ebv"
Column | Type | Modifiers
-----------+---------+-----------
runid | integer | not null
genid | integer | not null
sctraitid | integer | not null
nvalue | real | not null
accuracy | real |
Indexes: idx_ebg btree (genid),
idx_ebv btree (runid),
idx_ebvr btree (runid)
Table "public.genotypes"
Column | Type |
Modifiers
----------------+--------------------------+--------------------------------
-------------------------
genid | integer | not null default
nextval('"genotypes_genid_seq"'::text)
speciesid | integer | not null
familyid | integer | not null
created | timestamp with time zone | not null
batchid | integer | not null
lastmodifiedby | integer | not null
lastmodified | timestamp with time zone | not null
name | character varying(100) |
Indexes: genotypes_pkey primary key btree (genid),
idx_gnm unique btree (speciesid, name),
idx_gb btree (batchid),
idxc btree (created),
idxgf btree (familyid),
idxgsp btree (speciesid)
Table "public.tpr"
Column | Type |
Modifiers
---------------+--------------------------+---------------------------------
--------------------------
runid | integer | not null default
nextval('"tpr_runid_seq"'::text)
description | text | not null
systemid | integer | not null
runlog | text |
ranby | integer | not null
whenrun | timestamp with time zone | not null
valid_p | character(1) | not null default 'f'
whenvalidated | timestamp with time zone |
whovalidated | integer |
notes | text |
Indexes: tpr_pkey primary key btree (runid),
idx_tprv btree (valid_p)
Table "public.geneticfamily"
Column | Type |
Modifiers
----------------+--------------------------+--------------------------------
--------------------------------
familyid | integer | not null default
nextval('"geneticfamily_familyid_seq"'::text)
speciesid | integer | not null
mumid | integer | not null
mumtype | character(1) | not null
dadid | integer | not null
dadtype | character(1) | not null
batchid | integer | not null
lastmodifiedby | integer | not null
lastmodified | timestamp with time zone |
name | character varying(100) |
Indexes: geneticfamily_pkey primary key btree (familyid),
idu_gfam unique btree (mumid, dadid, mumtype, dadtype),
idx_gfnm unique btree (speciesid, name),
idx_gfb btree (batchid),
idx_gfun btree (upper(name)),
idxgfd btree (dadid),
idxgff btree (mumid, dadid),
idxgfm btree (mumid),
idxpsp btree (speciesid)
Table "public.tpsystems"
Column | Type |
Modifiers
------------------+--------------------------+------------------------------
------------------------------
systemid | integer | not null default
nextval('"tpsystems_systemid_seq"'::text)
speciesid | integer | not null
title | character varying(250) | not null
description | text |
lastmodifiedby | integer | not null
lastmodified | timestamp with time zone | not null
variancechecksum | character varying(32) |
Indexes: tpsystems_pkey primary key btree (systemid)
Table "public.sctraitdefinition"
Column | Type |
Modifiers
--------------------+--------------------------+----------------------------
-----------------------------------------
sctraitid | integer | not null default
nextval('"sctraitdefinition_sctraitid_seq"'::text)
speciesid | integer | not null
sctrait | character varying(32) | not null
lastmodifiedby | integer | not null
lastmodified | timestamp with time zone | not null
datatype | character varying(32) |
datatransformation | character varying(32) |
Indexes: sctraitdefinition_pkey primary key btree (sctraitid),
idu_sctd unique btree (speciesid, sctrait)
--------------------------------------------------------------------------
'top' looks like this most of the time.. (except when queries are running)
8:40pm up 54 days, 7:45, 2 users, load average: 0.00, 0.19, 0.32
44 processes: 43 sleeping, 1 running, 0 zombie, 0 stopped
CPU0 states: 0.1% user, 0.0% system, 0.0% nice, 99.4% idle
CPU1 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle
Mem: 1028428K av, 846776K used, 181652K free, 0K shrd, 1580K
buff
Swap: 530104K av, 53540K used, 476564K free 804148K
cached