Thread: 'best practises' to speed up sorting? tuning postgresql.conf

'best practises' to speed up sorting? tuning postgresql.conf

From

"Chris Hutchinson"

Date:

28 June 2003, 08:32:04

I'm wondering how to speed up sorting which is slowing a query run regularly
on a linux postgresql 7.3.3 system.

The box is a dual PIII with 1Gb ram. The database is located on a 20Gb SCSI
disk, with WAL on a separate disk.

The only changes I've made to postgresql.conf so far are:
shared_buffers=4000
sort_mem=4000

The results of explain analyze are below. The largest tables in the join,
genotypes and ebv, contain 440681 and 3060781 rows respectively, the others
contain perhaps a couple of hundred each.

I've reduced the runtime of the query to 2 minutes down from 10
minutes by eliminating a left join and ORDER BY, but it would be nice to
speed it up further.
The time goes up to nearly 6 minutes if I put the ORDER BY back into the
query.

Any suggestions gratefully received.

Regards,
Chris Hutchinson

----------------------------------------------------------------------

'fast query' without sort

explain analyze
SELECT A.genid,A.name,B.familyid,B.name,B.mumid,B.mumtype,B.dadid,
    B.dadtype,CBA.title,CA.sctrait,C.nvalue,C.accuracy
FROM genotypes A , ebv C , sctraitdefinition CA ,
    tpr CB , tpsystems CBA , geneticfamily B
WHERE (C.genid=A.genid) and ((A.speciesid='2') and (B.familyid=A.familyid))
and
    ((C.runid='72') and (CA.sctraitid=C.sctraitid)
    and (CB.runid=C.runid) and (CBA.systemid=CB.systemid));

QUERY PLAN
-------------------------------------------------------
 Hash Join  (cost=18114.84..287310.15 rows=678223 width=130) (actual
time=11824.24..139737.82 rows=1460290 loops=1)
   Hash Cond: ("outer".sctraitid = "inner".sctraitid)
   ->  Hash Join  (cost=18113.32..273744.17 rows=678223 width=110) (actual
time=11813.27..124934.74 rows=1460290 loops=1)
         Hash Cond: ("outer".runid = "inner".runid)
         ->  Hash Join  (cost=18106.69..260173.07 rows=678223 width=73)
(actual time=11782.12..107005.61 rows=1460290 loops=1)
               Hash Cond: ("outer".genid = "inner".genid)
               ->  Seq Scan on ebv c  (cost=0.00..195640.76 rows=1392655
width=20) (actual time=4684.71..68728.26 rows=1460290 loops=1)
                     Filter: (runid = 72)
               ->  Hash  (cost=15474.16..15474.16 rows=214612 width=53)
(actual time=7089.57..7089.57 rows=0 loops=1)
                     ->  Hash Join  (cost=147.83..15474.16 rows=214612
width=53) (actual time=58.74..6597.01 rows=226561 loops=1)
                           Hash Cond: ("outer".familyid = "inner".familyid)
                           ->  Seq Scan on genotypes a  (cost=0.00..9424.51
rows=214612 width=17) (actual time=10.47..3914.89 rows=226561 loops=1)
                                 Filter: (speciesid = 2)
                           ->  Hash  (cost=132.46..132.46 rows=6146
width=36) (actual time=48.06..48.06 rows=0 loops=1)
                                 ->  Seq Scan on geneticfamily b
(cost=0.00..132.46 rows=6146 width=36) (actual time=2.90..36.74 rows=6146
loops=1)
         ->  Hash  (cost=6.60..6.60 rows=13 width=37) (actual
time=31.06..31.06 rows=0 loops=1)
               ->  Hash Join  (cost=5.16..6.60 rows=13 width=37) (actual
time=30.93..31.04 rows=13 loops=1)
                     Hash Cond: ("outer".systemid = "inner".systemid)
                     ->  Seq Scan on tpsystems cba  (cost=0.00..1.16 rows=16
width=29) (actual time=6.68..6.72 rows=16 loops=1)
                     ->  Hash  (cost=5.13..5.13 rows=13 width=8) (actual
time=24.17..24.17 rows=0 loops=1)
                           ->  Seq Scan on tpr cb  (cost=0.00..5.13 rows=13
width=8) (actual time=4.75..24.15 rows=13 loops=1)
   ->  Hash  (cost=1.41..1.41 rows=41 width=20) (actual time=10.90..10.90
rows=0 loops=1)
         ->  Seq Scan on sctraitdefinition ca  (cost=0.00..1.41 rows=41
width=20) (actual time=10.72..10.82 rows=41 loops=1)
 Total runtime: 140736.98 msec
(24 rows)


'slow query' with sort

explain analyze
SELECT A.genid,A.name,B.familyid,B.name,B.mumid,B.mumtype,B.dadid,
    B.dadtype,CBA.title,CA.sctrait,C.nvalue,C.accuracy
FROM genotypes A , ebv C , sctraitdefinition CA ,
    tpr CB , tpsystems CBA , geneticfamily B
WHERE (C.genid=A.genid) and ((A.speciesid='2') and (B.familyid=A.familyid))
and
    ((C.runid='72') and (CA.sctraitid=C.sctraitid)
    and (CB.runid=C.runid) and (CBA.systemid=CB.systemid))
ORDER BY A.genid ASC,B.familyid ASC,CA.sctrait ASC;

QUERY PLAN
-------------------------------------------------------
 Sort  (cost=540740.81..542436.37 rows=678223 width=130) (actual
time=322602.06..346710.43 rows=1460290 loops=1)
   Sort Key: a.genid, b.familyid, ca.sctrait
   ->  Hash Join  (cost=18114.84..287310.15 rows=678223 width=130) (actual
time=10398.55..144991.62 rows=1460290 loops=1)
         Hash Cond: ("outer".sctraitid = "inner".sctraitid)
         ->  Hash Join  (cost=18113.32..273744.17 rows=678223 width=110)
(actual time=10384.84..129637.23 rows=1460290 loops=1)
               Hash Cond: ("outer".runid = "inner".runid)
               ->  Hash Join  (cost=18106.69..260173.07 rows=678223
width=73) (actual time=10353.69..111239.94 rows=1460290 loops=1)
                     Hash Cond: ("outer".genid = "inner".genid)
                     ->  Seq Scan on ebv c  (cost=0.00..195640.76
rows=1392655 width=20) (actual time=4499.94..74509.34 rows=1460290 loops=1)
                           Filter: (runid = 72)
                     ->  Hash  (cost=15474.16..15474.16 rows=214612
width=53) (actual time=5845.85..5845.85 rows=0 loops=1)
                           ->  Hash Join  (cost=147.83..15474.16 rows=214612
width=53) (actual time=58.75..5346.04 rows=226561 loops=1)
                                 Hash Cond: ("outer".familyid =
"inner".familyid)
                                 ->  Seq Scan on genotypes a
(cost=0.00..9424.51 rows=214612 width=17) (actual time=7.00..2799.43
rows=226561 loops=1)
                                       Filter: (speciesid = 2)
                                 ->  Hash  (cost=132.46..132.46 rows=6146
width=36) (actual time=51.54..51.54 rows=0 loops=1)
                                       ->  Seq Scan on geneticfamily b
(cost=0.00..132.46 rows=6146 width=36) (actual time=2.88..39.66 rows=6146
loops=1)
               ->  Hash  (cost=6.60..6.60 rows=13 width=37) (actual
time=31.05..31.05 rows=0 loops=1)
                     ->  Hash Join  (cost=5.16..6.60 rows=13 width=37)
(actual time=30.92..31.03 rows=13 loops=1)
                           Hash Cond: ("outer".systemid = "inner".systemid)
                           ->  Seq Scan on tpsystems cba  (cost=0.00..1.16
rows=16 width=29) (actual time=6.67..6.72 rows=16 loops=1)
                           ->  Hash  (cost=5.13..5.13 rows=13 width=8)
(actual time=24.17..24.17 rows=0 loops=1)
                                 ->  Seq Scan on tpr cb  (cost=0.00..5.13
rows=13 width=8) (actual time=4.72..24.14 rows=13 loops=1)
         ->  Hash  (cost=1.41..1.41 rows=41 width=20) (actual
time=13.62..13.62 rows=0 loops=1)
               ->  Seq Scan on sctraitdefinition ca  (cost=0.00..1.41
rows=41 width=20) (actual time=13.43..13.54 rows=41 loops=1)
 Total runtime: 347780.04 msec
(26 rows)

The tables look like this:

Table "public.ebv"
  Column   |  Type   | Modifiers
-----------+---------+-----------
 runid     | integer | not null
 genid     | integer | not null
 sctraitid | integer | not null
 nvalue    | real    | not null
 accuracy  | real    |
Indexes: idx_ebg btree (genid),
         idx_ebv btree (runid),
         idx_ebvr btree (runid)

Table "public.genotypes"
     Column     |           Type           |
Modifiers
----------------+--------------------------+--------------------------------
-------------------------
 genid          | integer                  | not null default
nextval('"genotypes_genid_seq"'::text)
 speciesid      | integer                  | not null
 familyid       | integer                  | not null
 created        | timestamp with time zone | not null
 batchid        | integer                  | not null
 lastmodifiedby | integer                  | not null
 lastmodified   | timestamp with time zone | not null
 name           | character varying(100)   |
Indexes: genotypes_pkey primary key btree (genid),
         idx_gnm unique btree (speciesid, name),
         idx_gb btree (batchid),
         idxc btree (created),
         idxgf btree (familyid),
         idxgsp btree (speciesid)

Table "public.tpr"
    Column     |           Type           |
Modifiers
---------------+--------------------------+---------------------------------
--------------------------
 runid         | integer                  | not null default
nextval('"tpr_runid_seq"'::text)
 description   | text                     | not null
 systemid      | integer                  | not null
 runlog        | text                     |
 ranby         | integer                  | not null
 whenrun       | timestamp with time zone | not null
 valid_p       | character(1)             | not null default 'f'
 whenvalidated | timestamp with time zone |
 whovalidated  | integer                  |
 notes         | text                     |
Indexes: tpr_pkey primary key btree (runid),
         idx_tprv btree (valid_p)

Table "public.geneticfamily"
     Column     |           Type           |
Modifiers
----------------+--------------------------+--------------------------------
--------------------------------
 familyid       | integer                  | not null default
nextval('"geneticfamily_familyid_seq"'::text)
 speciesid      | integer                  | not null
 mumid          | integer                  | not null
 mumtype        | character(1)             | not null
 dadid          | integer                  | not null
 dadtype        | character(1)             | not null
 batchid        | integer                  | not null
 lastmodifiedby | integer                  | not null
 lastmodified   | timestamp with time zone |
 name           | character varying(100)   |
Indexes: geneticfamily_pkey primary key btree (familyid),
         idu_gfam unique btree (mumid, dadid, mumtype, dadtype),
         idx_gfnm unique btree (speciesid, name),
         idx_gfb btree (batchid),
         idx_gfun btree (upper(name)),
         idxgfd btree (dadid),
         idxgff btree (mumid, dadid),
         idxgfm btree (mumid),
         idxpsp btree (speciesid)

Table "public.tpsystems"
      Column      |           Type           |
Modifiers
------------------+--------------------------+------------------------------
------------------------------
 systemid         | integer                  | not null default
nextval('"tpsystems_systemid_seq"'::text)
 speciesid        | integer                  | not null
 title            | character varying(250)   | not null
 description      | text                     |
 lastmodifiedby   | integer                  | not null
 lastmodified     | timestamp with time zone | not null
 variancechecksum | character varying(32)    |
Indexes: tpsystems_pkey primary key btree (systemid)

Table "public.sctraitdefinition"
       Column       |           Type           |
Modifiers
--------------------+--------------------------+----------------------------
-----------------------------------------
 sctraitid          | integer                  | not null default
nextval('"sctraitdefinition_sctraitid_seq"'::text)
 speciesid          | integer                  | not null
 sctrait            | character varying(32)    | not null
 lastmodifiedby     | integer                  | not null
 lastmodified       | timestamp with time zone | not null
 datatype           | character varying(32)    |
 datatransformation | character varying(32)    |
Indexes: sctraitdefinition_pkey primary key btree (sctraitid),
         idu_sctd unique btree (speciesid, sctrait)


--------------------------------------------------------------------------
'top' looks like this most of the time.. (except when queries are running)

 8:40pm  up 54 days,  7:45,  2 users,  load average: 0.00, 0.19, 0.32
 44 processes: 43 sleeping, 1 running, 0 zombie, 0 stopped
 CPU0 states:  0.1% user,  0.0% system,  0.0% nice, 99.4% idle
 CPU1 states:  0.0% user,  0.0% system,  0.0% nice, 100.0% idle
 Mem:  1028428K av,  846776K used,  181652K free,       0K shrd,    1580K
buff
 Swap:  530104K av,   53540K used,  476564K free                  804148K
cached

Re: 'best practises' to speed up sorting? tuning postgresql.conf

From

Shridhar Daithankar

Date:

28 June 2003, 08:50:30

On Saturday 28 June 2003 17:01, Chris Hutchinson wrote:
> I'm wondering how to speed up sorting which is slowing a query run
> regularly on a linux postgresql 7.3.3 system.
>
> The box is a dual PIII with 1Gb ram. The database is located on a 20Gb SCSI
> disk, with WAL on a separate disk.
>
> The only changes I've made to postgresql.conf so far are:
> shared_buffers=4000
> sort_mem=4000

sort_mem is in kbs. So you are setting it roughyl 4MB. Given that you don't
seem to have any shortage of RAM, how about 32/64/128MB RAM till things work
as expected?

Of course, setting it so in postgresql.conf would be a bad idea. Just set for
the session that makes this query and reset it back.

Keep us posted on updates..

 Shridhar

Re: 'best practises' to speed up sorting? tuning postgresql.conf

From

"Arjen van der Meijden"

Date:

28 June 2003, 08:58:32

> Chris Hutchinson wrote:

> I'm wondering how to speed up sorting which is slowing a
> query run regularly on a linux postgresql 7.3.3 system.
I see a lot of seq scans in your explain and there are no index scans,
have you done a 'vacuum analyze' lately?

Arjen