Thread: Hash join on int takes 8..114 seconds

Hash join on int takes 8..114 seconds

From

"Andrus"

Date:

19 November 2008, 16:40:37

Query below seems to use indexes everywhere in most optimal way.
dokumnr column is of type int

Speed of this query varies rapidly:

In live db fastest response I have got is 8 seconds.
Re-running same query  after 10 seconds may take 60 seconds.
Re-running it again after 10 seconds may take 114 seconds.

Any idea how to speed it up ?

Is it possible to optimize it, will upgrading to 8.3.5 help or should I
require to add more RAM, disk or CPU speed ?

Real query contains column list instead of sum(1) used in test below.

Andrus.


explain analyze
SELECT sum(1)
   FROM dok JOIN rid USING (dokumnr)
   JOIN toode USING (toode)
   LEFT JOIN artliik using(grupp,liik)
   WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01'

longest response time:

"Aggregate  (cost=234278.53..234278.54 rows=1 width=0) (actual
time=114479.933..114479.936 rows=1 loops=1)"
"  ->  Hash Left Join  (cost=52111.20..234218.21 rows=24126 width=0) (actual
time=100435.523..114403.293 rows=20588 loops=1)"
"        Hash Cond: (("outer".grupp = "inner".grupp) AND ("outer".liik =
"inner".liik))"
"        ->  Nested Loop  (cost=52103.94..233735.35 rows=24126 width=19)
(actual time=100405.258..114207.387 rows=20588 loops=1)"
"              ->  Index Scan using toode_pkey on toode  (cost=0.00..6.01
rows=1 width=43) (actual time=18.312..18.325 rows=1 loops=1)"
"                    Index Cond: ('X05'::bpchar = toode)"
"              ->  Hash Join  (cost=52103.94..233488.08 rows=24126 width=24)
(actual time=100386.921..114037.986 rows=20588 loops=1)"
"                    Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
"                    ->  Bitmap Heap Scan on rid  (cost=4127.51..175020.84
rows=317003 width=28) (actual time=11119.932..76225.918 rows=277294
loops=1)"
"                          Recheck Cond: (toode = 'X05'::bpchar)"
"                          ->  Bitmap Index Scan on rid_toode_idx
(cost=0.00..4127.51 rows=317003 width=0) (actual time=11105.807..11105.807
rows=280599 loops=1)"
"                                Index Cond: (toode = 'X05'::bpchar)"
"                    ->  Hash  (cost=47376.82..47376.82 rows=93444 width=4)
(actual time=35082.427..35082.427 rows=105202 loops=1)"
"                          ->  Index Scan using dok_kuupaev_idx on dok
(cost=0.00..47376.82 rows=93444 width=4) (actual time=42.110..34586.331
rows=105202 loops=1)"
"                                Index Cond: (kuupaev >=
'2008-09-01'::date)"
"        ->  Hash  (cost=6.84..6.84 rows=84 width=19) (actual
time=30.220..30.220 rows=84 loops=1)"
"              ->  Seq Scan on artliik  (cost=0.00..6.84 rows=84 width=19)
(actual time=20.104..29.845 rows=84 loops=1)"
"Total runtime: 114480.373 ms"

Same query in other runs:


"Aggregate  (cost=234278.53..234278.54 rows=1 width=0) (actual
time=62164.496..62164.500 rows=1 loops=1)"
"  ->  Hash Left Join  (cost=52111.20..234218.21 rows=24126 width=0) (actual
time=46988.005..62088.379 rows=20588 loops=1)"
"        Hash Cond: (("outer".grupp = "inner".grupp) AND ("outer".liik =
"inner".liik))"
"        ->  Nested Loop  (cost=52103.94..233735.35 rows=24126 width=19)
(actual time=46957.750..61893.613 rows=20588 loops=1)"
"              ->  Index Scan using toode_pkey on toode  (cost=0.00..6.01
rows=1 width=43) (actual time=146.530..146.543 rows=1 loops=1)"
"                    Index Cond: ('X05'::bpchar = toode)"
"              ->  Hash Join  (cost=52103.94..233488.08 rows=24126 width=24)
(actual time=46811.194..61595.560 rows=20588 loops=1)"
"                    Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
"                    ->  Bitmap Heap Scan on rid  (cost=4127.51..175020.84
rows=317003 width=28) (actual time=1870.209..55864.237 rows=277294 loops=1)"
"                          Recheck Cond: (toode = 'X05'::bpchar)"
"                          ->  Bitmap Index Scan on rid_toode_idx
(cost=0.00..4127.51 rows=317003 width=0) (actual time=1863.713..1863.713
rows=280599 loops=1)"
"                                Index Cond: (toode = 'X05'::bpchar)"
"                    ->  Hash  (cost=47376.82..47376.82 rows=93444 width=4)
(actual time=1650.823..1650.823 rows=105202 loops=1)"
"                          ->  Index Scan using dok_kuupaev_idx on dok
(cost=0.00..47376.82 rows=93444 width=4) (actual time=0.091..1190.962
rows=105202 loops=1)"
"                                Index Cond: (kuupaev >=
'2008-09-01'::date)"
"        ->  Hash  (cost=6.84..6.84 rows=84 width=19) (actual
time=30.210..30.210 rows=84 loops=1)"
"              ->  Seq Scan on artliik  (cost=0.00..6.84 rows=84 width=19)
(actual time=20.069..29.836 rows=84 loops=1)"
"Total runtime: 62164.789 ms"



"Aggregate  (cost=234278.53..234278.54 rows=1 width=0) (actual
time=40185.499..40185.503 rows=1 loops=1)"
"  ->  Hash Left Join  (cost=52111.20..234218.21 rows=24126 width=0) (actual
time=32646.761..40109.470 rows=20585 loops=1)"
"        Hash Cond: (("outer".grupp = "inner".grupp) AND ("outer".liik =
"inner".liik))"
"        ->  Nested Loop  (cost=52103.94..233735.35 rows=24126 width=19)
(actual time=32645.933..39944.242 rows=20585 loops=1)"
"              ->  Index Scan using toode_pkey on toode  (cost=0.00..6.01
rows=1 width=43) (actual time=0.072..0.085 rows=1 loops=1)"
"                    Index Cond: ('X05'::bpchar = toode)"
"              ->  Hash Join  (cost=52103.94..233488.08 rows=24126 width=24)
(actual time=32645.839..39793.180 rows=20585 loops=1)"
"                    Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
"                    ->  Bitmap Heap Scan on rid  (cost=4127.51..175020.84
rows=317003 width=28) (actual time=1823.542..36318.419 rows=277291 loops=1)"
"                          Recheck Cond: (toode = 'X05'::bpchar)"
"                          ->  Bitmap Index Scan on rid_toode_idx
(cost=0.00..4127.51 rows=317003 width=0) (actual time=1817.053..1817.053
rows=280596 loops=1)"
"                                Index Cond: (toode = 'X05'::bpchar)"
"                    ->  Hash  (cost=47376.82..47376.82 rows=93444 width=4)
(actual time=1242.785..1242.785 rows=105195 loops=1)"
"                          ->  Index Scan using dok_kuupaev_idx on dok
(cost=0.00..47376.82 rows=93444 width=4) (actual time=0.088..788.399
rows=105195 loops=1)"
"                                Index Cond: (kuupaev >=
'2008-09-01'::date)"
"        ->  Hash  (cost=6.84..6.84 rows=84 width=19) (actual
time=0.786..0.786 rows=84 loops=1)"
"              ->  Seq Scan on artliik  (cost=0.00..6.84 rows=84 width=19)
(actual time=0.019..0.419 rows=84 loops=1)"
"Total runtime: 40186.102 ms"


"Aggregate  (cost=234278.53..234278.54 rows=1 width=0) (actual
time=29650.398..29650.402 rows=1 loops=1)"
"  ->  Hash Left Join  (cost=52111.20..234218.21 rows=24126 width=0) (actual
time=23513.713..29569.448 rows=20588 loops=1)"
"        Hash Cond: (("outer".grupp = "inner".grupp) AND ("outer".liik =
"inner".liik))"
"        ->  Nested Loop  (cost=52103.94..233735.35 rows=24126 width=19)
(actual time=23512.808..29388.712 rows=20588 loops=1)"
"              ->  Index Scan using toode_pkey on toode  (cost=0.00..6.01
rows=1 width=43) (actual time=61.813..61.825 rows=1 loops=1)"
"                    Index Cond: ('X05'::bpchar = toode)"
"              ->  Hash Join  (cost=52103.94..233488.08 rows=24126 width=24)
(actual time=23450.970..29163.774 rows=20588 loops=1)"
"                    Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
"                    ->  Bitmap Heap Scan on rid  (cost=4127.51..175020.84
rows=317003 width=28) (actual time=2098.630..24362.104 rows=277294 loops=1)"
"                          Recheck Cond: (toode = 'X05'::bpchar)"
"                          ->  Bitmap Index Scan on rid_toode_idx
(cost=0.00..4127.51 rows=317003 width=0) (actual time=2086.772..2086.772
rows=280599 loops=1)"
"                                Index Cond: (toode = 'X05'::bpchar)"
"                    ->  Hash  (cost=47376.82..47376.82 rows=93444 width=4)
(actual time=2860.088..2860.088 rows=105202 loops=1)"
"                          ->  Index Scan using dok_kuupaev_idx on dok
(cost=0.00..47376.82 rows=93444 width=4) (actual time=0.088..2365.063
rows=105202 loops=1)"
"                                Index Cond: (kuupaev >=
'2008-09-01'::date)"
"        ->  Hash  (cost=6.84..6.84 rows=84 width=19) (actual
time=0.861..0.861 rows=84 loops=1)"
"              ->  Seq Scan on artliik  (cost=0.00..6.84 rows=84 width=19)
(actual time=0.039..0.458 rows=84 loops=1)"
"Total runtime: 29650.696 ms"


"Aggregate  (cost=234278.53..234278.54 rows=1 width=0) (actual
time=11131.392..11131.396 rows=1 loops=1)"
"  ->  Hash Left Join  (cost=52111.20..234218.21 rows=24126 width=0) (actual
time=9179.703..11043.906 rows=20588 loops=1)"
"        Hash Cond: (("outer".grupp = "inner".grupp) AND ("outer".liik =
"inner".liik))"
"        ->  Nested Loop  (cost=52103.94..233735.35 rows=24126 width=19)
(actual time=9178.827..10858.383 rows=20588 loops=1)"
"              ->  Index Scan using toode_pkey on toode  (cost=0.00..6.01
rows=1 width=43) (actual time=10.252..10.264 rows=1 loops=1)"
"                    Index Cond: ('X05'::bpchar = toode)"
"              ->  Hash Join  (cost=52103.94..233488.08 rows=24126 width=24)
(actual time=9168.552..10675.424 rows=20588 loops=1)"
"                    Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
"                    ->  Bitmap Heap Scan on rid  (cost=4127.51..175020.84
rows=317003 width=28) (actual time=2129.814..7152.618 rows=277294 loops=1)"
"                          Recheck Cond: (toode = 'X05'::bpchar)"
"                          ->  Bitmap Index Scan on rid_toode_idx
(cost=0.00..4127.51 rows=317003 width=0) (actual time=2123.223..2123.223
rows=280599 loops=1)"
"                                Index Cond: (toode = 'X05'::bpchar)"
"                    ->  Hash  (cost=47376.82..47376.82 rows=93444 width=4)
(actual time=1414.254..1414.254 rows=105202 loops=1)"
"                          ->  Index Scan using dok_kuupaev_idx on dok
(cost=0.00..47376.82 rows=93444 width=4) (actual time=0.092..895.533
rows=105202 loops=1)"
"                                Index Cond: (kuupaev >=
'2008-09-01'::date)"
"        ->  Hash  (cost=6.84..6.84 rows=84 width=19) (actual
time=0.833..0.833 rows=84 loops=1)"
"              ->  Seq Scan on artliik  (cost=0.00..6.84 rows=84 width=19)
(actual time=0.043..0.465 rows=84 loops=1)"
"Total runtime: 11131.694 ms"


Environment:

3-8 concurrent users

"PostgreSQL 8.1.4 on i686-pc-linux-gnu, compiled by GCC
i686-pc-linux-gnu-gcc (GCC) 3.4.6 (Gentoo 3.4.6-r1, ssp-3.4.5-1.0,
pie-8.7.9)"

# free
             total       used       free     shared    buffers     cached
Mem:       2075828    2008228      67600          0          0    1904552
-/+ buffers/cache:     103676    1972152
Swap:      3911816         76    3911740

Re: Hash join on int takes 8..114 seconds

From

PFC

Date:

19 November 2008, 19:28:01

> Query below seems to use indexes everywhere in most optimal way.
> dokumnr column is of type int
>
> Speed of this query varies rapidly:
>
> In live db fastest response I have got is 8 seconds.
> Re-running same query  after 10 seconds may take 60 seconds.
> Re-running it again after 10 seconds may take 114 seconds.
>
> Any idea how to speed it up ?
>
> Is it possible to optimize it, will upgrading to 8.3.5 help or should I
> require to add more RAM, disk or CPU speed ?
>
> Real query contains column list instead of sum(1) used in test below.
>
> Andrus.

    Just a question, what are you doing with the 20.000 result rows ?

Re: Hash join on int takes 8..114 seconds

From

"Andrus"

Date:

20 November 2008, 05:12:41

> Just a question, what are you doing with the 20.000 result rows ?

Those rows represent monthly sales data of one item.
They are used as following:

1. Detailed sales report for month. This report can browsed in screen for
montly sales and ordering analysis.

2. Total reports. In those reports, sum( sales), sum(quantity) is used to
get total sales in day, week, month, time for item and resulting rows are
summed.

Andrus.

Re: Hash join on int takes 8..114 seconds

From

Richard Huxton

Date:

20 November 2008, 06:46:00

Andrus wrote:
> Query below seems to use indexes everywhere in most optimal way.
> dokumnr column is of type int
>
> Speed of this query varies rapidly:
>
> In live db fastest response I have got is 8 seconds.
> Re-running same query  after 10 seconds may take 60 seconds.
> Re-running it again after 10 seconds may take 114 seconds.
>
> Any idea how to speed it up ?
>
> Is it possible to optimize it, will upgrading to 8.3.5 help or should I
> require to add more RAM, disk or CPU speed ?

At a quick glance, the plans look the same to me. The overall costs are
certainly identical. That means whatever is affecting the query times it
isn't the query plan.

> "Aggregate  (cost=234278.53..234278.54 rows=1 width=0) (actual
> time=62164.496..62164.500 rows=1 loops=1)"
> "Total runtime: 62164.789 ms"

> "Aggregate  (cost=234278.53..234278.54 rows=1 width=0) (actual
> time=40185.499..40185.503 rows=1 loops=1)"
> "Total runtime: 40186.102 ms"

> "Aggregate  (cost=234278.53..234278.54 rows=1 width=0) (actual
> time=29650.398..29650.402 rows=1 loops=1)"
> "Total runtime: 29650.696 ms"

> "Aggregate  (cost=234278.53..234278.54 rows=1 width=0) (actual
> time=11131.392..11131.396 rows=1 loops=1)"
> "Total runtime: 11131.694 ms"

So - what other activity is happening on this machine? Either other
queries are taking up noticeable resources, or some other process is (it
might be disk activity from checkpointing, logging some other application).

--
  Richard Huxton
  Archonet Ltd

Re: Hash join on int takes 8..114 seconds

From

"Andrus"

Date:

20 November 2008, 07:14:56

Richard,

> At a quick glance, the plans look the same to me. The overall costs are
> certainly identical. That means whatever is affecting the query times it
> isn't the query plan.
>
> So - what other activity is happening on this machine? Either other
> queries are taking up noticeable resources, or some other process is (it
> might be disk activity from checkpointing, logging some other
> application).

Thank you.
This is dedicated server running only PostgreSql which serves approx 6 point
of sales at this time.

Maybe those other clients make queries which invalidate lot of data loaded
into server cache.
In next time server must read it again from disk which causes those
perfomance differences.

top output is currently:

top - 13:13:10 up 22 days, 18:25,  1 user,  load average: 0.19, 0.12, 0.19
Tasks:  53 total,   2 running,  51 sleeping,   0 stopped,   0 zombie
Cpu(s): 13.7% us,  2.0% sy,  0.0% ni, 78.3% id,  6.0% wa,  0.0% hi,  0.0% si
Mem:   2075828k total,  2022808k used,    53020k free,        0k buffers
Swap:  3911816k total,       88k used,  3911728k free,  1908536k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5382 postgres  15   0  144m  43m  40m S 15.0  2.2   0:00.45 postmaster
 5358 postgres  15   0  152m  87m  75m S  0.3  4.3   0:00.97 postmaster
    1 root      16   0  1480  508  444 S  0.0  0.0   0:01.35 init
    2 root      34  19     0    0    0 S  0.0  0.0   0:00.01 ksoftirqd/0
    3 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 events/0
    4 root      10  -5     0    0    0 S  0.0  0.0   0:00.42 khelper
    5 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kthread
    7 root      10  -5     0    0    0 S  0.0  0.0   2:03.91 kblockd/0
    8 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 kacpid
  115 root      13  -5     0    0    0 S  0.0  0.0   0:00.00 aio/0
  114 root      15   0     0    0    0 S  0.0  0.0   8:49.67 kswapd0
  116 root      10  -5     0    0    0 S  0.0  0.0   0:10.32 xfslogd/0
  117 root      10  -5     0    0    0 S  0.0  0.0   0:39.96 xfsdatad/0
  706 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kseriod
  723 root      13  -5     0    0    0 S  0.0  0.0   0:00.00 kpsmoused
  738 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 ata/0
  740 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 scsi_eh_0
  741 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 scsi_eh_1
  742 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 scsi_eh_2
  743 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 scsi_eh_3
  762 root      10  -5     0    0    0 S  0.0  0.0   0:17.54 xfsbufd
  763 root      10  -5     0    0    0 S  0.0  0.0   0:00.68 xfssyncd
  963 root      16  -4  1712  528  336 S  0.0  0.0   0:00.24 udevd
 6677 root      15   0  1728  572  400 S  0.0  0.0   0:04.99 syslog-ng
 7128 postgres  16   0  140m  10m 9900 S  0.0  0.5   0:05.60 postmaster

in few seconds later:

top - 13:14:01 up 22 days, 18:26,  1 user,  load average: 1.72, 0.53, 0.32
Tasks:  52 total,   2 running,  50 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.3% us,  3.0% sy,  0.0% ni,  0.0% id, 91.0% wa,  0.0% hi,  0.7% si
Mem:   2075828k total,  2022692k used,    53136k free,        0k buffers
Swap:  3911816k total,       88k used,  3911728k free,  1905028k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1179 postgres  18   0  155m 136m 122m D  6.7  6.7   1:32.52 postmaster
 4748 postgres  15   0  145m 126m 122m D  1.3  6.2   0:14.38 postmaster
 5358 postgres  16   0  160m  98m  81m D  0.7  4.9   0:01.21 postmaster
    1 root      16   0  1480  508  444 S  0.0  0.0   0:01.35 init
    2 root      34  19     0    0    0 S  0.0  0.0   0:00.01 ksoftirqd/0
    3 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 events/0
    4 root      10  -5     0    0    0 S  0.0  0.0   0:00.42 khelper
    5 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kthread
    7 root      10  -5     0    0    0 S  0.0  0.0   2:03.97 kblockd/0
    8 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 kacpid
  115 root      13  -5     0    0    0 S  0.0  0.0   0:00.00 aio/0
  114 root      15   0     0    0    0 S  0.0  0.0   8:49.79 kswapd0
  116 root      10  -5     0    0    0 S  0.0  0.0   0:10.32 xfslogd/0
  117 root      10  -5     0    0    0 S  0.0  0.0   0:39.96 xfsdatad/0
  706 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kseriod
  723 root      13  -5     0    0    0 S  0.0  0.0   0:00.00 kpsmoused
  738 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 ata/0
  740 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 scsi_eh_0
  741 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 scsi_eh_1
  742 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 scsi_eh_2
  743 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 scsi_eh_3
  762 root      10  -5     0    0    0 S  0.0  0.0   0:17.54 xfsbufd
  763 root      10  -5     0    0    0 S  0.0  0.0   0:00.68 xfssyncd
  963 root      16  -4  1712  528  336 S  0.0  0.0   0:00.24 udevd
 6677 root      15   0  1728  572  400 S  0.0  0.0   0:04.99 syslog-ng


Andrus.

Re: Hash join on int takes 8..114 seconds

From

Richard Huxton

Date:

20 November 2008, 07:22:44

Andrus wrote:
> Richard,
>
>> At a quick glance, the plans look the same to me. The overall costs are
>> certainly identical. That means whatever is affecting the query times it
>> isn't the query plan.
>>
>> So - what other activity is happening on this machine? Either other
>> queries are taking up noticeable resources, or some other process is (it
>> might be disk activity from checkpointing, logging some other
>> application).
>
> Thank you.
> This is dedicated server running only PostgreSql which serves approx 6
> point of sales at this time.
>
> Maybe those other clients make queries which invalidate lot of data
> loaded into server cache.
> In next time server must read it again from disk which causes those
> perfomance differences.

In addition to "top" below, you'll probably find "vmstat 5" useful.

> top output is currently:
>
> top - 13:13:10 up 22 days, 18:25,  1 user,  load average: 0.19, 0.12, 0.19
> Tasks:  53 total,   2 running,  51 sleeping,   0 stopped,   0 zombie
> Cpu(s): 13.7% us,  2.0% sy,  0.0% ni, 78.3% id,  6.0% wa,  0.0% hi,
> 0.0% si
> Mem:   2075828k total,  2022808k used,    53020k free,        0k buffers
> Swap:  3911816k total,       88k used,  3911728k free,  1908536k cached
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 5382 postgres  15   0  144m  43m  40m S 15.0  2.2   0:00.45 postmaster
> 5358 postgres  15   0  152m  87m  75m S  0.3  4.3   0:00.97 postmaster
>    1 root      16   0  1480  508  444 S  0.0  0.0   0:01.35 init

Looks pretty quiet.

> in few seconds later:
>
> top - 13:14:01 up 22 days, 18:26,  1 user,  load average: 1.72, 0.53, 0.32
> Tasks:  52 total,   2 running,  50 sleeping,   0 stopped,   0 zombie
> Cpu(s):  5.3% us,  3.0% sy,  0.0% ni,  0.0% id, 91.0% wa,  0.0% hi,
> 0.7% si
> Mem:   2075828k total,  2022692k used,    53136k free,        0k buffers
> Swap:  3911816k total,       88k used,  3911728k free,  1905028k cached
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 1179 postgres  18   0  155m 136m 122m D  6.7  6.7   1:32.52 postmaster
> 4748 postgres  15   0  145m 126m 122m D  1.3  6.2   0:14.38 postmaster
> 5358 postgres  16   0  160m  98m  81m D  0.7  4.9   0:01.21 postmaster
>    1 root      16   0  1480  508  444 S  0.0  0.0   0:01.35 init

Here you're stuck waiting for disks (91.0% wa). Check out vmstat and
iostat to see what's happening.

--
  Richard Huxton
  Archonet Ltd

Re: Hash join on int takes 8..114 seconds

From

PFC

Date:

20 November 2008, 09:46:00

    OK so vmstat says you are IO-bound, this seems logical if the same plan
has widely varying timings...

    Let's look at the usual suspects :

    - how many dead rows in your tables ? are your tables data, or bloat ?
(check vacuum verbose, etc)
    - what's the size of the dataset relative to the RAM ?

    Now let's look more closely at the query :

explain analyze
SELECT sum(1)
   FROM dok JOIN rid USING (dokumnr)
   JOIN toode USING (toode)
   LEFT JOIN artliik using(grupp,liik)
   WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01'


    OK, so artliik is a very small table (84 rows) :

Seq Scan on artliik  (cost=0.00..6.84 rows=84 width=19)
(actual time=20.104..29.845 rows=84 loops=1)

    I presume doing the query without artliik changes nothing to the runtime,
yes ?
    Let's look at the main part of the query :

   FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode)
   WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01'

    Postgres's plan is logical. It starts by joining rid and dok since your
WHERE is on those :

->  Hash Join  (cost=52103.94..233488.08 rows=24126 width=24) (actual
time=100386.921..114037.986 rows=20588 loops=1)"
    Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
    ->  Bitmap Heap Scan on rid  (cost=4127.51..175020.84 rows=317003
width=28) (actual time=11119.932..76225.918 rows=277294 loops=1)"
          Recheck Cond: (toode = 'X05'::bpchar)"
          ->  Bitmap Index Scan on rid_toode_idx (cost=0.00..4127.51 rows=317003
width=0) (actual time=11105.807..11105.807 rows=280599 loops=1)"
                Index Cond: (toode = 'X05'::bpchar)"
    ->  Hash  (cost=47376.82..47376.82 rows=93444 width=4) (actual
time=35082.427..35082.427 rows=105202 loops=1)"
          ->  Index Scan using dok_kuupaev_idx on dok (cost=0.00..47376.82
rows=93444 width=4) (actual time=42.110..34586.331 rows=105202 loops=1)"
                Index Cond: (kuupaev >= '2008-09-01'::date)"

    Your problem here is that, no matter what, postgres will have to examine
    - all rows where dok.kuupaev>='2008-09-01',
    - and all rows where rid.toode = 'X05'.
    If you use dok.kuupaev>='2007-09-01' (note : 2007) it will probably have
to scan many, many more rows.

    If you perform this query often you could CLUSTER rid on (toode) and dok
on (kuupaev), but this can screw other queries.

    What is the meaning of the columns ?

    To make this type of query faster I would tend to think about :

    - materialized views
    - denormalization (ie adding a column in one of your tables and a
multicolumn index)
    - materialized summary tables (ie. summary of sales for last month, for
instance)


"Aggregate  (cost=234278.53..234278.54 rows=1 width=0) (actual
time=114479.933..114479.936 rows=1 loops=1)"
"  ->  Hash Left Join  (cost=52111.20..234218.21 rows=24126 width=0)
(actual time=100435.523..114403.293 rows=20588 loops=1)"
"        Hash Cond: (("outer".grupp = "inner".grupp) AND ("outer".liik =
"inner".liik))"
"        ->  Nested Loop  (cost=52103.94..233735.35 rows=24126 width=19)
(actual time=100405.258..114207.387 rows=20588 loops=1)"
"              ->  Index Scan using toode_pkey on toode  (cost=0.00..6.01
rows=1 width=43) (actual time=18.312..18.325 rows=1 loops=1)"
"                    Index Cond: ('X05'::bpchar = toode)"
"              ->  Hash Join  (cost=52103.94..233488.08 rows=24126
width=24) (actual time=100386.921..114037.986 rows=20588 loops=1)"
"                    Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
"                    ->  Bitmap Heap Scan on rid  (cost=4127.51..175020.84
rows=317003 width=28) (actual time=11119.932..76225.918 rows=277294
loops=1)"
"                          Recheck Cond: (toode = 'X05'::bpchar)"
"                          ->  Bitmap Index Scan on rid_toode_idx
(cost=0.00..4127.51 rows=317003 width=0) (actual time=11105.807..11105.807
rows=280599 loops=1)"
"                                Index Cond: (toode = 'X05'::bpchar)"
"                    ->  Hash  (cost=47376.82..47376.82 rows=93444
width=4) (actual time=35082.427..35082.427 rows=105202 loops=1)"
"                          ->  Index Scan using dok_kuupaev_idx on dok
(cost=0.00..47376.82 rows=93444 width=4) (actual time=42.110..34586.331
rows=105202 loops=1)"
"                                Index Cond: (kuupaev >=
'2008-09-01'::date)"
"        ->  Hash  (cost=6.84..6.84 rows=84 width=19) (actual
time=30.220..30.220 rows=84 loops=1)"
"              ->  Seq Scan on artliik  (cost=0.00..6.84 rows=84 width=19)
(actual time=20.104..29.845 rows=84 loops=1)"
"Total runtime: 114480.373 ms"

Re: Hash join on int takes 8..114 seconds

From

"Andrus"

Date:

21 November 2008, 09:49:40

Richard,

> In addition to "top" below, you'll probably find "vmstat 5" useful.

Thank you.
During this query run (65 sec), vmstat 5 shows big values in  bi,cs and wa
columns:

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
wa
 1  1     88  51444      0 1854404    0    0    17     6   15    13  5  1 83
10
 0  1     88  52140      0 1854304    0    0  3626    95  562   784 15 38  0
47
 0  1     92  51608      0 1855668    0    0 14116   103 1382  2294  4  8  0
88
 0  1     92  51620      0 1857256    0    1 15258    31 1210  1975  4  8  0
88
 0  2     92  50448      0 1859188    0    0 13118    19 1227  1982  3  7  0
90
 0  1     92  51272      0 1859088    0    0  7691    53  828  1284 14  4  0
82
 0  1     92  52472      0 1858792    0    0 10691     9  758   968  3  7  0
89
 0  2     92  51240      0 1858596    0    0  8204  7407  717  1063  2  5  0
93
 0  1     92  51648      0 1860388    0    0 20622   121 1118  2229 12  9  0
79
 2  1     92  50564      0 1861396    0    0 20994  3277  969  1681 15  8  0
76
 1  0     92  52180      0 1860192    0    0 18542    36  802  1276 36 12  0
51
 0  0     92  91872      0 1820948    0    1 15285    47  588   774  9 12 32
47
 0  0     92  91904      0 1820948    0    0     0     4  251    18  0  0
100  0
 0  0     92  92044      0 1820948    0    0     0     0  250    17  0  0
100  0
 0  0     92  91668      0 1821156    0    0    27    93  272    66  5  0 92
3
 0  0     92  91668      0 1821156    0    0     0    64  260    38  0  0
100  0
 0  0     92  91636      0 1821156    0    0     0   226  277    71  0  0
100  0
 0  0     92  91676      0 1821156    0    0     0    26  255    23  0  0
100  0

> Here you're stuck waiting for disks (91.0% wa). Check out vmstat and
> iostat to see what's happening.

typing  iostat returns

bash: iostat: command not found

It seems that this is not installed in this gentoo.

Andrus.

Re: Hash join on int takes 8..114 seconds

From

"Andrus"

Date:

21 November 2008, 11:16:30

PFC,

thank you.

> OK so vmstat says you are IO-bound, this seems logical if the same plan
> has widely varying timings...
>
> Let's look at the usual suspects :
>
> - how many dead rows in your tables ? are your tables data, or bloat ?
> (check vacuum verbose, etc)

set search_path to firma2,public;
vacuum verbose dok; vacuum verbose rid

INFO:  vacuuming "firma2.dok"
INFO:  index "dok_pkey" now contains 1235086 row versions in 9454 pages
DETAIL:  100 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.16s/0.38u sec elapsed 0.77 sec.
INFO:  index "dok_dokumnr_idx" now contains 1235086 row versions in 9454
pages
DETAIL:  100 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.14s/0.40u sec elapsed 0.78 sec.
INFO:  index "dok_klient_idx" now contains 1235086 row versions in 18147
pages
DETAIL:  887 index row versions were removed.
3265 index pages have been deleted, 3033 are currently reusable.
CPU 0.36s/0.46u sec elapsed 31.87 sec.
INFO:  index "dok_krdokumnr_idx" now contains 1235086 row versions in 11387
pages
DETAIL:  119436 index row versions were removed.
1716 index pages have been deleted, 1582 are currently reusable.
CPU 0.47s/0.55u sec elapsed 63.38 sec.
INFO:  index "dok_kuupaev_idx" now contains 1235101 row versions in 10766
pages
DETAIL:  119436 index row versions were removed.
659 index pages have been deleted, 625 are currently reusable.
CPU 0.62s/0.53u sec elapsed 40.20 sec.
INFO:  index "dok_tasudok_idx" now contains 1235104 row versions in 31348
pages
DETAIL:  119436 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 1.18s/1.08u sec elapsed 118.97 sec.
INFO:  index "dok_tasudok_unique_idx" now contains 99 row versions in 97
pages
DETAIL:  98 index row versions were removed.
80 index pages have been deleted, 80 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.48 sec.
INFO:  index "dok_tasumata_idx" now contains 1235116 row versions in 11663
pages
DETAIL:  119436 index row versions were removed.
5340 index pages have been deleted, 5131 are currently reusable.
CPU 0.43s/0.56u sec elapsed 53.96 sec.
INFO:  index "dok_tellimus_idx" now contains 1235122 row versions in 11442
pages
DETAIL:  119436 index row versions were removed.
1704 index pages have been deleted, 1569 are currently reusable.
CPU 0.45s/0.59u sec elapsed 76.91 sec.
INFO:  index "dok_yksus_pattern_idx" now contains 1235143 row versions in
5549 pages
DETAIL:  119436 index row versions were removed.
529 index pages have been deleted, 129 are currently reusable.
CPU 0.19s/0.46u sec elapsed 2.72 sec.
INFO:  index "dok_doktyyp" now contains 1235143 row versions in 3899 pages
DETAIL:  119436 index row versions were removed.
188 index pages have been deleted, 13 are currently reusable.
CPU 0.14s/0.44u sec elapsed 1.40 sec.
INFO:  index "dok_sihtyksus_pattern_idx" now contains 1235143 row versions
in 5353 pages
DETAIL:  119436 index row versions were removed.
286 index pages have been deleted, 5 are currently reusable.
CPU 0.13s/0.45u sec elapsed 3.01 sec.
INFO:  "dok": removed 119436 row versions in 13707 pages
DETAIL:  CPU 0.80s/0.37u sec elapsed 14.15 sec.
INFO:  "dok": found 119436 removable, 1235085 nonremovable row versions in
171641 pages
DETAIL:  2 dead row versions cannot be removed yet.
There were 1834279 unused item pointers.
0 pages are entirely empty.
CPU 6.56s/6.88u sec elapsed 450.54 sec.
INFO:  vacuuming "pg_toast.pg_toast_40595"
INFO:  index "pg_toast_40595_index" now contains 0 row versions in 1 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_40595": found 0 removable, 0 nonremovable row versions in 0
pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  vacuuming "firma2.rid"
INFO:  index "rid_pkey" now contains 3275197 row versions in 13959 pages
DETAIL:  38331 index row versions were removed.
262 index pages have been deleted, 262 are currently reusable.
CPU 0.42s/1.05u sec elapsed 58.56 sec.
INFO:  index "rid_dokumnr_idx" now contains 3275200 row versions in 14125
pages
DETAIL:  38331 index row versions were removed.
572 index pages have been deleted, 571 are currently reusable.
CPU 0.49s/1.14u sec elapsed 71.57 sec.
INFO:  index "rid_inpdokumnr_idx" now contains 3275200 row versions in 15103
pages
DETAIL:  38331 index row versions were removed.
579 index pages have been deleted, 579 are currently reusable.
CPU 0.66s/1.03u sec elapsed 68.38 sec.
INFO:  index "rid_toode_idx" now contains 3275224 row versions in 31094
pages
DETAIL:  38331 index row versions were removed.
2290 index pages have been deleted, 2290 are currently reusable.
CPU 1.39s/1.58u sec elapsed 333.82 sec.
INFO:  index "rid_rtellimus_idx" now contains 3275230 row versions in 7390
pages
DETAIL:  18591 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.18s/0.66u sec elapsed 1.78 sec.
INFO:  index "rid_toode_pattern_idx" now contains 3275230 row versions in
16310 pages
DETAIL:  17800 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.44s/1.04u sec elapsed 6.55 sec.
INFO:  "rid": removed 38331 row versions in 3090 pages
DETAIL:  CPU 0.20s/0.10u sec elapsed 5.49 sec.
INFO:  "rid": found 38331 removable, 3275189 nonremovable row versions in
165282 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 1878923 unused item pointers.
0 pages are entirely empty.
CPU 5.06s/7.27u sec elapsed 607.59 sec.

Query returned successfully with no result in 1058319 ms.

> - what's the size of the dataset relative to the RAM ?

Db size is 7417 MB
relevant table sizes in desc by size order:

      1        40595 dok                                 2345 MB
      2         1214 pg_shdepend                         2259 MB
      3        40926 rid                                 2057 MB
      6         1232 pg_shdepend_depender_index          795 MB
      7         1233 pg_shdepend_reference_index         438 MB
      8        44286 dok_tasudok_idx                     245 MB
      9        44299 rid_toode_idx                       243 MB
     10        44283 dok_klient_idx                      142 MB
     11     19103791 rid_toode_pattern_idx               127 MB
     14        44298 rid_inpdokumnr_idx                  118 MB
     15        44297 rid_dokumnr_idx                     110 MB
     16        43573 rid_pkey                            109 MB
     18        44288 dok_tasumata_idx                    91 MB
     19        44289 dok_tellimus_idx                    89 MB
     20        44284 dok_krdokumnr_idx                   89 MB
     21        44285 dok_kuupaev_idx                     84 MB
     23        43479 dok_pkey                            74 MB
     24        44282 dok_dokumnr_idx                     74 MB
     25     19076304 rid_rtellimus_idx                   58 MB
     26     18663923 dok_yksus_pattern_idx               43 MB
     27     18801591 dok_sihtyksus_pattern_idx           42 MB
     29     18774881 dok_doktyyp                         30 MB
     46        40967 toode                               13 MB

server is HP Proliant DL320 G3
http://h18000.www1.hp.com/products/quickspecs/12169_ca/12169_ca.HTML
CPU is 2.93Ghz Celeron 256kb cache.

Server has 2 GB RAM.
It has SATA RAID 0,1 integrated controller (1.5Gbps) and SAMSUNG HD160JJ
mirrored disks.

> Now let's look more closely at the query :
>
> explain analyze
> SELECT sum(1)
>   FROM dok JOIN rid USING (dokumnr)
>   JOIN toode USING (toode)
>   LEFT JOIN artliik using(grupp,liik)
>   WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01'
>
>
> I presume doing the query without artliik changes nothing to the runtime,
> yes ?

Yes. After removing artkliik from join I got response times 65 and 50
seconds, so this does not make difference.

> Your problem here is that, no matter what, postgres will have to examine
> - all rows where dok.kuupaev>='2008-09-01',
> - and all rows where rid.toode = 'X05'.
> If you use dok.kuupaev>='2007-09-01' (note : 2007) it will probably have
> to scan many, many more rows.

Probably yes, since then it reads one year more sales data.

> If you perform this query often you could CLUSTER rid on (toode) and dok
> on (kuupaev), but this can screw other queries.

Some reports are by sales date (dok.kuupaev) and customers.
CLUSTER rid on (toode) slows them down. Also autovacuum cannot do
clustering.

> What is the meaning of the columns ?

This is typical sales data:

-- Receipt headers:
DOK ( dokumnr  INT SERIAL PRIMARY KEY,
  kuupaev DATE --- sales date
)

-- Receipt details
RID ( dokumnr INT,
         toode CHAR(20),  -- item code
 CONSTRAINT rid_dokumnr_fkey FOREIGN KEY (dokumnr)   REFERENCES dok
(dokumnr),
  CONSTRAINT rid_toode_fkey FOREIGN KEY (toode)
      REFERENCES firma2.toode (toode)
)

-- Products
TOODE (
  toode CHAR(20) PRIMARY KEY
)

> To make this type of query faster I would tend to think about :

> - denormalization (ie adding a column in one of your tables and a
> multicolumn index)

For this query it is possible to duplicate kuupaev column to rid table.
However most of the this seems to go to scanning rid table, so I suspect
that this will help.

> - materialized views
> - materialized summary tables (ie. summary of sales for last month, for
> instance)

There are about 1000 items and reports are different.

Andrus.

Re: Hash join on int takes 8..114 seconds

From

tv@fuzzy.cz

Date:

21 November 2008, 12:14:26

Just the most important points:

1) "dok" table contains 1235086 row versions in 171641 pages (with 8kB
pages this means 1.4GB MB of data), but there are 1834279 unused item
pointers (i.e. about 60% of the space is wasted)

2) "rid" table contains 3275189 roiws in 165282 (with 8kB pages this means
about 1.3GB of data), but there are 1878923 unused item pointers (i.e.
about 30% of the space is wasted)

3) don't forget to execute analyze after vacuuming (or vacuum analyze)

4) I'm not sure why the sizes reported by you (for example 2.3GB vs 1.5GB
for "doc" table) - the difference seems too large for me.

Anyway the amount of wasted rows seems significant to me - I'd try to
solve this first. Either by VACUUM FULL or by CLUSTER. The CLUSTER will
lock the table exclusively, but the results may be better (when sorting by
a well chosen index). Don't forget to run ANALYZE afterwards.

Several other things to consider:

1) Regarding the toode column - why are you using CHAR(20) when the values
are actually shorter? This may significantly increase the amount of space
required.

2) I've noticed the CPU used is Celeron, which may negatively affect the
speed of hash computation. I'd try to replace it by something faster - say
INTEGER as an artificial primary key of the "toode" table and using it as
a FK in other tables.  This might improve the "Bitmap Heap Scan on rid"
part, but yes - it's just a minor improvement compared to the "Hash Join"
part of the query.

Materialized views seem like a good idea to me, but maybe I'm not seeing
something. What do you mean by "reports are different"? If there is a lot
of rows for a given product / day, then creating an aggregated table with
(product code / day) as a primary key is quite simple. It may require a
lot of disk space, but it'll remove the hash join overhead. But if the
queries are very different, then it may be difficult to build such
materialized view(s).

regards
Tomas

> PFC,
>
> thank you.
>
>> OK so vmstat says you are IO-bound, this seems logical if the same plan
>> has widely varying timings...
>>
>> Let's look at the usual suspects :
>>
>> - how many dead rows in your tables ? are your tables data, or bloat ?
>> (check vacuum verbose, etc)
>
> set search_path to firma2,public;
> vacuum verbose dok; vacuum verbose rid
>
> INFO:  vacuuming "firma2.dok"
> INFO:  index "dok_pkey" now contains 1235086 row versions in 9454 pages
> DETAIL:  100 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.16s/0.38u sec elapsed 0.77 sec.
> INFO:  index "dok_dokumnr_idx" now contains 1235086 row versions in 9454
> pages
> DETAIL:  100 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.14s/0.40u sec elapsed 0.78 sec.
> INFO:  index "dok_klient_idx" now contains 1235086 row versions in 18147
> pages
> DETAIL:  887 index row versions were removed.
> 3265 index pages have been deleted, 3033 are currently reusable.
> CPU 0.36s/0.46u sec elapsed 31.87 sec.
> INFO:  index "dok_krdokumnr_idx" now contains 1235086 row versions in
> 11387
> pages
> DETAIL:  119436 index row versions were removed.
> 1716 index pages have been deleted, 1582 are currently reusable.
> CPU 0.47s/0.55u sec elapsed 63.38 sec.
> INFO:  index "dok_kuupaev_idx" now contains 1235101 row versions in 10766
> pages
> DETAIL:  119436 index row versions were removed.
> 659 index pages have been deleted, 625 are currently reusable.
> CPU 0.62s/0.53u sec elapsed 40.20 sec.
> INFO:  index "dok_tasudok_idx" now contains 1235104 row versions in 31348
> pages
> DETAIL:  119436 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 1.18s/1.08u sec elapsed 118.97 sec.
> INFO:  index "dok_tasudok_unique_idx" now contains 99 row versions in 97
> pages
> DETAIL:  98 index row versions were removed.
> 80 index pages have been deleted, 80 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.48 sec.
> INFO:  index "dok_tasumata_idx" now contains 1235116 row versions in 11663
> pages
> DETAIL:  119436 index row versions were removed.
> 5340 index pages have been deleted, 5131 are currently reusable.
> CPU 0.43s/0.56u sec elapsed 53.96 sec.
> INFO:  index "dok_tellimus_idx" now contains 1235122 row versions in 11442
> pages
> DETAIL:  119436 index row versions were removed.
> 1704 index pages have been deleted, 1569 are currently reusable.
> CPU 0.45s/0.59u sec elapsed 76.91 sec.
> INFO:  index "dok_yksus_pattern_idx" now contains 1235143 row versions in
> 5549 pages
> DETAIL:  119436 index row versions were removed.
> 529 index pages have been deleted, 129 are currently reusable.
> CPU 0.19s/0.46u sec elapsed 2.72 sec.
> INFO:  index "dok_doktyyp" now contains 1235143 row versions in 3899 pages
> DETAIL:  119436 index row versions were removed.
> 188 index pages have been deleted, 13 are currently reusable.
> CPU 0.14s/0.44u sec elapsed 1.40 sec.
> INFO:  index "dok_sihtyksus_pattern_idx" now contains 1235143 row versions
> in 5353 pages
> DETAIL:  119436 index row versions were removed.
> 286 index pages have been deleted, 5 are currently reusable.
> CPU 0.13s/0.45u sec elapsed 3.01 sec.
> INFO:  "dok": removed 119436 row versions in 13707 pages
> DETAIL:  CPU 0.80s/0.37u sec elapsed 14.15 sec.
> INFO:  "dok": found 119436 removable, 1235085 nonremovable row versions in
> 171641 pages
> DETAIL:  2 dead row versions cannot be removed yet.
> There were 1834279 unused item pointers.
> 0 pages are entirely empty.
> CPU 6.56s/6.88u sec elapsed 450.54 sec.
> INFO:  vacuuming "pg_toast.pg_toast_40595"
> INFO:  index "pg_toast_40595_index" now contains 0 row versions in 1 pages
> DETAIL:  0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  "pg_toast_40595": found 0 removable, 0 nonremovable row versions in
> 0
> pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 0 unused item pointers.
> 0 pages are entirely empty.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  vacuuming "firma2.rid"
> INFO:  index "rid_pkey" now contains 3275197 row versions in 13959 pages
> DETAIL:  38331 index row versions were removed.
> 262 index pages have been deleted, 262 are currently reusable.
> CPU 0.42s/1.05u sec elapsed 58.56 sec.
> INFO:  index "rid_dokumnr_idx" now contains 3275200 row versions in 14125
> pages
> DETAIL:  38331 index row versions were removed.
> 572 index pages have been deleted, 571 are currently reusable.
> CPU 0.49s/1.14u sec elapsed 71.57 sec.
> INFO:  index "rid_inpdokumnr_idx" now contains 3275200 row versions in
> 15103
> pages
> DETAIL:  38331 index row versions were removed.
> 579 index pages have been deleted, 579 are currently reusable.
> CPU 0.66s/1.03u sec elapsed 68.38 sec.
> INFO:  index "rid_toode_idx" now contains 3275224 row versions in 31094
> pages
> DETAIL:  38331 index row versions were removed.
> 2290 index pages have been deleted, 2290 are currently reusable.
> CPU 1.39s/1.58u sec elapsed 333.82 sec.
> INFO:  index "rid_rtellimus_idx" now contains 3275230 row versions in 7390
> pages
> DETAIL:  18591 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.18s/0.66u sec elapsed 1.78 sec.
> INFO:  index "rid_toode_pattern_idx" now contains 3275230 row versions in
> 16310 pages
> DETAIL:  17800 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.44s/1.04u sec elapsed 6.55 sec.
> INFO:  "rid": removed 38331 row versions in 3090 pages
> DETAIL:  CPU 0.20s/0.10u sec elapsed 5.49 sec.
> INFO:  "rid": found 38331 removable, 3275189 nonremovable row versions in
> 165282 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 1878923 unused item pointers.
> 0 pages are entirely empty.
> CPU 5.06s/7.27u sec elapsed 607.59 sec.
>
> Query returned successfully with no result in 1058319 ms.
>
>> - what's the size of the dataset relative to the RAM ?
>
> Db size is 7417 MB
> relevant table sizes in desc by size order:
>
>       1        40595 dok                                 2345 MB
>       2         1214 pg_shdepend                         2259 MB
>       3        40926 rid                                 2057 MB
>       6         1232 pg_shdepend_depender_index          795 MB
>       7         1233 pg_shdepend_reference_index         438 MB
>       8        44286 dok_tasudok_idx                     245 MB
>       9        44299 rid_toode_idx                       243 MB
>      10        44283 dok_klient_idx                      142 MB
>      11     19103791 rid_toode_pattern_idx               127 MB
>      14        44298 rid_inpdokumnr_idx                  118 MB
>      15        44297 rid_dokumnr_idx                     110 MB
>      16        43573 rid_pkey                            109 MB
>      18        44288 dok_tasumata_idx                    91 MB
>      19        44289 dok_tellimus_idx                    89 MB
>      20        44284 dok_krdokumnr_idx                   89 MB
>      21        44285 dok_kuupaev_idx                     84 MB
>      23        43479 dok_pkey                            74 MB
>      24        44282 dok_dokumnr_idx                     74 MB
>      25     19076304 rid_rtellimus_idx                   58 MB
>      26     18663923 dok_yksus_pattern_idx               43 MB
>      27     18801591 dok_sihtyksus_pattern_idx           42 MB
>      29     18774881 dok_doktyyp                         30 MB
>      46        40967 toode                               13 MB
>
> server is HP Proliant DL320 G3
> http://h18000.www1.hp.com/products/quickspecs/12169_ca/12169_ca.HTML
> CPU is 2.93Ghz Celeron 256kb cache.
>
> Server has 2 GB RAM.
> It has SATA RAID 0,1 integrated controller (1.5Gbps) and SAMSUNG HD160JJ
> mirrored disks.
>
>> Now let's look more closely at the query :
>>
>> explain analyze
>> SELECT sum(1)
>>   FROM dok JOIN rid USING (dokumnr)
>>   JOIN toode USING (toode)
>>   LEFT JOIN artliik using(grupp,liik)
>>   WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01'
>>
>>
>> I presume doing the query without artliik changes nothing to the
>> runtime,
>> yes ?
>
> Yes. After removing artkliik from join I got response times 65 and 50
> seconds, so this does not make difference.
>
>> Your problem here is that, no matter what, postgres will have to examine
>> - all rows where dok.kuupaev>='2008-09-01',
>> - and all rows where rid.toode = 'X05'.
>> If you use dok.kuupaev>='2007-09-01' (note : 2007) it will probably have
>> to scan many, many more rows.
>
> Probably yes, since then it reads one year more sales data.
>
>> If you perform this query often you could CLUSTER rid on (toode) and dok
>> on (kuupaev), but this can screw other queries.
>
> Some reports are by sales date (dok.kuupaev) and customers.
> CLUSTER rid on (toode) slows them down. Also autovacuum cannot do
> clustering.
>
>> What is the meaning of the columns ?
>
> This is typical sales data:
>
> -- Receipt headers:
> DOK ( dokumnr  INT SERIAL PRIMARY KEY,
>   kuupaev DATE --- sales date
> )
>
> -- Receipt details
> RID ( dokumnr INT,
>          toode CHAR(20),  -- item code
>  CONSTRAINT rid_dokumnr_fkey FOREIGN KEY (dokumnr)   REFERENCES dok
> (dokumnr),
>   CONSTRAINT rid_toode_fkey FOREIGN KEY (toode)
>       REFERENCES firma2.toode (toode)
> )
>
> -- Products
> TOODE (
>   toode CHAR(20) PRIMARY KEY
> )
>
>> To make this type of query faster I would tend to think about :
>
>> - denormalization (ie adding a column in one of your tables and a
>> multicolumn index)
>
> For this query it is possible to duplicate kuupaev column to rid table.
> However most of the this seems to go to scanning rid table, so I suspect
> that this will help.
>
>> - materialized views
>> - materialized summary tables (ie. summary of sales for last month, for
>> instance)
>
> There are about 1000 items and reports are different.
>
> Andrus.
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: Hash join on int takes 8..114 seconds

From

Richard Huxton

Date:

21 November 2008, 12:32:57

Andrus wrote:
>> - what's the size of the dataset relative to the RAM ?
>
> Db size is 7417 MB
> relevant table sizes in desc by size order:
>
>      1        40595 dok                                 2345 MB

>      2         1214 pg_shdepend                         2259 MB
>      6         1232 pg_shdepend_depender_index          795 MB
>      7         1233 pg_shdepend_reference_index         438 MB

These three are highly suspicious. They track dependencies between
system object (so you can't drop function F because trigger T depends on
it).

http://www.postgresql.org/docs/8.3/static/catalog-pg-shdepend.html

You've got 3.5GB of data there, which is a *lot* of dependencies.

Try "SELECT count(*) FROM pg_shdepend".

If it's not a million rows, then the table is bloated. Try (as postgres
or some other db superuser) "vacuum full pg_shdepend" and a "reindex
pg_shdepend".

If it is a million rows, you'll need to find out why. Do you have a lot
of temporary tables that aren't being dropped or something similar?

--
  Richard Huxton
  Archonet Ltd

Re: Hash join on int takes 8..114 seconds

From

"Andrus"

Date:

21 November 2008, 13:51:17

Richard,

Thank you.

> Try "SELECT count(*) FROM pg_shdepend".

This query returns 3625  and takes 35 seconds to run.

> If it's not a million rows, then the table is bloated. Try (as postgres
> or some other db superuser) "vacuum full pg_shdepend" and a "reindex
> pg_shdepend".

vacuum full verbose pg_shdepend
INFO:  vacuuming "pg_catalog.pg_shdepend"
INFO:  "pg_shdepend": found 16103561 removable, 3629 nonremovable row
versions in 131425 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 49 to 49 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 1009387632 bytes.
131363 pages are or will become empty, including 0 at the end of the table.
131425 pages containing 1009387632 free bytes are potential move
destinations.
CPU 2.12s/1.69u sec elapsed 52.66 sec.
INFO:  index "pg_shdepend_depender_index" now contains 3629 row versions in
101794 pages
DETAIL:  16103561 index row versions were removed.
101311 index pages have been deleted, 20000 are currently reusable.
CPU 20.12s/14.52u sec elapsed 220.66 sec.

After 400 seconds of run I got phone calls that server does not respond to
other clients. So I was forced to cancel "vacuum full verbose pg_shdepend
" command.

How to run it so that other users can use database at same time ?

> If it is a million rows, you'll need to find out why. Do you have a lot
> of temporary tables that aren't being dropped or something similar?

Application creates temporary tables in many places. Every sales operation
probably creates some temporary tables.
Should I change something in configuration or change application (Only
single POS application which is used to access this db) or is only solution
to manully run

vacuum full pg_shdepend
reindex pg_shdepend

periodically ?
How to vacuum full pg_shdepend automatically so that other users can work at
same time ?

Hopefully this table size does not affect to query speed.

Andrus.

Re: Hash join on int takes 8..114 seconds

From

PFC

Date:

21 November 2008, 13:57:25

> How to vacuum full pg_shdepend automatically so that other users can
> work at same time ?

    Your table is horribly bloated.
    You must use VACUUM FULL + REINDEX (as superuser) on it, however
unfortunately, it is blocking.
    Therefore, you should wait for sunday night to do this, when noone will
notice.
    Meanwhile, you can always VACUUM it (as superuser) and REINDEX it.
    And while you're at it, VACUUM FULL + reindex the entire database.

    To avoid such annoyances in the future, you should ensure that autovacuum
runs properly ; you should investigate this. If you use a cron'ed VACUUM
that does not run as superuser, then it will not be able to VACUUM the
system catalogs, and the problem will come back.

Re: Hash join on int takes 8..114 seconds

From

PFC

Date:

21 November 2008, 14:31:15

> Server has 2 GB RAM.
> It has SATA RAID 0,1 integrated controller (1.5Gbps) and SAMSUNG HD160JJ
> mirrored disks.

    You could perhaps run a little check on the performance of the RAID, is
it better than linux software RAID ?
    Does it leverage NCQ appropriately when running queries in parallel ?

>  -- Receipt headers:
> DOK ( dokumnr  INT SERIAL PRIMARY KEY,
>  kuupaev DATE --- sales date
> )
>  -- Receipt details
> RID ( dokumnr INT,
>         toode CHAR(20),  -- item code
> CONSTRAINT rid_dokumnr_fkey FOREIGN KEY (dokumnr)   REFERENCES dok
> (dokumnr),
>  CONSTRAINT rid_toode_fkey FOREIGN KEY (toode)
>      REFERENCES firma2.toode (toode)
> )
>  -- Products
> TOODE (
>  toode CHAR(20) PRIMARY KEY
> )

    OK so pretty straightforward :

    dok <-(dokumnr)-> rid <-(toode)-> toode

    toode.toode should really be an INT though.

>> explain analyze
>> SELECT sum(1)
>>   FROM dok JOIN rid USING (dokumnr)
>>   JOIN toode USING (toode)
>>   LEFT JOIN artliik using(grupp,liik)
>>   WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01'

    By the way, note that the presence of the toode table in the query above
is not required at all, unless you use columns of toode in your aggregates.

    Let's play with that, after all, it's friday night.

BEGIN;
CREATE TABLE orders (order_id INTEGER NOT NULL, order_date DATE NOT NULL);
CREATE TABLE products (product_id INTEGER NOT NULL, product_name TEXT NOT
NULL);
CREATE TABLE orders_products (order_id INTEGER NOT NULL, product_id
INTEGER NOT NULL, padding1 TEXT, padding2 TEXT);

INSERT INTO products SELECT n, 'product number ' || n::TEXT FROM
generate_series(1,40000) AS n;
INSERT INTO orders SELECT n,'2000-01-01'::date + (n/1000 * '1
DAY'::interval) FROM generate_series(1,1000000) AS n;

SET work_mem TO '1GB';
INSERT INTO orders_products SELECT
a,b,'aibaifbaurgbyioubyfazierugybfoaybofauez',
'hfohbdsqbhjhqsvdfiuazvfgiurvgazrhbazboifhaoifh'
   FROM (SELECT DISTINCT (1+(n/10))::INTEGER AS a,
(1+(random()*39999))::INTEGER AS b FROM generate_series( 1,9999999 ) AS n)
AS x;

DELETE FROM orders_products WHERE product_id NOT IN (SELECT product_id
 FROM products);
DELETE FROM orders_products WHERE order_id NOT IN (SELECT order_id FROM
orders);
ALTER TABLE orders ADD PRIMARY KEY (order_id);
ALTER TABLE products ADD PRIMARY KEY (product_id);
ALTER TABLE orders_products ADD PRIMARY KEY (order_id,product_id);
ALTER TABLE orders_products ADD FOREIGN KEY (product_id) REFERENCES
products( product_id ) ON DELETE CASCADE;
ALTER TABLE orders_products ADD FOREIGN KEY (order_id) REFERENCES orders(
order_id ) ON DELETE CASCADE;
CREATE INDEX orders_date ON orders( order_date );
COMMIT;
SET work_mem TO DEFAULT;
ANALYZE;

With the following query :

EXPLAIN ANALYZE SELECT sum(1)
 FROM orders
JOIN orders_products USING (order_id)
JOIN products USING (product_id)
WHERE orders.order_date BETWEEN '2000-01-01' AND '2000-02-01'
AND products.product_id = 12345;

I get the following results :

orders_products has a PK index on (order_id, product_id). I dropped it.

No index on orders_products :
    => Big seq scan (16 seconds)

Index on orders_products( product_id ) :
  Aggregate  (cost=2227.22..2227.23 rows=1 width=0) (actual
time=108.204..108.205 rows=1 loops=1)
    ->  Nested Loop  (cost=1312.30..2227.20 rows=7 width=0) (actual
time=105.929..108.191 rows=6 loops=1)
          ->  Index Scan using products_pkey on products  (cost=0.00..8.27
rows=1 width=4) (actual time=0.010..0.014 rows=1 loops=1)
                Index Cond: (product_id = 12345)
          ->  Hash Join  (cost=1312.30..2218.85 rows=7 width=4) (actual
time=105.914..108.167 rows=6 loops=1)
                Hash Cond: (orders_products.order_id = orders.order_id)
                ->  Bitmap Heap Scan on orders_products  (cost=6.93..910.80
rows=232 width=8) (actual time=0.194..2.175 rows=246 loops=1)
                      Recheck Cond: (product_id = 12345)
                      ->  Bitmap Index Scan on orders_products_product_id
(cost=0.00..6.87 rows=232 width=0) (actual time=0.129..0.129 rows=246
loops=1)
                            Index Cond: (product_id = 12345)
                ->  Hash  (cost=949.98..949.98 rows=28432 width=4) (actual
time=105.696..105.696 rows=31999 loops=1)
                      ->  Index Scan using orders_date on orders
(cost=0.00..949.98 rows=28432 width=4) (actual time=0.059..64.443
rows=31999 loops=1)
                            Index Cond: ((order_date >= '2000-01-01'::date)
AND (order_date <= '2000-02-01'::date))
  Total runtime: 108.357 ms
(don't trust this timing, it's a bit cached, this is the same plan as you
get)

Index on orders_products( product_id ) and orders_products( order_id ):
    => Same plan

    Note that in this case, a smarter planner would use the new index to
perform a BitmapAnd before hitting the heap to get the rows.

Index on ( order_id, product_id ), orders_products( product_id ):
Index on ( order_id, product_id ):
    => Different plan, slower (especially in second case).

If a "order_date" column is added to the "orders_products" table to make
it into some kind of materialized view :

CREATE TABLE orders_products2 AS SELECT orders.order_id,
orders.order_date, product_id FROM orders JOIN orders_products USING
(order_id);

And an index is created on (product_id, order_date) we get this :

  Aggregate  (cost=100.44..100.45 rows=1 width=0) (actual time=0.176..0.177
rows=1 loops=1)
    ->  Nested Loop  (cost=0.00..100.42 rows=7 width=0) (actual
time=0.083..0.168 rows=6 loops=1)
          ->  Index Scan using products_pkey on products  (cost=0.00..8.27
rows=1 width=4) (actual time=0.012..0.013 rows=1 loops=1)
                Index Cond: (product_id = 12345)
          ->  Nested Loop  (cost=0.00..92.08 rows=7 width=4) (actual
time=0.068..0.147 rows=6 loops=1)
                ->  Index Scan using orders_products2_pid_date on
orders_products2  (cost=0.00..33.50 rows=7 width=8) (actual
time=0.053..0.076 rows=6 loops=1)
                      Index Cond: ((product_id = 12345) AND (order_date >=
'2000-01-01'::date) AND (order_date <= '2000-02-01'::date))
                ->  Index Scan using orders_pkey on orders
(cost=0.00..8.36 rows=1 width=4) (actual time=0.008..0.009 rows=1 loops=6)
                      Index Cond: (orders.order_id =
orders_products2.order_id)
  Total runtime: 0.246 ms

An index on (order_date,product_id) produces the same effect ; the index
scan is slower, but the heap scan uses the same amount of IO.

Two indexes, (order_date) and (product_id), strangely, do not produce a
BitmapAnd ; instead a plan with more IO is chosen.


>> - denormalization (ie adding a column in one of your tables and a
>> multicolumn index)
>
> For this query it is possible to duplicate kuupaev column to rid table.
> However most of the this seems to go to scanning rid table, so I suspect
> that this will help.

    Yes, most of the time goes to scanning rid table, and this is the time
that should be reduced.
    Adding a date column in "rid" would allow you to create a multicolumn
index on rid (dokumnr,date) which would massively speed up the particular
query above.
    If you don't create a multicolumn index, this denormalization is useless.

    Basically instead of scanning all rows in "rid" where

>
>> - materialized views
>> - materialized summary tables (ie. summary of sales for last month, for
>> instance)
>
> There are about 1000 items and reports are different.

    It all depends on what you put in your summary table...

Re: Hash join on int takes 8..114 seconds

From

"Andrus"

Date:

21 November 2008, 15:00:34

Thomas,

Thank you.

> Just the most important points:
>
> 1) "dok" table contains 1235086 row versions in 171641 pages (with 8kB
> pages this means 1.4GB MB of data), but there are 1834279 unused item
> pointers (i.e. about 60% of the space is wasted)
>
> 2) "rid" table contains 3275189 roiws in 165282 (with 8kB pages this means
> about 1.3GB of data), but there are 1878923 unused item pointers (i.e.
> about 30% of the space is wasted)
>
> 3) don't forget to execute analyze after vacuuming (or vacuum analyze)

autovacuum is running.
So if I understand properly, I must ran
VACUUM FULL ANALYZE dok;
VACUUM FULL ANALYZE rid;

Those commands cause server probably to stop responding to other client like
vacuum full pg_shdepend
did.

Should vacuum_cost_delay = 2000 allow other users to work when running those
commands ?

> 4) I'm not sure why the sizes reported by you (for example 2.3GB vs 1.5GB
> for "doc" table) - the difference seems too large for me.

I used pg_total_relation_size(). So 2.3 GB includes indexes also:

   8        44286 dok_tasudok_idx                        245 MB
     10        44283 dok_klient_idx                         142 MB
    18        44288 dok_tasumata_idx                       91 MB
    19        44289 dok_tellimus_idx                       89 MB
     20        44284dok_krdokumnr_idx                      89 MB
     21        44285 dok_kuupaev_idx                        84 MB
     22        43531 makse_pkey                             77 MB
     23        43479 dok_pkey                               74 MB
     24        44282 dok_dokumnr_idx                        74 MB
     26     18663923 dok_yksus_pattern_idx                  43 MB
     27     18801591 dok_sihtyksus_pattern_idx              42 MB

> Anyway the amount of wasted rows seems significant to me - I'd try to
> solve this first. Either by VACUUM FULL or by CLUSTER. The CLUSTER will
> lock the table exclusively, but the results may be better (when sorting by
> a well chosen index). Don't forget to run ANALYZE afterwards.

How to invoke those commands so that other clients can continue work?
I'm using 8.1.4.
Log files show that autovacuum is running.

I'm planning the following solution:

1. Set

vacuum_cost_delay=2000

2. Run the following commands periodically in this order:

VACUUM FULL;
vacuum full pg_shdepend;
CLUSTER rid on (toode);
CLUSTER dok  on (kuupaev);
REINDEX DATABASE mydb;
REINDEX SYSTEM mydb;
ANALYZE;

Are all those command required or can something leaved out ?

> Several other things to consider:
>
> 1) Regarding the toode column - why are you using CHAR(20) when the values
> are actually shorter? This may significantly increase the amount of space
> required.

There may be some products whose codes may be up to 20 characters.
PostgreSQL does not hold trailing spaces in db, so this does *not* affect to
space.

> 2) I've noticed the CPU used is Celeron, which may negatively affect the
> speed of hash computation. I'd try to replace it by something faster - say
> INTEGER as an artificial primary key of the "toode" table and using it as
> a FK in other tables.  This might improve the "Bitmap Heap Scan on rid"
> part, but yes - it's just a minor improvement compared to the "Hash Join"
> part of the query.

Natural key Toode CHAR(20) is used widely in different queries. Replacing it
with
INT surrogate key requires major application rewrite.

Should I add surrogate index INT columns to toode and rid table and measure
test query speed in this case?

> Materialized views seem like a good idea to me, but maybe I'm not seeing
> something. What do you mean by "reports are different"? If there is a lot
> of rows for a given product / day, then creating an aggregated table with
> (product code / day) as a primary key is quite simple. It may require a
> lot of disk space, but it'll remove the hash join overhead. But if the
> queries are very different, then it may be difficult to build such
> materialized view(s).

log file seems that mostly only those queries are slow:

SELECT ...
   FROM dok JOIN rid USING (dokumnr)
   JOIN ProductId USING (ProductId)
   WHERE rid.ProductId LIKE :p1 || '%' AND dok.SaleDate>=:p2

:p1 and :p2 are parameters different for different queries.

dok contains several years of data. :p2 is usually only few previous months
or last year ago.
SELECT column list contains fixed list of known columns from all tables.

How to create index or materialized view to optimize this types of queries ?

Andrus.

Re: Hash join on int takes 8..114 seconds

From

PFC

Date:

21 November 2008, 15:08:00


> log file seems that mostly only those queries are slow:
>
> SELECT ...
>    FROM dok JOIN rid USING (dokumnr)
>    JOIN ProductId USING (ProductId)
>    WHERE rid.ProductId LIKE :p1 || '%' AND dok.SaleDate>=:p2
>
> :p1 and :p2 are parameters different for different queries.
>
> dok contains several years of data. :p2 is usually only few previous
> months
> or last year ago.
> SELECT column list contains fixed list of known columns from all tables.
>
> How to create index or materialized view to optimize this types of
> queries ?
>

    I would remove some granularity, for instance create a summary table
(materialized view) by month :

- date (contains the first day of the month)
- product_id
- total quantity, total price sold in given month

    You get the idea.
    If your products belong to categories, and you make queries on all the
products in a category, it could be worth making a summary table for
categories also.

Re: Hash join on int takes 8..114 seconds

From

"Andrus"

Date:

21 November 2008, 15:10:46

>> How to vacuum full pg_shdepend automatically so that other users can
>> work at same time ?
>
> Your table is horribly bloated.
> You must use VACUUM FULL + REINDEX (as superuser) on it, however
> unfortunately, it is blocking.
> Therefore, you should wait for sunday night to do this, when noone will
> notice.

Shops are closed late night for a short time, including sunday night.
This time may be shorter than time required to complete VACUUM command.

I discovered vacuum_cost_delay=2000 option. Will this remove blocking issue
and allow vacuum full to work ?

> Meanwhile, you can always VACUUM it (as superuser) and REINDEX it.

I expect that autovacuum does this automatically.

> And while you're at it, VACUUM FULL + reindex the entire database.
> To avoid such annoyances in the future, you should ensure that autovacuum
> runs properly ; you should investigate this. If you use a cron'ed VACUUM
> that does not run as superuser, then it will not be able to VACUUM the
> system catalogs, and the problem will come back.

autovacuum is turned on in postgresql.conf file
log file shows a lot of messages every day that database is vacuumed.
I assume that it is running as user postgres.

I do'nt understand how autovacuum can avoid this: it does not perform vacuum
full so pg_shdepend ja my tables become
bloated again and again.

Andrus.

Re: Hash join on int takes 8..114 seconds

From

Alan Hodgson

Date:

21 November 2008, 15:13:55

On Friday 21 November 2008, "Andrus" <kobruleht2@hot.ee> wrote:
> Those commands cause server probably to stop responding to other client
> like vacuum full pg_shdepend
> did.
>
> Should vacuum_cost_delay = 2000 allow other users to work when running
> those commands ?

Any vacuum full or cluster will lock out other clients. A high
vacuum_cost_delay will just make the vacuum run slower.

--
Corporations will ingest natural resources and defecate garbage until all
resources are depleted, debt can no longer be repaid and our money becomes
worthless - Jay Hanson

Re: Hash join on int takes 8..114 seconds

From

Alvaro Herrera

Date:

21 November 2008, 15:17:08

Andrus wrote:

> I discovered vacuum_cost_delay=2000 option. Will this remove blocking
> issue and allow vacuum full to work ?

No.

Are you really using vacuum_cost_delay=2000?  If so, therein lies your
problem.  That's a silly value to use for that variable.  Useful values
are in the 20-40 range probably, or maybe 10-100 being extremely
generous.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Hash join on int takes 8..114 seconds

From

"Andrus"

Date:

21 November 2008, 15:45:48

Alvaro,

> Are you really using vacuum_cost_delay=2000?  If so, therein lies your
> problem.  That's a silly value to use for that variable.  Useful values
> are in the 20-40 range probably, or maybe 10-100 being extremely
> generous.

Thank you.
My 8.1.4 postgresql.conf does not contain such option. So vacuum_cost_delay
is off probably.
Since doc does not recommend any value, I planned to use 2000

Will value of 30 allow other clients to work when VACUUM FULL is running ?

Uncommented relevant values in postgresql.conf file are:

shared_buffers = 15000
work_mem = 512
maintenance_work_mem = 131072
fsync = on
effective_cache_size= 70000
log_min_duration_statement= 30000

Andrus.

Re: Hash join on int takes 8..114 seconds

From

Alvaro Herrera

Date:

21 November 2008, 15:48:22

Andrus wrote:

> Will value of 30 allow other clients to work when VACUUM FULL is running ?

1. vacuum_cost_delay does not affect vacuum full
2. vacuum full is always blocking, regardless of settings

So I gather you're not doing any vacuuming, eh?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Hash join on int takes 8..114 seconds

From

Tom Lane

Date:

21 November 2008, 16:07:16

PFC <lists@peufeu.com> writes:
> Index on orders_products( product_id ) and orders_products( order_id ):
>     => Same plan

>     Note that in this case, a smarter planner would use the new index to
> perform a BitmapAnd before hitting the heap to get the rows.

Considering that the query has no constraint on
orders_products.order_id, I'm not sure what you think the extra index is
supposed to be used *for*.

(Well, we could put orders as the outside of a nestloop and then we'd
have such a constraint, but with 30000 orders rows to process that plan
would lose big.)

(And yes, the planner did consider such a plan along the way.
See choose_bitmap_and.)

            regards, tom lane

Re: Hash join on int takes 8..114 seconds

From

"Andrus"

Date:

21 November 2008, 16:09:06

Alvaro,

> 1. vacuum_cost_delay does not affect vacuum full
> 2. vacuum full is always blocking, regardless of settings

So only way is to disable other database acces if vacuum full is required.

> So I gather you're not doing any vacuuming, eh?

Log files for every day are full of garbage messages below.
So I hope that vacuum is running well, isn't it ?

Andrus.

2008-11-19 00:00:48 EET    11728  1 LOG:  autovacuum: processing database
"postgres"
2008-11-19 00:01:48 EET    11729  1 LOG:  autovacuum: processing database
"mydb1"
2008-11-19 00:02:48 EET    11730  1 LOG:  autovacuum: processing database
"emydb1"
2008-11-19 00:03:48 EET    11731  1 LOG:  autovacuum: processing database
"template1"
2008-11-19 00:04:48 EET    11732  1 LOG:  autovacuum: processing database
"testmydb1"
2008-11-19 00:05:48 EET    11733  1 LOG:  autovacuum: processing database
"mydb3"
2008-11-19 00:06:48 EET    11734  1 LOG:  autovacuum: processing database
"postgres"
2008-11-19 00:07:48 EET    11735  1 LOG:  autovacuum: processing database
"mydb1"
2008-11-19 00:08:48 EET    11736  1 LOG:  autovacuum: processing database
"emydb1"
2008-11-19 00:09:48 EET    11737  1 LOG:  autovacuum: processing database
"template1"
2008-11-19 00:10:48 EET    11750  1 LOG:  autovacuum: processing database
"testmydb1"
2008-11-19 00:11:48 EET    11751  1 LOG:  autovacuum: processing database
"mydb3"
2008-11-19 00:12:48 EET    11752  1 LOG:  autovacuum: processing database
"postgres"
2008-11-19 00:13:48 EET    11753  1 LOG:  autovacuum: processing database
"mydb1"
2008-11-19 00:14:48 EET    11754  1 LOG:  autovacuum: processing database
"emydb1"
2008-11-19 00:15:48 EET    11755  1 LOG:  autovacuum: processing database
"template1"
2008-11-19 00:16:48 EET    11756  1 LOG:  autovacuum: processing database
"testmydb1"
2008-11-19 00:17:48 EET    11757  1 LOG:  autovacuum: processing database
"mydb3"
...

Re: Hash join on int takes 8..114 seconds

From

Alvaro Herrera

Date:

21 November 2008, 16:21:01

Andrus wrote:

>> So I gather you're not doing any vacuuming, eh?
>
> Log files for every day are full of garbage messages below.
> So I hope that vacuum is running well, isn't it ?

This does not really mean that autovacuum has done anything in the
databases.  If the times are consistently separated by 1 min, then it's
possible that it always exits without doing anything.

In such old a release we didn't have any decent logging mechanism in
autovacuum :-(  You can change log_min_messages to debug2 to see if it
actually does anything or not.

I suggest you connect to the problem database (and then to all others,
just to be sure) and run "vacuum" (no full).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Hash join on int takes 8..114 seconds

From

Tomas Vondra

Date:

21 November 2008, 16:48:24

> Thank you.
> My 8.1.4 postgresql.conf does not contain such option. So
> vacuum_cost_delay is off probably.
> Since doc does not recommend any value, I planned to use 2000
>
> Will value of 30 allow other clients to work when VACUUM FULL is running ?

No, as someone already noted the VACUUM FULL is blocking anyway (and
does not use this value at all).

> Uncommented relevant values in postgresql.conf file are:
>
> shared_buffers = 15000
> work_mem = 512

I'd consider increasing this value a little - 0.5 MB seems too low to me
(but not necessarily).

> maintenance_work_mem = 131072
> fsync = on
> effective_cache_size= 70000

Well, your server has 2GB of RAM and usually it's recommended to set
this value to about 60-70% of your RAM, so using 540MB (25%) seems quite
low. Anyway this is just a hint to PostgreSQL, it does not increase
memory consumption or so - it's just an estimate of how much data are
cached by kernel.

Anyway, I don't expect these values have significant effect in case of
the issue solved in this thread.

regards
Tomas

Re: Hash join on int takes 8..114 seconds

From

Tomas Vondra

Date:

21 November 2008, 17:22:58

> 2. Run the following commands periodically in this order:
>
> VACUUM FULL;
> vacuum full pg_shdepend;
> CLUSTER rid on (toode);
> CLUSTER dok  on (kuupaev);
> REINDEX DATABASE mydb;
> REINDEX SYSTEM mydb;
> ANALYZE;
>
> Are all those command required or can something leaved out ?

Running CLUSTER after VACUUM FULL is just a waste of time. In my
experience CLUSTER is actually faster in case of such heavily bloated
tables - I think this is caused by the fact that it creates indexes from
the beginning instead of updating them (as VACUUM FULL does).

So CLUSTER actually performs REINDEX, so I'd just run

CLUSTER rid ON rid_pkey;
CLUSTER dok ON dok_pkey;
ANALYZE rid;
ANALYZE dok;

Clustering by other indexes might give better performance, using primary
keys is just a safe guess here. This should improve the performance of
your query and it seems these two tables are the most bloated ones.

I wouldn't do the same maintenance on the other tables now - it's just a
waste of time.

>
>> Several other things to consider:
>>
>> 1) Regarding the toode column - why are you using CHAR(20) when the
>> values
>> are actually shorter? This may significantly increase the amount of space
>> required.
>
> There may be some products whose codes may be up to 20 characters.
> PostgreSQL does not hold trailing spaces in db, so this does *not*
> affect to
> space.

OK, I haven't realized this. You're right.

>> 2) I've noticed the CPU used is Celeron, which may negatively affect the
>> speed of hash computation. I'd try to replace it by something faster -
>> say
>> INTEGER as an artificial primary key of the "toode" table and using it as
>> a FK in other tables.  This might improve the "Bitmap Heap Scan on rid"
>> part, but yes - it's just a minor improvement compared to the "Hash Join"
>> part of the query.
>
> Natural key Toode CHAR(20) is used widely in different queries.
> Replacing it with INT surrogate key requires major application rewrite.
>
> Should I add surrogate index INT columns to toode and rid table and measure
> test query speed in this case?

Test it. Create tables with fake data, and compare the performance with
and without the surrogate keys. Using a simple data type instead of text
  gave me huge performance boost. For example one of my colleagues used
VARCHAR(15) to store IP addresses, and then used them to join tables
(and suffered by the poor perfomance). Replacing them by INET improved
the speed by several orders of magnitude.

>> Materialized views seem like a good idea to me, but maybe I'm not seeing
>> something. What do you mean by "reports are different"? If there is a lot
>> of rows for a given product / day, then creating an aggregated table with
>> (product code / day) as a primary key is quite simple. It may require a
>> lot of disk space, but it'll remove the hash join overhead. But if the
>> queries are very different, then it may be difficult to build such
>> materialized view(s).
>
> log file seems that mostly only those queries are slow:
>
> SELECT ...
>   FROM dok JOIN rid USING (dokumnr)
>   JOIN ProductId USING (ProductId)
>   WHERE rid.ProductId LIKE :p1 || '%' AND dok.SaleDate>=:p2
>
> :p1 and :p2 are parameters different for different queries.
>
> dok contains several years of data. :p2 is usually only few previous months
> or last year ago.
> SELECT column list contains fixed list of known columns from all tables.
>
> How to create index or materialized view to optimize this types of
> queries ?

Well, difficult to answer without detailed information about the queries
you want to run, aggregated values, etc. Materialized views is a world
on it's own, and the best solution depends on (for example):

1) what aggregated values are you interested in (additive values are the
    most primitive ones, while VARIANCE etc. make it difficult)

2) do you need current data, or is it OK that today's data are not
    available till midnight (for example)?

Another thing you have to consider is whether you want to create
materialized view with final or intermediary data and then compute the
final data somehow (for example monthly totals from daily totals).

The most primitive (but often sufficient) solution is recreating the
materialized view periodically (for example every midnight). In your
case it'd mean running something like

CREATE TABLE materialized_view AS SELECT ... your query here ...
GROUP BY productId, saleDate

This gives you daily totals for each product - the clients then can run
another query to compute the final data.

But of course, if you need to maintain 'current' data, you may create a
set of triggers to update the materialized view. Either after each
modification or (more sophisticated) when it's needed.

See for example a great presentation from this year's PGCon:

http://www.pgcon.org/2008/schedule/events/69.en.html


regards
Tomas

Re: Hash join on int takes 8..114 seconds

From

"Andrus"

Date:

21 November 2008, 17:53:11

> If it's not a million rows, then the table is bloated. Try (as postgres
> or some other db superuser) "vacuum full pg_shdepend" and a "reindex
> pg_shdepend".

reindex table pg_shdepend causes error

ERROR:  shared table "pg_shdepend" can only be reindexed in stand-alone mode

vacuum full verbose pg_shdepend seems to work but indexes are still bloated.
How to remove index bloat ?

sizes after vacuum full are below.
pg_shdepend  size 1234 MB includes its index sizes, so indexes are 100%
bloated.

      4         1214 pg_catalog.pg_shdepend                        1234 MB
      6         1232 pg_catalog.pg_shdepend_depender_index         795 MB
      7         1233 pg_catalog.pg_shdepend_reference_index        439 MB

Andrus.


vacuum full verbose pg_shdepend;

INFO:  vacuuming "pg_catalog.pg_shdepend"
INFO:  "pg_shdepend": found 254 removable, 3625 nonremovable row versions in
131517 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 49 to 49 bytes long.
There were 16115259 unused item pointers.
Total free space (including removable row versions) is 1010091872 bytes.
131456 pages are or will become empty, including 8 at the end of the table.
131509 pages containing 1010029072 free bytes are potential move
destinations.
CPU 2.08s/0.92u sec elapsed 63.51 sec.
INFO:  index "pg_shdepend_depender_index" now contains 3625 row versions in
101794 pages
DETAIL:  254 index row versions were removed.
101611 index pages have been deleted, 20000 are currently reusable.
CPU 0.87s/0.28u sec elapsed 25.44 sec.
INFO:  index "pg_shdepend_reference_index" now contains 3625 row versions in
56139 pages
DETAIL:  254 index row versions were removed.
56076 index pages have been deleted, 20000 are currently reusable.
CPU 0.51s/0.15u sec elapsed 23.10 sec.
INFO:  "pg_shdepend": moved 1518 row versions, truncated 131517 to 25 pages
DETAIL:  CPU 5.26s/2.39u sec elapsed 89.93 sec.
INFO:  index "pg_shdepend_depender_index" now contains 3625 row versions in
101794 pages
DETAIL:  1518 index row versions were removed.
101609 index pages have been deleted, 20000 are currently reusable.
CPU 0.94s/0.28u sec elapsed 24.61 sec.
INFO:  index "pg_shdepend_reference_index" now contains 3625 row versions in
56139 pages
DETAIL:  1518 index row versions were removed.
56088 index pages have been deleted, 20000 are currently reusable.
CPU 0.54s/0.14u sec elapsed 21.11 sec.

Query returned successfully with no result in 253356 ms

Re: Hash join on int takes 8..114 seconds

From

Scott Carey

Date:

21 November 2008, 19:11:55

If Autovacuum was working, and your tables still got very bloated, it may be because your free space map is not
configuredlarge enough.
 
What is your value for max_fsm_pages?

The effect of having max_fsm_pages or max_fsm_relations too small is bloating of tables and indexes.

Increasing it too large will not use a lot of memory.  Since you had over 130000 free pages in just one table below,
makesure it is set to at least 150000.  This may be overkill with regular vacuum or autovacuum, however its much better
tohave this too large than too small.
 

Your server has 2GB of RAM?  You should make sure your shared_buffers is between 100MB and 400MB if this is a dedicated
server,or more in some cases.
 

If you can, plan to migrate to 8.3 (or 8.4 early next year).  Vacuum, bloating, and configuration related to these have
improveda great deal since 8.1.
 

Although your Indexes below remain bloated, the fact that they have been vacuumed, combined with a large enough value
setin max_fsm_pages, means that they should not get any larger.  I am not sure how to foce these to be smaller.
 
The larger size, after a vacuum and large enough max_fsm_pages value will also not cause a performance problem.  It
willwaste some disk space and be a bit slower to access, but only very slightly.
 


-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Andrus
Sent: Friday, November 21, 2008 1:51 PM
To: Richard Huxton
Cc: PFC; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Hash join on int takes 8..114 seconds

> If it's not a million rows, then the table is bloated. Try (as postgres
> or some other db superuser) "vacuum full pg_shdepend" and a "reindex
> pg_shdepend".

reindex table pg_shdepend causes error

ERROR:  shared table "pg_shdepend" can only be reindexed in stand-alone mode

vacuum full verbose pg_shdepend seems to work but indexes are still bloated.
How to remove index bloat ?

sizes after vacuum full are below.
pg_shdepend  size 1234 MB includes its index sizes, so indexes are 100%
bloated.

      4         1214 pg_catalog.pg_shdepend                        1234 MB
      6         1232 pg_catalog.pg_shdepend_depender_index         795 MB
      7         1233 pg_catalog.pg_shdepend_reference_index        439 MB

Andrus.


vacuum full verbose pg_shdepend;

INFO:  vacuuming "pg_catalog.pg_shdepend"
INFO:  "pg_shdepend": found 254 removable, 3625 nonremovable row versions in
131517 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 49 to 49 bytes long.
There were 16115259 unused item pointers.
Total free space (including removable row versions) is 1010091872 bytes.
131456 pages are or will become empty, including 8 at the end of the table.
131509 pages containing 1010029072 free bytes are potential move
destinations.
CPU 2.08s/0.92u sec elapsed 63.51 sec.
INFO:  index "pg_shdepend_depender_index" now contains 3625 row versions in
101794 pages
DETAIL:  254 index row versions were removed.
101611 index pages have been deleted, 20000 are currently reusable.
CPU 0.87s/0.28u sec elapsed 25.44 sec.
INFO:  index "pg_shdepend_reference_index" now contains 3625 row versions in
56139 pages
DETAIL:  254 index row versions were removed.
56076 index pages have been deleted, 20000 are currently reusable.
CPU 0.51s/0.15u sec elapsed 23.10 sec.
INFO:  "pg_shdepend": moved 1518 row versions, truncated 131517 to 25 pages
DETAIL:  CPU 5.26s/2.39u sec elapsed 89.93 sec.
INFO:  index "pg_shdepend_depender_index" now contains 3625 row versions in
101794 pages
DETAIL:  1518 index row versions were removed.
101609 index pages have been deleted, 20000 are currently reusable.
CPU 0.94s/0.28u sec elapsed 24.61 sec.
INFO:  index "pg_shdepend_reference_index" now contains 3625 row versions in
56139 pages
DETAIL:  1518 index row versions were removed.
56088 index pages have been deleted, 20000 are currently reusable.
CPU 0.54s/0.14u sec elapsed 21.11 sec.

Query returned successfully with no result in 253356 ms




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Hash join on int takes 8..114 seconds

From

PFC

Date:

22 November 2008, 13:09:50

On Fri, 21 Nov 2008 21:07:02 +0100, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> PFC <lists@peufeu.com> writes:
>> Index on orders_products( product_id ) and orders_products( order_id ):
>>     => Same plan
>
>>     Note that in this case, a smarter planner would use the new index to
>> perform a BitmapAnd before hitting the heap to get the rows.
>
> Considering that the query has no constraint on
> orders_products.order_id, I'm not sure what you think the extra index is
> supposed to be used *for*.
>
> (Well, we could put orders as the outside of a nestloop and then we'd
> have such a constraint, but with 30000 orders rows to process that plan
> would lose big.)
>
> (And yes, the planner did consider such a plan along the way.
> See choose_bitmap_and.)
>
>             regards, tom lane


    I think I didn't express myself correctly...

    Here the indexes are small (therefore well cached) but the
orders_products table is large (and not cached).
    To reproduce this, I put this table on a crummy slow external USB drive.
    Between each of the following queries, pg was stopped, the USB drive
unmounted, remounted, and pg restarted, to purge orders_products table out
of all caches.
    I also modified the statistical distribution (see init script at bottom
of message).

EXPLAIN ANALYZE SELECT count(*)
 FROM orders
JOIN orders_products USING (order_id)
WHERE orders.order_date BETWEEN '2000-01-01' AND '2000-02-01'
AND orders_products.product_id = 2345;
                                                                         QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=5431.93..5431.94 rows=1 width=0) (actual
time=5176.382..5176.382 rows=1 loops=1)
    ->  Hash Join  (cost=1575.13..5431.84 rows=35 width=0) (actual
time=62.634..5176.332 rows=36 loops=1)
          Hash Cond: (orders_products.order_id = orders.order_id)
          ->  Bitmap Heap Scan on orders_products  (cost=21.27..3864.85
rows=1023 width=4) (actual time=7.041..5118.512 rows=1004 loops=1)
                Recheck Cond: (product_id = 2345)
                ->  Bitmap Index Scan on orders_products_product_order
(cost=0.00..21.02 rows=1023 width=0) (actual time=0.531..0.531 rows=1004
loops=1)
                      Index Cond: (product_id = 2345)
          ->  Hash  (cost=1130.58..1130.58 rows=33862 width=4) (actual
time=55.526..55.526 rows=31999 loops=1)
                ->  Index Scan using orders_date on orders
(cost=0.00..1130.58 rows=33862 width=4) (actual time=0.139..33.466
rows=31999 loops=1)
                      Index Cond: ((order_date >= '2000-01-01'::date) AND
(order_date <= '2000-02-01'::date))
  Total runtime: 5176.659 ms

    This is the original query ; what I don't like about it is that it
bitmapscans orders_products way too much, because it reads all orders for
the specified product, not just orders in the date period we want.

    However, since Postgres scanned all order_id's corresponding to the date
range already, to build the hash, the list of order_ids of interest is
known at no extra cost. In this case, additionnally, correlation is 100%
between order_id and date, so I can do :

test=> SELECT max(order_id), min(order_id) FROM orders WHERE order_date
BETWEEN '2000-01-01' AND '2000-02-01';
   max  | min
-------+-----
  31999 |   1

    And I can add an extra condition to the query, like this :

EXPLAIN ANALYZE SELECT count(*)
 FROM orders
JOIN orders_products USING (order_id)
WHERE orders.order_date BETWEEN '2000-01-01' AND '2000-02-01'
AND orders_products.order_id BETWEEN 1 AND 31999
AND orders_products.product_id = 2345;
                                                                              QUERY
PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=426.80..426.81 rows=1 width=0) (actual
time=179.233..179.233 rows=1 loops=1)
    ->  Nested Loop  (cost=0.00..426.79 rows=1 width=0) (actual
time=6.667..179.190 rows=36 loops=1)
          ->  Index Scan using orders_products_product_order on
orders_products  (cost=0.00..142.11 rows=34 width=4) (actual
time=6.559..177.597 rows=36 loops=1)
                Index Cond: ((product_id = 2345) AND (order_id >= 1) AND
(order_id <= 31999))
          ->  Index Scan using orders_pkey on orders  (cost=0.00..8.36
rows=1 width=4) (actual time=0.039..0.041 rows=1 loops=36)
                Index Cond: (orders.order_id = orders_products.order_id)
                Filter: ((orders.order_date >= '2000-01-01'::date) AND
(orders.order_date <= '2000-02-01'::date))
  Total runtime: 179.392 ms

    This is with no cache on orders_products table. About 30X faster.
    Interestingly, when everything is cached, it's even faster (about 100X)...

    The plan I was thinking about was not a nested loop with 30K loops...
this would be bad as you said. It would have been something like this :

- There is an index on (product_id, order_id)

- Build the hash from orders table (can't avoid it)

->  Hash
    ->  Index Scan using orders_date on orders
        Index Cond: ((order_date >= '2000-01-01'::date) AND (order_date <=
'2000-02-01'::date))

- A slightly twisted bitmap scan form :

->  Bitmap Heap Scan on orders_products
     Recheck Cond: (product_id = 2345) AND order_id IN (hash created above))
     ->  Bitmap Index Scan on orders_products_product_order
           Index Cond: (product_id = 2345 AND order_id IN (hash created
above))

    The Bitmap Index Scan sees the order_ids in the index it is scanning...
they could be checked before checking the visibility in the heap for the
big table.




Test script:


BEGIN;
CREATE TABLE orders (order_id INTEGER NOT NULL, order_date DATE NOT NULL);
CREATE TABLE products (product_id INTEGER NOT NULL, product_name TEXT NOT
NULL);
CREATE TABLE orders_products (order_id INTEGER NOT NULL, product_id
INTEGER NOT NULL, padding1 TEXT, padding2 TEXT) TABLESPACE usb;

INSERT INTO products SELECT n, 'product number ' || n::TEXT FROM
generate_series(1,10001) AS n;
INSERT INTO orders SELECT n,'2000-01-01'::date + (n/1000 * '1
DAY'::interval) FROM generate_series(1,1000000) AS n;

SET work_mem TO '1GB';
INSERT INTO orders_products SELECT
a,b,'aibaifbaurgbyioubyfazierugybfoaybofauez',
'hfohbdsqbhjhqsvdfiuazvfgiurvgazrhbazboifhaoifh'
   FROM (SELECT DISTINCT (1+(n/10))::INTEGER AS a,
(1+(random()*10000))::INTEGER AS b FROM generate_series( 1,9999999 ) AS n)
AS x;

DELETE FROM orders_products WHERE product_id NOT IN (SELECT product_id
 FROM products);
DELETE FROM orders_products WHERE order_id NOT IN (SELECT order_id FROM
orders);
ALTER TABLE orders ADD PRIMARY KEY (order_id);
ALTER TABLE products ADD PRIMARY KEY (product_id);
ALTER TABLE orders_products ADD PRIMARY KEY (order_id,product_id);
ALTER TABLE orders_products ADD FOREIGN KEY (product_id) REFERENCES
products( product_id ) ON DELETE CASCADE;
ALTER TABLE orders_products ADD FOREIGN KEY (order_id) REFERENCES orders(
order_id ) ON DELETE CASCADE;
CREATE INDEX orders_date ON orders( order_date );
CREATE INDEX orders_products_product_order ON orders_products( product_id,
order_id );
COMMIT;
SET work_mem TO DEFAULT;
ANALYZE;

Re: Hash join on int takes 8..114 seconds

From

"Andrus"

Date:

22 November 2008, 16:19:08

> You could perhaps run a little check on the performance of the RAID, is
> it better than linux software RAID ?
> Does it leverage NCQ appropriately when running queries in parallel ?

I was told that this RAID is software RAID.
I have no experience what to check.
This HP server was installed 3 years ago and in this time it was not
high perfomance server.

>>> explain analyze
>>> SELECT sum(1)
>>>   FROM dok JOIN rid USING (dokumnr)
>>>   JOIN toode USING (toode)
>>>   LEFT JOIN artliik using(grupp,liik)
>>>   WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01'
>
> By the way, note that the presence of the toode table in the query above
> is not required at all, unless you use columns of toode in your
> aggregates.

In real query, SELECT column list contains data form sales table dok (sale
date and time)
and sales detail table rid (quantity, price)
WHERE clause may contain additional filters from product table (product
category, supplier).

> Let's play with that, after all, it's friday night.

Thank you very much for great sample.
I tried to create testcase from this to match production db:

1.2 million orders
3.5 million order details
13400 products with char(20) as primary keys containing ean-13 codes mostly
3 last year data
every order has usually 1..3 detail lines
same product can appear multiple times in order
products are queried by start of code

This sample does not distribute products randomly between orders.
How to change this so that every order contains 3 (or 1..6 ) random
products?
I tried to use random row sample from
 http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks-i

but in this case constant product is returned always. It seems than query
containing randon() is executed only once.


Andrus.

begin;
CREATE TEMP TABLE orders (order_id INTEGER NOT NULL, order_date DATE NOT
NULL);
CREATE TEMP TABLE products (product_id CHAR(20) NOT NULL, product_name
char(70) NOT NULL, quantity numeric(12,2) default 1);
CREATE TEMP TABLE orders_products (order_id INTEGER NOT NULL, product_id
CHAR(20),padding1 char(70),
  id serial, price numeric(12,2) default 1 );

INSERT INTO products SELECT (n*power( 10,13))::INT8::CHAR(20),
       'product number ' || n::TEXT FROM generate_series(0,13410) AS n;

INSERT INTO orders
SELECT n,'2005-01-01'::date + (4000.0 * n/3500000.0 * '1 DAY'::interval)
FROM generate_series(0,3500000/3) AS n;

SET work_mem TO 2097151; -- 1048576;

INSERT INTO orders_products SELECT
   generate_series/3 as  order_id,
   ( ((generate_series/3500000.0)*13410.0)::int*power(
10,13))::INT8::CHAR(20)
FROM generate_series(1,3500000)
where generate_series/3>0;

ALTER TABLE orders ADD PRIMARY KEY (order_id);
ALTER TABLE products ADD PRIMARY KEY (product_id);
ALTER TABLE orders_products ADD PRIMARY KEY (id);

ALTER TABLE orders_products ADD FOREIGN KEY (product_id) REFERENCES
products(product_id);
ALTER TABLE orders_products ADD FOREIGN KEY (order_id) REFERENCES
orders(order_id) ON DELETE CASCADE;

CREATE INDEX orders_date ON orders( order_date );
CREATE INDEX order_product_pattern_idx ON orders_products( product_id
bpchar_pattern_ops );

COMMIT;
SET work_mem TO DEFAULT;
ANALYZE;

  SELECT sum(quantity*price)
 FROM   orders
JOIN orders_products USING (order_id)
JOIN products USING (product_id)
WHERE orders.order_date>='2008-01-17'
and orders_products.product_id like '130%'

Re: Hash join on int takes 8..114 seconds

From

PFC

Date:

23 November 2008, 03:21:05

> Thank you very much for great sample.
> I tried to create testcase from this to match production db:
>
> 1.2 million orders
> 3.5 million order details
> 13400 products with char(20) as primary keys containing ean-13 codes
> mostly
> 3 last year data
> every order has usually 1..3 detail lines
> same product can appear multiple times in order
> products are queried by start of code
>
> This sample does not distribute products randomly between orders.
> How to change this so that every order contains 3 (or 1..6 ) random
> products?
> I tried to use random row sample from
>  http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks-i
>
> but in this case constant product is returned always. It seems than
> query containing randon() is executed only once.

    You could try writing a plpgsql function which would generate the data
set.
    Or you could use your existing data set.

    By the way, a simple way to de-bloat your big table without blocking
would be this :

- stop all inserts and updates
- begin
- create table new like old table
- insert into new select * from old (order by perhaps)
- create indexes
- rename new into old
- commit

    If this is just a reporting database where you insert a batch of new data
every day, for instance, that's very easy to do. If it's OLTP, then, no.

Re: Hash join on int takes 8..114 seconds

From

"Andrus"

Date:

23 November 2008, 10:40:51

> You could try writing a plpgsql function which would generate the data
> set.
> Or you could use your existing data set.

Creating 3.5 mln rows using stored proc is probably slow.
Probably it would be better and faster to use some random() and
generate_series() trick.
In this case others can try it and dataset generation is faster.

> By the way, a simple way to de-bloat your big table without blocking
> would be this :
>
> - stop all inserts and updates
> - begin
> - create table new like old table
> - insert into new select * from old (order by perhaps)
> - create indexes
> - rename new into old
> - commit
>
> If this is just a reporting database where you insert a batch of new data
> every day, for instance, that's very easy to do. If it's OLTP, then, no.

Those are orders and order_products tables.
I ran vacuum full analyze verbose last night.
Now database has 4832 MB size, including 1 GB
pg_shdepend bloated indexes.
I added  max_fsm_pages=150000 and re-booted.

Query below and other queries are still too slow

set search_path to firma2,public;
explain analyze
SELECT sum(1)
   FROM dok JOIN rid USING (dokumnr)
   JOIN toode USING (toode)
   WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01'

"Aggregate  (cost=181795.13..181795.14 rows=1 width=0) (actual
time=23678.265..23678.268 rows=1 loops=1)"
"  ->  Nested Loop  (cost=73999.44..181733.74 rows=24555 width=0) (actual
time=18459.230..23598.956 rows=21476 loops=1)"
"        ->  Index Scan using toode_pkey on toode  (cost=0.00..6.01 rows=1
width=24) (actual time=0.134..0.145 rows=1 loops=1)"
"              Index Cond: ('X05'::bpchar = toode)"
"        ->  Hash Join  (cost=73999.44..181482.18 rows=24555 width=24)
(actual time=18459.076..23441.098 rows=21476 loops=1)"
"              Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
"              ->  Bitmap Heap Scan on rid  (cost=4082.88..101779.03
rows=270252 width=28) (actual time=9337.782..12720.365 rows=278182 loops=1)"
"                    Recheck Cond: (toode = 'X05'::bpchar)"
"                    ->  Bitmap Index Scan on rid_toode_idx
(cost=0.00..4082.88 rows=270252 width=0) (actual time=9330.634..9330.634
rows=278183 loops=1)"
"                          Index Cond: (toode = 'X05'::bpchar)"
"              ->  Hash  (cost=69195.13..69195.13 rows=112573 width=4)
(actual time=8894.465..8894.465 rows=109890 loops=1)"
"                    ->  Bitmap Heap Scan on dok  (cost=1492.00..69195.13
rows=112573 width=4) (actual time=1618.763..8404.847 rows=109890 loops=1)"
"                          Recheck Cond: (kuupaev >= '2008-09-01'::date)"
"                          ->  Bitmap Index Scan on dok_kuupaev_idx
(cost=0.00..1492.00 rows=112573 width=0) (actual time=1612.177..1612.177
rows=110484 loops=1)"
"                                Index Cond: (kuupaev >=
'2008-09-01'::date)"
"Total runtime: 23678.790 ms"


Here is a list of untried recommendations from this thread:

1. CLUSTER rid ON rid_toode_pkey ; CLUSTER dok ON dok_kuupaev_idx
- In 8.1.4 provided form  of CLUSTER causes syntax error, no idea what
syntax to use.
Risky to try in prod server. Requires creating  randomly distributed
product_id testcase to measure
difference.

2. Change CHAR(20) product index to int index by adding update trigger.
Risky to try in prod server. Requires creating  randomly distributed
product_id testcase to measure
difference.

3. Denormalization of sale date to order_producs table by adding update
trigger.
Risky to try in prod server. Requires creating  randomly distributed
product_id testcase to measure
difference.

4. Check on the performance of the RAID: Does it leverage NCQ appropriately
when running queries in parallel ?
 No idea how.

5. Materialized views. I need date granularity so it is possible to sum only
one days sales.
http://www.pgcon.org/2008/schedule/events/69.en.html
Seems to be major appl re-write, no idea how.

Appoaches which probably does not change perfomance:

6. Upgrade to 8.4 or to 8.3.5

7. run server on standalone mode and recover 1 GB pg_shdepend bloated index.

8. tune some conf file parameters:
> work_mem = 512
I'd consider increasing this value a little - 0.5 MB seems too low to me
(but not necessarily).

> effective_cache_size= 70000
Well, your server has 2GB of RAM and usually it's recommended to set
this value to about 60-70% of your RAM, so using 540MB (25%) seems quite
low.

Data size is nearly the same as RAM size. It is unpleasant surprise that
queries take so long time.

What should I do next?


Andrus.

      1        40926 firma2.rid                                    1737 MB
      2        40595 firma2.dok                                    1632 MB
      3         1214 pg_catalog.pg_shdepend                        1235 MB
      4         1232 pg_catalog.pg_shdepend_depender_index         795 MB
      7         1233 pg_catalog.pg_shdepend_reference_index        439 MB
      8        44299 firma2.rid_toode_idx                          298 MB
      9        44286 firma2.dok_tasudok_idx                        245 MB
     10     19103791 firma2.rid_toode_pattern_idx                  202 MB
     11        44283 firma2.dok_klient_idx                         160 MB
     12        44298 firma2.rid_inpdokumnr_idx                     148 MB
     13        44297 firma2.rid_dokumnr_idx                        132 MB
     14        43573 firma2.rid_pkey                               130 MB
     17        40556 pg_toast.pg_toast_40552                       112 MB
     18        44288 firma2.dok_tasumata_idx                       103 MB
     19        44289 firma2.dok_tellimus_idx                       101 MB
     20        44284 firma2.dok_krdokumnr_idx                      101 MB
     21        44285 firma2.dok_kuupaev_idx                        94 MB
     22     19076304 firma2.rid_rtellimus_idx                      90 MB
     24        44282 firma2.dok_dokumnr_idx                        74 MB
     25        43479 firma2.dok_pkey                               74 MB
     26     18663923 firma2.dok_yksus_pattern_idx                  65 MB
     27     18801591 firma2.dok_sihtyksus_pattern_idx              64 MB
     32     18774881 firma2.dok_doktyyp                            47 MB


output from  vacuum full:


INFO:  free space map contains 14353 pages in 314 relations
DETAIL:  A total of 20000 page slots are in use (including overhead).
89664 page slots are required to track all free space.
Current limits are:  20000 page slots, 1000 relations, using 182 KB.
NOTICE:  number of page slots needed (89664) exceeds max_fsm_pages (20000)
HINT:  Consider increasing the configuration parameter "max_fsm_pages" to a
value over 89664.

Query returned successfully with no result in 10513335 ms.

Re: Hash join on int takes 8..114 seconds

From

Tomas Vondra

Date:

23 November 2008, 14:47:25

> Risky to try in prod server. Requires creating  randomly distributed
> product_id testcase to measure
> difference.
>
> What should I do next?

I guess you have backups - take them, restore the database on a
different machine (preferably with the same / similar hw config) and
tune the queries on it.

After restoring all the tables / indexes will be 'clean' (not bloated),
so you'll see if performing VACUUM FULL / CLUSTER is the right solution
or if you have to change the application internals.

Sure, the times will be slightly different but the performance problems
should remain the same.

regards
Tomas

Re: Hash join on int takes 8..114 seconds

From

Scott Carey

Date:

23 November 2008, 17:40:52

> Appoaches which probably does not change perfomance:

> 6. Upgrade to 8.4 or to 8.3.5

Both of these will improve performance a little, even with the same query plan and same data.  I would expect about a
10%improvement for 8.3.x on most memory bound select queries. 8.4 won't be out for a few months. 

> 7. run server on standalone mode and recover 1 GB pg_shdepend bloated index.

> 8. tune some conf file parameters:
> > work_mem = 512
> I'd consider increasing this value a little - 0.5 MB seems too low to me
> (but not necessarily).

This is very easy to try.  You can change work_mem for just a single session, and this can in some cases help
performancequite a bit, and in others not at all. 
I would not recommend having it lower than at least 4MB on a server like that unless you have a lot of concurrently
activequeries / connections. 
To try it, simply use the SET command.  To try out 32MB, just do:
SET work_mem = '32MB';
and the value will be changed locally for that session only.  See if it affects your test query or not.
http://www.postgresql.org/docs/8.3/interactive/sql-set.html

> > effective_cache_size= 70000
> Well, your server has 2GB of RAM and usually it's recommended to set
> this value to about 60-70% of your RAM, so using 540MB (25%) seems quite
> low.

> Data size is nearly the same as RAM size. It is unpleasant surprise that
> queries take so long time.

> What should I do next?

First, demonstrate that it is all or mostly in memory -- use iostat or other tools to ensure that there is not much
diskactivity during the query.  If your system doesn't have iostat installed, it should be installed.  It is a very
usefultool. 
If it is all cached in memory, you may want to ensure that your shared_buffers is a reasonalbe size so that there is
lessshuffling of data from the kernel to postgres and back.  Generally, shared_buffers works best between 5% and 25% of
systemmemory. 
If it is completely CPU bound then the work done for the query has to be reduced by altering the plan to a more optimal
oneor making the work it has to do at each step easier.  Most of the ideas in this thread revolve around those things. 

Based on the time it took to do the vacuum, I suspect your disk subsystem is a bit slow.  If it can be determined that
thereis much disk I/O in your use cases, there are generally several things that can be done to tune Linux I/O.  The
mainones in my experience are the 'readahead' value for each disk which helps sequential reads significantly, and
tryingout the linux 'deadline' scheduler and comparing it to the more commonly used 'cfq' scheduler.  If the system is
configuredwith the anticipatory scheduler, absolutely switch to cfq or deadline as the anticipatory scheduler will
performhorribly poorly for a database. 

> Andrus.

Re: Hash join on int takes 8..114 seconds

From

"Andrus"

Date:

23 November 2008, 17:41:19

> I guess you have backups - take them, restore the database on a different
> machine (preferably with the same / similar hw config) and tune the
> queries on it.
>
> After restoring all the tables / indexes will be 'clean' (not bloated), so
> you'll see if performing VACUUM FULL / CLUSTER is the right solution or if
> you have to change the application internals.
>
> Sure, the times will be slightly different but the performance problems
> should remain the same.

VACUUM FULL has
My test computer has PostgreSql 8.3, 4 GB RAM, SSD disks, Intel X2Extreme
CPU
So it is much faster than this prod server.
No idea how to emulate this environment.
I can create new db in prod server as old copy but this can be used in late
night only.

Where to find script which clones some database in server? Something like

CREATE DATABASE newdb AS SELECT *  FROM olddb;

It would be more convenient to run db cloning script from pgadmin command
window.
Only way I found is to use SSH with pg_dup/pg_restore. This requires SSH
access to server and SSH port opening to public  internet.

Or probably try to run CLUSTER command in prod server. Hopefully clustering
by product id cannot make things slow
too much.

Andrus.

Re: Hash join on int takes 8..114 seconds

From

"Andrus"

Date:

23 November 2008, 18:44:55

Scott,

thank you.

>> > work_mem = 512
>This is very easy to try.  You can change work_mem for just a single
>session, and this can in some cases help performance quite a bit, and in
>others not at all.
>I would not recommend having it lower than at least 4MB on a server like
>that unless you have a lot of concurrently active queries / connections.
>To try it, simply use the SET command.  To try out 32MB, just do:
>SET work_mem = '32MB';

8.1.4 requires int value.
SET work_mem = 33554432;
causes:
ERROR: 33554432 is outside the valid range for parameter "work_mem" (64 ..
2097151)

So max allowed value seems to be 2 MB
I tested it when this server was idle by running both queries several times
after VACUUM FULL was running

SET work_mem = 2097151;
set search_path to firma2,public;
explain analyze
SELECT sum(1)
   FROM dok JOIN rid USING (dokumnr)
   JOIN toode USING (toode)
   WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01'
"Aggregate  (cost=177291.36..177291.37 rows=1 width=0) (actual
time=5153.856..5153.859 rows=1 loops=1)"
"  ->  Nested Loop  (cost=73607.45..177229.96 rows=24561 width=0) (actual
time=1395.935..5071.247 rows=21541 loops=1)"
"        ->  Index Scan using toode_pkey on toode  (cost=0.00..6.01 rows=1
width=24) (actual time=0.078..0.087 rows=1 loops=1)"
"              Index Cond: ('X05'::bpchar = toode)"
"        ->  Hash Join  (cost=73607.45..176978.33 rows=24561 width=24)
(actual time=1395.836..4911.425 rows=21541 loops=1)"
"              Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
"              ->  Bitmap Heap Scan on rid  (cost=4083.10..101802.05
rows=270316 width=28) (actual time=238.578..2367.189 rows=278247 loops=1)"
"                    Recheck Cond: (toode = 'X05'::bpchar)"
"                    ->  Bitmap Index Scan on rid_toode_idx
(cost=0.00..4083.10 rows=270316 width=0) (actual time=150.868..150.868
rows=278248 loops=1)"
"                          Index Cond: (toode = 'X05'::bpchar)"
"              ->  Hash  (cost=69242.72..69242.72 rows=112651 width=4)
(actual time=1146.081..1146.081 rows=110170 loops=1)"
"                    ->  Bitmap Heap Scan on dok  (cost=1492.28..69242.72
rows=112651 width=4) (actual time=46.210..696.803 rows=110170 loops=1)"
"                          Recheck Cond: (kuupaev >= '2008-09-01'::date)"
"                          ->  Bitmap Index Scan on dok_kuupaev_idx
(cost=0.00..1492.28 rows=112651 width=0) (actual time=33.938..33.938
rows=110232 loops=1)"
"                                Index Cond: (kuupaev >=
'2008-09-01'::date)"
"Total runtime: 5154.911 ms"


SET work_mem to default;
set search_path to firma2,public;
explain analyze
SELECT sum(1)
   FROM dok JOIN rid USING (dokumnr)
   JOIN toode USING (toode)
   WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01'
"Aggregate  (cost=181869.36..181869.37 rows=1 width=0) (actual
time=7807.867..7807.871 rows=1 loops=1)"
"  ->  Nested Loop  (cost=74048.45..181807.96 rows=24561 width=0) (actual
time=2607.429..7728.138 rows=21541 loops=1)"
"        ->  Index Scan using toode_pkey on toode  (cost=0.00..6.01 rows=1
width=24) (actual time=0.079..0.091 rows=1 loops=1)"
"              Index Cond: ('X05'::bpchar = toode)"
"        ->  Hash Join  (cost=74048.45..181556.33 rows=24561 width=24)
(actual time=2607.332..7569.612 rows=21541 loops=1)"
"              Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
"              ->  Bitmap Heap Scan on rid  (cost=4083.10..101802.05
rows=270316 width=28) (actual time=1147.071..4528.845 rows=278247 loops=1)"
"                    Recheck Cond: (toode = 'X05'::bpchar)"
"                    ->  Bitmap Index Scan on rid_toode_idx
(cost=0.00..4083.10 rows=270316 width=0) (actual time=1140.337..1140.337
rows=278248 loops=1)"
"                          Index Cond: (toode = 'X05'::bpchar)"
"              ->  Hash  (cost=69242.72..69242.72 rows=112651 width=4)
(actual time=1240.988..1240.988 rows=110170 loops=1)"
"                    ->  Bitmap Heap Scan on dok  (cost=1492.28..69242.72
rows=112651 width=4) (actual time=68.053..769.448 rows=110170 loops=1)"
"                          Recheck Cond: (kuupaev >= '2008-09-01'::date)"
"                          ->  Bitmap Index Scan on dok_kuupaev_idx
(cost=0.00..1492.28 rows=112651 width=0) (actual time=61.358..61.358
rows=110232 loops=1)"
"                                Index Cond: (kuupaev >=
'2008-09-01'::date)"
"Total runtime: 7808.174 ms"

In both cases vmstat 2 shows only cpu activity when queries are running:

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
wa
...
 0  0    232 123692      0 1891044    0    0     0     0  252    17  0  0
100  0
 0  0    232 123692      0 1891044    0    0     0     0  252    17  0  0
100  0
 0  0    232 123684      0 1891044    0    0     0   162  254    22  0  0
100  0
 0  0    232 123684      0 1891044    0    0     0     0  252    18  0  0
100  0
 1  0    232 123056      0 1891444    0    0     0    13  254    21 62  5 34
0   <---- start of slower query
 1  0    232 102968      0 1911060    0    0     0    16  252    18 26 75  0
0
 1  0    232  77424      0 1936996    0    0     0     0  252    18 37 63  0
0
 1  0    232  71464      0 1941928    0    0     0    73  260    34 38 62  0
0
 0  0    232 123420      0 1891044    0    0     0    32  257    31  8 15 77
0 <-------- end of slower query
 0  0    232 123420      0 1891044    0    0     0    25  255    24  0  0
100  0
 0  0    232 123420      0 1891044    0    0     0    28  255    27  0  0
100  0


Is it safe to set

work_mem = 2097151

in postgresql.conf file ?

>First, demonstrate that it is all or mostly in memory -- use iostat or
>other tools to ensure that there is not much disk activity during the
>query.  If your system doesn't have iostat installed, it should be
>installed.  It is a very useful tool.

# iostat
bash: iostat: command not found
# locate iostat
/usr/src/linux-2.6.16-gentoo-r9/Documentation/iostats.txt

I have few experience in Linux. No idea how to install or compile iostat in
this system.

>If it is all cached in memory, you may want to ensure that your
>shared_buffers is a reasonalbe size so that there is less shuffling of data
>from the kernel to postgres and back.  Generally, shared_buffers works best
>between 5% and 25% of system memory.

currently shared_buffers = 15000

>If it is completely CPU bound then the work done for the query has to be
>reduced by altering the plan to a more optimal one or making the work it
>has to do at each step easier.  Most of the ideas in this thread revolve
>around those things.

When running on loaded server even after VACUUM FULL, response time for
original work_mem is  longer probably because it must fetch blocks from
disk.

>Based on the time it took to do the vacuum, I suspect your disk subsystem
>is a bit slow.  If it can be determined that there is much disk I/O in your
>use cases, there are generally several things that can be done to tune
>Linux I/O.  The main ones in my experience are the 'readahead' value for
>each disk which helps sequential reads significantly, and trying out the
>linux 'deadline' scheduler and comparing it to the more commonly used 'cfq'
>scheduler.  If the system is configured with the anticipatory scheduler,
>absolutely switch to cfq or deadline as the anticipatory scheduler will
>perform horribly poorly for a database.

This is 3 year old cheap server.
No idea what to config.

There is also other similar server which as 1.2 GB more usable memory. No
idea is it worth to switch into it.
After some years sales data will still exceed this more memory.

Andrus.

Re: Hash join on int takes 8..114 seconds

From

Tomas Vondra

Date:

23 November 2008, 19:11:42

> My test computer has PostgreSql 8.3, 4 GB RAM, SSD disks, Intel X2Extreme
> CPU
> So it is much faster than this prod server.
> No idea how to emulate this environment.
> I can create new db in prod server as old copy but this can be used in late
> night only.

Well, a faster but comparable system may not be a problem - the query
might run 10 times faster, but it still will be slow (say 40 seconds
instead of 8 minutes).

What is a problem is a different I/O system - SSD instead of traditional
drives in this case. I have no direct experience with with SSD yet, but
AFAIK the seek time is much better compared to regular drives (say 0.1ms
instead of 10ms, that is 100-times faster).

So you can't just put on old SATA drive into the test machine?

> Where to find script which clones some database in server? Something like
>
> CREATE DATABASE newdb AS SELECT *  FROM olddb;
 >
> It would be more convenient to run db cloning script from pgadmin command
> window.
> Only way I found is to use SSH with pg_dup/pg_restore. This requires SSH
> access to server and SSH port opening to public  internet.

Yes, using pg_dump | pg_restore is probably the way to clone database.
But it will slow down the system, as it has to do a lot of I/O (and as
it seems to be a bottleneck already, I don't think this is a good idea).

> Or probably try to run CLUSTER command in prod server. Hopefully clustering
> by product id cannot make things slow
> too much.

As already noted, CLUSTER command causes exclusive lock on the database.
So this is an operation you'd like to do on production server ...

Tomas

Re: Hash join on int takes 8..114 seconds

From

Tomas Vondra

Date:

23 November 2008, 19:39:52

> Scott,
>
> thank you.
>
>>> > work_mem = 512
>> This is very easy to try.  You can change work_mem for just a single
>> session, and this can in some cases help performance quite a bit, and in
>> others not at all.
>> I would not recommend having it lower than at least 4MB on a server like
>> that unless you have a lot of concurrently active queries / connections.
>> To try it, simply use the SET command.  To try out 32MB, just do:
>> SET work_mem = '32MB';
>
> 8.1.4 requires int value.
> SET work_mem = 33554432;
> causes:
> ERROR: 33554432 is outside the valid range for parameter "work_mem" (64 ..
> 2097151)
>
> So max allowed value seems to be 2 MB
> I tested it when this server was idle by running both queries several times
> after VACUUM FULL was running
>
> SET work_mem = 2097151;


No, not really. The work_mem value is specified in kilobytes, so you've
set it to 2GB :-)

> set search_path to firma2,public;
> explain analyze
> SELECT sum(1)
>   FROM dok JOIN rid USING (dokumnr)
>   JOIN toode USING (toode)
>   WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01'
> "Aggregate  (cost=177291.36..177291.37 rows=1 width=0) (actual
> time=5153.856..5153.859 rows=1 loops=1)"
> "  ->  Nested Loop  (cost=73607.45..177229.96 rows=24561 width=0) (actual
> time=1395.935..5071.247 rows=21541 loops=1)"
> "        ->  Index Scan using toode_pkey on toode  (cost=0.00..6.01 rows=1
> width=24) (actual time=0.078..0.087 rows=1 loops=1)"
> "              Index Cond: ('X05'::bpchar = toode)"
> "        ->  Hash Join  (cost=73607.45..176978.33 rows=24561 width=24)
> (actual time=1395.836..4911.425 rows=21541 loops=1)"
> "              Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
> "              ->  Bitmap Heap Scan on rid  (cost=4083.10..101802.05
> rows=270316 width=28) (actual time=238.578..2367.189 rows=278247 loops=1)"
> "                    Recheck Cond: (toode = 'X05'::bpchar)"
> "                    ->  Bitmap Index Scan on rid_toode_idx
> (cost=0.00..4083.10 rows=270316 width=0) (actual time=150.868..150.868
> rows=278248 loops=1)"
> "                          Index Cond: (toode = 'X05'::bpchar)"
> "              ->  Hash  (cost=69242.72..69242.72 rows=112651 width=4)
> (actual time=1146.081..1146.081 rows=110170 loops=1)"
> "                    ->  Bitmap Heap Scan on dok  (cost=1492.28..69242.72
> rows=112651 width=4) (actual time=46.210..696.803 rows=110170 loops=1)"
> "                          Recheck Cond: (kuupaev >= '2008-09-01'::date)"
> "                          ->  Bitmap Index Scan on dok_kuupaev_idx
> (cost=0.00..1492.28 rows=112651 width=0) (actual time=33.938..33.938
> rows=110232 loops=1)"
> "                                Index Cond: (kuupaev >=
> '2008-09-01'::date)"
> "Total runtime: 5154.911 ms"
>
>
> SET work_mem to default;
> set search_path to firma2,public;
> explain analyze
> SELECT sum(1)
>   FROM dok JOIN rid USING (dokumnr)
>   JOIN toode USING (toode)
>   WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01'
> "Aggregate  (cost=181869.36..181869.37 rows=1 width=0) (actual
> time=7807.867..7807.871 rows=1 loops=1)"
> "  ->  Nested Loop  (cost=74048.45..181807.96 rows=24561 width=0) (actual
> time=2607.429..7728.138 rows=21541 loops=1)"
> "        ->  Index Scan using toode_pkey on toode  (cost=0.00..6.01 rows=1
> width=24) (actual time=0.079..0.091 rows=1 loops=1)"
> "              Index Cond: ('X05'::bpchar = toode)"
> "        ->  Hash Join  (cost=74048.45..181556.33 rows=24561 width=24)
> (actual time=2607.332..7569.612 rows=21541 loops=1)"
> "              Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
> "              ->  Bitmap Heap Scan on rid  (cost=4083.10..101802.05
> rows=270316 width=28) (actual time=1147.071..4528.845 rows=278247 loops=1)"
> "                    Recheck Cond: (toode = 'X05'::bpchar)"
> "                    ->  Bitmap Index Scan on rid_toode_idx
> (cost=0.00..4083.10 rows=270316 width=0) (actual time=1140.337..1140.337
> rows=278248 loops=1)"
> "                          Index Cond: (toode = 'X05'::bpchar)"
> "              ->  Hash  (cost=69242.72..69242.72 rows=112651 width=4)
> (actual time=1240.988..1240.988 rows=110170 loops=1)"
> "                    ->  Bitmap Heap Scan on dok  (cost=1492.28..69242.72
> rows=112651 width=4) (actual time=68.053..769.448 rows=110170 loops=1)"
> "                          Recheck Cond: (kuupaev >= '2008-09-01'::date)"
> "                          ->  Bitmap Index Scan on dok_kuupaev_idx
> (cost=0.00..1492.28 rows=112651 width=0) (actual time=61.358..61.358
> rows=110232 loops=1)"
> "                                Index Cond: (kuupaev >=
> '2008-09-01'::date)"
> "Total runtime: 7808.174 ms"
>
> In both cases vmstat 2 shows only cpu activity when queries are running:
>
> procs -----------memory---------- ---swap-- -----io---- --system--
> ----cpu----
> r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
> wa
> ...
> 0  0    232 123692      0 1891044    0    0     0     0  252    17  0  0
> 100  0
> 0  0    232 123692      0 1891044    0    0     0     0  252    17  0  0
> 100  0
> 0  0    232 123684      0 1891044    0    0     0   162  254    22  0  0
> 100  0
> 0  0    232 123684      0 1891044    0    0     0     0  252    18  0  0
> 100  0
> 1  0    232 123056      0 1891444    0    0     0    13  254    21 62  5 34
> 0   <---- start of slower query
> 1  0    232 102968      0 1911060    0    0     0    16  252    18 26 75  0
> 0
> 1  0    232  77424      0 1936996    0    0     0     0  252    18 37 63  0
> 0
> 1  0    232  71464      0 1941928    0    0     0    73  260    34 38 62  0
> 0
> 0  0    232 123420      0 1891044    0    0     0    32  257    31  8 15 77
> 0 <-------- end of slower query
> 0  0    232 123420      0 1891044    0    0     0    25  255    24  0  0
> 100  0
> 0  0    232 123420      0 1891044    0    0     0    28  255    27  0  0
> 100  0
>

Well, this parameter specifies how much memory may be used for in-memory
sorting and hash tables. If more memory is needed, the sort / hash table
will be performed on-disk.

I guess the difference in speed is not huge, so in this case the hash
table (used for join) probably fits into the 1024kB (which is a default
value).

> Is it safe to set
>
> work_mem = 2097151
>
> in postgresql.conf file ?

Each sort in a query uses a separate area in a memory (up to work_mem).
So if you have 10 sessions running a query with 2 sorts, that may occupy

    10 * 2 * work_mem

of memory. Let's suppose you set a reasonable value (say 8096) instead
of 2GB. That gives about 160MB.

Anyway this depends - if you have a lot of slow queries caused by
on-disk sorts / hash tables, use a higher value. Otherwise leave it as
it is.

>> First, demonstrate that it is all or mostly in memory -- use iostat or
>> other tools to ensure that there is not much disk activity during the
>> query.  If your system doesn't have iostat installed, it should be
>> installed.  It is a very useful tool.
>
> # iostat
> bash: iostat: command not found
> # locate iostat
> /usr/src/linux-2.6.16-gentoo-r9/Documentation/iostats.txt
>
> I have few experience in Linux. No idea how to install or compile iostat in
> this system.
>
>> If it is all cached in memory, you may want to ensure that your
>> shared_buffers is a reasonalbe size so that there is less shuffling of
>> data
>> from the kernel to postgres and back.  Generally, shared_buffers works
>> best
>> between 5% and 25% of system memory.
>
> currently shared_buffers = 15000

That's 120MB, i.e. about 6% of the memory. Might be a little bit higher,
but seems reasonable.

>> If it is completely CPU bound then the work done for the query has to be
>> reduced by altering the plan to a more optimal one or making the work it
>> has to do at each step easier.  Most of the ideas in this thread revolve
>> around those things.
>
> When running on loaded server even after VACUUM FULL, response time for
> original work_mem is  longer probably because it must fetch blocks from
> disk.
>
>> Based on the time it took to do the vacuum, I suspect your disk subsystem
>> is a bit slow.  If it can be determined that there is much disk I/O in
>> your
>> use cases, there are generally several things that can be done to tune
>> Linux I/O.  The main ones in my experience are the 'readahead' value for
>> each disk which helps sequential reads significantly, and trying out the
>> linux 'deadline' scheduler and comparing it to the more commonly used
>> 'cfq'
>> scheduler.  If the system is configured with the anticipatory scheduler,
>> absolutely switch to cfq or deadline as the anticipatory scheduler will
>> perform horribly poorly for a database.
>
> This is 3 year old cheap server.
> No idea what to config.

Well, how could we know that if you don't?

Anyway the options mentioned by Scott Carey are related to linux kernel,
so it shouldn't be a problem to change.

> There is also other similar server which as 1.2 GB more usable memory.
> No idea is it worth to switch into it.
> After some years sales data will still exceed this more memory.

Given the fact that the performance issues are caused by bloated tables
and / or slow I/O subsystem, moving to a similar system won't help I guess.

regards
Tomas

Re: Hash join on int takes 8..114 seconds

From

"Andrus"

Date:

24 November 2008, 09:16:33

Tomas,

> Let's suppose you set a reasonable value (say 8096) instead of 2GB. That
> gives about 160MB.
> Anyway this depends - if you have a lot of slow queries caused by on-disk
> sorts / hash tables, use a higher value. Otherwise leave it as it is.

Probably product orders table is frequently joined which product table.
currently there was work_memory = 512 in conf file.

I changed it to work_memory = 8096

>>> If it is all cached in memory, you may want to ensure that your
>>> shared_buffers is a reasonalbe size so that there is less shuffling of
>>> data
>>> from the kernel to postgres and back.  Generally, shared_buffers works
>>> best
>>> between 5% and 25% of system memory.
>>
>> currently shared_buffers = 15000
>
> That's 120MB, i.e. about 6% of the memory. Might be a little bit higher,
> but seems reasonable.

I changed it to 20000

> Given the fact that the performance issues are caused by bloated tables
> and / or slow I/O subsystem, moving to a similar system won't help I
> guess.

I have ran VACUUM FULL ANALYZE VERBOSE
and set MAX_FSM_PAGES = 150000

So there is no any bloat except pg_shdepend indexes which should not affect
to query speed.

Andrus.

Re: Hash join on int takes 8..114 seconds

From

tv@fuzzy.cz

Date:

24 November 2008, 09:39:20

>> Given the fact that the performance issues are caused by bloated tables
>> and / or slow I/O subsystem, moving to a similar system won't help I
>> guess.
>
> I have ran VACUUM FULL ANALYZE VERBOSE
> and set MAX_FSM_PAGES = 150000
>
> So there is no any bloat except pg_shdepend indexes which should not
> affect to query speed.

OK, what was the number of unused pointer items in the VACUUM output?

The query performance is still the same as when the tables were bloated?
What are the outputs of iostat/vmstat/dstat/top when running the query?

regards
Tomas

Re: Hash join on int takes 8..114 seconds

From

"Andrus"

Date:

24 November 2008, 17:16:47

Tomas,

> OK, what was the number of unused pointer items in the VACUUM output?

I posted it in this thread:

VACUUM FULL ANALYZE VERBOSE;
...
INFO:  free space map contains 14353 pages in 314 relations
DETAIL:  A total of 20000 page slots are in use (including overhead).
89664 page slots are required to track all free space.
Current limits are:  20000 page slots, 1000 relations, using 182 KB.
NOTICE:  number of page slots needed (89664) exceeds max_fsm_pages (20000)
HINT:  Consider increasing the configuration parameter "max_fsm_pages" to a
value over 89664.

Query returned successfully with no result in 10513335 ms.

> The query performance is still the same as when the tables were bloated?

Seems to be the same.
However I improved yesterday after previous message other queries not to
scan whole
product orders (rid) table.
Now there is only few disk activity after 5-300 seconds which seems not to
affect to those query results. So issue in this thread has been gone away.

Now this query runs using constant time 8 seconds:

explain analyze
SELECT sum(1)
   FROM dok JOIN rid USING (dokumnr)
   JOIN toode USING (toode)
   WHERE rid.toode = 'X05' AND dok.kuupaev>='2008-09-01'
"Aggregate  (cost=182210.77..182210.78 rows=1 width=0) (actual
time=8031.600..8031.604 rows=1 loops=1)"
"  ->  Nested Loop  (cost=74226.74..182149.27 rows=24598 width=0) (actual
time=2602.474..7948.121 rows=21711 loops=1)"
"        ->  Index Scan using toode_pkey on toode  (cost=0.00..6.01 rows=1
width=24) (actual time=0.077..0.089 rows=1 loops=1)"
"              Index Cond: ('X05'::bpchar = toode)"
"        ->  Hash Join  (cost=74226.74..181897.28 rows=24598 width=24)
(actual time=2602.378..7785.315 rows=21711 loops=1)"
"              Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
"              ->  Bitmap Heap Scan on rid  (cost=4084.54..101951.60
rows=270725 width=28) (actual time=1129.925..4686.601 rows=278417 loops=1)"
"                    Recheck Cond: (toode = 'X05'::bpchar)"
"                    ->  Bitmap Index Scan on rid_toode_idx
(cost=0.00..4084.54 rows=270725 width=0) (actual time=1123.202..1123.202
rows=278426 loops=1)"
"                          Index Cond: (toode = 'X05'::bpchar)"
"              ->  Hash  (cost=69419.29..69419.29 rows=112766 width=4)
(actual time=1251.496..1251.496 rows=111088 loops=1)"
"                    ->  Bitmap Heap Scan on dok  (cost=1492.68..69419.29
rows=112766 width=4) (actual time=70.837..776.249 rows=111088 loops=1)"
"                          Recheck Cond: (kuupaev >= '2008-09-01'::date)"
"                          ->  Bitmap Index Scan on dok_kuupaev_idx
(cost=0.00..1492.68 rows=112766 width=0) (actual time=64.177..64.177
rows=111343 loops=1)"
"                                Index Cond: (kuupaev >=
'2008-09-01'::date)"
"Total runtime: 8031.905 ms"


Interestingly using like is 80 times faster:

explain analyze
SELECT sum(1)
   FROM dok JOIN rid USING (dokumnr)
   JOIN toode USING (toode)
   WHERE rid.toode like 'X05' AND dok.kuupaev>='2008-09-01'
"Aggregate  (cost=88178.69..88178.70 rows=1 width=0) (actual
time=115.335..115.339 rows=1 loops=1)"
"  ->  Hash Join  (cost=71136.22..88117.36 rows=24532 width=0) (actual
time=115.322..115.322 rows=0 loops=1)"
"        Hash Cond: ("outer".toode = "inner".toode)"
"        ->  Hash Join  (cost=70163.36..86253.20 rows=24598 width=24)
(actual time=0.046..0.046 rows=0 loops=1)"
"              Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
"              ->  Bitmap Heap Scan on rid  (cost=21.16..6307.52 rows=270725
width=28) (actual time=0.037..0.037 rows=0 loops=1)"
"                    Filter: (toode ~~ 'X05'::text)"
"                    ->  Bitmap Index Scan on rid_toode_pattern_idx
(cost=0.00..21.16 rows=1760 width=0) (actual time=0.028..0.028 rows=0
loops=1)"
"                          Index Cond: (toode ~=~ 'X05'::bpchar)"
"              ->  Hash  (cost=69419.29..69419.29 rows=112766 width=4)
(never executed)"
"                    ->  Bitmap Heap Scan on dok  (cost=1492.68..69419.29
rows=112766 width=4) (never executed)"
"                          Recheck Cond: (kuupaev >= '2008-09-01'::date)"
"                          ->  Bitmap Index Scan on dok_kuupaev_idx
(cost=0.00..1492.68 rows=112766 width=0) (never executed)"
"                                Index Cond: (kuupaev >=
'2008-09-01'::date)"
"        ->  Hash  (cost=853.29..853.29 rows=13429 width=24) (actual
time=114.757..114.757 rows=13412 loops=1)"
"              ->  Seq Scan on toode  (cost=0.00..853.29 rows=13429
width=24) (actual time=0.014..58.319 rows=13412 loops=1)"
"Total runtime: 115.505 ms"

I posted also a test script in other thread which shows also that like is
magitude faster than equality check.

rid.toode = 'X05'

and

rid.toode like 'X05'

are exactly the same conditions, there are indexes for both conditions.

So I do'nt understand why results are so different.

In other sample which I posted in thread "Increasing pattern index query
speed" like is 4 times slower:

SELECT sum(1)
   FROM dok JOIN rid USING (dokumnr)
   JOIN toode USING (toode)
   WHERE rid.toode = '99000010' AND dok.kuupaev BETWEEN '2008-11-21' AND
'2008-11-21'
          AND dok.yksus  LIKE 'ORISSAARE%'

"Aggregate  (cost=43.09..43.10 rows=1 width=0) (actual
time=12674.675..12674.679 rows=1 loops=1)"
"  ->  Nested Loop  (cost=29.57..43.08 rows=1 width=0) (actual
time=2002.045..12673.645 rows=19 loops=1)"
"        ->  Nested Loop  (cost=29.57..37.06 rows=1 width=24) (actual
time=2001.922..12672.344 rows=19 loops=1)"
"              ->  Index Scan using dok_kuupaev_idx on dok  (cost=0.00..3.47
rows=1 width=4) (actual time=342.812..9810.627 rows=319 loops=1)"
"                    Index Cond: ((kuupaev >= '2008-11-21'::date) AND
(kuupaev <= '2008-11-21'::date))"
"                    Filter: (yksus ~~ 'ORISSAARE%'::text)"
"              ->  Bitmap Heap Scan on rid  (cost=29.57..33.58 rows=1
width=28) (actual time=8.948..8.949 rows=0 loops=319)"
"                    Recheck Cond: (("outer".dokumnr = rid.dokumnr) AND
(rid.toode = '99000010'::bpchar))"
"                    ->  BitmapAnd  (cost=29.57..29.57 rows=1 width=0)
(actual time=8.930..8.930 rows=0 loops=319)"
"                          ->  Bitmap Index Scan on rid_dokumnr_idx
(cost=0.00..2.52 rows=149 width=0) (actual time=0.273..0.273 rows=2
loops=319)"
"                                Index Cond: ("outer".dokumnr =
rid.dokumnr)"
"                          ->  Bitmap Index Scan on rid_toode_idx
(cost=0.00..26.79 rows=1941 width=0) (actual time=8.596..8.596 rows=15236
loops=319)"
"                                Index Cond: (toode = '99000010'::bpchar)"
"        ->  Index Scan using toode_pkey on toode  (cost=0.00..6.01 rows=1
width=24) (actual time=0.043..0.048 rows=1 loops=19)"
"              Index Cond: ('99000010'::bpchar = toode)"
"Total runtime: 12675.191 ms"

explain analyze SELECT sum(1)
   FROM dok JOIN rid USING (dokumnr)
   JOIN toode USING (toode)
   WHERE rid.toode like '99000010%' AND dok.kuupaev BETWEEN '2008-11-21' AND
'2008-11-21'
          AND dok.yksus  LIKE 'ORISSAARE%'

"Aggregate  (cost=17.99..18.00 rows=1 width=0) (actual
time=46465.609..46465.613 rows=1 loops=1)"
"  ->  Nested Loop  (cost=0.00..17.98 rows=1 width=0) (actual
time=3033.947..46465.462 rows=19 loops=1)"
"        ->  Nested Loop  (cost=0.00..11.96 rows=1 width=24) (actual
time=3033.890..46464.310 rows=19 loops=1)"
"              Join Filter: ("outer".dokumnr = "inner".dokumnr)"
"              ->  Index Scan using dok_kuupaev_idx on dok  (cost=0.00..5.93
rows=1 width=4) (actual time=0.044..5.419 rows=319 loops=1)"
"                    Index Cond: ((kuupaev >= '2008-11-21'::date) AND
(kuupaev <= '2008-11-21'::date))"
"                    Filter: (yksus ~~ 'ORISSAARE%'::text)"
"              ->  Index Scan using rid_toode_pattern_idx on rid
(cost=0.00..6.01 rows=1 width=28) (actual time=0.025..88.046 rows=15322
loops=319)"
"                    Index Cond: ((toode ~>=~ '99000010'::bpchar) AND (toode
~<~ '99000011'::bpchar))"
"                    Filter: (toode ~~ '99000010%'::text)"
"        ->  Index Scan using toode_pkey on toode  (cost=0.00..6.01 rows=1
width=24) (actual time=0.034..0.039 rows=1 loops=19)"
"              Index Cond: ("outer".toode = toode.toode)"
"Total runtime: 46465.833 ms"


> What are the outputs of iostat/vmstat/dstat/top when running the query?

iostat and dstat are not installed in Gentoo.
For last like query which constantly takes 46 seconds vmstat 5  shows:

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
wa
 0  0    216 124780      0 1899236    0    0    10    13  277    78  3  1 95
0
 0  0    216 124656      0 1899248    0    0     2     8  267    59  2  0 98
0
 1  0    216 124664      0 1899244    0    0     0     7  266    62  2  0 97
0
 0  0    216 124788      0 1899248    0    0     0     8  273    73  3  1 96
0
 0  0    216 124656      0 1899252    0    0     0     2  265    54  2  0 97
0
 1  0    216 124656      0 1899252    0    0     0    22  268    63 14 39 48
0  <-------- start of query
 1  0    216 124656      0 1899252    0    0     0    62  267    61 25 75  0
0
 1  0    216 124656      0 1899252    0    0     0     2  266    55 28 72  0
0
 1  0    216 124664      0 1899256    0    0     0     5  265    56 26 74  0
0
 1  0    216 124788      0 1899256    0    0     0    10  271    67 25 75  0
0
 1  0    216 124664      0 1899256    0    0     0     3  265    54 25 75  0
0
 1  0    216 124160      0 1899260    0    0     0     1  265    57 28 72  0
0
 1  0    216 125020      0 1899260    0    0     0    21  272    60 28 72  0
0
 1  0    216 124020      0 1899264    0    0     0     0  271    73 29 71  0
0
 0  0    216 124260      0 1899268    0    0     0     3  266    61 19 59 22
0 <------ end of query
 1  0    216 125268      0 1899260    0    0     0     9  268    59  2  0 97
0
 0  0    216 124912      0 1899268    0    0     0     5  270    59  3  0 96
0
 0  0    216 124656      0 1899272    0    0     0     5  267    64  2  0 98
0
 0  0    216 124664      0 1899272    0    0     0     0  263    50  2  0 98
0

top shows single postmaster process activity:

top - 23:07:49 up 27 days,  4:20,  1 user,  load average: 0.25, 0.22, 0.12
Tasks:  50 total,   3 running,  47 sleeping,   0 stopped,   0 zombie
Cpu(s): 29.3% us, 70.7% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
Mem:   2075828k total,  1951604k used,   124224k free,        0k buffers
Swap:  3911816k total,      216k used,  3911600k free,  1899348k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
29687 postgres  25   0  144m 124m 121m R 96.3  6.2   8:18.04 postmaster
 8147 root      16   0  4812 1628 1316 S  1.0  0.1   0:00.03 sshd
 8141 root      15   0  5080 1892 1528 S  0.7  0.1   0:00.02 sshd
 8145 sshd      15   0  4816 1220  912 S  0.3  0.1   0:00.01 sshd
 8151 sshd      15   0  4812 1120  816 S  0.3  0.1   0:00.01 sshd
    1 root      16   0  1480  508  444 S  0.0  0.0   0:01.89 init
    2 root      34  19     0    0    0 S  0.0  0.0   0:00.01 ksoftirqd/0
    3 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 events/0
    4 root      10  -5     0    0    0 S  0.0  0.0   0:00.63 khelper
    5 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kthread
    7 root      10  -5     0    0    0 S  0.0  0.0   2:21.73 kblockd/0
    8 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 kacpid
  115 root      13  -5     0    0    0 S  0.0  0.0   0:00.00 aio/0
  114 root      15   0     0    0    0 S  0.0  0.0   9:21.41 kswapd0
  116 root      10  -5     0    0    0 S  0.0  0.0   0:12.06 xfslogd/0
  117 root      10  -5     0    0    0 S  0.0  0.0   1:36.43 xfsdatad/0
  706 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kseriod
  723 root      13  -5     0    0    0 S  0.0  0.0   0:00.00 kpsmoused
  738 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 ata/0
  740 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 scsi_eh_0
  741 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 scsi_eh_1
  742 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 scsi_eh_2
  743 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 scsi_eh_3
  762 root      10  -5     0    0    0 S  0.0  0.0   0:18.64 xfsbufd
  763 root      10  -5     0    0    0 S  0.0  0.0   0:00.79 xfssyncd
  963 root      16  -4  1712  528  336 S  0.0  0.0   0:00.24 udevd
 6677 root      15   0  1728  572  400 S  0.0  0.0   0:10.14 syslog-ng
 7183 root      15   0  3472  828  672 S  0.0  0.0   0:08.50 sshd
 7222 root      16   0  1736  672  556 S  0.0  0.0   0:00.03 cron
 7237 root      16   0  1620  712  608 S  0.0  0.0   0:00.00 agetty
 7238 root      17   0  1616  712  608 S  0.0  0.0   0:00.00 agetty
 7239 root      16   0  1616  712  608 S  0.0  0.0   0:00.00 agetty
 7240 root      16   0  1616  708  608 S  0.0  0.0   0:00.00 agetty
 7241 root      16   0  1616  708  608 S  0.0  0.0   0:00.00 agetty
31873 root      15   0  1616  712  608 S  0.0  0.0   0:00.00 agetty
14908 postgres  16   0  141m  10m 9936 S  0.0  0.5   0:01.44 postmaster
14915 postgres  16   0  8468 1360  896 S  0.0  0.1   0:00.36 postmaster

Andrus.