Interpreting vmstat - Mailing list pgsql-performance

From Doug Y
Subject Interpreting vmstat
Date
Msg-id 6.0.1.1.2.20040518133210.01eb9ec0@mail.traderonline.com
Whole thread Raw
Responses Re: Interpreting vmstat
Re: Interpreting vmstat
List pgsql-performance
Hello,
  (note best viewed in fixed-width font)

   I'm still trying to find where my performance bottle neck is...
I have 4G ram, PG 7.3.4
shared_buffers = 75000
effective_cache_size = 75000

Run a query I've been having trouble with and watch the output of vmstat
(linux):

$ vmstat 1
    procs                      memory    swap          io     system
  cpu
  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us
sy  id
  0  0  0    148   8732 193652
2786668   0   0     0     0  292   151   0   2  98
  2  0  2    148   7040 193652
2786668   0   0     0   208  459   697  45  10  45
  0  0  0    148   9028 193652
2786684   0   0    16   644  318   613  25   4  71
  1  0  0    148   5092 193676
2780196   0   0    12   184  441   491  37   5  58
  0  1  0    148   5212 193684
2772512   0   0   112  9740  682  1063  45  12  43
  1  0  0    148   5444 193684
2771584   0   0   120  4216  464  1303  44   3  52
  1  0  0    148  12232 193660
2771620   0   0   244   628  340   681  43  20  38
  1  0  0    148  12168 193664
2771832   0   0   196   552  332   956  42   2  56
  1  0  0    148  12080 193664
2772248   0   0   272   204  371   201  40   1  59
  1  1  0    148  12024 193664
2772624   0   0   368     0  259   127  42   3  55

Thats the first 10 lines or so... the query takes 60 seconds to run.

I'm confused on the bo & bi parts of the io:
    IO
        bi: Blocks sent to a block device (blocks/s).
        bo: Blocks received from a block device (blocks/s).

yet it seems to be opposite of that... bi only increases when doing a
largish query, while bo also goes up, I typically see periodic bo numbers
in the low 100's, which I'd guess are log writes.

I would think that my entire DB should end up cached since a raw pg_dump
file is about 1G in size, yet my performance doesn't indicate that that is
the case... running the same query a few minutes later, I'm not seeing a
significant performance improvement.

Here's a sample from iostat while the query is running:

$ iostat -x -d 1

Device:  rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda        0.00   0.00  0.00  0.00    0.00    0.00     0.00
42949552.96    0.00   0.00 100.00
sda1       0.00   0.00  0.00  0.00    0.00    0.00     0.00
42949662.96    0.00   0.00 100.00
sda2       0.00   0.00  0.00  0.00    0.00    0.00     0.00
42949642.96    0.00   0.00 100.00
sdb        0.00 428.00  0.00 116.00    0.00
4368.00    37.66  2844.40  296.55  86.21 100.00
sdb1       0.00 428.00  0.00 116.00    0.00
4368.00    37.66  6874.40  296.55  86.21 100.00

Device:  rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda        0.00   0.00  0.00  0.00    0.00    0.00     0.00
42949552.96    0.00   0.00 100.00
sda1       0.00   0.00  0.00  0.00    0.00    0.00     0.00
42949662.96    0.00   0.00 100.00
sda2       0.00   0.00  0.00  0.00    0.00    0.00     0.00
42949642.96    0.00   0.00 100.00
sdb        4.00 182.00  6.00 77.00   80.00
2072.00    25.93  2814.50   54.22 120.48 100.00
sdb1       4.00 182.00  6.00 77.00   80.00
2072.00    25.93  6844.50   54.22 120.48 100.00

Device:  rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda        0.00   0.00  0.00  0.00    0.00    0.00     0.00
42949552.96    0.00   0.00 100.00
sda1       0.00   0.00  0.00  0.00    0.00    0.00     0.00
42949662.96    0.00   0.00 100.00
sda2       0.00   0.00  0.00  0.00    0.00    0.00     0.00
42949642.96    0.00   0.00 100.00
sdb        0.00  43.00  0.00
11.00    0.00  432.00    39.27  2810.40   36.36 909.09 100.00
sdb1       0.00  43.00  0.00
11.00    0.00  432.00    39.27  6840.40   36.36 909.09 100.00

Device:  rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda        0.00  15.84  0.00 17.82    0.00  269.31    15.11
42524309.47   44.44 561.11 100.00
sda1       0.00  15.84  0.00 17.82    0.00  269.31    15.11
42524419.47   44.44 561.11 100.00
sda2       0.00   0.00  0.00  0.00    0.00    0.00     0.00
42524398.67    0.00   0.00 100.00
sdb        0.99 222.77  0.99 114.85   15.84
2700.99    23.45  2814.16   35.90  86.32 100.00
sdb1       0.99 222.77  0.99 114.85   15.84
2700.99    23.45  6844.16   35.90  86.32 100.00

Device:  rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda        0.00   0.00  0.00  0.00    0.00    0.00     0.00
42949551.76    0.00   0.00 101.00
sda1       0.00   0.00  0.00  0.00    0.00    0.00     0.00
42949662.86    0.00   0.00 101.00
sda2       0.00   0.00  0.00  0.00    0.00    0.00     0.00
42949642.66    0.00   0.00 101.00
sdb        1.00  91.00  1.00
28.00   16.00  960.00    33.66  2838.40   10.34 348.28 101.00
sdb1       1.00  91.00  1.00
28.00   16.00  960.00    33.66  6908.70   10.34 348.28 101.00

The DB files and logs are on sdb1.

Can someone point me in the direction of some documentation on how to
interpret these numbers?

Also, I've tried to figure out what's getting cached by PostgreSQL by
looking at pg_statio_all_tables. What kind of ratio should I be seeing for
heap_blks_read / heap_blks_hit ?

Thanks.


pgsql-performance by date:

Previous
From: Eduardo Almeida
Date:
Subject: Re: TPCH 100GB - need some help
Next
From: Litao Wu
Date:
Subject: where to find out when a table was last analyzed?