Thread: High load and iowait but no disk access

High load and iowait but no disk access

From
Rémy Beaumont
Date:
<fontfamily><param>Courier</param>We have been trying to pinpoint what
originally seem to be a I/O bottleneck but which now seems to be an
issue with either Postgresql or RHES 3.


We have the following test environment on which we can reproduce the
problem:


1) Test System A

Dell 6650 Quad Xeon Pentium 4

8 Gig of RAM

OS: RHES 3 update 2

Storage: NetApp FAS270 connected using an FC card using 10 disks


2) Test System B

Dell Dual Xeon Pentium III

2 Gig o RAM

OS: RHES 3 update 2

Storage: NetApp FAS920 connected using an FC card using 28 disks


Our Database size is around 30G.


The behavior we see is that when running queries that do random reads
on disk, IOWAIT goes over 80% and actual disk IO falls to a crawl at a
throughput bellow 3000kB/s (We usually average 40000 kB/s to 80000
kB/s on sequential read operations on the netapps)


The stats of the NetApp do confirm that it is sitting idle. Doing an
strace on the Postgresql process shows that is it doing seeks and
reads.


So my question is where is this iowait time spent ?

Is there a way to pinpoint the problem in more details ?

We are able to reproduce this behavior with Postgresql 7.4.8 and 8.0.3


I have included the output of top,vmstat,strace and systat from the
Netapp from System B while running a single query that generates this
behavior.


Rémy


top output:

 06:27:28  up 5 days, 16:59,  6 users,  load average: 1.04, 1.30, 1.01

72 processes: 71 sleeping, 1 running, 0 zombie, 0 stopped

CPU states:  cpu    user    nice  system    irq  softirq  iowait
idle

           total    2.7%    0.0%    1.0%   0.1%     0.2%   46.0%
49.5%

           cpu00    0.2%    0.0%    0.2%   0.0%     0.2%    2.2%
97.2%

           cpu01    5.3%    0.0%    1.9%   0.3%     0.3%   89.8%
1.9%

Mem:  2061696k av, 2043936k used,   17760k free,       0k shrd,
3916k buff

                   1566332k actv,  296648k in_d,   30504k in_c

Swap: 16771584k av,   21552k used, 16750032k free
1933772k cached


  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU
COMMAND

30960 postgres  15   0 13424  10M  9908 D     2.7  0.5   2:00   1
postmaster

30538 root      15   0  1080  764   524 S     0.7  0.0   0:43   0 sshd

    1 root      15   0   496  456   436 S     0.0  0.0   0:08   0 init

    2 root      RT   0     0    0     0 SW    0.0  0.0   0:00   0
migration/0

    3 root      RT   0     0    0     0 SW    0.0  0.0   0:00   1
migration/1

    4 root      15   0     0    0     0 SW    0.0  0.0   0:01   0
keventd

    5 root      34  19     0    0     0 SWN   0.0  0.0   0:00   0
ksoftirqd/0

    6 root      34  19     0    0     0 SWN   0.0  0.0   0:00   1
ksoftirqd/1

    9 root      15   0     0    0     0 SW    0.0  0.0   0:24   1
bdflush

    7 root      15   0     0    0     0 SW    0.0  0.0   6:53   1
kswapd

    8 root      15   0     0    0     0 SW    0.0  0.0   8:44   1
kscand

   10 root      15   0     0    0     0 SW    0.0  0.0   0:13   0
kupdated

   11 root      25   0     0    0     0 SW    0.0  0.0   0:00   0
mdrecoveryd

   17 root      15   0     0    0     0 SW    0.0  0.0   0:00   0
ahc_dv_0



vmstat output

procs                      memory      swap          io     system
cpu

 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us
sy id wa

 0  1  21552  17796   4872 1931928    2    3     3     1   27     6  2
1  7  3

 0  1  21552  18044   4880 1931652    0    0  1652     0  397   512  1
2 50 47

 0  1  21552  17976   4896 1931664    0    0  2468     0  407   552  2
2 50 47

 1  0  21552  17984   4896 1931608    0    0  2124     0  418   538  3
3 48 46

 0  1  21552  18028   4900 1931536    0    0  1592     0  385   509  1
3 50 46

 0  1  21552  18040   4916 1931488    0    0  1620   820  419   581  2
2 50 46

 0  1  21552  17968   4916 1931536    0    4  1708     4  402   554  3
1 50 46

 1  1  21552  18052   4916 1931388    0    0  1772     0  409   531  3
1 49 47

 0  1  21552  17912   4924 1931492    0    0  1772     0  408   565  3
1 48 48

 0  1  21552  17932   4932 1931440    0    4  1356     4  391   545  5
0 49 46

 0  1  21552  18320   4944 1931016    0    4  1500   840  414   571  1
1 48 50

 0  1  21552  17872   4944 1931440    0    0  2116     0  392   496  1
5 46 48

 0  1  21552  18060   4944 1931232    0    0  2232     0  423   597  1
2 48 49

 1  1  21552  17684   4944 1931584    0    0  1752     0  395   537  1
1 50 48

 0  1  21552  18000   4944 1931240    0    0  1576     0  401   549  0
1 50 49



NetApp stats:

 CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s
Cache Cache  CP   CP Disk   DAFS   FCP iSCSI   FCP  kB/s

                                  in   out   read  write  read write
age   hit time  ty util                       in   out

  2%     0     0     0     139     0     0   2788      0     0     0
3   96%   0%  -   15%      0   139     0     3  2277

  2%     0     0     0     144     0     0   2504      0     0     0
3   96%   0%  -   18%      0   144     0     3  2150

  2%     0     0     0     130     0     0   2212      0     0     0
3   96%   0%  -   13%      0   130     0     3  1879

  3%     0     0     0     169     0     0   2937     80     0     0
3   96%   0%  -   13%      0   169     0     4  2718

  2%     0     0     0     139     0     0   2448      0     0     0
3   96%   0%  -   12%      0   139     0     3  2096

  2%     0     0     0     137     0     0   2116      0     0     0
3   96%   0%  -   10%      0   137     0     3  1892

  3%     0     0     0     107     0     0   2660    812     0     0
3   96%  24%  T   20%      0   107     0     3  1739

  2%     0     0     0     118     0     0   1788      0     0     0
3   96%   0%  -   13%      0   118     0     3  1608

  2%     0     0     0     136     0     0   2228      0     0     0
3   96%   0%  -   11%      0   136     0     3  2018

  2%     0     0     0     119     0     0   1940      0     0     0
3   96%   0%  -   13%      0   119     0     3  1998

  2%     0     0     0     136     0     0   2175      0     0     0
3   96%   0%  -   14%      0   136     0     3  1929

  2%     0     0     0     133     0     0   1924      0     0     0
3   96%   0%  -   19%      0   133     0     3  2292

  2%     0     0     0     115     0     0   2044      0     0     0
3   96%   0%  -   11%      0   115     0     3  1682

  2%     0     0     0     134     0     0   2256      0     0     0
3   96%   0%  -   12%      0   134     0     3  2096

  2%     0     0     0     112     0     0   2184      0     0     0
3   96%   0%  -   12%      0   112     0     3  1633

  2%     0     0     0     163     0     0   2348      0     0     0
3   96%   0%  -   13%      0   163     0     4  2421

  2%     0     0     0     120     0     0   2056    184     0     0
3   96%   8%  T   14%      0   120     0     3  1703


strace output:

read(55, "\4\0\0\0\10fm}\1\0\0\0p\0\264\0\0 \2 \230\236\320\0020"...,
8192) = 8192

_llseek(55, 857997312, [857997312], SEEK_SET) = 0

read(55, "\4\0\0\0\\\315\321|\1\0\0\0p\0\354\0\0 \2 \250\236\260"...,
8192) = 8192

_llseek(55, 883220480, [883220480], SEEK_SET) = 0

read(55, "\4\0\0\0T\17a~\1\0\0\0p\0\20\1\0 \2 \270\236\220\2D\235"...,
8192) = 8192

_llseek(55, 858005504, [858005504], SEEK_SET) = 0

read(55, "\4\0\0\0\300\356\321|\1\0\0\0p\0\330\0\0 \2
\260\236\240"..., 8192) = 8192

_llseek(55, 857964544, [857964544], SEEK_SET) = 0

read(55, "\4\0\0\0lH\321|\1\0\0\0p\0<<\1\0 \2 \300\236\200\2p\235"...,
8192) = 8192

_llseek(55, 857956352, [857956352], SEEK_SET) = 0

read(55, "\4\0\0\0l\'\321|\1\0\0\0p\0\320\0\0 \2 \260\236\240\2\\"...,
8192) = 8192

_llseek(55, 910802944, [910802944], SEEK_SET) = 0

read(55, "\4\0\0\0\10}\25\200\1\0\0\0l\0\274\1\0 \2 \250\236\260"...,
8192) = 8192

_llseek(55, 857948160, [857948160], SEEK_SET) = 0

read(55, "\4\0\0\0\370\5\321|\1\0\0\0p\0\350\0\0 \2 \230\236\320"...,
8192) = 8192

_llseek(56, 80371712, [80371712], SEEK_SET) = 0

read(56, "\4\0\0\0Lf \217\1\0\0\0p\0\f\1\0 \2 \250\236\260\2T\235"...,
8192) = 8192

read(102, "\2\0\34\0001\236\0\0\1\0\0\0\t\0\0\00020670\0\0\0B\6\0"...,
8192) = 8192

_llseek(55, 857939968, [857939968], SEEK_SET) = 0

read(55, "\4\0\0\0\244\344\320|\1\0\0\0l\0\230\1\0 \2
\244\236\270"..., 8192) = 8192

_llseek(55, 857923584, [857923584], SEEK_SET) = 0

read(55, "\4\0\0\0\224\242\320|\1\0\0\0p\0|\0\0 \2
\234\236\310\002"..., 8192) = 8192

_llseek(55, 57270272, [57270272], SEEK_SET) = 0

read(55, "\4\0\0\0\3204FK\1\0\0\0t\0\340\0\0 \2
\310\236j\2\214\235"..., 8192) = 8192

_llseek(55, 870727680, [870727680], SEEK_SET) = 0

read(55, "\4\0\0\0x>\233}\1\0\0\0p\0@\1\0 \2 \250\236\260\2X\235"...,
8192) = 8192

_llseek(55, 1014734848, [1014734848], SEEK_SET) = 0

read(55, "\4\0\0\0\34\354\201\206\1\0\0\0p\0p\0\0 \2 \264\236\230"...,
8192) = 8192

_llseek(55, 857874432, [857874432], SEEK_SET) = 0

read(55, "\4\0\0\0\214\331\317|\1\0\0\0l\0\324\1\0 \2
\224\236\330"..., 8192) = 8192

_llseek(55, 760872960, [760872960], SEEK_SET) = 0

read(55, "\4\0\0\0\30\257\321v\1\0\0\0p\0\230\0\0 \2 \234\236\310"...,
8192) = 8192

_llseek(55, 891715584, [891715584], SEEK_SET) = 0

read(55, "\4\0\0\0\370\220\347~\1\0\0\0p\0P\1\0 \2 \230\236\320\2"...,
8192) = 8192

_llseek(55, 857858048, [857858048], SEEK_SET) = 0

read(55, "\4\0\0\0\250\227\317|\1\0\0\0p\0\264\0\0 \2
\254\236\250"..., 8192) = 8192

_llseek(55, 666910720, [666910720], SEEK_SET) = 0

read(55, "\4\0\0\0x\206\3q\1\0\0\0p\0004\1\0 \2
\254\236\242\2P\235"..., 8192) = 8192

_llseek(55, 857841664, [857841664], SEEK_SET) = 0

read(55, "\4\0\0\0dT\317|\1\0\0\0p\0\224\0\0 \2 \214\236\350\2\30"...,
8192) = 8192




</fontfamily>We have been trying to pinpoint what originally seem to be a I/O
bottleneck but which now seems to be an issue with either Postgresql or
RHES 3.

We have the following test environment on which we can reproduce the
problem:

1) Test System A
Dell 6650 Quad Xeon Pentium 4
8 Gig of RAM
OS: RHES 3 update 2
Storage: NetApp FAS270 connected using an FC card using 10 disks

2) Test System B
Dell Dual Xeon Pentium III
2 Gig o RAM
OS: RHES 3 update 2
Storage: NetApp FAS920 connected using an FC card using 28 disks

Our Database size is around 30G.

The behavior we see is that when running queries that do random reads
on disk, IOWAIT goes over 80% and actual disk IO falls to a crawl at a
throughput bellow 3000kB/s (We usually average 40000 kB/s to 80000 kB/s
on sequential read operations on the netapps)

The stats of the NetApp do confirm that it is sitting idle. Doing an
strace on the Postgresql process shows that is it doing seeks and
reads.

So my question is where is this iowait time spent ?
Is there a way to pinpoint the problem in more details ?
We are able to reproduce this behavior with Postgresql 7.4.8 and 8.0.3

I have included the output of top,vmstat,strace and systat from the
Netapp from System B while running a single query that generates this
behavior.

Rémy

top output:
  06:27:28  up 5 days, 16:59,  6 users,  load average: 1.04, 1.30, 1.01
72 processes: 71 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
            total    2.7%    0.0%    1.0%   0.1%     0.2%   46.0%   49.5%
            cpu00    0.2%    0.0%    0.2%   0.0%     0.2%    2.2%   97.2%
            cpu01    5.3%    0.0%    1.9%   0.3%     0.3%   89.8%    1.9%
Mem:  2061696k av, 2043936k used,   17760k free,       0k shrd,
3916k buff
                    1566332k actv,  296648k in_d,   30504k in_c
Swap: 16771584k av,   21552k used, 16750032k free
1933772k cached

   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU
COMMAND
30960 postgres  15   0 13424  10M  9908 D     2.7  0.5   2:00   1
postmaster
30538 root      15   0  1080  764   524 S     0.7  0.0   0:43   0 sshd
     1 root      15   0   496  456   436 S     0.0  0.0   0:08   0 init
     2 root      RT   0     0    0     0 SW    0.0  0.0   0:00   0
migration/0
     3 root      RT   0     0    0     0 SW    0.0  0.0   0:00   1
migration/1
     4 root      15   0     0    0     0 SW    0.0  0.0   0:01   0
keventd
     5 root      34  19     0    0     0 SWN   0.0  0.0   0:00   0
ksoftirqd/0
     6 root      34  19     0    0     0 SWN   0.0  0.0   0:00   1
ksoftirqd/1
     9 root      15   0     0    0     0 SW    0.0  0.0   0:24   1
bdflush
     7 root      15   0     0    0     0 SW    0.0  0.0   6:53   1 kswapd
     8 root      15   0     0    0     0 SW    0.0  0.0   8:44   1 kscand
    10 root      15   0     0    0     0 SW    0.0  0.0   0:13   0
kupdated
    11 root      25   0     0    0     0 SW    0.0  0.0   0:00   0
mdrecoveryd
    17 root      15   0     0    0     0 SW    0.0  0.0   0:00   0
ahc_dv_0


vmstat output
procs                      memory      swap          io     system
    cpu
  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us
sy id wa
  0  1  21552  17796   4872 1931928    2    3     3     1   27     6  2
1  7  3
  0  1  21552  18044   4880 1931652    0    0  1652     0  397   512  1
2 50 47
  0  1  21552  17976   4896 1931664    0    0  2468     0  407   552  2
2 50 47
  1  0  21552  17984   4896 1931608    0    0  2124     0  418   538  3
3 48 46
  0  1  21552  18028   4900 1931536    0    0  1592     0  385   509  1
3 50 46
  0  1  21552  18040   4916 1931488    0    0  1620   820  419   581  2
2 50 46
  0  1  21552  17968   4916 1931536    0    4  1708     4  402   554  3
1 50 46
  1  1  21552  18052   4916 1931388    0    0  1772     0  409   531  3
1 49 47
  0  1  21552  17912   4924 1931492    0    0  1772     0  408   565  3
1 48 48
  0  1  21552  17932   4932 1931440    0    4  1356     4  391   545  5
0 49 46
  0  1  21552  18320   4944 1931016    0    4  1500   840  414   571  1
1 48 50
  0  1  21552  17872   4944 1931440    0    0  2116     0  392   496  1
5 46 48
  0  1  21552  18060   4944 1931232    0    0  2232     0  423   597  1
2 48 49
  1  1  21552  17684   4944 1931584    0    0  1752     0  395   537  1
1 50 48
  0  1  21552  18000   4944 1931240    0    0  1576     0  401   549  0
1 50 49


NetApp stats:
  CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s
Cache Cache  CP   CP Disk   DAFS   FCP iSCSI   FCP  kB/s
                                   in   out   read  write  read write
age   hit time  ty util                       in   out
   2%     0     0     0     139     0     0   2788      0     0     0
  3   96%   0%  -   15%      0   139     0     3  2277
   2%     0     0     0     144     0     0   2504      0     0     0
  3   96%   0%  -   18%      0   144     0     3  2150
   2%     0     0     0     130     0     0   2212      0     0     0
  3   96%   0%  -   13%      0   130     0     3  1879
   3%     0     0     0     169     0     0   2937     80     0     0
  3   96%   0%  -   13%      0   169     0     4  2718
   2%     0     0     0     139     0     0   2448      0     0     0
  3   96%   0%  -   12%      0   139     0     3  2096
   2%     0     0     0     137     0     0   2116      0     0     0
  3   96%   0%  -   10%      0   137     0     3  1892
   3%     0     0     0     107     0     0   2660    812     0     0
  3   96%  24%  T   20%      0   107     0     3  1739
   2%     0     0     0     118     0     0   1788      0     0     0
  3   96%   0%  -   13%      0   118     0     3  1608
   2%     0     0     0     136     0     0   2228      0     0     0
  3   96%   0%  -   11%      0   136     0     3  2018
   2%     0     0     0     119     0     0   1940      0     0     0
  3   96%   0%  -   13%      0   119     0     3  1998
   2%     0     0     0     136     0     0   2175      0     0     0
  3   96%   0%  -   14%      0   136     0     3  1929
   2%     0     0     0     133     0     0   1924      0     0     0
  3   96%   0%  -   19%      0   133     0     3  2292
   2%     0     0     0     115     0     0   2044      0     0     0
  3   96%   0%  -   11%      0   115     0     3  1682
   2%     0     0     0     134     0     0   2256      0     0     0
  3   96%   0%  -   12%      0   134     0     3  2096
   2%     0     0     0     112     0     0   2184      0     0     0
  3   96%   0%  -   12%      0   112     0     3  1633
   2%     0     0     0     163     0     0   2348      0     0     0
  3   96%   0%  -   13%      0   163     0     4  2421
   2%     0     0     0     120     0     0   2056    184     0     0
  3   96%   8%  T   14%      0   120     0     3  1703

strace output:
read(55, "\4\0\0\0\10fm}\1\0\0\0p\0\264\0\0 \2 \230\236\320\0020"...,
8192) = 8192
_llseek(55, 857997312, [857997312], SEEK_SET) = 0
read(55, "\4\0\0\0\\\315\321|\1\0\0\0p\0\354\0\0 \2 \250\236\260"...,
8192) = 8192
_llseek(55, 883220480, [883220480], SEEK_SET) = 0
read(55, "\4\0\0\0T\17a~\1\0\0\0p\0\20\1\0 \2 \270\236\220\2D\235"...,
8192) = 8192
_llseek(55, 858005504, [858005504], SEEK_SET) = 0
read(55, "\4\0\0\0\300\356\321|\1\0\0\0p\0\330\0\0 \2 \260\236\240"...,
8192) = 8192
_llseek(55, 857964544, [857964544], SEEK_SET) = 0
read(55, "\4\0\0\0lH\321|\1\0\0\0p\0<\1\0 \2 \300\236\200\2p\235"...,
8192) = 8192
_llseek(55, 857956352, [857956352], SEEK_SET) = 0
read(55, "\4\0\0\0l\'\321|\1\0\0\0p\0\320\0\0 \2 \260\236\240\2\\"...,
8192) = 8192
_llseek(55, 910802944, [910802944], SEEK_SET) = 0
read(55, "\4\0\0\0\10}\25\200\1\0\0\0l\0\274\1\0 \2 \250\236\260"...,
8192) = 8192
_llseek(55, 857948160, [857948160], SEEK_SET) = 0
read(55, "\4\0\0\0\370\5\321|\1\0\0\0p\0\350\0\0 \2 \230\236\320"...,
8192) = 8192
_llseek(56, 80371712, [80371712], SEEK_SET) = 0
read(56, "\4\0\0\0Lf \217\1\0\0\0p\0\f\1\0 \2 \250\236\260\2T\235"...,
8192) = 8192
read(102, "\2\0\34\0001\236\0\0\1\0\0\0\t\0\0\00020670\0\0\0B\6\0"...,
8192) = 8192
_llseek(55, 857939968, [857939968], SEEK_SET) = 0
read(55, "\4\0\0\0\244\344\320|\1\0\0\0l\0\230\1\0 \2 \244\236\270"...,
8192) = 8192
_llseek(55, 857923584, [857923584], SEEK_SET) = 0
read(55, "\4\0\0\0\224\242\320|\1\0\0\0p\0|\0\0 \2
\234\236\310\002"..., 8192) = 8192
_llseek(55, 57270272, [57270272], SEEK_SET) = 0
read(55, "\4\0\0\0\3204FK\1\0\0\0t\0\340\0\0 \2
\310\236j\2\214\235"..., 8192) = 8192
_llseek(55, 870727680, [870727680], SEEK_SET) = 0
read(55, "\4\0\0\0x>\233}\1\0\0\0p\0@\1\0 \2 \250\236\260\2X\235"...,
8192) = 8192
_llseek(55, 1014734848, [1014734848], SEEK_SET) = 0
read(55, "\4\0\0\0\34\354\201\206\1\0\0\0p\0p\0\0 \2 \264\236\230"...,
8192) = 8192
_llseek(55, 857874432, [857874432], SEEK_SET) = 0
read(55, "\4\0\0\0\214\331\317|\1\0\0\0l\0\324\1\0 \2 \224\236\330"...,
8192) = 8192
_llseek(55, 760872960, [760872960], SEEK_SET) = 0
read(55, "\4\0\0\0\30\257\321v\1\0\0\0p\0\230\0\0 \2 \234\236\310"...,
8192) = 8192
_llseek(55, 891715584, [891715584], SEEK_SET) = 0
read(55, "\4\0\0\0\370\220\347~\1\0\0\0p\0P\1\0 \2 \230\236\320\2"...,
8192) = 8192
_llseek(55, 857858048, [857858048], SEEK_SET) = 0
read(55, "\4\0\0\0\250\227\317|\1\0\0\0p\0\264\0\0 \2 \254\236\250"...,
8192) = 8192
_llseek(55, 666910720, [666910720], SEEK_SET) = 0
read(55, "\4\0\0\0x\206\3q\1\0\0\0p\0004\1\0 \2
\254\236\242\2P\235"..., 8192) = 8192
_llseek(55, 857841664, [857841664], SEEK_SET) = 0
read(55, "\4\0\0\0dT\317|\1\0\0\0p\0\224\0\0 \2 \214\236\350\2\30"...,
8192) = 8192



Re: High load and iowait but no disk access

From
Rémy Beaumont
Date:
On 30-Aug-05, at 12:15, Tom Lane wrote:

> =?ISO-8859-1?Q?R=E9my_Beaumont?= <remyb@medrium.com> writes:
>> The stats of the NetApp do confirm that it is sitting idle.
>
> Really?


>
>>   CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s
>> Cache Cache  CP   CP Disk   DAFS   FCP iSCSI   FCP  kB/s
>>                                    in   out   read  write  read write
>> age   hit time  ty util                       in   out
>>    2%     0     0     0     139     0     0   2788      0     0     0
>>   3   96%   0%  -   15%      0   139     0     3  2277
>>    2%     0     0     0     144     0     0   2504      0     0     0
>>   3   96%   0%  -   18%      0   144     0     3  2150
>>    2%     0     0     0     130     0     0   2212      0     0     0
>>   3   96%   0%  -   13%      0   130     0     3  1879
>>    3%     0     0     0     169     0     0   2937     80     0     0
>>   3   96%   0%  -   13%      0   169     0     4  2718
>>    2%     0     0     0     139     0     0   2448      0     0     0
>>   3   96%   0%  -   12%      0   139     0     3  2096
>
> I know zip about NetApps, but doesn't the 8th column indicate pretty
> steady disk reads?
Yes, but they are very low.
At 15% usage, it's pretty much sitting idle if you consider that the OS
reports that one of the processor is spending more then 80% of it's
time in IOwait.

Rémy
>
>             regards, tom lane


Re: High load and iowait but no disk access

From
Michael Stone
Date:
On Mon, Aug 29, 2005 at 09:42:46AM -0400, Rémy Beaumont wrote:
>We have been trying to pinpoint what originally seem to be a I/O
>bottleneck but which now seems to be an issue with either Postgresql or
>RHES 3.

Nope, it's an IO bottleneck.

>The behavior we see is that when running queries that do random reads
>on disk, IOWAIT goes over 80% and actual disk IO falls to a crawl at a
>throughput bellow 3000kB/s

That's the sign of an IO bottleneck.

>The stats of the NetApp do confirm that it is sitting idle. Doing an
>strace on the Postgresql process shows that is it doing seeks and
>reads.
>
>So my question is where is this iowait time spent ?

Waiting for the seeks. postgres doesn't do async io, so it requests a
block, waits for it to come in, then requests another block, etc. The
utilization on the netapp isn't going to be high because it doesn't have
a queue of requests and can't do readahead because the IO is random. The
only way to improve the situation would be to reduce the latency of the
seeks. If I read the numbers right you're only getting about 130
seeks/s, which ain't great. I don't know how much latency the netapp
adds in the this configuration; have you tried benchmarking
direct-attach disks?

Mike Stone

Re: High load and iowait but no disk access

From
Tom Lane
Date:
=?ISO-8859-1?Q?R=E9my_Beaumont?= <remyb@medrium.com> writes:
> The stats of the NetApp do confirm that it is sitting idle.

Really?

>   CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s
> Cache Cache  CP   CP Disk   DAFS   FCP iSCSI   FCP  kB/s
>                                    in   out   read  write  read write
> age   hit time  ty util                       in   out
>    2%     0     0     0     139     0     0   2788      0     0     0
>   3   96%   0%  -   15%      0   139     0     3  2277
>    2%     0     0     0     144     0     0   2504      0     0     0
>   3   96%   0%  -   18%      0   144     0     3  2150
>    2%     0     0     0     130     0     0   2212      0     0     0
>   3   96%   0%  -   13%      0   130     0     3  1879
>    3%     0     0     0     169     0     0   2937     80     0     0
>   3   96%   0%  -   13%      0   169     0     4  2718
>    2%     0     0     0     139     0     0   2448      0     0     0
>   3   96%   0%  -   12%      0   139     0     3  2096

I know zip about NetApps, but doesn't the 8th column indicate pretty
steady disk reads?

            regards, tom lane

Re: High load and iowait but no disk access

From
Tom Lane
Date:
=?ISO-8859-1?Q?R=E9my_Beaumont?= <remyb@medrium.com> writes:
> On 30-Aug-05, at 12:15, Tom Lane wrote:
>> I know zip about NetApps, but doesn't the 8th column indicate pretty
>> steady disk reads?

> Yes, but they are very low.

Sure, but that's more or less what you'd expect if the thing is randomly
seeking all over the disk :-(.  Just because it's a NetApp doesn't mean
it's got zero seek time.

You did not say what sort of query this is, but I gather that it's doing
an indexscan on a table that is not at all in index order.  Possible
solutions involve reverting to a seqscan (have you forced the planner to
choose an indexscan here, either directly or by lowering random_page_cost?)
or CLUSTERing the table by the index (which would need to be repeated
periodically, so it's not a great answer).

            regards, tom lane

Re: High load and iowait but no disk access

From
Rémy Beaumont
Date:
On 30-Aug-05, at 12:29, Tom Lane wrote:

> =?ISO-8859-1?Q?R=E9my_Beaumont?= <remyb@medrium.com> writes:
>> On 30-Aug-05, at 12:15, Tom Lane wrote:
>>> I know zip about NetApps, but doesn't the 8th column indicate pretty
>>> steady disk reads?
>
>> Yes, but they are very low.
>
> Sure, but that's more or less what you'd expect if the thing is
> randomly
> seeking all over the disk :-(.  Just because it's a NetApp doesn't mean
> it's got zero seek time.
Per NetApp, the disk utilization percentage they report does include
seek time, not just read/write operations.
NetApp has been involved in trying to figure out what is going on and
their claim is that the NetApp filer is not IO bound.

>
> You did not say what sort of query this is, but I gather that it's
> doing
> an indexscan on a table that is not at all in index order.
Yes, most of those queries are doing an  indexscan.  It's a fresh
restore of our production database that we have vacuumed/analyzed.

> Possible
> solutions involve reverting to a seqscan (have you forced the planner
> to
> choose an indexscan here, either directly or by lowering
> random_page_cost?)
No.
> or CLUSTERing the table by the index (which would need to be repeated
> periodically, so it's not a great answer).
Will try to cluster the tables and see if it changes anything. Still
doesn't explain what is going on with those seeks.

Thanks,

Rémy

>
>             regards, tom lane


Re: High load and iowait but no disk access

From
Josh Berkus
Date:
Remy,

> The behavior we see is that when running queries that do random reads
> on disk, IOWAIT goes over 80% and actual disk IO falls to a crawl at a
> throughput bellow 3000kB/s (We usually average 40000 kB/s to 80000 kB/s
> on sequential read operations on the netapps)

This seems pretty low for a NetApp -- you should be able to manage up to
180mb/s, if not higher.   Are you sure it's configured correctly?

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: High load and iowait but no disk access

From
"Woody Woodring"
Date:
Have you tried a different kernel?  We run with a netapp over NFS without any issues, but we have seen high IO-wait on other Dell boxes (running  and not running postgres) and RHES 3.  We have replaced a Dell PowerEdge 350 running RH 7.3  with a PE750 with more memory running RHES3 and it be bogged down with IO waits due to syslog messages writing to the disk, the old slower server could handle it fine.  I don't know if it is a Dell thing or a RH kernel, but we try different kernels on our boxes to try to find one that works better.  We have not found one that stands out over another consistently but we have been moving away from Update 2 kernel (2.4.21-15.ELsmp) due to server lockup issues.  Unfortunately we get the best disk throughput on our few remaining 7.3 boxes.
 
Woody
 
IGLASS Networks


From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Rémy Beaumont
Sent: Monday, August 29, 2005 9:43 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] High load and iowait but no disk access

We have been trying to pinpoint what originally seem to be a I/O bottleneck but which now seems to be an issue with either Postgresql or RHES 3.

We have the following test environment on which we can reproduce the problem:

1) Test System A
Dell 6650 Quad Xeon Pentium 4
8 Gig of RAM
OS: RHES 3 update 2
Storage: NetApp FAS270 connected using an FC card using 10 disks

2) Test System B
Dell Dual Xeon Pentium III
2 Gig o RAM
OS: RHES 3 update 2
Storage: NetApp FAS920 connected using an FC card using 28 disks

Our Database size is around 30G.

The behavior we see is that when running queries that do random reads on disk, IOWAIT goes over 80% and actual disk IO falls to a crawl at a throughput bellow 3000kB/s (We usually average 40000 kB/s to 80000 kB/s on sequential read operations on the netapps)

The stats of the NetApp do confirm that it is sitting idle. Doing an strace on the Postgresql process shows that is it doing seeks and reads.

So my question is where is this iowait time spent ?
Is there a way to pinpoint the problem in more details ?
We are able to reproduce this behavior with Postgresql 7.4.8 and 8.0.3

I have included the output of top,vmstat,strace and systat from the Netapp from System B while running a single query that generates this behavior.

Rémy

top output:
06:27:28 up 5 days, 16:59, 6 users, load average: 1.04, 1.30, 1.01
72 processes: 71 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 2.7% 0.0% 1.0% 0.1% 0.2% 46.0% 49.5%
cpu00 0.2% 0.0% 0.2% 0.0% 0.2% 2.2% 97.2%
cpu01 5.3% 0.0% 1.9% 0.3% 0.3% 89.8% 1.9%
Mem: 2061696k av, 2043936k used, 17760k free, 0k shrd, 3916k buff
1566332k actv, 296648k in_d, 30504k in_c
Swap: 16771584k av, 21552k used, 16750032k free 1933772k cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
30960 postgres 15 0 13424 10M 9908 D 2.7 0.5 2:00 1 postmaster
30538 root 15 0 1080 764 524 S 0.7 0.0 0:43 0 sshd
1 root 15 0 496 456 436 S 0.0 0.0 0:08 0 init
2 root RT 0 0 0 0 SW 0.0 0.0 0:00 0 migration/0
3 root RT 0 0 0 0 SW 0.0 0.0 0:00 1 migration/1
4 root 15 0 0 0 0 SW 0.0 0.0 0:01 0 keventd
5 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0
6 root 34 19 0 0 0 SWN 0.0 0.0 0:00 1 ksoftirqd/1
9 root 15 0 0 0 0 SW 0.0 0.0 0:24 1 bdflush
7 root 15 0 0 0 0 SW 0.0 0.0 6:53 1 kswapd
8 root 15 0 0 0 0 SW 0.0 0.0 8:44 1 kscand
10 root 15 0 0 0 0 SW 0.0 0.0 0:13 0 kupdated
11 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd
17 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 ahc_dv_0


vmstat output
procs memory swap io system cpu
r b swpd free buff cache si so bi bo in cs us sy id wa
0 1 21552 17796 4872 1931928 2 3 3 1 27 6 2 1 7 3
0 1 21552 18044 4880 1931652 0 0 1652 0 397 512 1 2 50 47
0 1 21552 17976 4896 1931664 0 0 2468 0 407 552 2 2 50 47
1 0 21552 17984 4896 1931608 0 0 2124 0 418 538 3 3 48 46
0 1 21552 18028 4900 1931536 0 0 1592 0 385 509 1 3 50 46
0 1 21552 18040 4916 1931488 0 0 1620 820 419 581 2 2 50 46
0 1 21552 17968 4916 1931536 0 4 1708 4 402 554 3 1 50 46
1 1 21552 18052 4916 1931388 0 0 1772 0 409 531 3 1 49 47
0 1 21552 17912 4924 1931492 0 0 1772 0 408 565 3 1 48 48
0 1 21552 17932 4932 1931440 0 4 1356 4 391 545 5 0 49 46
0 1 21552 18320 4944 1931016 0 4 1500 840 414 571 1 1 48 50
0 1 21552 17872 4944 1931440 0 0 2116 0 392 496 1 5 46 48
0 1 21552 18060 4944 1931232 0 0 2232 0 423 597 1 2 48 49
1 1 21552 17684 4944 1931584 0 0 1752 0 395 537 1 1 50 48
0 1 21552 18000 4944 1931240 0 0 1576 0 401 549 0 1 50 49


NetApp stats:
CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk DAFS FCP iSCSI FCP kB/s
in out read write read write age hit time ty util in out
2% 0 0 0 139 0 0 2788 0 0 0 3 96% 0% - 15% 0 139 0 3 2277
2% 0 0 0 144 0 0 2504 0 0 0 3 96% 0% - 18% 0 144 0 3 2150
2% 0 0 0 130 0 0 2212 0 0 0 3 96% 0% - 13% 0 130 0 3 1879
3% 0 0 0 169 0 0 2937 80 0 0 3 96% 0% - 13% 0 169 0 4 2718
2% 0 0 0 139 0 0 2448 0 0 0 3 96% 0% - 12% 0 139 0 3 2096
2% 0 0 0 137 0 0 2116 0 0 0 3 96% 0% - 10% 0 137 0 3 1892
3% 0 0 0 107 0 0 2660 812 0 0 3 96% 24% T 20% 0 107 0 3 1739
2% 0 0 0 118 0 0 1788 0 0 0 3 96% 0% - 13% 0 118 0 3 1608
2% 0 0 0 136 0 0 2228 0 0 0 3 96% 0% - 11% 0 136 0 3 2018
2% 0 0 0 119 0 0 1940 0 0 0 3 96% 0% - 13% 0 119 0 3 1998
2% 0 0 0 136 0 0 2175 0 0 0 3 96% 0% - 14% 0 136 0 3 1929
2% 0 0 0 133 0 0 1924 0 0 0 3 96% 0% - 19% 0 133 0 3 2292
2% 0 0 0 115 0 0 2044 0 0 0 3 96% 0% - 11% 0 115 0 3 1682
2% 0 0 0 134 0 0 2256 0 0 0 3 96% 0% - 12% 0 134 0 3 2096
2% 0 0 0 112 0 0 2184 0 0 0 3 96% 0% - 12% 0 112 0 3 1633
2% 0 0 0 163 0 0 2348 0 0 0 3 96% 0% - 13% 0 163 0 4 2421
2% 0 0 0 120 0 0 2056 184 0 0 3 96% 8% T 14% 0 120 0 3 1703

strace output:
read(55, "\4\0\0\0\10fm}\1\0\0\0p\0\264\0\0 \2 \230\236\320\0020"..., 8192) = 8192
_llseek(55, 857997312, [857997312], SEEK_SET) = 0
read(55, "\4\0\0\0\\\315\321|\1\0\0\0p\0\354\0\0 \2 \250\236\260"..., 8192) = 8192
_llseek(55, 883220480, [883220480], SEEK_SET) = 0
read(55, "\4\0\0\0T\17a~\1\0\0\0p\0\20\1\0 \2 \270\236\220\2D\235"..., 8192) = 8192
_llseek(55, 858005504, [858005504], SEEK_SET) = 0
read(55, "\4\0\0\0\300\356\321|\1\0\0\0p\0\330\0\0 \2 \260\236\240"..., 8192) = 8192
_llseek(55, 857964544, [857964544], SEEK_SET) = 0
read(55, "\4\0\0\0lH\321|\1\0\0\0p\0<\1\0 \2 \300\236\200\2p\235"..., 8192) = 8192
_llseek(55, 857956352, [857956352], SEEK_SET) = 0
read(55, "\4\0\0\0l\'\321|\1\0\0\0p\0\320\0\0 \2 \260\236\240\2\\"..., 8192) = 8192
_llseek(55, 910802944, [910802944], SEEK_SET) = 0
read(55, "\4\0\0\0\10}\25\200\1\0\0\0l\0\274\1\0 \2 \250\236\260"..., 8192) = 8192
_llseek(55, 857948160, [857948160], SEEK_SET) = 0
read(55, "\4\0\0\0\370\5\321|\1\0\0\0p\0\350\0\0 \2 \230\236\320"..., 8192) = 8192
_llseek(56, 80371712, [80371712], SEEK_SET) = 0
read(56, "\4\0\0\0Lf \217\1\0\0\0p\0\f\1\0 \2 \250\236\260\2T\235"..., 8192) = 8192
read(102, "\2\0\34\0001\236\0\0\1\0\0\0\t\0\0\00020670\0\0\0B\6\0"..., 8192) = 8192
_llseek(55, 857939968, [857939968], SEEK_SET) = 0
read(55, "\4\0\0\0\244\344\320|\1\0\0\0l\0\230\1\0 \2 \244\236\270"..., 8192) = 8192
_llseek(55, 857923584, [857923584], SEEK_SET) = 0
read(55, "\4\0\0\0\224\242\320|\1\0\0\0p\0|\0\0 \2 \234\236\310\002"..., 8192) = 8192
_llseek(55, 57270272, [57270272], SEEK_SET) = 0
read(55, "\4\0\0\0\3204FK\1\0\0\0t\0\340\0\0 \2 \310\236j\2\214\235"..., 8192) = 8192
_llseek(55, 870727680, [870727680], SEEK_SET) = 0
read(55, "\4\0\0\0x>\233}\1\0\0\0p\0@\1\0 \2 \250\236\260\2X\235"..., 8192) = 8192
_llseek(55, 1014734848, [1014734848], SEEK_SET) = 0
read(55, "\4\0\0\0\34\354\201\206\1\0\0\0p\0p\0\0 \2 \264\236\230"..., 8192) = 8192
_llseek(55, 857874432, [857874432], SEEK_SET) = 0
read(55, "\4\0\0\0\214\331\317|\1\0\0\0l\0\324\1\0 \2 \224\236\330"..., 8192) = 8192
_llseek(55, 760872960, [760872960], SEEK_SET) = 0
read(55, "\4\0\0\0\30\257\321v\1\0\0\0p\0\230\0\0 \2 \234\236\310"..., 8192) = 8192
_llseek(55, 891715584, [891715584], SEEK_SET) = 0
read(55, "\4\0\0\0\370\220\347~\1\0\0\0p\0P\1\0 \2 \230\236\320\2"..., 8192) = 8192
_llseek(55, 857858048, [857858048], SEEK_SET) = 0
read(55, "\4\0\0\0\250\227\317|\1\0\0\0p\0\264\0\0 \2 \254\236\250"..., 8192) = 8192
_llseek(55, 666910720, [666910720], SEEK_SET) = 0
read(55, "\4\0\0\0x\206\3q\1\0\0\0p\0004\1\0 \2 \254\236\242\2P\235"..., 8192) = 8192
_llseek(55, 857841664, [857841664], SEEK_SET) = 0
read(55, "\4\0\0\0dT\317|\1\0\0\0p\0\224\0\0 \2 \214\236\350\2\30"..., 8192) = 8192


Re: High load and iowait but no disk access

From
Rémy Beaumont
Date:
On 30-Aug-05, at 14:32, Josh Berkus wrote:

> Remy,
>
>> The behavior we see is that when running queries that do random reads
>> on disk, IOWAIT goes over 80% and actual disk IO falls to a crawl at a
>> throughput bellow 3000kB/s (We usually average 40000 kB/s to 80000
>> kB/s
>> on sequential read operations on the netapps)
>
> This seems pretty low for a NetApp -- you should be able to manage up
> to
> 180mb/s, if not higher.   Are you sure it's configured correctly?
Hi Josh,

The config has been reviewed by NetApp. We do get rates higher then
80mb/s, but on average, that's what we get.

Do you have NetApp filers deployed ?
How many spindles do you have in your volume ?
On which OS are you running Postgres ?

Thanks,

Rémy

>
> --
> --Josh
>
> Josh Berkus
> Aglio Database Solutions
> San Francisco
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match


Re: High load and iowait but no disk access

From
Rémy Beaumont
Date:
On 30-Aug-05, at 14:46, Anjan Dave wrote:

> I have seen references of changing the kernel io scheduler at boot
> time…not sure if it applies to RHEL3.0, or will help, but try setting
> ‘elevator=deadline’ during boot time or via grub.conf.
That's only for RHEL 4.0.

> Have you tried running a simple ‘dd’ on the LUN?
We get amazing performance using dd.
> The drives are in RAID10 configuration, right?
NetApp has their own type of raid format (RAID4 aka WAFL)

Rémy
>  
> Thanks,
> Anjan
>
> From: Woody Woodring [mailto:george.woodring@iglass.net]
> Sent: Tuesday, August 30, 2005 2:30 PM
> To: 'Rémy Beaumont'; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] High load and iowait but no disk access
>  
> Have you tried a different kernel?  We run with a netapp over NFS
> without any issues, but we have seen high IO-wait on other Dell boxes
> (running  and not running postgres) and RHES 3.  We have replaced a
> Dell PowerEdge 350 running RH 7.3  with a PE750 with more memory
> running RHES3 and it be bogged down with IO waits due to syslog
> messages writing to the disk, the old slower server could handle it
> fine.  I don't know if it is a Dell thing or a RH kernel, but we try
> different kernels on our boxes to try to find one that works better. 
> We have not found one that stands out over another consistently but we
> have been moving away from Update 2 kernel (2.4.21-15.ELsmp) due to
> server lockup issues.  Unfortunately we get the best disk throughput
> on our few remaining 7.3 boxes.
>  
> Woody
>  
> IGLASS Networks
> www.iglass.net
>  
>
>
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Rémy
> Beaumont
> Sent: Monday, August 29, 2005 9:43 AM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] High load and iowait but no disk access
> We have been trying to pinpoint what originally seem to be a I/O
> bottleneck but which now seems to be an issue with either Postgresql
> or RHES 3.
>
>  We have the following test environment on which we can reproduce the
> problem:
>
>  1) Test System A
>  Dell 6650 Quad Xeon Pentium 4
>  8 Gig of RAM
>  OS: RHES 3 update 2
>  Storage: NetApp FAS270 connected using an FC card using 10 disks
>
>  2) Test System B
>  Dell Dual Xeon Pentium III
>  2 Gig o RAM
>  OS: RHES 3 update 2
>  Storage: NetApp FAS920 connected using an FC card using 28 disks
>
>  Our Database size is around 30G.
>
>  The behavior we see is that when running queries that do random reads
> on disk, IOWAIT goes over 80% and actual disk IO falls to a crawl at a
> throughput bellow 3000kB/s (We usually average 40000 kB/s to 80000
> kB/s on sequential read operations on the netapps)
>
>  The stats of the NetApp do confirm that it is sitting idle. Doing an
> strace on the Postgresql process shows that is it doing seeks and
> reads.
>
>  So my question is where is this iowait time spent ?
>  Is there a way to pinpoint the problem in more details ?
>  We are able to reproduce this behavior with Postgresql 7.4.8 and 8.0.3
>
>  I have included the output of top,vmstat,strace and systat from the
> Netapp from System B while running a single query that generates this
> behavior.
>
>  Rémy
>
>  top output:
>  06:27:28 up 5 days, 16:59, 6 users, load average: 1.04, 1.30, 1.01
>  72 processes: 71 sleeping, 1 running, 0 zombie, 0 stopped
>  CPU states: cpu user nice system irq softirq iowait idle
>  total 2.7% 0.0% 1.0% 0.1% 0.2% 46.0% 49.5%
>  cpu00 0.2% 0.0% 0.2% 0.0% 0.2% 2.2% 97.2%
>  cpu01 5.3% 0.0% 1.9% 0.3% 0.3% 89.8% 1.9%
>  Mem: 2061696k av, 2043936k used, 17760k free, 0k shrd, 3916k buff
>  1566332k actv, 296648k in_d, 30504k in_c
>  Swap: 16771584k av, 21552k used, 16750032k free 1933772k cached
>
>  PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
>  30960 postgres 15 0 13424 10M 9908 D 2.7 0.5 2:00 1 postmaster
>  30538 root 15 0 1080 764 524 S 0.7 0.0 0:43 0 sshd
>  1 root 15 0 496 456 436 S 0.0 0.0 0:08 0 init
>  2 root RT 0 0 0 0 SW 0.0 0.0 0:00 0 migration/0
>  3 root RT 0 0 0 0 SW 0.0 0.0 0:00 1 migration/1
>  4 root 15 0 0 0 0 SW 0.0 0.0 0:01 0 keventd
>  5 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0
>  6 root 34 19 0 0 0 SWN 0.0 0.0 0:00 1 ksoftirqd/1
>  9 root 15 0 0 0 0 SW 0.0 0.0 0:24 1 bdflush
>  7 root 15 0 0 0 0 SW 0.0 0.0 6:53 1 kswapd
>  8 root 15 0 0 0 0 SW 0.0 0.0 8:44 1 kscand
>  10 root 15 0 0 0 0 SW 0.0 0.0 0:13 0 kupdated
>  11 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd
>  17 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 ahc_dv_0
>
>
>  vmstat output
>  procs memory swap io system cpu
>  r b swpd free buff cache si so bi bo in cs us sy id wa
>  0 1 21552 17796 4872 1931928 2 3 3 1 27 6 2 1 7 3
>  0 1 21552 18044 4880 1931652 0 0 1652 0 397 512 1 2 50 47
>  0 1 21552 17976 4896 1931664 0 0 2468 0 407 552 2 2 50 47
>  1 0 21552 17984 4896 1931608 0 0 2124 0 418 538 3 3 48 46
>  0 1 21552 18028 4900 1931536 0 0 1592 0 385 509 1 3 50 46
>  0 1 21552 18040 4916 1931488 0 0 1620 820 419 581 2 2 50 46
>  0 1 21552 17968 4916 1931536 0 4 1708 4 402 554 3 1 50 46
>  1 1 21552 18052 4916 1931388 0 0 1772 0 409 531 3 1 49 47
>  0 1 21552 17912 4924 1931492 0 0 1772 0 408 565 3 1 48 48
>  0 1 21552 17932 4932 1931440 0 4 1356 4 391 545 5 0 49 46
>  0 1 21552 18320 4944 1931016 0 4 1500 840 414 571 1 1 48 50
>  0 1 21552 17872 4944 1931440 0 0 2116 0 392 496 1 5 46 48
>  0 1 21552 18060 4944 1931232 0 0 2232 0 423 597 1 2 48 49
>  1 1 21552 17684 4944 1931584 0 0 1752 0 395 537 1 1 50 48
>  0 1 21552 18000 4944 1931240 0 0 1576 0 401 549 0 1 50 49
>
>
>  NetApp stats:
>  CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP
> CP Disk DAFS FCP iSCSI FCP kB/s
>  in out read write read write age hit time ty util in out
>  2% 0 0 0 139 0 0 2788 0 0 0 3 96% 0% - 15% 0 139 0 3 2277
>  2% 0 0 0 144 0 0 2504 0 0 0 3 96% 0% - 18% 0 144 0 3 2150
>  2% 0 0 0 130 0 0 2212 0 0 0 3 96% 0% - 13% 0 130 0 3 1879
>  3% 0 0 0 169 0 0 2937 80 0 0 3 96% 0% - 13% 0 169 0 4 2718
>  2% 0 0 0 139 0 0 2448 0 0 0 3 96% 0% - 12% 0 139 0 3 2096
>  2% 0 0 0 137 0 0 2116 0 0 0 3 96% 0% - 10% 0 137 0 3 1892
>  3% 0 0 0 107 0 0 2660 812 0 0 3 96% 24% T 20% 0 107 0 3 1739
>  2% 0 0 0 118 0 0 1788 0 0 0 3 96% 0% - 13% 0 118 0 3 1608
>  2% 0 0 0 136 0 0 2228 0 0 0 3 96% 0% - 11% 0 136 0 3 2018
>  2% 0 0 0 119 0 0 1940 0 0 0 3 96% 0% - 13% 0 119 0 3 1998
>  2% 0 0 0 136 0 0 2175 0 0 0 3 96% 0% - 14% 0 136 0 3 1929
>  2% 0 0 0 133 0 0 1924 0 0 0 3 96% 0% - 19% 0 133 0 3 2292
>  2% 0 0 0 115 0 0 2044 0 0 0 3 96% 0% - 11% 0 115 0 3 1682
>  2% 0 0 0 134 0 0 2256 0 0 0 3 96% 0% - 12% 0 134 0 3 2096
>  2% 0 0 0 112 0 0 2184 0 0 0 3 96% 0% - 12% 0 112 0 3 1633
>  2% 0 0 0 163 0 0 2348 0 0 0 3 96% 0% - 13% 0 163 0 4 2421
>  2% 0 0 0 120 0 0 2056 184 0 0 3 96% 8% T 14% 0 120 0 3 1703
>
>  strace output:
>  read(55, "\4\0\0\0\10fm}\1\0\0\0p\0\264\0\0 \2 \230\236\320\0020"...,
> 8192) = 8192
>  _llseek(55, 857997312, [857997312], SEEK_SET) = 0
>  read(55, "\4\0\0\0\\\315\321|\1\0\0\0p\0\354\0\0 \2 \250\236\260"...,
> 8192) = 8192
>  _llseek(55, 883220480, [883220480], SEEK_SET) = 0
>  read(55, "\4\0\0\0T\17a~\1\0\0\0p\0\20\1\0 \2
> \270\236\220\2D\235"..., 8192) = 8192
>  _llseek(55, 858005504, [858005504], SEEK_SET) = 0
>  read(55, "\4\0\0\0\300\356\321|\1\0\0\0p\0\330\0\0 \2
> \260\236\240"..., 8192) = 8192
>  _llseek(55, 857964544, [857964544], SEEK_SET) = 0
>  read(55, "\4\0\0\0lH\321|\1\0\0\0p\0<\1\0 \2 \300\236\200\2p\235"...,
> 8192) = 8192
>  _llseek(55, 857956352, [857956352], SEEK_SET) = 0
>  read(55, "\4\0\0\0l\'\321|\1\0\0\0p\0\320\0\0 \2
> \260\236\240\2\\"..., 8192) = 8192
>  _llseek(55, 910802944, [910802944], SEEK_SET) = 0
>  read(55, "\4\0\0\0\10}\25\200\1\0\0\0l\0\274\1\0 \2 \250\236\260"...,
> 8192) = 8192
>  _llseek(55, 857948160, [857948160], SEEK_SET) = 0
>  read(55, "\4\0\0\0\370\5\321|\1\0\0\0p\0\350\0\0 \2 \230\236\320"...,
> 8192) = 8192
>  _llseek(56, 80371712, [80371712], SEEK_SET) = 0
>  read(56, "\4\0\0\0Lf \217\1\0\0\0p\0\f\1\0 \2
> \250\236\260\2T\235"..., 8192) = 8192
>  read(102,
> "\2\0\34\0001\236\0\0\1\0\0\0\t\0\0\00020670\0\0\0B\6\0"..., 8192) =
> 8192
>  _llseek(55, 857939968, [857939968], SEEK_SET) = 0
>  read(55, "\4\0\0\0\244\344\320|\1\0\0\0l\0\230\1\0 \2
> \244\236\270"..., 8192) = 8192
>  _llseek(55, 857923584, [857923584], SEEK_SET) = 0
>  read(55, "\4\0\0\0\224\242\320|\1\0\0\0p\0|\0\0 \2
> \234\236\310\002"..., 8192) = 8192
>  _llseek(55, 57270272, [57270272], SEEK_SET) = 0
>  read(55, "\4\0\0\0\3204FK\1\0\0\0t\0\340\0\0 \2
> \310\236j\2\214\235"..., 8192) = 8192
>  _llseek(55, 870727680, [870727680], SEEK_SET) = 0
>  read(55, "\4\0\0\0x>\233}\1\0\0\0p\0@\1\0 \2 \250\236\260\2X\235"...,
> 8192) = 8192
>  _llseek(55, 1014734848, [1014734848], SEEK_SET) = 0
>  read(55, "\4\0\0\0\34\354\201\206\1\0\0\0p\0p\0\0 \2
> \264\236\230"..., 8192) = 8192
>  _llseek(55, 857874432, [857874432], SEEK_SET) = 0
>  read(55, "\4\0\0\0\214\331\317|\1\0\0\0l\0\324\1\0 \2
> \224\236\330"..., 8192) = 8192
>  _llseek(55, 760872960, [760872960], SEEK_SET) = 0
>  read(55, "\4\0\0\0\30\257\321v\1\0\0\0p\0\230\0\0 \2
> \234\236\310"..., 8192) = 8192
>  _llseek(55, 891715584, [891715584], SEEK_SET) = 0
>  read(55, "\4\0\0\0\370\220\347~\1\0\0\0p\0P\1\0 \2
> \230\236\320\2"..., 8192) = 8192
>  _llseek(55, 857858048, [857858048], SEEK_SET) = 0
>  read(55, "\4\0\0\0\250\227\317|\1\0\0\0p\0\264\0\0 \2
> \254\236\250"..., 8192) = 8192
>  _llseek(55, 666910720, [666910720], SEEK_SET) = 0
>  read(55, "\4\0\0\0x\206\3q\1\0\0\0p\0004\1\0 \2
> \254\236\242\2P\235"..., 8192) = 8192
>  _llseek(55, 857841664, [857841664], SEEK_SET) = 0
>  read(55, "\4\0\0\0dT\317|\1\0\0\0p\0\224\0\0 \2
> \214\236\350\2\30"..., 8192) = 8192
>
>
>


Re: High load and iowait but no disk access

From
"Anjan Dave"
Date:

I have seen references of changing the kernel io scheduler at boot time…not sure if it applies to RHEL3.0, or will help, but try setting ‘elevator=deadline’ during boot time or via grub.conf. Have you tried running a simple ‘dd’ on the LUN? The drives are in RAID10 configuration, right?

 

Thanks,

Anjan


From: Woody Woodring [mailto:george.woodring@iglass.net]
Sent: Tuesday, August 30, 2005 2:30 PM
To: 'Rémy Beaumont'; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] High load and iowait but no disk access

 

Have you tried a different kernel?  We run with a netapp over NFS without any issues, but we have seen high IO-wait on other Dell boxes (running  and not running postgres) and RHES 3.  We have replaced a Dell PowerEdge 350 running RH 7.3  with a PE750 with more memory running RHES3 and it be bogged down with IO waits due to syslog messages writing to the disk, the old slower server could handle it fine.  I don't know if it is a Dell thing or a RH kernel, but we try different kernels on our boxes to try to find one that works better.  We have not found one that stands out over another consistently but we have been moving away from Update 2 kernel (2.4.21-15.ELsmp) due to server lockup issues.  Unfortunately we get the best disk throughput on our few remaining 7.3 boxes.

 

Woody

 

IGLASS Networks

www.iglass.net

 


From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Rémy Beaumont
Sent: Monday, August 29, 2005 9:43 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] High load and iowait but no disk access

We have been trying to pinpoint what originally seem to be a I/O bottleneck but which now seems to be an issue with either Postgresql or RHES 3.

We have the following test environment on which we can reproduce the problem:

1) Test System A
Dell 6650 Quad Xeon Pentium 4
8 Gig of RAM
OS: RHES 3 update 2
Storage: NetApp FAS270 connected using an FC card using 10 disks

2) Test System B
Dell Dual Xeon Pentium III
2 Gig o RAM
OS: RHES 3 update 2
Storage: NetApp FAS920 connected using an FC card using 28 disks

Our Database size is around 30G.

The behavior we see is that when running queries that do random reads on disk, IOWAIT goes over 80% and actual disk IO falls to a crawl at a throughput bellow 3000kB/s (We usually average 40000 kB/s to 80000 kB/s on sequential read operations on the netapps)

The stats of the NetApp do confirm that it is sitting idle. Doing an strace on the Postgresql process shows that is it doing seeks and reads.

So my question is where is this iowait time spent ?
Is there a way to pinpoint the problem in more details ?
We are able to reproduce this behavior with Postgresql 7.4.8 and 8.0.3

I have included the output of top,vmstat,strace and systat from the Netapp from System B while running a single query that generates this behavior.

Rémy

top output:
06:27:28 up 5 days, 16:59, 6 users, load average: 1.04, 1.30, 1.01
72 processes: 71 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 2.7% 0.0% 1.0% 0.1% 0.2% 46.0% 49.5%
cpu00 0.2% 0.0% 0.2% 0.0% 0.2% 2.2% 97.2%
cpu01 5.3% 0.0% 1.9% 0.3% 0.3% 89.8% 1.9%
Mem: 2061696k av, 2043936k used, 17760k free, 0k shrd, 3916k buff
1566332k actv, 296648k in_d, 30504k in_c
Swap: 16771584k av, 21552k used, 16750032k free 1933772k cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
30960 postgres 15 0 13424 10M 9908 D 2.7 0.5 2:00 1 postmaster
30538 root 15 0 1080 764 524 S 0.7 0.0 0:43 0 sshd
1 root 15 0 496 456 436 S 0.0 0.0 0:08 0 init
2 root RT 0 0 0 0 SW 0.0 0.0 0:00 0 migration/0
3 root RT 0 0 0 0 SW 0.0 0.0 0:00 1 migration/1
4 root 15 0 0 0 0 SW 0.0 0.0 0:01 0 keventd
5 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0
6 root 34 19 0 0 0 SWN 0.0 0.0 0:00 1 ksoftirqd/1
9 root 15 0 0 0 0 SW 0.0 0.0 0:24 1 bdflush
7 root 15 0 0 0 0 SW 0.0 0.0 6:53 1 kswapd
8 root 15 0 0 0 0 SW 0.0 0.0 8:44 1 kscand
10 root 15 0 0 0 0 SW 0.0 0.0 0:13 0 kupdated
11 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd
17 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 ahc_dv_0


vmstat output
procs memory swap io system cpu
r b swpd free buff cache si so bi bo in cs us sy id wa
0 1 21552 17796 4872 1931928 2 3 3 1 27 6 2 1 7 3
0 1 21552 18044 4880 1931652 0 0 1652 0 397 512 1 2 50 47
0 1 21552 17976 4896 1931664 0 0 2468 0 407 552 2 2 50 47
1 0 21552 17984 4896 1931608 0 0 2124 0 418 538 3 3 48 46
0 1 21552 18028 4900 1931536 0 0 1592 0 385 509 1 3 50 46
0 1 21552 18040 4916 1931488 0 0 1620 820 419 581 2 2 50 46
0 1 21552 17968 4916 1931536 0 4 1708 4 402 554 3 1 50 46
1 1 21552 18052 4916 1931388 0 0 1772 0 409 531 3 1 49 47
0 1 21552 17912 4924 1931492 0 0 1772 0 408 565 3 1 48 48
0 1 21552 17932 4932 1931440 0 4 1356 4 391 545 5 0 49 46
0 1 21552 18320 4944 1931016 0 4 1500 840 414 571 1 1 48 50
0 1 21552 17872 4944 1931440 0 0 2116 0 392 496 1 5 46 48
0 1 21552 18060 4944 1931232 0 0 2232 0 423 597 1 2 48 49
1 1 21552 17684 4944 1931584 0 0 1752 0 395 537 1 1 50 48
0 1 21552 18000 4944 1931240 0 0 1576 0 401 549 0 1 50 49


NetApp stats:
CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk DAFS FCP iSCSI FCP kB/s
in out read write read write age hit time ty util in out
2% 0 0 0 139 0 0 2788 0 0 0 3 96% 0% - 15% 0 139 0 3 2277
2% 0 0 0 144 0 0 2504 0 0 0 3 96% 0% - 18% 0 144 0 3 2150
2% 0 0 0 130 0 0 2212 0 0 0 3 96% 0% - 13% 0 130 0 3 1879
3% 0 0 0 169 0 0 2937 80 0 0 3 96% 0% - 13% 0 169 0 4 2718
2% 0 0 0 139 0 0 2448 0 0 0 3 96% 0% - 12% 0 139 0 3 2096
2% 0 0 0 137 0 0 2116 0 0 0 3 96% 0% - 10% 0 137 0 3 1892
3% 0 0 0 107 0 0 2660 812 0 0 3 96% 24% T 20% 0 107 0 3 1739
2% 0 0 0 118 0 0 1788 0 0 0 3 96% 0% - 13% 0 118 0 3 1608
2% 0 0 0 136 0 0 2228 0 0 0 3 96% 0% - 11% 0 136 0 3 2018
2% 0 0 0 119 0 0 1940 0 0 0 3 96% 0% - 13% 0 119 0 3 1998
2% 0 0 0 136 0 0 2175 0 0 0 3 96% 0% - 14% 0 136 0 3 1929
2% 0 0 0 133 0 0 1924 0 0 0 3 96% 0% - 19% 0 133 0 3 2292
2% 0 0 0 115 0 0 2044 0 0 0 3 96% 0% - 11% 0 115 0 3 1682
2% 0 0 0 134 0 0 2256 0 0 0 3 96% 0% - 12% 0 134 0 3 2096
2% 0 0 0 112 0 0 2184 0 0 0 3 96% 0% - 12% 0 112 0 3 1633
2% 0 0 0 163 0 0 2348 0 0 0 3 96% 0% - 13% 0 163 0 4 2421
2% 0 0 0 120 0 0 2056 184 0 0 3 96% 8% T 14% 0 120 0 3 1703

strace output:
read(55, "\4\0\0\0\10fm}\1\0\0\0p\0\264\0\0 \2 \230\236\320\0020"..., 8192) = 8192
_llseek(55, 857997312, [857997312], SEEK_SET) = 0
read(55, "\4\0\0\0\\\315\321|\1\0\0\0p\0\354\0\0 \2 \250\236\260"..., 8192) = 8192
_llseek(55, 883220480, [883220480], SEEK_SET) = 0
read(55, "\4\0\0\0T\17a~\1\0\0\0p\0\20\1\0 \2 \270\236\220\2D\235"..., 8192) = 8192
_llseek(55, 858005504, [858005504], SEEK_SET) = 0
read(55, "\4\0\0\0\300\356\321|\1\0\0\0p\0\330\0\0 \2 \260\236\240"..., 8192) = 8192
_llseek(55, 857964544, [857964544], SEEK_SET) = 0
read(55, "\4\0\0\0lH\321|\1\0\0\0p\0<\1\0 \2 \300\236\200\2p\235"..., 8192) = 8192
_llseek(55, 857956352, [857956352], SEEK_SET) = 0
read(55, "\4\0\0\0l\'\321|\1\0\0\0p\0\320\0\0 \2 \260\236\240\2\\"..., 8192) = 8192
_llseek(55, 910802944, [910802944], SEEK_SET) = 0
read(55, "\4\0\0\0\10}\25\200\1\0\0\0l\0\274\1\0 \2 \250\236\260"..., 8192) = 8192
_llseek(55, 857948160, [857948160], SEEK_SET) = 0
read(55, "\4\0\0\0\370\5\321|\1\0\0\0p\0\350\0\0 \2 \230\236\320"..., 8192) = 8192
_llseek(56, 80371712, [80371712], SEEK_SET) = 0
read(56, "\4\0\0\0Lf \217\1\0\0\0p\0\f\1\0 \2 \250\236\260\2T\235"..., 8192) = 8192
read(102, "\2\0\34\0001\236\0\0\1\0\0\0\t\0\0\00020670\0\0\0B\6\0"..., 8192) = 8192
_llseek(55, 857939968, [857939968], SEEK_SET) = 0
read(55, "\4\0\0\0\244\344\320|\1\0\0\0l\0\230\1\0 \2 \244\236\270"..., 8192) = 8192
_llseek(55, 857923584, [857923584], SEEK_SET) = 0
read(55, "\4\0\0\0\224\242\320|\1\0\0\0p\0|\0\0 \2 \234\236\310\002"..., 8192) = 8192
_llseek(55, 57270272, [57270272], SEEK_SET) = 0
read(55, "\4\0\0\0\3204FK\1\0\0\0t\0\340\0\0 \2 \310\236j\2\214\235"..., 8192) = 8192
_llseek(55, 870727680, [870727680], SEEK_SET) = 0
read(55, "\4\0\0\0x>\233}\1\0\0\0p\0@\1\0 \2 \250\236\260\2X\235"..., 8192) = 8192
_llseek(55, 1014734848, [1014734848], SEEK_SET) = 0
read(55, "\4\0\0\0\34\354\201\206\1\0\0\0p\0p\0\0 \2 \264\236\230"..., 8192) = 8192
_llseek(55, 857874432, [857874432], SEEK_SET) = 0
read(55, "\4\0\0\0\214\331\317|\1\0\0\0l\0\324\1\0 \2 \224\236\330"..., 8192) = 8192
_llseek(55, 760872960, [760872960], SEEK_SET) = 0
read(55, "\4\0\0\0\30\257\321v\1\0\0\0p\0\230\0\0 \2 \234\236\310"..., 8192) = 8192
_llseek(55, 891715584, [891715584], SEEK_SET) = 0
read(55, "\4\0\0\0\370\220\347~\1\0\0\0p\0P\1\0 \2 \230\236\320\2"..., 8192) = 8192
_llseek(55, 857858048, [857858048], SEEK_SET) = 0
read(55, "\4\0\0\0\250\227\317|\1\0\0\0p\0\264\0\0 \2 \254\236\250"..., 8192) = 8192
_llseek(55, 666910720, [666910720], SEEK_SET) = 0
read(55, "\4\0\0\0x\206\3q\1\0\0\0p\0004\1\0 \2 \254\236\242\2P\235"..., 8192) = 8192
_llseek(55, 857841664, [857841664], SEEK_SET) = 0
read(55, "\4\0\0\0dT\317|\1\0\0\0p\0\224\0\0 \2 \214\236\350\2\30"..., 8192) = 8192



Re: High load and iowait but no disk access

From
mudfoot@rawbw.com
Date:
This might be optimal behavior from the hardware.  Random reads are hard to
optimize for--except if you have enough physical memory to hold the entire
dataset.  Cached reads (either in array controller or OS buffer cache) should
return nearly immediately.  But random reads probably aren't cached.  And any
read-ahead alogorithms or other types of performance enhancements in the
hardware or OS go out the window--because the behavior isn't predictable.

Each time a drive spindle needs to move to a new track, it requires at least a
couple of miliseconds.  Sequential reads only require this movement
infrequently.  But random reads may be forcing this movement for every IO operation.

Since the bottleneck in random reads is the physical hard drives themselves,
everything else stands around waiting.  Fancy hardware can optimize everything
else -- writes with write cache, sequential reads with read-ahead and read
cache.  But there's no real solution to a purely random read workload except
perhaps creating different disk groups to help avoid spindle contention.

I like this tool:  http://www.soliddata.com/products/iotest.html
It allows you to select pure workloads (read/write/sequential/random), and it
runs against raw devices, so you bypass the OS buffer cache.  When I've run it
I've always seen sequential activity get much much higher throughput than random.

Quoting Anjan Dave <adave@vantage.com>:

> I have seen references of changing the kernel io scheduler at boot time...not
> sure if it applies to RHEL3.0, or will help, but try setting
> 'elevator=deadline' during boot time or via grub.conf. Have you tried running
> a simple 'dd' on the LUN? The drives are in RAID10 configuration, right?
>
>
>
> Thanks,
>
> Anjan
>
>   _____
>
> From: Woody Woodring [mailto:george.woodring@iglass.net]
> Sent: Tuesday, August 30, 2005 2:30 PM
> To: 'Rémy Beaumont'; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] High load and iowait but no disk access
>
>
>
> Have you tried a different kernel?  We run with a netapp over NFS without any
> issues, but we have seen high IO-wait on other Dell boxes (running  and not
> running postgres) and RHES 3.  We have replaced a Dell PowerEdge 350 running
> RH 7.3  with a PE750 with more memory running RHES3 and it be bogged down
> with IO waits due to syslog messages writing to the disk, the old slower
> server could handle it fine.  I don't know if it is a Dell thing or a RH
> kernel, but we try different kernels on our boxes to try to find one that
> works better.  We have not found one that stands out over another
> consistently but we have been moving away from Update 2 kernel
> (2.4.21-15.ELsmp) due to server lockup issues.  Unfortunately we get the best
> disk throughput on our few remaining 7.3 boxes.
>
>
>
> Woody
>
>
>
> IGLASS Networks
>
> www.iglass.net
>
>
>
>   _____
>
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Rémy Beaumont
> Sent: Monday, August 29, 2005 9:43 AM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] High load and iowait but no disk access
>
> We have been trying to pinpoint what originally seem to be a I/O bottleneck
> but which now seems to be an issue with either Postgresql or RHES 3.
>
> We have the following test environment on which we can reproduce the
> problem:
>
> 1) Test System A
> Dell 6650 Quad Xeon Pentium 4
> 8 Gig of RAM
> OS: RHES 3 update 2
> Storage: NetApp FAS270 connected using an FC card using 10 disks
>
> 2) Test System B
> Dell Dual Xeon Pentium III
> 2 Gig o RAM
> OS: RHES 3 update 2
> Storage: NetApp FAS920 connected using an FC card using 28 disks
>
> Our Database size is around 30G.
>
> The behavior we see is that when running queries that do random reads on
> disk, IOWAIT goes over 80% and actual disk IO falls to a crawl at a
> throughput bellow 3000kB/s (We usually average 40000 kB/s to 80000 kB/s on
> sequential read operations on the netapps)
>
> The stats of the NetApp do confirm that it is sitting idle. Doing an strace
> on the Postgresql process shows that is it doing seeks and reads.
>
> So my question is where is this iowait time spent ?
> Is there a way to pinpoint the problem in more details ?
> We are able to reproduce this behavior with Postgresql 7.4.8 and 8.0.3
>
> I have included the output of top,vmstat,strace and systat from the Netapp
> from System B while running a single query that generates this behavior.
>
> Rémy
>
> top output:
> 06:27:28 up 5 days, 16:59, 6 users, load average: 1.04, 1.30, 1.01
> 72 processes: 71 sleeping, 1 running, 0 zombie, 0 stopped
> CPU states: cpu user nice system irq softirq iowait idle
> total 2.7% 0.0% 1.0% 0.1% 0.2% 46.0% 49.5%
> cpu00 0.2% 0.0% 0.2% 0.0% 0.2% 2.2% 97.2%
> cpu01 5.3% 0.0% 1.9% 0.3% 0.3% 89.8% 1.9%
> Mem: 2061696k av, 2043936k used, 17760k free, 0k shrd, 3916k buff
> 1566332k actv, 296648k in_d, 30504k in_c
> Swap: 16771584k av, 21552k used, 16750032k free 1933772k cached
>
> PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
> 30960 postgres 15 0 13424 10M 9908 D 2.7 0.5 2:00 1 postmaster
> 30538 root 15 0 1080 764 524 S 0.7 0.0 0:43 0 sshd
> 1 root 15 0 496 456 436 S 0.0 0.0 0:08 0 init
> 2 root RT 0 0 0 0 SW 0.0 0.0 0:00 0 migration/0
> 3 root RT 0 0 0 0 SW 0.0 0.0 0:00 1 migration/1
> 4 root 15 0 0 0 0 SW 0.0 0.0 0:01 0 keventd
> 5 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0
> 6 root 34 19 0 0 0 SWN 0.0 0.0 0:00 1 ksoftirqd/1
> 9 root 15 0 0 0 0 SW 0.0 0.0 0:24 1 bdflush
> 7 root 15 0 0 0 0 SW 0.0 0.0 6:53 1 kswapd
> 8 root 15 0 0 0 0 SW 0.0 0.0 8:44 1 kscand
> 10 root 15 0 0 0 0 SW 0.0 0.0 0:13 0 kupdated
> 11 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd
> 17 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 ahc_dv_0
>
>
> vmstat output
> procs memory swap io system cpu
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 0 1 21552 17796 4872 1931928 2 3 3 1 27 6 2 1 7 3
> 0 1 21552 18044 4880 1931652 0 0 1652 0 397 512 1 2 50 47
> 0 1 21552 17976 4896 1931664 0 0 2468 0 407 552 2 2 50 47
> 1 0 21552 17984 4896 1931608 0 0 2124 0 418 538 3 3 48 46
> 0 1 21552 18028 4900 1931536 0 0 1592 0 385 509 1 3 50 46
> 0 1 21552 18040 4916 1931488 0 0 1620 820 419 581 2 2 50 46
> 0 1 21552 17968 4916 1931536 0 4 1708 4 402 554 3 1 50 46
> 1 1 21552 18052 4916 1931388 0 0 1772 0 409 531 3 1 49 47
> 0 1 21552 17912 4924 1931492 0 0 1772 0 408 565 3 1 48 48
> 0 1 21552 17932 4932 1931440 0 4 1356 4 391 545 5 0 49 46
> 0 1 21552 18320 4944 1931016 0 4 1500 840 414 571 1 1 48 50
> 0 1 21552 17872 4944 1931440 0 0 2116 0 392 496 1 5 46 48
> 0 1 21552 18060 4944 1931232 0 0 2232 0 423 597 1 2 48 49
> 1 1 21552 17684 4944 1931584 0 0 1752 0 395 537 1 1 50 48
> 0 1 21552 18000 4944 1931240 0 0 1576 0 401 549 0 1 50 49
>
>
> NetApp stats:
> CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
> DAFS FCP iSCSI FCP kB/s
> in out read write read write age hit time ty util in out
> 2% 0 0 0 139 0 0 2788 0 0 0 3 96% 0% - 15% 0 139 0 3 2277
> 2% 0 0 0 144 0 0 2504 0 0 0 3 96% 0% - 18% 0 144 0 3 2150
> 2% 0 0 0 130 0 0 2212 0 0 0 3 96% 0% - 13% 0 130 0 3 1879
> 3% 0 0 0 169 0 0 2937 80 0 0 3 96% 0% - 13% 0 169 0 4 2718
> 2% 0 0 0 139 0 0 2448 0 0 0 3 96% 0% - 12% 0 139 0 3 2096
> 2% 0 0 0 137 0 0 2116 0 0 0 3 96% 0% - 10% 0 137 0 3 1892
> 3% 0 0 0 107 0 0 2660 812 0 0 3 96% 24% T 20% 0 107 0 3 1739
> 2% 0 0 0 118 0 0 1788 0 0 0 3 96% 0% - 13% 0 118 0 3 1608
> 2% 0 0 0 136 0 0 2228 0 0 0 3 96% 0% - 11% 0 136 0 3 2018
> 2% 0 0 0 119 0 0 1940 0 0 0 3 96% 0% - 13% 0 119 0 3 1998
> 2% 0 0 0 136 0 0 2175 0 0 0 3 96% 0% - 14% 0 136 0 3 1929
> 2% 0 0 0 133 0 0 1924 0 0 0 3 96% 0% - 19% 0 133 0 3 2292
> 2% 0 0 0 115 0 0 2044 0 0 0 3 96% 0% - 11% 0 115 0 3 1682
> 2% 0 0 0 134 0 0 2256 0 0 0 3 96% 0% - 12% 0 134 0 3 2096
> 2% 0 0 0 112 0 0 2184 0 0 0 3 96% 0% - 12% 0 112 0 3 1633
> 2% 0 0 0 163 0 0 2348 0 0 0 3 96% 0% - 13% 0 163 0 4 2421
> 2% 0 0 0 120 0 0 2056 184 0 0 3 96% 8% T 14% 0 120 0 3 1703
>
> strace output:
> read(55, "\4\0\0\0\10fm}\1\0\0\0p\0\264\0\0 \2 \230\236\320\0020"..., 8192) =
> 8192
> _llseek(55, 857997312, [857997312], SEEK_SET) = 0
> read(55, "\4\0\0\0\\\315\321|\1\0\0\0p\0\354\0\0 \2 \250\236\260"..., 8192) =
> 8192
> _llseek(55, 883220480, [883220480], SEEK_SET) = 0
> read(55, "\4\0\0\0T\17a~\1\0\0\0p\0\20\1\0 \2 \270\236\220\2D\235"..., 8192)
> = 8192
> _llseek(55, 858005504, [858005504], SEEK_SET) = 0
> read(55, "\4\0\0\0\300\356\321|\1\0\0\0p\0\330\0\0 \2 \260\236\240"..., 8192)
> = 8192
> _llseek(55, 857964544, [857964544], SEEK_SET) = 0
> read(55, "\4\0\0\0lH\321|\1\0\0\0p\0<\1\0 \2 \300\236\200\2p\235"..., 8192) =
> 8192
> _llseek(55, 857956352, [857956352], SEEK_SET) = 0
> read(55, "\4\0\0\0l\'\321|\1\0\0\0p\0\320\0\0 \2 \260\236\240\2\\"..., 8192)
> = 8192
> _llseek(55, 910802944, [910802944], SEEK_SET) = 0
> read(55, "\4\0\0\0\10}\25\200\1\0\0\0l\0\274\1\0 \2 \250\236\260"..., 8192) =
> 8192
> _llseek(55, 857948160, [857948160], SEEK_SET) = 0
> read(55, "\4\0\0\0\370\5\321|\1\0\0\0p\0\350\0\0 \2 \230\236\320"..., 8192) =
> 8192
> _llseek(56, 80371712, [80371712], SEEK_SET) = 0
> read(56, "\4\0\0\0Lf \217\1\0\0\0p\0\f\1\0 \2 \250\236\260\2T\235"..., 8192)
> = 8192
> read(102, "\2\0\34\0001\236\0\0\1\0\0\0\t\0\0\00020670\0\0\0B\6\0"..., 8192)
> = 8192
> _llseek(55, 857939968, [857939968], SEEK_SET) = 0
> read(55, "\4\0\0\0\244\344\320|\1\0\0\0l\0\230\1\0 \2 \244\236\270"..., 8192)
> = 8192
> _llseek(55, 857923584, [857923584], SEEK_SET) = 0
> read(55, "\4\0\0\0\224\242\320|\1\0\0\0p\0|\0\0 \2 \234\236\310\002"...,
> 8192) = 8192
> _llseek(55, 57270272, [57270272], SEEK_SET) = 0
> read(55, "\4\0\0\0\3204FK\1\0\0\0t\0\340\0\0 \2 \310\236j\2\214\235"...,
> 8192) = 8192
> _llseek(55, 870727680, [870727680], SEEK_SET) = 0
> read(55, "\4\0\0\0x>\233}\1\0\0\0p\0@\1\0 \2 \250\236\260\2X\235"..., 8192) =
> 8192
> _llseek(55, 1014734848, [1014734848], SEEK_SET) = 0
> read(55, "\4\0\0\0\34\354\201\206\1\0\0\0p\0p\0\0 \2 \264\236\230"..., 8192)
> = 8192
> _llseek(55, 857874432, [857874432], SEEK_SET) = 0
> read(55, "\4\0\0\0\214\331\317|\1\0\0\0l\0\324\1\0 \2 \224\236\330"..., 8192)
> = 8192
> _llseek(55, 760872960, [760872960], SEEK_SET) = 0
> read(55, "\4\0\0\0\30\257\321v\1\0\0\0p\0\230\0\0 \2 \234\236\310"..., 8192)
> = 8192
> _llseek(55, 891715584, [891715584], SEEK_SET) = 0
> read(55, "\4\0\0\0\370\220\347~\1\0\0\0p\0P\1\0 \2 \230\236\320\2"..., 8192)
> = 8192
> _llseek(55, 857858048, [857858048], SEEK_SET) = 0
> read(55, "\4\0\0\0\250\227\317|\1\0\0\0p\0\264\0\0 \2 \254\236\250"..., 8192)
> = 8192
> _llseek(55, 666910720, [666910720], SEEK_SET) = 0
> read(55, "\4\0\0\0x\206\3q\1\0\0\0p\0004\1\0 \2 \254\236\242\2P\235"...,
> 8192) = 8192
> _llseek(55, 857841664, [857841664], SEEK_SET) = 0
> read(55, "\4\0\0\0dT\317|\1\0\0\0p\0\224\0\0 \2 \214\236\350\2\30"..., 8192)
> = 8192
>
>
>
>
>
>