High load and iowait but no disk access - Mailing list pgsql-performance

<fontfamily><param>Courier</param>We have been trying to pinpoint what
originally seem to be a I/O bottleneck but which now seems to be an
issue with either Postgresql or RHES 3.


We have the following test environment on which we can reproduce the
problem:


1) Test System A

Dell 6650 Quad Xeon Pentium 4

8 Gig of RAM

OS: RHES 3 update 2

Storage: NetApp FAS270 connected using an FC card using 10 disks


2) Test System B

Dell Dual Xeon Pentium III

2 Gig o RAM

OS: RHES 3 update 2

Storage: NetApp FAS920 connected using an FC card using 28 disks


Our Database size is around 30G.


The behavior we see is that when running queries that do random reads
on disk, IOWAIT goes over 80% and actual disk IO falls to a crawl at a
throughput bellow 3000kB/s (We usually average 40000 kB/s to 80000
kB/s on sequential read operations on the netapps)


The stats of the NetApp do confirm that it is sitting idle. Doing an
strace on the Postgresql process shows that is it doing seeks and
reads.


So my question is where is this iowait time spent ?

Is there a way to pinpoint the problem in more details ?

We are able to reproduce this behavior with Postgresql 7.4.8 and 8.0.3


I have included the output of top,vmstat,strace and systat from the
Netapp from System B while running a single query that generates this
behavior.


Rémy


top output:

 06:27:28  up 5 days, 16:59,  6 users,  load average: 1.04, 1.30, 1.01

72 processes: 71 sleeping, 1 running, 0 zombie, 0 stopped

CPU states:  cpu    user    nice  system    irq  softirq  iowait
idle

           total    2.7%    0.0%    1.0%   0.1%     0.2%   46.0%
49.5%

           cpu00    0.2%    0.0%    0.2%   0.0%     0.2%    2.2%
97.2%

           cpu01    5.3%    0.0%    1.9%   0.3%     0.3%   89.8%
1.9%

Mem:  2061696k av, 2043936k used,   17760k free,       0k shrd,
3916k buff

                   1566332k actv,  296648k in_d,   30504k in_c

Swap: 16771584k av,   21552k used, 16750032k free
1933772k cached


  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU
COMMAND

30960 postgres  15   0 13424  10M  9908 D     2.7  0.5   2:00   1
postmaster

30538 root      15   0  1080  764   524 S     0.7  0.0   0:43   0 sshd

    1 root      15   0   496  456   436 S     0.0  0.0   0:08   0 init

    2 root      RT   0     0    0     0 SW    0.0  0.0   0:00   0
migration/0

    3 root      RT   0     0    0     0 SW    0.0  0.0   0:00   1
migration/1

    4 root      15   0     0    0     0 SW    0.0  0.0   0:01   0
keventd

    5 root      34  19     0    0     0 SWN   0.0  0.0   0:00   0
ksoftirqd/0

    6 root      34  19     0    0     0 SWN   0.0  0.0   0:00   1
ksoftirqd/1

    9 root      15   0     0    0     0 SW    0.0  0.0   0:24   1
bdflush

    7 root      15   0     0    0     0 SW    0.0  0.0   6:53   1
kswapd

    8 root      15   0     0    0     0 SW    0.0  0.0   8:44   1
kscand

   10 root      15   0     0    0     0 SW    0.0  0.0   0:13   0
kupdated

   11 root      25   0     0    0     0 SW    0.0  0.0   0:00   0
mdrecoveryd

   17 root      15   0     0    0     0 SW    0.0  0.0   0:00   0
ahc_dv_0



vmstat output

procs                      memory      swap          io     system
cpu

 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us
sy id wa

 0  1  21552  17796   4872 1931928    2    3     3     1   27     6  2
1  7  3

 0  1  21552  18044   4880 1931652    0    0  1652     0  397   512  1
2 50 47

 0  1  21552  17976   4896 1931664    0    0  2468     0  407   552  2
2 50 47

 1  0  21552  17984   4896 1931608    0    0  2124     0  418   538  3
3 48 46

 0  1  21552  18028   4900 1931536    0    0  1592     0  385   509  1
3 50 46

 0  1  21552  18040   4916 1931488    0    0  1620   820  419   581  2
2 50 46

 0  1  21552  17968   4916 1931536    0    4  1708     4  402   554  3
1 50 46

 1  1  21552  18052   4916 1931388    0    0  1772     0  409   531  3
1 49 47

 0  1  21552  17912   4924 1931492    0    0  1772     0  408   565  3
1 48 48

 0  1  21552  17932   4932 1931440    0    4  1356     4  391   545  5
0 49 46

 0  1  21552  18320   4944 1931016    0    4  1500   840  414   571  1
1 48 50

 0  1  21552  17872   4944 1931440    0    0  2116     0  392   496  1
5 46 48

 0  1  21552  18060   4944 1931232    0    0  2232     0  423   597  1
2 48 49

 1  1  21552  17684   4944 1931584    0    0  1752     0  395   537  1
1 50 48

 0  1  21552  18000   4944 1931240    0    0  1576     0  401   549  0
1 50 49



NetApp stats:

 CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s
Cache Cache  CP   CP Disk   DAFS   FCP iSCSI   FCP  kB/s

                                  in   out   read  write  read write
age   hit time  ty util                       in   out

  2%     0     0     0     139     0     0   2788      0     0     0
3   96%   0%  -   15%      0   139     0     3  2277

  2%     0     0     0     144     0     0   2504      0     0     0
3   96%   0%  -   18%      0   144     0     3  2150

  2%     0     0     0     130     0     0   2212      0     0     0
3   96%   0%  -   13%      0   130     0     3  1879

  3%     0     0     0     169     0     0   2937     80     0     0
3   96%   0%  -   13%      0   169     0     4  2718

  2%     0     0     0     139     0     0   2448      0     0     0
3   96%   0%  -   12%      0   139     0     3  2096

  2%     0     0     0     137     0     0   2116      0     0     0
3   96%   0%  -   10%      0   137     0     3  1892

  3%     0     0     0     107     0     0   2660    812     0     0
3   96%  24%  T   20%      0   107     0     3  1739

  2%     0     0     0     118     0     0   1788      0     0     0
3   96%   0%  -   13%      0   118     0     3  1608

  2%     0     0     0     136     0     0   2228      0     0     0
3   96%   0%  -   11%      0   136     0     3  2018

  2%     0     0     0     119     0     0   1940      0     0     0
3   96%   0%  -   13%      0   119     0     3  1998

  2%     0     0     0     136     0     0   2175      0     0     0
3   96%   0%  -   14%      0   136     0     3  1929

  2%     0     0     0     133     0     0   1924      0     0     0
3   96%   0%  -   19%      0   133     0     3  2292

  2%     0     0     0     115     0     0   2044      0     0     0
3   96%   0%  -   11%      0   115     0     3  1682

  2%     0     0     0     134     0     0   2256      0     0     0
3   96%   0%  -   12%      0   134     0     3  2096

  2%     0     0     0     112     0     0   2184      0     0     0
3   96%   0%  -   12%      0   112     0     3  1633

  2%     0     0     0     163     0     0   2348      0     0     0
3   96%   0%  -   13%      0   163     0     4  2421

  2%     0     0     0     120     0     0   2056    184     0     0
3   96%   8%  T   14%      0   120     0     3  1703


strace output:

read(55, "\4\0\0\0\10fm}\1\0\0\0p\0\264\0\0 \2 \230\236\320\0020"...,
8192) = 8192

_llseek(55, 857997312, [857997312], SEEK_SET) = 0

read(55, "\4\0\0\0\\\315\321|\1\0\0\0p\0\354\0\0 \2 \250\236\260"...,
8192) = 8192

_llseek(55, 883220480, [883220480], SEEK_SET) = 0

read(55, "\4\0\0\0T\17a~\1\0\0\0p\0\20\1\0 \2 \270\236\220\2D\235"...,
8192) = 8192

_llseek(55, 858005504, [858005504], SEEK_SET) = 0

read(55, "\4\0\0\0\300\356\321|\1\0\0\0p\0\330\0\0 \2
\260\236\240"..., 8192) = 8192

_llseek(55, 857964544, [857964544], SEEK_SET) = 0

read(55, "\4\0\0\0lH\321|\1\0\0\0p\0<<\1\0 \2 \300\236\200\2p\235"...,
8192) = 8192

_llseek(55, 857956352, [857956352], SEEK_SET) = 0

read(55, "\4\0\0\0l\'\321|\1\0\0\0p\0\320\0\0 \2 \260\236\240\2\\"...,
8192) = 8192

_llseek(55, 910802944, [910802944], SEEK_SET) = 0

read(55, "\4\0\0\0\10}\25\200\1\0\0\0l\0\274\1\0 \2 \250\236\260"...,
8192) = 8192

_llseek(55, 857948160, [857948160], SEEK_SET) = 0

read(55, "\4\0\0\0\370\5\321|\1\0\0\0p\0\350\0\0 \2 \230\236\320"...,
8192) = 8192

_llseek(56, 80371712, [80371712], SEEK_SET) = 0

read(56, "\4\0\0\0Lf \217\1\0\0\0p\0\f\1\0 \2 \250\236\260\2T\235"...,
8192) = 8192

read(102, "\2\0\34\0001\236\0\0\1\0\0\0\t\0\0\00020670\0\0\0B\6\0"...,
8192) = 8192

_llseek(55, 857939968, [857939968], SEEK_SET) = 0

read(55, "\4\0\0\0\244\344\320|\1\0\0\0l\0\230\1\0 \2
\244\236\270"..., 8192) = 8192

_llseek(55, 857923584, [857923584], SEEK_SET) = 0

read(55, "\4\0\0\0\224\242\320|\1\0\0\0p\0|\0\0 \2
\234\236\310\002"..., 8192) = 8192

_llseek(55, 57270272, [57270272], SEEK_SET) = 0

read(55, "\4\0\0\0\3204FK\1\0\0\0t\0\340\0\0 \2
\310\236j\2\214\235"..., 8192) = 8192

_llseek(55, 870727680, [870727680], SEEK_SET) = 0

read(55, "\4\0\0\0x>\233}\1\0\0\0p\0@\1\0 \2 \250\236\260\2X\235"...,
8192) = 8192

_llseek(55, 1014734848, [1014734848], SEEK_SET) = 0

read(55, "\4\0\0\0\34\354\201\206\1\0\0\0p\0p\0\0 \2 \264\236\230"...,
8192) = 8192

_llseek(55, 857874432, [857874432], SEEK_SET) = 0

read(55, "\4\0\0\0\214\331\317|\1\0\0\0l\0\324\1\0 \2
\224\236\330"..., 8192) = 8192

_llseek(55, 760872960, [760872960], SEEK_SET) = 0

read(55, "\4\0\0\0\30\257\321v\1\0\0\0p\0\230\0\0 \2 \234\236\310"...,
8192) = 8192

_llseek(55, 891715584, [891715584], SEEK_SET) = 0

read(55, "\4\0\0\0\370\220\347~\1\0\0\0p\0P\1\0 \2 \230\236\320\2"...,
8192) = 8192

_llseek(55, 857858048, [857858048], SEEK_SET) = 0

read(55, "\4\0\0\0\250\227\317|\1\0\0\0p\0\264\0\0 \2
\254\236\250"..., 8192) = 8192

_llseek(55, 666910720, [666910720], SEEK_SET) = 0

read(55, "\4\0\0\0x\206\3q\1\0\0\0p\0004\1\0 \2
\254\236\242\2P\235"..., 8192) = 8192

_llseek(55, 857841664, [857841664], SEEK_SET) = 0

read(55, "\4\0\0\0dT\317|\1\0\0\0p\0\224\0\0 \2 \214\236\350\2\30"...,
8192) = 8192




</fontfamily>We have been trying to pinpoint what originally seem to be a I/O
bottleneck but which now seems to be an issue with either Postgresql or
RHES 3.

We have the following test environment on which we can reproduce the
problem:

1) Test System A
Dell 6650 Quad Xeon Pentium 4
8 Gig of RAM
OS: RHES 3 update 2
Storage: NetApp FAS270 connected using an FC card using 10 disks

2) Test System B
Dell Dual Xeon Pentium III
2 Gig o RAM
OS: RHES 3 update 2
Storage: NetApp FAS920 connected using an FC card using 28 disks

Our Database size is around 30G.

The behavior we see is that when running queries that do random reads
on disk, IOWAIT goes over 80% and actual disk IO falls to a crawl at a
throughput bellow 3000kB/s (We usually average 40000 kB/s to 80000 kB/s
on sequential read operations on the netapps)

The stats of the NetApp do confirm that it is sitting idle. Doing an
strace on the Postgresql process shows that is it doing seeks and
reads.

So my question is where is this iowait time spent ?
Is there a way to pinpoint the problem in more details ?
We are able to reproduce this behavior with Postgresql 7.4.8 and 8.0.3

I have included the output of top,vmstat,strace and systat from the
Netapp from System B while running a single query that generates this
behavior.

Rémy

top output:
  06:27:28  up 5 days, 16:59,  6 users,  load average: 1.04, 1.30, 1.01
72 processes: 71 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
            total    2.7%    0.0%    1.0%   0.1%     0.2%   46.0%   49.5%
            cpu00    0.2%    0.0%    0.2%   0.0%     0.2%    2.2%   97.2%
            cpu01    5.3%    0.0%    1.9%   0.3%     0.3%   89.8%    1.9%
Mem:  2061696k av, 2043936k used,   17760k free,       0k shrd,
3916k buff
                    1566332k actv,  296648k in_d,   30504k in_c
Swap: 16771584k av,   21552k used, 16750032k free
1933772k cached

   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU
COMMAND
30960 postgres  15   0 13424  10M  9908 D     2.7  0.5   2:00   1
postmaster
30538 root      15   0  1080  764   524 S     0.7  0.0   0:43   0 sshd
     1 root      15   0   496  456   436 S     0.0  0.0   0:08   0 init
     2 root      RT   0     0    0     0 SW    0.0  0.0   0:00   0
migration/0
     3 root      RT   0     0    0     0 SW    0.0  0.0   0:00   1
migration/1
     4 root      15   0     0    0     0 SW    0.0  0.0   0:01   0
keventd
     5 root      34  19     0    0     0 SWN   0.0  0.0   0:00   0
ksoftirqd/0
     6 root      34  19     0    0     0 SWN   0.0  0.0   0:00   1
ksoftirqd/1
     9 root      15   0     0    0     0 SW    0.0  0.0   0:24   1
bdflush
     7 root      15   0     0    0     0 SW    0.0  0.0   6:53   1 kswapd
     8 root      15   0     0    0     0 SW    0.0  0.0   8:44   1 kscand
    10 root      15   0     0    0     0 SW    0.0  0.0   0:13   0
kupdated
    11 root      25   0     0    0     0 SW    0.0  0.0   0:00   0
mdrecoveryd
    17 root      15   0     0    0     0 SW    0.0  0.0   0:00   0
ahc_dv_0


vmstat output
procs                      memory      swap          io     system
    cpu
  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us
sy id wa
  0  1  21552  17796   4872 1931928    2    3     3     1   27     6  2
1  7  3
  0  1  21552  18044   4880 1931652    0    0  1652     0  397   512  1
2 50 47
  0  1  21552  17976   4896 1931664    0    0  2468     0  407   552  2
2 50 47
  1  0  21552  17984   4896 1931608    0    0  2124     0  418   538  3
3 48 46
  0  1  21552  18028   4900 1931536    0    0  1592     0  385   509  1
3 50 46
  0  1  21552  18040   4916 1931488    0    0  1620   820  419   581  2
2 50 46
  0  1  21552  17968   4916 1931536    0    4  1708     4  402   554  3
1 50 46
  1  1  21552  18052   4916 1931388    0    0  1772     0  409   531  3
1 49 47
  0  1  21552  17912   4924 1931492    0    0  1772     0  408   565  3
1 48 48
  0  1  21552  17932   4932 1931440    0    4  1356     4  391   545  5
0 49 46
  0  1  21552  18320   4944 1931016    0    4  1500   840  414   571  1
1 48 50
  0  1  21552  17872   4944 1931440    0    0  2116     0  392   496  1
5 46 48
  0  1  21552  18060   4944 1931232    0    0  2232     0  423   597  1
2 48 49
  1  1  21552  17684   4944 1931584    0    0  1752     0  395   537  1
1 50 48
  0  1  21552  18000   4944 1931240    0    0  1576     0  401   549  0
1 50 49


NetApp stats:
  CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s
Cache Cache  CP   CP Disk   DAFS   FCP iSCSI   FCP  kB/s
                                   in   out   read  write  read write
age   hit time  ty util                       in   out
   2%     0     0     0     139     0     0   2788      0     0     0
  3   96%   0%  -   15%      0   139     0     3  2277
   2%     0     0     0     144     0     0   2504      0     0     0
  3   96%   0%  -   18%      0   144     0     3  2150
   2%     0     0     0     130     0     0   2212      0     0     0
  3   96%   0%  -   13%      0   130     0     3  1879
   3%     0     0     0     169     0     0   2937     80     0     0
  3   96%   0%  -   13%      0   169     0     4  2718
   2%     0     0     0     139     0     0   2448      0     0     0
  3   96%   0%  -   12%      0   139     0     3  2096
   2%     0     0     0     137     0     0   2116      0     0     0
  3   96%   0%  -   10%      0   137     0     3  1892
   3%     0     0     0     107     0     0   2660    812     0     0
  3   96%  24%  T   20%      0   107     0     3  1739
   2%     0     0     0     118     0     0   1788      0     0     0
  3   96%   0%  -   13%      0   118     0     3  1608
   2%     0     0     0     136     0     0   2228      0     0     0
  3   96%   0%  -   11%      0   136     0     3  2018
   2%     0     0     0     119     0     0   1940      0     0     0
  3   96%   0%  -   13%      0   119     0     3  1998
   2%     0     0     0     136     0     0   2175      0     0     0
  3   96%   0%  -   14%      0   136     0     3  1929
   2%     0     0     0     133     0     0   1924      0     0     0
  3   96%   0%  -   19%      0   133     0     3  2292
   2%     0     0     0     115     0     0   2044      0     0     0
  3   96%   0%  -   11%      0   115     0     3  1682
   2%     0     0     0     134     0     0   2256      0     0     0
  3   96%   0%  -   12%      0   134     0     3  2096
   2%     0     0     0     112     0     0   2184      0     0     0
  3   96%   0%  -   12%      0   112     0     3  1633
   2%     0     0     0     163     0     0   2348      0     0     0
  3   96%   0%  -   13%      0   163     0     4  2421
   2%     0     0     0     120     0     0   2056    184     0     0
  3   96%   8%  T   14%      0   120     0     3  1703

strace output:
read(55, "\4\0\0\0\10fm}\1\0\0\0p\0\264\0\0 \2 \230\236\320\0020"...,
8192) = 8192
_llseek(55, 857997312, [857997312], SEEK_SET) = 0
read(55, "\4\0\0\0\\\315\321|\1\0\0\0p\0\354\0\0 \2 \250\236\260"...,
8192) = 8192
_llseek(55, 883220480, [883220480], SEEK_SET) = 0
read(55, "\4\0\0\0T\17a~\1\0\0\0p\0\20\1\0 \2 \270\236\220\2D\235"...,
8192) = 8192
_llseek(55, 858005504, [858005504], SEEK_SET) = 0
read(55, "\4\0\0\0\300\356\321|\1\0\0\0p\0\330\0\0 \2 \260\236\240"...,
8192) = 8192
_llseek(55, 857964544, [857964544], SEEK_SET) = 0
read(55, "\4\0\0\0lH\321|\1\0\0\0p\0<\1\0 \2 \300\236\200\2p\235"...,
8192) = 8192
_llseek(55, 857956352, [857956352], SEEK_SET) = 0
read(55, "\4\0\0\0l\'\321|\1\0\0\0p\0\320\0\0 \2 \260\236\240\2\\"...,
8192) = 8192
_llseek(55, 910802944, [910802944], SEEK_SET) = 0
read(55, "\4\0\0\0\10}\25\200\1\0\0\0l\0\274\1\0 \2 \250\236\260"...,
8192) = 8192
_llseek(55, 857948160, [857948160], SEEK_SET) = 0
read(55, "\4\0\0\0\370\5\321|\1\0\0\0p\0\350\0\0 \2 \230\236\320"...,
8192) = 8192
_llseek(56, 80371712, [80371712], SEEK_SET) = 0
read(56, "\4\0\0\0Lf \217\1\0\0\0p\0\f\1\0 \2 \250\236\260\2T\235"...,
8192) = 8192
read(102, "\2\0\34\0001\236\0\0\1\0\0\0\t\0\0\00020670\0\0\0B\6\0"...,
8192) = 8192
_llseek(55, 857939968, [857939968], SEEK_SET) = 0
read(55, "\4\0\0\0\244\344\320|\1\0\0\0l\0\230\1\0 \2 \244\236\270"...,
8192) = 8192
_llseek(55, 857923584, [857923584], SEEK_SET) = 0
read(55, "\4\0\0\0\224\242\320|\1\0\0\0p\0|\0\0 \2
\234\236\310\002"..., 8192) = 8192
_llseek(55, 57270272, [57270272], SEEK_SET) = 0
read(55, "\4\0\0\0\3204FK\1\0\0\0t\0\340\0\0 \2
\310\236j\2\214\235"..., 8192) = 8192
_llseek(55, 870727680, [870727680], SEEK_SET) = 0
read(55, "\4\0\0\0x>\233}\1\0\0\0p\0@\1\0 \2 \250\236\260\2X\235"...,
8192) = 8192
_llseek(55, 1014734848, [1014734848], SEEK_SET) = 0
read(55, "\4\0\0\0\34\354\201\206\1\0\0\0p\0p\0\0 \2 \264\236\230"...,
8192) = 8192
_llseek(55, 857874432, [857874432], SEEK_SET) = 0
read(55, "\4\0\0\0\214\331\317|\1\0\0\0l\0\324\1\0 \2 \224\236\330"...,
8192) = 8192
_llseek(55, 760872960, [760872960], SEEK_SET) = 0
read(55, "\4\0\0\0\30\257\321v\1\0\0\0p\0\230\0\0 \2 \234\236\310"...,
8192) = 8192
_llseek(55, 891715584, [891715584], SEEK_SET) = 0
read(55, "\4\0\0\0\370\220\347~\1\0\0\0p\0P\1\0 \2 \230\236\320\2"...,
8192) = 8192
_llseek(55, 857858048, [857858048], SEEK_SET) = 0
read(55, "\4\0\0\0\250\227\317|\1\0\0\0p\0\264\0\0 \2 \254\236\250"...,
8192) = 8192
_llseek(55, 666910720, [666910720], SEEK_SET) = 0
read(55, "\4\0\0\0x\206\3q\1\0\0\0p\0004\1\0 \2
\254\236\242\2P\235"..., 8192) = 8192
_llseek(55, 857841664, [857841664], SEEK_SET) = 0
read(55, "\4\0\0\0dT\317|\1\0\0\0p\0\224\0\0 \2 \214\236\350\2\30"...,
8192) = 8192



pgsql-performance by date:

Previous
From: Chris Travers
Date:
Subject: Re: Need indexes on empty tables for good performance ?
Next
From:
Date:
Subject: Observation about db response time