Thread: Investigating IO Saturation

Investigating IO Saturation

From
Brad Nicholson
Date:
I'm investigating a potential IO issue.  We're running 7.4 on AIX 5.1.
During periods of high activity (reads, writes, and vacuums), we are
seeing iostat reporting 100% disk usage.  I have a feeling that the
iostat numbers are misleading.  I can make iostat usage jump from less
than 10% to greater than 95% by running a single vacuum against a
moderate sized table (no noticeable change in the other activity).

Do I actually have a problem with IO?  Whether I do or not, at what
point should I start to be concerned about IO problems?  If my
understanding is correct, it should be based on the wait time.  Here's
the output of vmstat during heavy load (reads, writes, several daily
vacuums and a nightly pg_dump).  Wait times appear to be alright, but my
understanding of when to start being concerned about IO starvation is
foggy at best.

vmstat 5
kthr     memory             page              faults        cpu
----- ----------- ------------------------ ------------ -----------
r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
2  2 1548418 67130   0   0   0 141   99   0 1295 22784 13128 11  4 71 14
3  3 1548422 66754   0   0   0 2127 2965   0 2836 29981 25091 26  4 39 31
2  3 1548423 66908   0   0   0 2369 3221   0 3130 34725 28424 25  7 38 30
3  5 1548423 67029   0   0   0 2223 3097   0 2722 31885 25929 26  9 33 32
3  3 1548423 67066   0   0   0 2366 3194   0 2824 43546 36226 30  5 35 31
2  4 1548423 67004   0   0   0 2123 3236   0 2662 25756 21841 22  4 39 35
2  4 1548957 66277   0   0   0 1928 10322   0 2941 36340 29906 28  6 34 33
3  5 1549245 66024   0   0   0 2324 14291   0 2872 39413 25615 25  4 34 37
2  6 1549282 66107   0   0   0 1930 11189   0 2832 72062 32311 26  5 32 38
2  4 1549526 65855   0   0   0 2375 9278   0 2822 40368 32156 29  5 37 29
2  3 1548984 66227   0   0   0 1732 5065   0 2825 39240 30788 26  5 40 30
3  4 1549341 66027   0   0   0 2325 6453   0 2790 37567 30509 28  5 37 30
2  4 1549377 65789   0   0   0 1633 2731   0 2648 35533 27395 20  5 39 36
1  5 1549765 65666   0   0   0 2272 3340   0 2792 43002 34090 26  5 29 40
2  3 1549787 65646   0   0   0 1779 2679   0 2596 37446 29184 22  5 37 36
2  5 1548985 66263   0   0   0 2077 3086   0 2778 49579 39940 26  9 35 30
2  4 1548985 66473   0   0   0 2078 3093   0 2682 23274 18460 22  3 41 34
4  3 1548985 66263   0   0   0 2177 3344   0 2734 43029 35536 29  5 38 28
1  4 1548985 66491   0   0   0 1978 3215   0 2739 28291 22672 23  4 41 32
3  3 1548985 66422   0   0   0 1732 2469   0 2852 71865 30850 28  5 38 29

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



Re: Investigating IO Saturation

From
"Joshua D. Drake"
Date:
Brad Nicholson wrote:
> I'm investigating a potential IO issue.  We're running 7.4 on AIX
> 5.1.  During periods of high activity (reads, writes, and vacuums), we
> are seeing iostat reporting 100% disk usage.  I have a feeling that
> the iostat numbers are misleading.  I can make iostat usage jump from
> less than 10% to greater than 95% by running a single vacuum against a
> moderate sized table (no noticeable change in the other activity).
>
Well that isn't surprising. Vacuum is brutal especially on 7.4 as that
is pre background writer. What type of IO do you have available (RAID,
SCSI?)

> Do I actually have a problem with IO?  Whether I do or not, at what
> point should I start to be concerned about IO problems?  If my
> understanding is correct, it should be based on the wait time.  Here's
> the output of vmstat during heavy load (reads, writes, several daily
> vacuums and a nightly pg_dump).  Wait times appear to be alright, but
> my understanding of when to start being concerned about IO starvation
> is foggy at best.
>
> vmstat 5
> kthr     memory             page              faults        cpu
> ----- ----------- ------------------------ ------------ -----------
> r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
> 2  2 1548418 67130   0   0   0 141   99   0 1295 22784 13128 11  4 71 14
> 3  3 1548422 66754   0   0   0 2127 2965   0 2836 29981 25091 26  4 39 31
> 2  3 1548423 66908   0   0   0 2369 3221   0 3130 34725 28424 25  7 38 30
> 3  5 1548423 67029   0   0   0 2223 3097   0 2722 31885 25929 26  9 33 32
> 3  3 1548423 67066   0   0   0 2366 3194   0 2824 43546 36226 30  5 35 31
> 2  4 1548423 67004   0   0   0 2123 3236   0 2662 25756 21841 22  4 39 35
> 2  4 1548957 66277   0   0   0 1928 10322   0 2941 36340 29906 28  6
> 34 33
> 3  5 1549245 66024   0   0   0 2324 14291   0 2872 39413 25615 25  4
> 34 37
> 2  6 1549282 66107   0   0   0 1930 11189   0 2832 72062 32311 26  5
> 32 38
> 2  4 1549526 65855   0   0   0 2375 9278   0 2822 40368 32156 29  5 37 29
> 2  3 1548984 66227   0   0   0 1732 5065   0 2825 39240 30788 26  5 40 30
> 3  4 1549341 66027   0   0   0 2325 6453   0 2790 37567 30509 28  5 37 30
> 2  4 1549377 65789   0   0   0 1633 2731   0 2648 35533 27395 20  5 39 36
> 1  5 1549765 65666   0   0   0 2272 3340   0 2792 43002 34090 26  5 29 40
> 2  3 1549787 65646   0   0   0 1779 2679   0 2596 37446 29184 22  5 37 36
> 2  5 1548985 66263   0   0   0 2077 3086   0 2778 49579 39940 26  9 35 30
> 2  4 1548985 66473   0   0   0 2078 3093   0 2682 23274 18460 22  3 41 34
> 4  3 1548985 66263   0   0   0 2177 3344   0 2734 43029 35536 29  5 38 28
> 1  4 1548985 66491   0   0   0 1978 3215   0 2739 28291 22672 23  4 41 32
> 3  3 1548985 66422   0   0   0 1732 2469   0 2852 71865 30850 28  5 38 29
>


Re: Investigating IO Saturation

From
Brad Nicholson
Date:
Joshua D. Drake wrote:

> Brad Nicholson wrote:
>
>> I'm investigating a potential IO issue.  We're running 7.4 on AIX
>> 5.1.  During periods of high activity (reads, writes, and vacuums),
>> we are seeing iostat reporting 100% disk usage.  I have a feeling
>> that the iostat numbers are misleading.  I can make iostat usage jump
>> from less than 10% to greater than 95% by running a single vacuum
>> against a moderate sized table (no noticeable change in the other
>> activity).
>>
> Well that isn't surprising. Vacuum is brutal especially on 7.4 as that
> is pre background writer. What type of IO do you have available (RAID,
> SCSI?)
>
Data LUN is RAID 10, wal LUN is RAID 1.


--
Brad Nicholson  416-673-4106    bnichols@ca.afilias.info
Database Administrator, Afilias Canada Corp.



Re: Investigating IO Saturation

From
Tom Lane
Date:
Brad Nicholson <bnichols@ca.afilias.info> writes:
> I'm investigating a potential IO issue.  We're running 7.4 on AIX 5.1.
> During periods of high activity (reads, writes, and vacuums), we are
> seeing iostat reporting 100% disk usage.  I have a feeling that the
> iostat numbers are misleading.  I can make iostat usage jump from less
> than 10% to greater than 95% by running a single vacuum against a
> moderate sized table (no noticeable change in the other activity).

That's not particularly surprising, and I see no reason to think that
iostat is lying to you.

More recent versions of PG include parameters that you can use to
"throttle" vacuum's I/O demand ... but unthrottled, it's definitely
an I/O hog.

The vmstat numbers suggest that vacuum is not completely killing you,
but you could probably get some improvement in foreground query
performance by throttling it back.  There are other good reasons to
consider an update, anyway.

            regards, tom lane

Re: Investigating IO Saturation

From
"Joshua D. Drake"
Date:
Brad Nicholson wrote:
> Joshua D. Drake wrote:
>
>> Brad Nicholson wrote:
>>
>>> I'm investigating a potential IO issue.  We're running 7.4 on AIX
>>> 5.1.  During periods of high activity (reads, writes, and vacuums),
>>> we are seeing iostat reporting 100% disk usage.  I have a feeling
>>> that the iostat numbers are misleading.  I can make iostat usage
>>> jump from less than 10% to greater than 95% by running a single
>>> vacuum against a moderate sized table (no noticeable change in the
>>> other activity).
>>>
>> Well that isn't surprising. Vacuum is brutal especially on 7.4 as
>> that is pre background writer. What type of IO do you have available
>> (RAID, SCSI?)
>>
> Data LUN is RAID 10, wal LUN is RAID 1.
How many disks?

>
>


Re: Investigating IO Saturation

From
Chris Browne
Date:
tgl@sss.pgh.pa.us (Tom Lane) writes:

> Brad Nicholson <bnichols@ca.afilias.info> writes:
>> I'm investigating a potential IO issue.  We're running 7.4 on AIX 5.1.
>> During periods of high activity (reads, writes, and vacuums), we are
>> seeing iostat reporting 100% disk usage.  I have a feeling that the
>> iostat numbers are misleading.  I can make iostat usage jump from less
>> than 10% to greater than 95% by running a single vacuum against a
>> moderate sized table (no noticeable change in the other activity).
>
> That's not particularly surprising, and I see no reason to think that
> iostat is lying to you.
>
> More recent versions of PG include parameters that you can use to
> "throttle" vacuum's I/O demand ... but unthrottled, it's definitely
> an I/O hog.

I believe it's 7.4 where the cost-based vacuum parameters entered in,
so that would, in principle, already be an option.

[rummaging around...]

Hmm.... There was a patch for 7.4, but it's only "standard" as of
8.0...

> The vmstat numbers suggest that vacuum is not completely killing you,
> but you could probably get some improvement in foreground query
> performance by throttling it back.  There are other good reasons to
> consider an update, anyway.

I'd have reservations about "throttling it back" because that would
lead to VACUUMs running, and holding transactions open, for 6 hours
instead of 2.

That is consistent with benchmarking; there was a report of the
default policy cutting I/O load by ~80% at the cost of vacuums taking
3x as long to complete.

The "real" answer is to move to 8.x, where VACUUM doesn't chew up
shared memory cache as it does in 7.4 and earlier.

But in the interim, we need to make sure we tilt over the right
windmills, or something of the sort :-).
--
output = reverse("gro.gultn" "@" "enworbbc")
http://www3.sympatico.ca/cbbrowne/linuxxian.html
"Women and cats will do as  they please, and men and dogs should relax
and get used to the idea." -- Robert A. Heinlein

Re: Investigating IO Saturation

From
Andrew Sullivan
Date:
On Tue, Jan 24, 2006 at 02:43:59PM -0500, Chris Browne wrote:
> I believe it's 7.4 where the cost-based vacuum parameters entered in,
> so that would, in principle, already be an option.
>
> [rummaging around...]
>
> Hmm.... There was a patch for 7.4, but it's only "standard" as of
> 8.0...

And it doesn't work very well without changes to buffering.  You need
both pieces to get it to work.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
In the future this spectacle of the middle classes shocking the avant-
garde will probably become the textbook definition of Postmodernism.
                --Brad Holland