Thread: Ever Increasing IOWAIT

Ever Increasing IOWAIT

From

"Ralph Mason"

Date:

17 May 2007, 19:45:56

We have a database running on a 4 processor machine. As time goes by the IO gets worse and worse peeking at about 200% as the machine loads up.

The weird thing is that if we restart postgres it’s fine for hours but over time it goes bad again.

(CPU usage graph here http://www.flickr.com/photos/8347741@N02/502596262/ ) You can clearly see where the restart happens in the IO area

This is Postgres 8.1.4 64bit.

Anyone have any ideas?

Thanks

Ralph

--
Internal Virus Database is out-of-date.
Checked by AVG Free Edition.
Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 5/12/2006 4:07 p.m.

Re: Ever Increasing IOWAIT

From

"Joshua D. Drake"

Date:

17 May 2007, 20:02:06

Ralph Mason wrote:
> We have a database running on a 4 processor machine.  As time goes by
> the IO gets worse and worse peeking at about 200% as the machine loads up.
>
>
>
> The weird thing is that if we restart postgres it’s fine for hours but
> over time it goes bad again.
>
>
>
> (CPU usage graph here
> http://www.flickr.com/photos/8347741@N02/502596262/ )  You can clearly
> see where the restart happens in the IO area
>
>
>
> This is Postgres  8.1.4 64bit.

1. Upgrade to 8.1.9. There is a bug with autovac that is fixed that is
pretty important.

>
>
>
> Anyone have any ideas?
>

Sure... you aren't analyzing enough. You are using prepared queries that
have plans that get stale... you are not running autovac... You are
cursed (kidding)..

Joshua D. Drake

>
>
> Thanks
>
> Ralph
>
>
>
>
> --
> Internal Virus Database is out-of-date.
> Checked by AVG Free Edition.
> Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date:
> 5/12/2006 4:07 p.m.
>


--

       === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Re: Ever Increasing IOWAIT

From

"Ralph Mason"

Date:

17 May 2007, 20:19:33

Hi Josh - thanks for thoughts.

>
> This is Postgres  8.1.4 64bit.

>1. Upgrade to 8.1.9. There is a bug with autovac that is fixed that is
>pretty important.

We don't use pg_autovac - we have our own process that runs very often
vacuuming tables that are dirty. It works well and vacuums when activity is
happening.  During busy time active tables are vacuumed about once a minute.
The 'slack' space on busy tables sits at about 100% (eg the table has 2X the
number of pages it would after a cluster)  We use rows updated and deleted
to decide what to vacuum.  Those busy tables are reasonably small and take
less than a second to vacuum.

Also, If it were a vacuuming problem why would a restart of the engine fix
it fully?

>
> Anyone have any ideas?
>

>Sure... you aren't analyzing enough. You are using prepared queries that
>have plans that get stale... you are not running autovac... You are
>cursed (kidding)..

The shape of the data never changes and we don't reanalyze on start-up so
suspect analyzing won't do much (although we do every so often).

We don't use prepared queries - just lots of functions - but like I said
above the shape of the data doesn't change. So even if postgres stores plans
for those (does it?) it seems like it should be just fine.


Thanks
Ralph




--
Internal Virus Database is out-of-date.
Checked by AVG Free Edition.
Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 5/12/2006
4:07 p.m.

Re: Ever Increasing IOWAIT

From

Richard Huxton

Date:

18 May 2007, 06:12:15

Ralph Mason wrote:
> We have a database running on a 4 processor machine.  As time goes by the IO
> gets worse and worse peeking at about 200% as the machine loads up.
>
> The weird thing is that if we restart postgres it’s fine for hours but over
> time it goes bad again.
>
> (CPU usage graph here HYPERLINK
> "http://www.flickr.com/photos/8347741@N02/502596262/"http://www.flickr.com/p
> hotos/8347741@N02/502596262/ )  You can clearly see where the restart
> happens in the IO area

I'm assuming here we're talking about that big block of iowait at about
4-6am?

I take it vmstat/iostat show a corresponding increase in disk activity
at that time.

The question is - what?
Does the number of PG processes increase at that time? If that's not
intentional then you might need to see what your applications are up to.

Do you have a vacuum/backup scheduled for that time? Do you have some
other process doing a lot of file I/O at that time?

> This is Postgres  8.1.4 64bit.

You'll want to upgrade to the latest patch release - you're missing 5
lots of bug-fixes there.

--
   Richard Huxton
   Archonet Ltd

Re: Ever Increasing IOWAIT

From

Mark Lewis

Date:

18 May 2007, 10:58:10

You're not swapping are you?  One explanation could be that PG is
configured to think it has access to a little more memory than the box
can really provide, which forces it to swap once it's been running for
long enough to fill up its shared buffers or after a certain number of
concurrent connections are opened.

-- Mark Lewis

On Fri, 2007-05-18 at 10:45 +1200, Ralph Mason wrote:
> We have a database running on a 4 processor machine.  As time goes by
> the IO gets worse and worse peeking at about 200% as the machine loads
> up.
>
>
>
> The weird thing is that if we restart postgres it’s fine for hours but
> over time it goes bad again.
>
>
>
> (CPU usage graph here
> http://www.flickr.com/photos/8347741@N02/502596262/ )  You can clearly
> see where the restart happens in the IO area
>
>
>
> This is Postgres  8.1.4 64bit.
>
>
>
> Anyone have any ideas?
>
>
>
> Thanks
>
> Ralph
>
>
>
>
>
> --
> Internal Virus Database is out-of-date.
> Checked by AVG Free Edition.
> Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date:
> 5/12/2006 4:07 p.m.
>
>

Re: Ever Increasing IOWAIT

From

"Ralph Mason"

Date:

20 May 2007, 17:41:12

Ralph Mason wrote:
> We have a database running on a 4 processor machine.  As time goes by
> the IO gets worse and worse peeking at about 200% as the machine loads up.
>
> The weird thing is that if we restart postgres it’s fine for hours but
> over time it goes bad again.
>
> (CPU usage graph here HYPERLINK
> "http://www.flickr.com/photos/8347741@N02/502596262/"http://www.flickr
> .com/p hotos/8347741@N02/502596262/ )  You can clearly see where the
> restart happens in the IO area
>I'm assuming here we're talking about that big block of iowait at about
>4-6am?

Actually no - that is a vacuum of the whole database to double check It's
not a vacuuming problem (I am sure it's not).  The restart is at at 22:00
where you see the io drop to nothing, the database is still doing the same
work.

>>I take it vmstat/iostat show a corresponding increase in disk activity
>>at that time.

I didn't know you could have IO/wait without disk activity - I will check
that out.

>>The question is - what?
>>Does the number of PG processes increase at that time? If that's not
>>intentional then you might need to see what your applications are up to.

No the number of connections is stable and the jobs they do stays the same,
just this deteriorating of i/o wait over time.

>>Do you have a vacuum/backup scheduled for that time? Do you have some
>>other process doing a lot of file I/O at that time?

> This is Postgres  8.1.4 64bit.

>You'll want to upgrade to the latest patch release - you're missing 5
>lots of bug-fixes there.

Thanks - will try that.


--
Internal Virus Database is out-of-date.
Checked by AVG Free Edition.
Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 5/12/2006
4:07 p.m.

Re: Ever Increasing IOWAIT

From

"Ralph Mason"

Date:

20 May 2007, 17:42:02


>You're not swapping are you?  One explanation could be that PG is
>configured to think it has access to a little more memory than the box
>can really provide, which forces it to swap once it's been running for
>long enough to fill up its shared buffers or after a certain number of
>concurrent connections are opened.
>
>-- Mark Lewis

No - no swap on this machine. The number of connections is stable.

Ralph


On Fri, 2007-05-18 at 10:45 +1200, Ralph Mason wrote:
> We have a database running on a 4 processor machine.  As time goes by
> the IO gets worse and worse peeking at about 200% as the machine loads
> up.
>
>
>
> The weird thing is that if we restart postgres it’s fine for hours but
> over time it goes bad again.
>
>
>
> (CPU usage graph here
> http://www.flickr.com/photos/8347741@N02/502596262/ )  You can clearly
> see where the restart happens in the IO area
>
>
>
> This is Postgres  8.1.4 64bit.
>
>
>
> Anyone have any ideas?
>
>
>
> Thanks
>
> Ralph
>
>
>
>
>
> --
> Internal Virus Database is out-of-date.
> Checked by AVG Free Edition.
> Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date:
> 5/12/2006 4:07 p.m.
>
>

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

                http://www.postgresql.org/about/donate

--
Internal Virus Database is out-of-date.
Checked by AVG Free Edition.
Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 5/12/2006 4:07 p.m.


--
Internal Virus Database is out-of-date.
Checked by AVG Free Edition.
Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 5/12/2006 4:07 p.m.

Re: Ever Increasing IOWAIT

From

Tom Lane

Date:

20 May 2007, 18:58:15

"Ralph Mason" <ralph.mason@telogis.com> writes:
> Ralph Mason wrote:
>> We have a database running on a 4 processor machine.  As time goes by
>> the IO gets worse and worse peeking at about 200% as the machine loads up.
>>
>> The weird thing is that if we restart postgres it's fine for hours but
>> over time it goes bad again.

Do you by any chance have stats collection enabled and
stats_reset_on_server_start set to true? If so, maybe this is explained
by growth in the size of the stats file over time.  It'd be interesting
to keep an eye on the size of $PGDATA/global/pgstat.stat over a fast-to-
slow cycle.

            regards, tom lane

Re: Ever Increasing IOWAIT

From

"Ralph Mason"

Date:

21 May 2007, 02:18:06

"Ralph Mason" <ralph.mason@telogis.com> writes:
> Ralph Mason wrote:
>> We have a database running on a 4 processor machine.  As time goes by
>> the IO gets worse and worse peeking at about 200% as the machine loads
up.
>>
>> The weird thing is that if we restart postgres it's fine for hours but
>> over time it goes bad again.

>Do you by any chance have stats collection enabled and
>stats_reset_on_server_start set to true? If so, maybe this is explained
>by growth in the size of the stats file over time.  It'd be interesting
>to keep an eye on the size of $PGDATA/global/pgstat.stat over a fast-to-
>slow cycle.

We do because we use the stats to figure out when we will vacuum.  Our
vacuum process reads that table and when it runs resets it using
pg_stat_reset() to clear it down each time it runs (about ever 60 seconds
when the db is very busy), stats_reset_on_server_restart is off.

Interestingly after a suggestion here I went and looked at the IO stat at
the same time.  It shows the writes as expected and picking up exactly where
they were before the reset, but the reads drop dramatically - like it's
reading far less data after the reset.

I will watch the size of the pgstat.stat table.

Ralph

--
Internal Virus Database is out-of-date.
Checked by AVG Free Edition.
Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 5/12/2006
4:07 p.m.