Thread: Ever Increasing IOWAIT
We have a database running on a 4 processor machine. As time goes by the IO gets worse and worse peeking at about 200% as the machine loads up.
The weird thing is that if we restart postgres it’s fine for hours but over time it goes bad again.
(CPU usage graph here http://www.flickr.com/photos/8347741@N02/502596262/ ) You can clearly see where the restart happens in the IO area
This is Postgres 8.1.4 64bit.
Anyone have any ideas?
Thanks
Ralph
--
Internal Virus Database is out-of-date.
Checked by AVG Free Edition.
Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 5/12/2006 4:07 p.m.
Ralph Mason wrote: > We have a database running on a 4 processor machine. As time goes by > the IO gets worse and worse peeking at about 200% as the machine loads up. > > > > The weird thing is that if we restart postgres it’s fine for hours but > over time it goes bad again. > > > > (CPU usage graph here > http://www.flickr.com/photos/8347741@N02/502596262/ ) You can clearly > see where the restart happens in the IO area > > > > This is Postgres 8.1.4 64bit. 1. Upgrade to 8.1.9. There is a bug with autovac that is fixed that is pretty important. > > > > Anyone have any ideas? > Sure... you aren't analyzing enough. You are using prepared queries that have plans that get stale... you are not running autovac... You are cursed (kidding).. Joshua D. Drake > > > Thanks > > Ralph > > > > > -- > Internal Virus Database is out-of-date. > Checked by AVG Free Edition. > Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: > 5/12/2006 4:07 p.m. > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Hi Josh - thanks for thoughts. > > This is Postgres 8.1.4 64bit. >1. Upgrade to 8.1.9. There is a bug with autovac that is fixed that is >pretty important. We don't use pg_autovac - we have our own process that runs very often vacuuming tables that are dirty. It works well and vacuums when activity is happening. During busy time active tables are vacuumed about once a minute. The 'slack' space on busy tables sits at about 100% (eg the table has 2X the number of pages it would after a cluster) We use rows updated and deleted to decide what to vacuum. Those busy tables are reasonably small and take less than a second to vacuum. Also, If it were a vacuuming problem why would a restart of the engine fix it fully? > > Anyone have any ideas? > >Sure... you aren't analyzing enough. You are using prepared queries that >have plans that get stale... you are not running autovac... You are >cursed (kidding).. The shape of the data never changes and we don't reanalyze on start-up so suspect analyzing won't do much (although we do every so often). We don't use prepared queries - just lots of functions - but like I said above the shape of the data doesn't change. So even if postgres stores plans for those (does it?) it seems like it should be just fine. Thanks Ralph -- Internal Virus Database is out-of-date. Checked by AVG Free Edition. Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 5/12/2006 4:07 p.m.
Ralph Mason wrote: > We have a database running on a 4 processor machine. As time goes by the IO > gets worse and worse peeking at about 200% as the machine loads up. > > The weird thing is that if we restart postgres it’s fine for hours but over > time it goes bad again. > > (CPU usage graph here HYPERLINK > "http://www.flickr.com/photos/8347741@N02/502596262/"http://www.flickr.com/p > hotos/8347741@N02/502596262/ ) You can clearly see where the restart > happens in the IO area I'm assuming here we're talking about that big block of iowait at about 4-6am? I take it vmstat/iostat show a corresponding increase in disk activity at that time. The question is - what? Does the number of PG processes increase at that time? If that's not intentional then you might need to see what your applications are up to. Do you have a vacuum/backup scheduled for that time? Do you have some other process doing a lot of file I/O at that time? > This is Postgres 8.1.4 64bit. You'll want to upgrade to the latest patch release - you're missing 5 lots of bug-fixes there. -- Richard Huxton Archonet Ltd
You're not swapping are you? One explanation could be that PG is configured to think it has access to a little more memory than the box can really provide, which forces it to swap once it's been running for long enough to fill up its shared buffers or after a certain number of concurrent connections are opened. -- Mark Lewis On Fri, 2007-05-18 at 10:45 +1200, Ralph Mason wrote: > We have a database running on a 4 processor machine. As time goes by > the IO gets worse and worse peeking at about 200% as the machine loads > up. > > > > The weird thing is that if we restart postgres it’s fine for hours but > over time it goes bad again. > > > > (CPU usage graph here > http://www.flickr.com/photos/8347741@N02/502596262/ ) You can clearly > see where the restart happens in the IO area > > > > This is Postgres 8.1.4 64bit. > > > > Anyone have any ideas? > > > > Thanks > > Ralph > > > > > > -- > Internal Virus Database is out-of-date. > Checked by AVG Free Edition. > Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: > 5/12/2006 4:07 p.m. > >
Ralph Mason wrote: > We have a database running on a 4 processor machine. As time goes by > the IO gets worse and worse peeking at about 200% as the machine loads up. > > The weird thing is that if we restart postgres it’s fine for hours but > over time it goes bad again. > > (CPU usage graph here HYPERLINK > "http://www.flickr.com/photos/8347741@N02/502596262/"http://www.flickr > .com/p hotos/8347741@N02/502596262/ ) You can clearly see where the > restart happens in the IO area >I'm assuming here we're talking about that big block of iowait at about >4-6am? Actually no - that is a vacuum of the whole database to double check It's not a vacuuming problem (I am sure it's not). The restart is at at 22:00 where you see the io drop to nothing, the database is still doing the same work. >>I take it vmstat/iostat show a corresponding increase in disk activity >>at that time. I didn't know you could have IO/wait without disk activity - I will check that out. >>The question is - what? >>Does the number of PG processes increase at that time? If that's not >>intentional then you might need to see what your applications are up to. No the number of connections is stable and the jobs they do stays the same, just this deteriorating of i/o wait over time. >>Do you have a vacuum/backup scheduled for that time? Do you have some >>other process doing a lot of file I/O at that time? > This is Postgres 8.1.4 64bit. >You'll want to upgrade to the latest patch release - you're missing 5 >lots of bug-fixes there. Thanks - will try that. -- Internal Virus Database is out-of-date. Checked by AVG Free Edition. Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 5/12/2006 4:07 p.m.
>You're not swapping are you? One explanation could be that PG is >configured to think it has access to a little more memory than the box >can really provide, which forces it to swap once it's been running for >long enough to fill up its shared buffers or after a certain number of >concurrent connections are opened. > >-- Mark Lewis No - no swap on this machine. The number of connections is stable. Ralph On Fri, 2007-05-18 at 10:45 +1200, Ralph Mason wrote: > We have a database running on a 4 processor machine. As time goes by > the IO gets worse and worse peeking at about 200% as the machine loads > up. > > > > The weird thing is that if we restart postgres it’s fine for hours but > over time it goes bad again. > > > > (CPU usage graph here > http://www.flickr.com/photos/8347741@N02/502596262/ ) You can clearly > see where the restart happens in the IO area > > > > This is Postgres 8.1.4 64bit. > > > > Anyone have any ideas? > > > > Thanks > > Ralph > > > > > > -- > Internal Virus Database is out-of-date. > Checked by AVG Free Edition. > Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: > 5/12/2006 4:07 p.m. > > ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate -- Internal Virus Database is out-of-date. Checked by AVG Free Edition. Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 5/12/2006 4:07 p.m. -- Internal Virus Database is out-of-date. Checked by AVG Free Edition. Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 5/12/2006 4:07 p.m.
"Ralph Mason" <ralph.mason@telogis.com> writes: > Ralph Mason wrote: >> We have a database running on a 4 processor machine. As time goes by >> the IO gets worse and worse peeking at about 200% as the machine loads up. >> >> The weird thing is that if we restart postgres it's fine for hours but >> over time it goes bad again. Do you by any chance have stats collection enabled and stats_reset_on_server_start set to true? If so, maybe this is explained by growth in the size of the stats file over time. It'd be interesting to keep an eye on the size of $PGDATA/global/pgstat.stat over a fast-to- slow cycle. regards, tom lane
"Ralph Mason" <ralph.mason@telogis.com> writes: > Ralph Mason wrote: >> We have a database running on a 4 processor machine. As time goes by >> the IO gets worse and worse peeking at about 200% as the machine loads up. >> >> The weird thing is that if we restart postgres it's fine for hours but >> over time it goes bad again. >Do you by any chance have stats collection enabled and >stats_reset_on_server_start set to true? If so, maybe this is explained >by growth in the size of the stats file over time. It'd be interesting >to keep an eye on the size of $PGDATA/global/pgstat.stat over a fast-to- >slow cycle. We do because we use the stats to figure out when we will vacuum. Our vacuum process reads that table and when it runs resets it using pg_stat_reset() to clear it down each time it runs (about ever 60 seconds when the db is very busy), stats_reset_on_server_restart is off. Interestingly after a suggestion here I went and looked at the IO stat at the same time. It shows the writes as expected and picking up exactly where they were before the reset, but the reads drop dramatically - like it's reading far less data after the reset. I will watch the size of the pgstat.stat table. Ralph -- Internal Virus Database is out-of-date. Checked by AVG Free Edition. Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 5/12/2006 4:07 p.m.