Thread: significant vacuum issues - looking for suggestions

significant vacuum issues - looking for suggestions

From

Kevin Kempter

Date:

24 August 2007, 17:05:25

Hi List;

I've just started working with a client that has been running Postgres (with
no DBA) for a few years. They're running  version 8.1.4 on 4-way dell boxes
with 4Gig of memory on each box attached to RAID-10 disk arrays.

Some of their key config settings are here:
shared_buffers = 20480
work_mem = 16384
maintenance_work_mem = 32758
wal_buffers = 24
checkpoint_segments = 32
checkpoint_timeout = 300
checkpoint_warning = 30
effective_cache_size = 524288
autovacuum = on
autovacuum_naptime = 60
autovacuum_vacuum_threshold = 500
autovacuum_analyze_threshold = 250
autovacuum_vacuum_scale_factor = 0.2
autovacuum_analyze_scale_factor = 0.1
autovacuum_vacuum_cost_delay = -1
autovacuum_vacuum_cost_limit = -1


Currently I've found that they have approx 17 tables that show a significant
amount of bloat in the system. The worst one showing over 5million  pages
worth of dead space. One of the problems is that their system is so busy with
activity during the day and massive data load processes at night that they
often kill the pid of vacuum processes kicked off by autovacuum because the
overall load impact disallows users from logging into the app since the login
process includes at least one db query that then seems to hang because there
are anywhere from 100 - 300 queries ahead of it at any given time. Normally a
user gets logged in with an avg wait of 5 - 10 seconds but when a long
running vacuum (sometimes due to a long running update process that's trying
to sort/update > 40million rows)  is going the system gets to a state where
the login queries never get executed until the vacuum process is killed.

As a result of this I believe that the biggest table (the one with > 5million
pages worth of dead space) has never been vacuumed to completion. I suspect
this is the case for a few of the other top dead space tables as well but I
can't be sure.

My first priority was to get this vacuum scenario cleaned up. First off I
added the biggest table into pg_autovacuum and set the enabled column to
false ('f'). Then I set vacuum_cost_delay to 10 and in the same session
ran "vacuum analyze verbose big_table".  This ran for 7.5 hours before we had
to kill it due to system load - and to make matters worse the high system
load was forcing many of the nightly batch queries that load, update, etc the
data to stack up to a point where the system was at less than 2% idle (CPU)
for the next 4 hours and barely responding to the command line.

To make matters worse I find out this morning that the db is at 85% per used
transaction ID's - again since a vacuum on the entire db has never been
completed.

As far as I can tell, the overall db size is currently 199G of which approx
104G seems to be valid data.

Here's my thoughts per how to proceed:

=====================================
1) fix the big table ASAP (probably over the weekend) since it's not only the
biggest table but the most active like this:

   a) run a pg_dump of this table

   b)  restore this dump into a new table (i.e. new_big_table)

   c) lock the original big_table, sync any changes, inserts, deletes since we
did the dump from big_table into new_big_table

   d) drop big_table

   e) re-name new_big_table to big_table

* I may run through this for a few of the other large, highly active tables
that have minimal page density as well.
=====================================


The development folks that have been here awhile tell me that it seems like
when they have a query (not limited to vacuum processes) that has been
running for a long time (i.e. > 5 or 6 hours) that the query sort of "goes
crazy" and the entire system gets pegged until they kill that process. - I've
not heard of this but I suspect upgrading to 8.2.4 is probably a good plan at
this point as well, so for step 2, I'll do this:

=====================================
2) (obviously I'll do this in dev first, then in QA and finally in prod)
   a) install verson 8.2.4 from source, leaving 8.1.4 in place

   b) create the new 8.2.4 cluster on a new port

   c) setup WAL archiving on the 8.1.4 cluster

   d) do a full dump of the 8.1.4 cluster and restore it to the new 8.2.4
cluster

   e) stop the 8.2.4 cluster and bring it up in recovery mode, pointing it to
the directory where we're archiving the 8.1.4 cluster's WAL segments.

   f) once caught up, bring both clusters down

   g) copy any final files from the 8.1.4 cluster's pg_xlog directory into the
new 8.2.4 pg_xlog dir (is this ok, since I'm moving 8.1.4 version tx logs
into an 8.2.4 xlog dir?)

   h) Change the port on the 8.2.4 cluster to what the original 8.1.4 cluster
port was

   i) bring up the new 8.2.4 system, and actively manage the vacuum needs
moving fwd via a combination of autovacuum, cron processes for specififed
table vac's (daily, hourly, 15min, 5min, etc), and as needed interactive
session vacuums
=====================================


The src based install will allow me to setup a robust upgrade CM process
capable of supporting multiple concurrent versions on a server if needed, the
ability to quickly revert to a previous version, etc however this is a
discussion for another day - I only mention it in case the question "why not
just use RPM's?" arises...


So here's my questions:

1) Does this sound like a good plan?

2) Are there other steps I should be taking, other Issues I should be
concerned about short-term, etc?

3) Does anyone have any additional advice for managing either this initial
mess, or the system(s) long term?

Thanks in advance...

/Kevin

Re: significant vacuum issues - looking for suggestions

From

Bill Moran

Date:

24 August 2007, 17:42:54

In response to Kevin Kempter <kevin@kevinkempterllc.com>:

> Hi List;
>
> I've just started working with a client that has been running Postgres (with
> no DBA) for a few years. They're running  version 8.1.4 on 4-way dell boxes
> with 4Gig of memory on each box attached to RAID-10 disk arrays.
>
> Some of their key config settings are here:
> shared_buffers = 20480
> work_mem = 16384
> maintenance_work_mem = 32758

Before you do any of those other things, bump shared_buffers to about
120000 and maintenance_work_mem to 250000 or so -- unless this box
has other applications on it using significant amounts of those 4G of
RAM.  You may find that these changes alone are enough to get vacuum
to complete.  You'll need to restart the server for the shared_buffers
setting to take effect.

Can you do a pg_relation_size() on the tables in question?

> wal_buffers = 24
> checkpoint_segments = 32
> checkpoint_timeout = 300
> checkpoint_warning = 30
> effective_cache_size = 524288
> autovacuum = on
> autovacuum_naptime = 60
> autovacuum_vacuum_threshold = 500
> autovacuum_analyze_threshold = 250
> autovacuum_vacuum_scale_factor = 0.2
> autovacuum_analyze_scale_factor = 0.1
> autovacuum_vacuum_cost_delay = -1
> autovacuum_vacuum_cost_limit = -1
>
>
> Currently I've found that they have approx 17 tables that show a significant
> amount of bloat in the system. The worst one showing over 5million  pages
> worth of dead space. One of the problems is that their system is so busy with
> activity during the day and massive data load processes at night that they
> often kill the pid of vacuum processes kicked off by autovacuum because the
> overall load impact disallows users from logging into the app since the login
> process includes at least one db query that then seems to hang because there
> are anywhere from 100 - 300 queries ahead of it at any given time. Normally a
> user gets logged in with an avg wait of 5 - 10 seconds but when a long
> running vacuum (sometimes due to a long running update process that's trying
> to sort/update > 40million rows)  is going the system gets to a state where
> the login queries never get executed until the vacuum process is killed.
>
> As a result of this I believe that the biggest table (the one with > 5million
> pages worth of dead space) has never been vacuumed to completion. I suspect
> this is the case for a few of the other top dead space tables as well but I
> can't be sure.
>
> My first priority was to get this vacuum scenario cleaned up. First off I
> added the biggest table into pg_autovacuum and set the enabled column to
> false ('f'). Then I set vacuum_cost_delay to 10 and in the same session
> ran "vacuum analyze verbose big_table".  This ran for 7.5 hours before we had
> to kill it due to system load - and to make matters worse the high system
> load was forcing many of the nightly batch queries that load, update, etc the
> data to stack up to a point where the system was at less than 2% idle (CPU)
> for the next 4 hours and barely responding to the command line.
>
> To make matters worse I find out this morning that the db is at 85% per used
> transaction ID's - again since a vacuum on the entire db has never been
> completed.
>
> As far as I can tell, the overall db size is currently 199G of which approx
> 104G seems to be valid data.
>
> Here's my thoughts per how to proceed:
>
> =====================================
> 1) fix the big table ASAP (probably over the weekend) since it's not only the
> biggest table but the most active like this:
>
>    a) run a pg_dump of this table
>
>    b)  restore this dump into a new table (i.e. new_big_table)
>
>    c) lock the original big_table, sync any changes, inserts, deletes since we
> did the dump from big_table into new_big_table
>
>    d) drop big_table
>
>    e) re-name new_big_table to big_table
>
> * I may run through this for a few of the other large, highly active tables
> that have minimal page density as well.
> =====================================
>
>
> The development folks that have been here awhile tell me that it seems like
> when they have a query (not limited to vacuum processes) that has been
> running for a long time (i.e. > 5 or 6 hours) that the query sort of "goes
> crazy" and the entire system gets pegged until they kill that process. - I've
> not heard of this but I suspect upgrading to 8.2.4 is probably a good plan at
> this point as well, so for step 2, I'll do this:
>
> =====================================
> 2) (obviously I'll do this in dev first, then in QA and finally in prod)
>    a) install verson 8.2.4 from source, leaving 8.1.4 in place
>
>    b) create the new 8.2.4 cluster on a new port
>
>    c) setup WAL archiving on the 8.1.4 cluster
>
>    d) do a full dump of the 8.1.4 cluster and restore it to the new 8.2.4
> cluster
>
>    e) stop the 8.2.4 cluster and bring it up in recovery mode, pointing it to
> the directory where we're archiving the 8.1.4 cluster's WAL segments.
>
>    f) once caught up, bring both clusters down
>
>    g) copy any final files from the 8.1.4 cluster's pg_xlog directory into the
> new 8.2.4 pg_xlog dir (is this ok, since I'm moving 8.1.4 version tx logs
> into an 8.2.4 xlog dir?)
>
>    h) Change the port on the 8.2.4 cluster to what the original 8.1.4 cluster
> port was
>
>    i) bring up the new 8.2.4 system, and actively manage the vacuum needs
> moving fwd via a combination of autovacuum, cron processes for specififed
> table vac's (daily, hourly, 15min, 5min, etc), and as needed interactive
> session vacuums
> =====================================
>
>
> The src based install will allow me to setup a robust upgrade CM process
> capable of supporting multiple concurrent versions on a server if needed, the
> ability to quickly revert to a previous version, etc however this is a
> discussion for another day - I only mention it in case the question "why not
> just use RPM's?" arises...
>
>
> So here's my questions:
>
> 1) Does this sound like a good plan?
>
> 2) Are there other steps I should be taking, other Issues I should be
> concerned about short-term, etc?
>
> 3) Does anyone have any additional advice for managing either this initial
> mess, or the system(s) long term?
>
> Thanks in advance...
>
> /Kevin
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>
>
>
>
>
>


--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

****************************************************************
IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.
****************************************************************

Re: significant vacuum issues - looking for suggestions

From

"Kevin Grittner"

Date:

24 August 2007, 17:53:18

>>> On Fri, Aug 24, 2007 at  2:57 PM, in message
<200708241357.23629.kevin@kevinkempterllc.com>, Kevin Kempter
<kevin@kevinkempterllc.com> wrote:
>    c) setup WAL archiving on the 8.1.4 cluster
>
>    d) do a full dump of the 8.1.4 cluster and restore it to the new 8.2.4
> cluster
>
>    e) stop the 8.2.4 cluster and bring it up in recovery mode, pointing it
> to
> the directory where we're archiving the 8.1.4 cluster's WAL segments.

You can't use these techniques for a major version upgrade.
Use pg_dump piped to psql.  That will also eliminate all bloat.

-Kevin

Re: significant vacuum issues - looking for suggestions

From

Bill Moran

Date:

24 August 2007, 18:08:20

In response to "Kevin Grittner" <Kevin.Grittner@wicourts.gov>:

> >>> On Fri, Aug 24, 2007 at  2:57 PM, in message
> <200708241357.23629.kevin@kevinkempterllc.com>, Kevin Kempter
> <kevin@kevinkempterllc.com> wrote:
> >    c) setup WAL archiving on the 8.1.4 cluster
> >
> >    d) do a full dump of the 8.1.4 cluster and restore it to the new 8.2.4
> > cluster
> >
> >    e) stop the 8.2.4 cluster and bring it up in recovery mode, pointing it
> > to
> > the directory where we're archiving the 8.1.4 cluster's WAL segments.
>
> You can't use these techniques for a major version upgrade.
> Use pg_dump piped to psql.  That will also eliminate all bloat.

If you can't afford any downtime, you may be able to use Slony to
do your upgrade.  However, slony adds overhead, and if this system
is tapped out already, it may not tolerate the additional overhead.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: significant vacuum issues - looking for suggestions

From

Tom Lane

Date:

24 August 2007, 18:39:28

Kevin Kempter <kevin@kevinkempterllc.com> writes:
> The development folks that have been here awhile tell me that it seems like
> when they have a query (not limited to vacuum processes) that has been
> running for a long time (i.e. > 5 or 6 hours) that the query sort of "goes
> crazy" and the entire system gets pegged until they kill that
> process. - I've not heard of this

Me either, but I wonder whether their queries are tickling some memory
leak.  I could imagine that what they are seeing is the backend process
growing slowly until it starts to swap, and then continuing to grow and
needing more and more swap activity.  Once you get over the knee of that
curve, things get real bad real fast.  It might not be a bad idea to run
the postmaster under a (carefully chosen) ulimit setting to cut such
things off before the system starts swapping.  Other things to look at:

* what exactly gets "pegged" --- is it CPU or I/O bound?  Watching
"vmstat 1" is usually a good diagnostic since you can see CPU, swap,
and regular disk I/O activity at once.

* is there really not any pattern to the queries that cause the problem?
I don't think 8.1.4 has any widespread leakage problem, but they might
be tickling something isolated, in which case 8.2 is not necessarily
gonna fix it.  If you can produce a test case showing this behavior it'd
be time to call in pgsql-hackers.

Your other points seem pretty well covered by other replies.

            regards, tom lane

Re: significant vacuum issues - looking for suggestions

From

Kevin Kempter

Date:

25 August 2007, 02:35:39

On Friday 24 August 2007 15:39:22 Tom Lane wrote:
> Kevin Kempter <kevin@kevinkempterllc.com> writes:
> > The development folks that have been here awhile tell me that it seems
> > like when they have a query (not limited to vacuum processes) that has
> > been running for a long time (i.e. > 5 or 6 hours) that the query sort of
> > "goes crazy" and the entire system gets pegged until they kill that
> > process. - I've not heard of this
>
> Me either, but I wonder whether their queries are tickling some memory
> leak.  I could imagine that what they are seeing is the backend process
> growing slowly until it starts to swap, and then continuing to grow and
> needing more and more swap activity.  Once you get over the knee of that
> curve, things get real bad real fast.  It might not be a bad idea to run
> the postmaster under a (carefully chosen) ulimit setting to cut such
> things off before the system starts swapping.  Other things to look at:
>
> * what exactly gets "pegged" --- is it CPU or I/O bound?  Watching
> "vmstat 1" is usually a good diagnostic since you can see CPU, swap,
> and regular disk I/O activity at once.
>
> * is there really not any pattern to the queries that cause the problem?
> I don't think 8.1.4 has any widespread leakage problem, but they might
> be tickling something isolated, in which case 8.2 is not necessarily
> gonna fix it.  If you can produce a test case showing this behavior it'd
> be time to call in pgsql-hackers.
>
> Your other points seem pretty well covered by other replies.
>
>             regards, tom lane

Thanks everyone for the help. I'll first up the memory settings like Bill
suggested and then see where I'm at. Moving fwd I'll see if I have a test
case that I can re-create, plus I may try constraining the postmaster via a
ulimit setting, again based on what I see once the cluster is allowed to use
the memory it should have been given up front.

/Kevin

Re: significant vacuum issues - looking for suggestions

From

Decibel!

Date:

27 August 2007, 18:00:54

On Fri, Aug 24, 2007 at 04:41:44PM -0400, Bill Moran wrote:
> In response to Kevin Kempter <kevin@kevinkempterllc.com>:
>
> > Hi List;
> >
> > I've just started working with a client that has been running Postgres (with
> > no DBA) for a few years. They're running  version 8.1.4 on 4-way dell boxes
> > with 4Gig of memory on each box attached to RAID-10 disk arrays.
> >
> > Some of their key config settings are here:
> > shared_buffers = 20480
> > work_mem = 16384
> > maintenance_work_mem = 32758
>
> Before you do any of those other things, bump shared_buffers to about
> 120000 and maintenance_work_mem to 250000 or so -- unless this box
> has other applications on it using significant amounts of those 4G of
> RAM.  You may find that these changes alone are enough to get vacuum
> to complete.  You'll need to restart the server for the shared_buffers
> setting to take effect.

For the really bloated table, you might need to go even higher than
250000 for maint_work_mem. IIRC vacuum needs 6 bytes per dead tuple, so
that means 43M rows... with 5M dead pages, that means less than 10 rows
per page, which is unlikely. Keep in mind that if you do a vacuum
verbose, you'll be able to see if vacuum runs out of
maintenance_work_mem, because you'll see multiple passes through all the
indexes.

You could also potentially use this to your benefit. Set maint_work_mem
low enough so that vacuum will have to start it's cleaning pass after
only an hour or so... depending on how big/bloated the indexes are on
the table, it might take another 2-3 hours to clean everything. I
believe that as soon as you see it start on the indexes a second time
you can kill it... you'll have wasted some work, but more importantly
you'll have actually vacuumed part of the table.

But all of that's a moot point if they're running the default free space
map settings, which are way, way, way to conservative in 8.1. If you've
got one table with 5M dead pages, you probably want to set fsm_pages to
at least 50000000 as a rough guess, at least until this is under
control. Keep in mind that does equate to 286M of memory, though.

As for your pg_dump idea... why not just do a CREATE TABLE AS SELECT *
FROM bloated_table? That would likely be much faster than messing around
with pg_dump.

What kind of disk hardware is this running on? A good raid 10 array with
write caching should be able to handle a 200G database fairly well; at
least better than it is from what I'm hearing.
--
Decibel!, aka Jim Nasby                        decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Attachment

msg-22763-28173.dat

Re: significant vacuum issues - looking for suggestions

From

"Kevin Grittner"

Date:

27 August 2007, 18:56:48

>>> Decibel! <decibel@decibel.org> 08/27/07 4:00 PM >>>
> > > They're running  version 8.1.4
>
> As for your pg_dump idea... why not just do a CREATE TABLE AS SELECT *
> FROM bloated_table? That would likely be much faster than messing around
> with pg_dump.

He wanted to upgrade to 8.2.4.  CREATE TABLE AS won't get him there.

> > > They're running  version 8.1.4 on 4-way dell boxes
> > > with 4Gig of memory on each box attached to RAID-10 disk arrays.
>
> What kind of disk hardware is this running on? A good raid 10 array with
> write caching should be able to handle a 200G database fairly well

What other details were you looking for?

-Kevin

Re: significant vacuum issues - looking for suggestions

From

Kevin Kempter

Date:

27 August 2007, 19:01:57

On Monday 27 August 2007 15:56:33 Kevin Grittner wrote:
> >>> Decibel! <decibel@decibel.org> 08/27/07 4:00 PM >>>
> >>>
> > > > They're running  version 8.1.4
> >
> > As for your pg_dump idea... why not just do a CREATE TABLE AS SELECT *
> > FROM bloated_table? That would likely be much faster than messing around
> > with pg_dump.
>
> He wanted to upgrade to 8.2.4.  CREATE TABLE AS won't get him there.
>
> > > > They're running  version 8.1.4 on 4-way dell boxes
> > > > with 4Gig of memory on each box attached to RAID-10 disk arrays.
> >
> > What kind of disk hardware is this running on? A good raid 10 array with
> > write caching should be able to handle a 200G database fairly well
>
> What other details were you looking for?
>
> -Kevin
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

I decided to fix this table first - did so by creating a new table, running a
select from insert into and renaming the orig table to old_XXX and then
renamed the new table to the orig table's name.

I'll drop the orig table once I'm sure there are no data issues.

I'm planning to setup a new file system layout for the box(es) and try to do a
pg_dump | psql for the upgrade.

Re: significant vacuum issues - looking for suggestions

From

Decibel!

Date:

27 August 2007, 19:04:56

On Mon, Aug 27, 2007 at 04:56:33PM -0500, Kevin Grittner wrote:
> >>> Decibel! <decibel@decibel.org> 08/27/07 4:00 PM >>>
> > > > They're running  version 8.1.4
> >
> > As for your pg_dump idea... why not just do a CREATE TABLE AS SELECT *
> > FROM bloated_table? That would likely be much faster than messing around
> > with pg_dump.
>
> He wanted to upgrade to 8.2.4.  CREATE TABLE AS won't get him there.
>
> > > > They're running  version 8.1.4 on 4-way dell boxes
> > > > with 4Gig of memory on each box attached to RAID-10 disk arrays.
> >
> > What kind of disk hardware is this running on? A good raid 10 array with
> > write caching should be able to handle a 200G database fairly well
>
> What other details were you looking for?

How many drives? Write caching? 200G isn't *that* big for good drive
hardware, *IF* it's performing the way it should. You'd be surprised how
many arrays fall on their face even from a simple dd test.
--
Decibel!, aka Jim Nasby                        decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Attachment

msg-22763-28178.dat

Re: significant vacuum issues - looking for suggestions

From

Kevin Kempter

Date:

27 August 2007, 19:05:41

On Monday 27 August 2007 15:00:41 you wrote:
> On Fri, Aug 24, 2007 at 04:41:44PM -0400, Bill Moran wrote:
> > In response to Kevin Kempter <kevin@kevinkempterllc.com>:
> > > Hi List;
> > >
> > > I've just started working with a client that has been running Postgres
> > > (with no DBA) for a few years. They're running  version 8.1.4 on 4-way
> > > dell boxes with 4Gig of memory on each box attached to RAID-10 disk
> > > arrays.
> > >
> > > Some of their key config settings are here:
> > > shared_buffers = 20480
> > > work_mem = 16384
> > > maintenance_work_mem = 32758
> >
> > Before you do any of those other things, bump shared_buffers to about
> > 120000 and maintenance_work_mem to 250000 or so -- unless this box
> > has other applications on it using significant amounts of those 4G of
> > RAM.  You may find that these changes alone are enough to get vacuum
> > to complete.  You'll need to restart the server for the shared_buffers
> > setting to take effect.
>
> For the really bloated table, you might need to go even higher than
> 250000 for maint_work_mem. IIRC vacuum needs 6 bytes per dead tuple, so
> that means 43M rows... with 5M dead pages, that means less than 10 rows
> per page, which is unlikely. Keep in mind that if you do a vacuum
> verbose, you'll be able to see if vacuum runs out of
> maintenance_work_mem, because you'll see multiple passes through all the
> indexes.
>
> You could also potentially use this to your benefit. Set maint_work_mem
> low enough so that vacuum will have to start it's cleaning pass after
> only an hour or so... depending on how big/bloated the indexes are on
> the table, it might take another 2-3 hours to clean everything. I
> believe that as soon as you see it start on the indexes a second time
> you can kill it... you'll have wasted some work, but more importantly
> you'll have actually vacuumed part of the table.
>
> But all of that's a moot point if they're running the default free space
> map settings, which are way, way, way to conservative in 8.1. If you've
> got one table with 5M dead pages, you probably want to set fsm_pages to
> at least 50000000 as a rough guess, at least until this is under
> control. Keep in mind that does equate to 286M of memory, though.
>
> As for your pg_dump idea... why not just do a CREATE TABLE AS SELECT *
> FROM bloated_table? That would likely be much faster than messing around
> with pg_dump.
>
> What kind of disk hardware is this running on? A good raid 10 array with
> write caching should be able to handle a 200G database fairly well; at
> least better than it is from what I'm hearing.

The memory settings are way low on all their db servers (less than 170Meg for
the shared_buffers). I fixed this table via creating a new_** table, select
from insert into, and a rename.

I'm still working through the memory settings and reviewing their other config
settings, the filesystem type/settings and eventually a security audit. It's
a new client and theyve been running postgres for a few years on approx 8 db
servers with no DBA.

The servers are 4-way intel boxes (NOT dual-core) with 4G of memory and
running raid-10 arrays.

Re: significant vacuum issues - looking for suggestions

From

Kevin Kempter

Date:

27 August 2007, 19:07:51

On Monday 27 August 2007 16:04:39 Decibel! wrote:
> On Mon, Aug 27, 2007 at 04:56:33PM -0500, Kevin Grittner wrote:
> > >>> Decibel! <decibel@decibel.org> 08/27/07 4:00 PM >>>
> > >>>
> > > > > They're running  version 8.1.4
> > >
> > > As for your pg_dump idea... why not just do a CREATE TABLE AS SELECT *
> > > FROM bloated_table? That would likely be much faster than messing
> > > around with pg_dump.
> >
> > He wanted to upgrade to 8.2.4.  CREATE TABLE AS won't get him there.
> >
> > > > > They're running  version 8.1.4 on 4-way dell boxes
> > > > > with 4Gig of memory on each box attached to RAID-10 disk arrays.
> > >
> > > What kind of disk hardware is this running on? A good raid 10 array
> > > with write caching should be able to handle a 200G database fairly well
> >
> > What other details were you looking for?
>
> How many drives? Write caching? 200G isn't *that* big for good drive
> hardware, *IF* it's performing the way it should. You'd be surprised how
> many arrays fall on their face even from a simple dd test.

I havent gotten that info yet, the key resources are too busy... I'll have
more info next week.

Thanks for the replies...