Thread: More Vacuum questions...

More Vacuum questions...

From
Kevin Kempter
Date:
Hi List;

I've recently started cleaning up some postgres db's which previous to my
recent arrival had no DBA to care for them.

I quickly figured out that there were several tables which were grossly full
of dead space. One table in particular had 75G worth of dead pages (or the
equivelant in overall dead rows). So I rebuilt these several tables via this
process:

1) BEGIN;
2) LOCK TABLE table_old (the current table)
2) CREATE TABLE table_new (...) (copy of table_old above without the indexes)
3) insert into table_new select * from table_old;
4) DROP TABLE table_old;
5) ALTER TABLE table_new rename to table_old;
6) CREATE INDEX (create all original table indexes)
7) COMMIT;

The biggest table mentioned above did in fact reduce the able's overall size
by about 69G.

After the table rebuild, as an interum measure since I'm still tuning and I
need to go through a full test/qa/prod lifecycle to get anything rolled onto
the production servers I added this table to pg_autovacuum with enabled = 'f'
and setup a daily cron job to vacuum the table during off hours. This was due
primarily to the fact that the vacuum of this table was severely impacting
day2day processing. I've since upped the maintenance_work_mem to 300,000 and
in general the vacuums no longer impact day2day processing - with the
exception of this big table.

I let the cron vacuum run for 14 days. in that 14 days the time it takes to
vacuum the table grew from 1.2hours directly after the rebuild to > 8hours
last nite.

It's difficult to try and vacuum this table during the day as it seems to
begin blocking all the other queries against the database after some time.  I
plan to rebuild the table again and see if I can get away with vacuuming more
often - it during the day. Also I'm considering a weekly cron job each Sunday
(minimal processing happens on the weekends) to rebuild the table.

Just curious if anyone has any thoughts on an automated rebuild scenario? or
better yet managing the vac of this table more efficiently?

Maybe it's worth upping maintenance_work_mem sky-high for this table (via a
session specific SET of maintenance_work_mem) and running a vacuum every 3
hours or so. Also, does Postgres allocate maintenence_work_memory from the
overall shared_buffers space available (I think not) ?

Is there some method / guideline I could use to determine the memory needs on
a table by table basis for the vacuum process ? If so, I suspect I could use
this as a guide for setting a session specific maintenance_work_mem via cron
to vacuum these problem tables on a specified schedule.

Thanks in advance...


/Kevin



Re: More Vacuum questions...

From
"Heikki Linnakangas"
Date:
Kevin Kempter wrote:
> It's difficult to try and vacuum this table during the day as it seems to
> begin blocking all the other queries against the database after some time.

Vacuum can generate so much I/O that it overwhelms all other
transactions, but it shouldn't block other queries otherwise. You can
use the vacuum cost delay options to throttle vacuum so that it doesn't
runs slower, but doesn't disrupt other operations so much.

> I plan to rebuild the table again and see if I can get away with vacuuming more
> often - it during the day. Also I'm considering a weekly cron job each Sunday
> (minimal processing happens on the weekends) to rebuild the table.
>
> Just curious if anyone has any thoughts on an automated rebuild scenario? or
> better yet managing the vac of this table more efficiently?

CLUSTER is a handy way to do rebuild tables.

> Maybe it's worth upping maintenance_work_mem sky-high for this table (via a
> session specific SET of maintenance_work_mem) and running a vacuum every 3
> hours or so.

You only need enough maintenance_work_mem to hold pointers to all dead
tuples in the table. Using more than that won't help.

> Also, does Postgres allocate maintenence_work_memory from the
> overall shared_buffers space available (I think not) ?

No.

> Is there some method / guideline I could use to determine the memory needs on
> a table by table basis for the vacuum process ? If so, I suspect I could use
> this as a guide for setting a session specific maintenance_work_mem via cron
> to vacuum these problem tables on a specified schedule.

You need 6 bytes per dead tuple in the table to avoid scanning the
indexes more than once. If you vacuum regularly, you shouldn't need more
than a few hundred MB.

One way is to run VACUUM VERBOSE, which will tell  how many passes it
used. If it used more than one, increase maintenance_work_mem.

I would suggest using autovacuum after all. If it seems to be disrupting
other activity too much, increase autovacuum_cost_delay. Or decrease it
if it can't keep up with the updates.

BTW, you didn't mention which version of PostgreSQL you're using.
There's been some performance enhancements to VACUUM in 8.2, as well as
autovacuum changes. You might consider upgrading if you're not on 8.2
already.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: More Vacuum questions...

From
Decibel!
Date:
On Tue, Sep 11, 2007 at 10:24:58AM -0600, Kevin Kempter wrote:
> I let the cron vacuum run for 14 days. in that 14 days the time it takes to
> vacuum the table grew from 1.2hours directly after the rebuild to > 8hours
> last nite.

Sounds to me like daily isn't enough, and that your FSM is too small.
--
Decibel!, aka Jim Nasby                        decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Attachment