Re: pg_autovacuum next steps - Mailing list pgsql-hackers

From Joe Conway
Subject Re: pg_autovacuum next steps
Date
Msg-id 405F7671.7070705@joeconway.com
Whole thread Raw
In response to Re: pg_autovacuum next steps  ("Matthew T. O'Connor" <matthew@zeut.net>)
Responses Re: pg_autovacuum next steps
List pgsql-hackers
Matthew T. O'Connor wrote:
> * Inability to customize thresholds on a per table basis

I ran headlong into this one. IMHO fixing this is critical.

> * Inability to set default thresholds on a per database basis
> * Inability to exclude specific databases / tables from pg_autovacuum
> monitoring

These would be nice to have, but less critical than #1 I think.

> * Inability to schedule vacuums during off-peak times

This would be *really* nice to have. In my recent case, if pg_autovacuum 
could work for say 3 minutes, and then back off for 2 minutes or so 
while the batch transactions hit, it would be ideal.

> I'm not sure how to address all of these concerns, or that they all
> should be addressed right now.  One of my big questions is backend
> integration.  I am leaning towards leaving pg_autovacuum as a client
> application in contrib for one more release.  During this time, I can
> continue to tweak and improve pg_autovacuum so that we will have a very
> good idea what the final product should be before we make it a standard
> backend process.

I really think pg_autovacuum ought to get folded into the backend now, 
for 7.5. I haven't had time yet to read the entire thread, but I saw 
others making the same comment. It would make some of the listed 
problems go away, or at least become far easier to deal with.

> For PostgreSQL 7.5,  I plan to implement these new features:
> 
>  1.Per database defaults and per table thresholds (including total
> exclusion)

Great!

>  2.Persistent data
>  3.Single-Pass Mode (external scheduling from cron etc...)
>  4.Off peak scheduling

Great again!

> 1. Per Database defaults and Per table Thresholds:
> 
>  1.Store config data inside a special pg_autovacuum table inside
> existing databases that wants custom settings.   

A natural if folded into the backend.

> 3.Single-Pass Mode (External Scheduling):
> 
> I have received requests to be able to run pg_autovacuum only on request
> (not as a daemon) making only one pass over all the tables (not looping
> indefinately).  The advantage being that it will operate more like the
> current vacuum command except that it will only vacuum tables that need
> to be vacuumed.  This feature could be useful as long as pg_autovacuum
> exists outside the backend.  If pg_autovacuum gets integrated into the
> backend and gets automatically started as a daemon during startup, then
> this option will no longer make sense.

It still might make sense. You could have a mode where the daemon 
essentially sleeps forever, until explicitly woken up by a signal. When 
woken, it makes one pass, and goes back to infinite sleep. Then provide 
a simple way to signal the autovacuum process -- maybe an extension of 
the current VACUUM syntax.

> 4.Off-Peak Scheduling:
> 
> A fundamental advantage of our vacuum system is that the work required
> to reclaim table space is taken out of the critical path and can be
> moved to and off-peak time when cycles are less precious.  One of the
> drawbacks of the current pg_autovacuum is that it doesn't have any way
> to factor this in.
> 
> In it's simplest form (which I will implement first) I would add the
> ability to add a second set of thresholds that will be active only
> during an “off-peak” time that can be specified in the pg_autovacuum
> database, perhaps in a general_settings table.

I don't know how this would work, but it is for sure important. In the 
recent testing I found that pg_autovacuum (well, lazy vacuum in general, 
but I was using pg_autovacuum to control it) made a huge difference in 
performance of batch transactions. They range from 4-5 seconds without 
vacuum running, to as high as 15 minutes with vacuum running. With the 
vacuum delay patch, delay = 1, pagecount = 8, I still saw times go as 
high as 10 minutes. Backing vacuum off any more than that caused it to 
fall behind the transaction rate unrecoverably. But as I said above, if 
the transactions could complete without vacuum running in 4-5 seconds, 
then vacuuming resumes for the 3-to-4 minutes between batches, all would 
be well.

Joe



pgsql-hackers by date:

Previous
From: Bernd Helmle
Date:
Subject: Re: Thoughts about updateable views
Next
From: "Matthew T. O'Connor"
Date:
Subject: Re: pg_autovacuum next steps