Re: Autovacuum integration patch - Mailing list pgsql-patches

From Matthew T. O'Connor
Subject Re: Autovacuum integration patch
Date
Msg-id 42C36F00.7070905@zeut.net
Whole thread Raw
In response to Autovacuum integration patch  (Alvaro Herrera <alvherre@surnet.cl>)
List pgsql-patches
Alvaro Herrera wrote:

>There are several things that are painfully evident with this thing on:
>
>- TRUNCATE does not update stats.  It should send a stat message to
>  which we can react.
>
>

How important is this really?  The stats from before the truncate might
be ok, especially since they might represent how the table will look in
the future.  Also, there isn't any free space in a table that was just
truncated, so there is no need to run vacuum to update the FSM.

>- If you empty a whole table using DELETE just after an
>  automatically-issued VACUUM takes place, the new threshold may not be
>  enough to trigger a new VACUUM.  Thus you end up with a bloated table,
>  and it won't get vacuumed until it grows again.  This may be a problem
>  with the cost equations, but those are AFAICT identical to those of
>  pg_autovacuum, so we may need to rethink the equations.
>
>

I'm very open to a better equation if someone has one, but I'm not sure
what the problem is.  If there are 10,000 rows in a table and an
autovacuum takes place, you will have a threshold of 5,000 (assuming you
are using the default threshold parmeters: base = 1000, scaling factor =
0.4).  So now when all the rows are deleted that will be enough activity
to cross the threshold and cause another vacuum.   I guess the problem
is if the table is smaller say, 1,000 rows, now after a vacuum, the
threshold will be 1,400, and deleting all the rows will not cause a
vacuum.  But that is OK because a 1,000 row table is probably not very
big.  The purpose of the base threshold value is so that vacuum commands
don't get run continually on really small tables that are updated a lot,
it's OK to have some slack space.  If the default is deemed to high, we
can always lower it.

>- The default value of on for reset stats on server start is going to be
>  painful with autovacuum, because it reacts badly to losing the info.
>
>

I agree, this is an issue.  Is there any reason not to change
stats_reset_on_restart to default to true?

>- We should make VACUUM and ANALYZE update the pg_autovacuum relation,
>  in order to make the autovacuum daemon behave sanely with manually
>  issued VACUUM/ANALYZE.
>
>

Agree completly.  This way autovacuum can work in harmony with manually
issued or cron isssued vacuum commands.

>- Having an autovacuum process running on a database can be surprising
>  if you want to drop a database, or create a new one using it as a
>  template.  This happenned to me several times.
>
>

Not sure what to do about this.   We could reduce the number of times
autovacuum actually connects to a database by checking the stats flat
file before we connect.  If there hasn't been any activity since the
last time we connected, then don't connect again.  Better ideas anyone?

>- The shutdown sequence is not debugged nor very well tested.  It may be
>  all wrong.
>
>

Ok, I'm testing it now, i'll let you know if I see anything funny.

>- The startup sequence is a mixture from pgarch, normal backend and
>  pgstat.  I find it relatively clean but I can't swear it's bug-free.
>
>

Same as above.

>- There are no docs
>
>

I can help here as long as I don't have to have the docs done before July 1.

>- There are no ALTER TABLE commands to change the pg_autovacuum
>  attributes for a table. (Enable/disable, set thresholds and scaling
>  factor)
>
>

I don't think we need this do we?  Mucking around in the autovacuum
table shouldn't cause the system any serious problems, if you do mess up
your values, it's easy to just reset them all to 0 and start back with
the defaults.

>- I compiled with -DEXEC_BACKEND, but I didn't look to see if it
>  actually worked on that case.
>
>Apart from all these issues, it is completely functional :-)  It can
>survive several "make installcheck" runs without problem, and the
>regression database is vacuumed/analyzed as it runs.
>
>

Cool.

>Some of these issues are trivial to handle.  However I'd like to release
>this right now, so I can go back to "shared dependencies" now that role
>support is in.
>
>Barring any objections I think this should be integrated, so these
>issues can be tackled by interested parties.
>

Couple of other thoughts:
Do the vacuum commands respect the GUC vacuum delay settings?
Should we be able to set per table vacuum delay settings?
This patch doesn't have the "maintenance window" that was discussed a
while ago.  Can that be added after July 1?

Thanks Alvaro for doing the integration work!!!!

Matthew O'Connor



pgsql-patches by date:

Previous
From: Christopher Kings-Lynne
Date:
Subject: Re: Dump comments on large objects in text mode
Next
From: Tom Lane
Date:
Subject: Re: Dump comments on large objects in text mode