Re: pg_autovacuum bug and feature request - Mailing list pgsql-hackers

From Matthew T. O'Connor
Subject Re: pg_autovacuum bug and feature request
Date
Msg-id 1057598455.5150.36.camel@zeutrh9
Whole thread Raw
In response to pg_autovacuum bug and feature request  (Vincent van Leeuwen <pgsql.spam@vinz.nl>)
List pgsql-hackers
Sorry for the slow response, I was away for the 4th.

On Fri, 2003-07-04 at 14:53, Christopher Browne wrote:
> Vincent Van Leeuwen wrote:
> > 2411 All DBs checked in: -717533400 usec, will sleep for 30 secs.
> 
> That sure looks like a 32 bit wraparound bug; 

Agreed, please check with pg_autovaccum from CVS and let us know if you
still see this problem.

> > Also, I'd like to see a way to tell pg_autovacuum which tables it
> > should monitor. I understand most setups would like to have all tables
> > monitored, but on our setup pg_autovacuum is wasting most of it's time
> > (and a fair amount of serverload) vacuuming some large tables 

See my other email about your choice of settings.  I believe that you
need to tweak your scaling factor settings to get pg_autovacuum to
behave in the manner you are looking for.  In fact I think the default
values used by pg_autovacuum (CVS version) will work for you.  If not,
please let me know.  In most instances -V < .5 will probably have a net
loss of performance, but it's there for testing purposes.

> The whole point of the architecture of pg_autovacuum is for it to be
> totally unnecessary to give any indication of which tables are to be
> vacuumed, and for the daemon to come up with reasonable answers all by
> itself.

Agreed this was a design principle for pg_autovacuum, and I think it
will get reasonable answers with reasonable settings.

> The FIRST approach I'd take would be to see if there are "tweaks" that
> might be made to the model it uses to determine when to vacuum.  Perhaps
> the formula should take account of table size, and thus vacuum less
> often for larger tables, 

It already does this.  That is what the scaling factor -V does, it is a
percentage of the table size that is added to -v (a base value) to
determine the threshold for vacuums.

> perhaps throwing in a "[-z] Table Pages
> Factor", where the calculation of "how often" would get added into it
> the value:
> 
>    pg_class.relpages * table_pages_factor
> 
> [Jan, Matthew; if you have thoughts on this, feel free to suggest
> further.]

There are lots of things we could try, but the next step I am planning
on taking, is to use the FSM as a guide to what should be vacuumed
when.  It has several benefits over the current setup.  The only thing I
don't know is if we can use it alone, or if we will still need to
monitor other sources of information such as the stats system.

> Of course, there is always the answer:  "Use the Source, Luke!"
> 
> The "local kludge" would be for you to customize pg_autovacuum to
> exclude your "not favorite" tables.  That oughtn't be too difficult to
> do, actually.  If you have several tables you don't want to deal with,
> you could do something like the following:
> 
> 
>   if ((strcmp(tbl->table_name, "table_i_dont_want")) ||
>       (strcmp(tbl->table_name, "another_table_i_dont_want")) ||
>       (strcmp(tbl->table_name, "still_another_table_i_dont_want"))) {
>     /* do nothing */
>   } else {
>     /* proceed with usual logic */
>     if ((tbl->curr_vacuum_count - tbl->CountAtLastVacuum) >=
>         tbl->vacuum_threshold)
> ...

A kludge for sure, but it is open source so...

> To provide the answer that you asked for, in a more general way, would
> require introducing either a data file parser (which is what made pgavd
> a serious pain to deploy) or that pg_autovacuum set up its own
> PostgreSQL tables and store data in them.  

I think a better setup would be to have the config information in the
database itself, or add a new system table that allows it.  A new system
table would require that pg_autovacuum be accecpted as a core component
of postgresql.  I don't see this happening, not as long as it's a libpq
based client app.  Using a table inside of a database makes it easy for
the settings to be tweaked live and reduces complexity.

> There _is_ merit to that; one present shortcoming of pg_autovacuum is
> that it can only talk to one postmaster.  If one were to, for instance,
> have _four_ backends (with 4 separate port numbers) on one server, you
> need four instances of pg_autovacuum, and they would be perfectly happy
> to trample the I/O bus if they each concurrently figure they need to
> vacuum some big tables in the respective instances.

Agreed, this was never built into the design, nor do I think it should
be.




pgsql-hackers by date:

Previous
From: Carlos Guzman Alvarez
Date:
Subject: CREATE DATABASE command in Extended Query mode of 3.0 protocol
Next
From: "Matthew T. O'Connor"
Date:
Subject: Re: pg_autovacuum bug and feature request