Thread: new GUC var: autovacuum_process_all_tables

new GUC var: autovacuum_process_all_tables

From
Alvaro Herrera
Date:
Hi,

Right now, when autovacuum is turned on we always assume it's supposed
to process all tables except those that have autovacuum_enabled=false.

Now, sometimes it might make more sense to keep it enabled but have it
only check for certain tables, and leave the majority of them disabled.
For this we'd have a separate GUC parameter, as in $SUBJECT (I'm not
wedded to the name), and have the user set autovacuum_enabled=true via
reloptions to enable it.

Opinions?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: new GUC var: autovacuum_process_all_tables

From
"Joshua D. Drake"
Date:
On Thu, 2009-02-05 at 17:45 -0300, Alvaro Herrera wrote:
> Hi,
> 
> Right now, when autovacuum is turned on we always assume it's supposed
> to process all tables except those that have autovacuum_enabled=false.
> 
> Now, sometimes it might make more sense to keep it enabled but have it
> only check for certain tables, and leave the majority of them disabled.
> For this we'd have a separate GUC parameter, as in $SUBJECT (I'm not
> wedded to the name), and have the user set autovacuum_enabled=true via
> reloptions to enable it.
> 
> Opinions?
> 

So you are inverting the option? What I mean is you are giving the
option of either:

A. Process everything unless false
B. Process nothing unless true

If I am understanding what you wrote correctly I am not sure I like the
idea as a whole. I think we should just always have it on and not have
it be optional. The rule of thumb should be, we autovacuum everything,
unless there is a extremely good reason not to and I think you should
have to explicitly turn off autovacuum for a relation.

Sincerely,

Joshua D. Drake


> -- 
> Alvaro Herrera                                http://www.CommandPrompt.com/
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
> 
-- 
PostgreSQL - XMPP: jdrake@jabber.postgresql.org  Consulting, Development, Support, Training  503-667-4564 -
http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
 



Re: new GUC var: autovacuum_process_all_tables

From
Simon Riggs
Date:
On Thu, 2009-02-05 at 17:45 -0300, Alvaro Herrera wrote:

> Right now, when autovacuum is turned on we always assume it's supposed
> to process all tables except those that have autovacuum_enabled=false.
> 
> Now, sometimes it might make more sense to keep it enabled but have it
> only check for certain tables, and leave the majority of them disabled.
> For this we'd have a separate GUC parameter, as in $SUBJECT (I'm not
> wedded to the name), and have the user set autovacuum_enabled=true via
> reloptions to enable it.

I would prefer it if that behaviour was enabled by putting a special
entry into pg_autovacuum, e.g.

"ALL TABLES", autovacuum_enabled=false

I don't really want more GUCs for every nuance of AV behaviour.

If you do this we'd want it to be selectable by database and schema as
well. Perhaps by inserting the oid of the relevant database or schema?

e.g. if we want to turn off AV for database X, which has oid x

then we insert into pg_autovacuum(x, false, ....)

or to make the default no-autovacuum for all tables in all databases

insert into pg_autovacuum(0, false, ....)

It would be useful if all but the first two columns were nullable also,
to avoid having to specify -1.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: new GUC var: autovacuum_process_all_tables

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Right now, when autovacuum is turned on we always assume it's supposed
> to process all tables except those that have autovacuum_enabled=false.

> Now, sometimes it might make more sense to keep it enabled but have it
> only check for certain tables, and leave the majority of them disabled.

When would that be?  I can follow the use-case for vacuuming a selected
set of tables via cron-driven commands or whatever, and then excluding
those tables from autovacuum's purview.  But there isn't a command to
"vacuum all tables except these".  Without such a command available
to the cron-job, a switch such as you suggest is merely a foot-gun,
because it's dead certain some tables are going to get left out of
both manual and autovacuum processing.

And before anyone suggests it, I don't want to invent "vacuum all tables
except these".  It'd make more sense to put the effort into developing
better scheduling control over autovacuum, such as a concept of
maintenance windows.

(BTW, autovac does vacuum tables to prevent wraparound even if you try
to tell it to skip them, right?)
        regards, tom lane


Re: new GUC var: autovacuum_process_all_tables

From
Robert Haas
Date:
On Thu, Feb 5, 2009 at 3:45 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Right now, when autovacuum is turned on we always assume it's supposed
> to process all tables except those that have autovacuum_enabled=false.
>
> Now, sometimes it might make more sense to keep it enabled but have it
> only check for certain tables, and leave the majority of them disabled.
> For this we'd have a separate GUC parameter, as in $SUBJECT (I'm not
> wedded to the name), and have the user set autovacuum_enabled=true via
> reloptions to enable it.
>
> Opinions?

Sounds horribly confusing.  It's not very difficult to write a script
to set this value for every table in the database, if that's what you
want to do.  Having autovacuum_enabled potentially mean two different
things depending on the value of some GUC sounds like a recipe for
confusion (and no I don't like it any better if we put the global
switch somewhere other than a GUC).

...Robert


Re: new GUC var: autovacuum_process_all_tables

From
Alvaro Herrera
Date:
Simon Riggs wrote:
> 
> On Thu, 2009-02-05 at 17:45 -0300, Alvaro Herrera wrote:
> 
> > For this we'd have a separate GUC parameter, as in $SUBJECT (I'm not
> > wedded to the name), and have the user set autovacuum_enabled=true via
> > reloptions to enable it.
> 
> I would prefer it if that behaviour was enabled by putting a special
> entry into pg_autovacuum,

So you're not aware that we're doing away with pg_autovacuum for good?
It's going to be replaced by reloptions, i.e.
ALTER TABLE foo SET (autovacuum_enabled = false);

Obviously there's no way to add a "catchall" setting.

> e.g.
> "ALL TABLES", autovacuum_enabled=false
> 
> I don't really want more GUCs for every nuance of AV behaviour.

In any case I fail to see how is this much different from a new GUC var.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: new GUC var: autovacuum_process_all_tables

From
Simon Riggs
Date:
On Thu, 2009-02-05 at 16:29 -0500, Tom Lane wrote:

> It'd make more sense to put the effort into developing
> better scheduling control over autovacuum, such as a concept of
> maintenance windows.

We need that as well, not instead of.

People want to be able to specify (as an example)
* autovac these problem tables anytime required
* for all other tables disable AV, except on sundays [non-busy-times]

If we're going to build in scheduling featured for AV, I'd like to make
those scheduling features available to user defined tasks also. No need
to limit ourselves to just AV.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: new GUC var: autovacuum_process_all_tables

From
Simon Riggs
Date:
On Thu, 2009-02-05 at 18:25 -0300, Alvaro Herrera wrote:

> So you're not aware that we're doing away with pg_autovacuum for good?
> It's going to be replaced by reloptions, i.e.
> ALTER TABLE foo SET (autovacuum_enabled = false);
> 
> Obviously there's no way to add a "catchall" setting.

Seems like a bad plan then. How do you reconcile those conflicting
requirements?

> > e.g.
> > "ALL TABLES", autovacuum_enabled=false
> > 
> > I don't really want more GUCs for every nuance of AV behaviour.
> 
> In any case I fail to see how is this much different from a new GUC var.

Rows in a table v. new parameters. We can allow endless table driven
complexity. Adding my_little_nuance=on|off strains most people's
patience.

How would I specify that database A wants AV turned off, but database B
wants it on?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: new GUC var: autovacuum_process_all_tables

From
Euler Taveira de Oliveira
Date:
Alvaro Herrera escreveu:
> Hi,
> 
> Right now, when autovacuum is turned on we always assume it's supposed
> to process all tables except those that have autovacuum_enabled=false.
> 
> Now, sometimes it might make more sense to keep it enabled but have it
> only check for certain tables, and leave the majority of them disabled.
> For this we'd have a separate GUC parameter, as in $SUBJECT (I'm not
> wedded to the name), and have the user set autovacuum_enabled=true via
> reloptions to enable it.
> 
> Opinions?
> 
What about 'autovacuum_mode'? Values could be 'all' and 'reloption'.

If we don't want to add another GUC, I'll go changing the 'autovacuum' GUC.
Values would be: 'on' means enable autovacuum and process all tables, 'off'
means disable autovacuum and 'reloption' (?) means only process those tables
that have reloption autovacuum_enabled=true. The con is that we couldn't
implement a per-{schema,database} switch for autovacuum. So I prefer the first
one.


--  Euler Taveira de Oliveira http://www.timbira.com/


Re: new GUC var: autovacuum_process_all_tables

From
Alvaro Herrera
Date:
Tom Lane wrote:

> (BTW, autovac does vacuum tables to prevent wraparound even if you try
> to tell it to skip them, right?)

Yes.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: new GUC var: autovacuum_process_all_tables

From
Alvaro Herrera
Date:
Simon Riggs wrote:
> 
> On Thu, 2009-02-05 at 18:25 -0300, Alvaro Herrera wrote:
> 
> > So you're not aware that we're doing away with pg_autovacuum for good?
> > It's going to be replaced by reloptions, i.e.
> > ALTER TABLE foo SET (autovacuum_enabled = false);
> > 
> > Obviously there's no way to add a "catchall" setting.
> 
> Seems like a bad plan then. How do you reconcile those conflicting
> requirements?

I don't see them as conflicting; I see yours as a missing feature,
namely the ability to add tables to an autovacuum "group", which could
have settings attached.  Being able to do that is the whole point of
moving settings to reloptions.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: new GUC var: autovacuum_process_all_tables

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On Thu, 2009-02-05 at 16:29 -0500, Tom Lane wrote:
>> It'd make more sense to put the effort into developing
>> better scheduling control over autovacuum, such as a concept of
>> maintenance windows.

> We need that as well, not instead of.

I disagree; adding every frammish anyone could ever think of is not
an overall improvement to the system.

My feeling is that we should be trying to eliminate use-cases for
cron-driven vacuuming, not trying to make sure that cron-driven
scripts can do anything autovacuum can.

The main remaining use-case seems to me to make vacuuming work adhere
to some business-determined schedule, hence maintenance windows seem
like the next thing to do.
        regards, tom lane


Re: new GUC var: autovacuum_process_all_tables

From
Simon Riggs
Date:
On Thu, 2009-02-05 at 18:54 -0300, Alvaro Herrera wrote:
> Simon Riggs wrote:
> > 
> > On Thu, 2009-02-05 at 18:25 -0300, Alvaro Herrera wrote:
> > 
> > > So you're not aware that we're doing away with pg_autovacuum for good?
> > > It's going to be replaced by reloptions, i.e.
> > > ALTER TABLE foo SET (autovacuum_enabled = false);
> > > 
> > > Obviously there's no way to add a "catchall" setting.
> > 
> > Seems like a bad plan then. How do you reconcile those conflicting
> > requirements?
> 
> I don't see them as conflicting; I see yours as a missing feature,
> namely the ability to add tables to an autovacuum "group", which could
> have settings attached.  Being able to do that is the whole point of
> moving settings to reloptions.

So your changes will allow these?

ALTER DATABASE foo SET (autovacuum_enabled = false);
ALTER SCHEMA foo SET (autovacuum_enabled = false);

CREATE TABLE GROUP foo_group;
ALTER TABLE foo SET TABLE GROUP foo_group;
ALTER TABLE foo2 SET TABLE GROUP foo_group;
ALTER TABLE GROUP SET (autovacuum_enabled = false);

Hopefully the grouping of tables is not purely related to AV?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: new GUC var: autovacuum_process_all_tables

From
"Joshua D. Drake"
Date:
On Thu, 2009-02-05 at 17:08 -0500, Tom Lane wrote:

> I disagree; adding every frammish anyone could ever think of is not
> an overall improvement to the system.
> 

:)

> My feeling is that we should be trying to eliminate use-cases for
> cron-driven vacuuming, not trying to make sure that cron-driven
> scripts can do anything autovacuum can.

Agreed. IMO, the user should only have to think about "vacuum" in an
abstract sense. With the exception being those "few" tables that need
the customized configuration (thus reloptions).

> 
> The main remaining use-case seems to me to make vacuuming work adhere
> to some business-determined schedule, hence maintenance windows seem
> like the next thing to do.
> 

Also agreed.

Sincerely,

Joshua D. Drake


>             regards, tom lane
> 
-- 
PostgreSQL - XMPP: jdrake@jabber.postgresql.org  Consulting, Development, Support, Training  503-667-4564 -
http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
 



Re: new GUC var: autovacuum_process_all_tables

From
Bruce Momjian
Date:
Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > On Thu, 2009-02-05 at 16:29 -0500, Tom Lane wrote:
> >> It'd make more sense to put the effort into developing
> >> better scheduling control over autovacuum, such as a concept of
> >> maintenance windows.
> 
> > We need that as well, not instead of.
> 
> I disagree; adding every frammish anyone could ever think of is not
> an overall improvement to the system.

Agreed, let's get this capability out in 8.4 and we can always adust it
based on user demand.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: new GUC var: autovacuum_process_all_tables

From
Simon Riggs
Date:
On Thu, 2009-02-05 at 17:57 -0500, Bruce Momjian wrote:
> Tom Lane wrote:
> > Simon Riggs <simon@2ndQuadrant.com> writes:
> > > On Thu, 2009-02-05 at 16:29 -0500, Tom Lane wrote:
> > >> It'd make more sense to put the effort into developing
> > >> better scheduling control over autovacuum, such as a concept of
> > >> maintenance windows.
> > 
> > > We need that as well, not instead of.
> > 
> > I disagree; adding every frammish anyone could ever think of is not
> > an overall improvement to the system.
> 
> Agreed, let's get this capability out in 8.4 and we can always adust it
> based on user demand.

Oh, I agree to limiting what we do for 8,4, but we need more later.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: new GUC var: autovacuum_process_all_tables

From
Simon Riggs
Date:
On Thu, 2009-02-05 at 17:08 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > On Thu, 2009-02-05 at 16:29 -0500, Tom Lane wrote:
> >> It'd make more sense to put the effort into developing
> >> better scheduling control over autovacuum, such as a concept of
> >> maintenance windows.
> 
> > We need that as well, not instead of.
> 
> I disagree; adding every frammish anyone could ever think of is not
> an overall improvement to the system.

I like your word frammish and am watchful of such things myself.

> My feeling is that we should be trying to eliminate use-cases for
> cron-driven vacuuming, 

Agreed.

> not trying to make sure that cron-driven
> scripts can do anything autovacuum can.

I'm not in favour of limiting our capability to internal actions only.
If we add a capability for scheduling work, we can easily make it
capable of scheduling many kinds of work.

Writing an application maintenance utility in PL/pgSQL is much better
than having to write it for all the different servers an application may
need to run on. We can't ignore that many people use Windows.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: new GUC var: autovacuum_process_all_tables

From
Greg Stark
Date:
On Thu, Feb 5, 2009 at 11:57 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> Writing an application maintenance utility in PL/pgSQL is much better
> than having to write it for all the different servers an application may
> need to run on.

Welcome to the suction effect. If your scheduler is in the database
then you're stuck with the interfaces the database provides. When you
use those interfaces you're going to be stuck with whatever tools work
with them. Imagine trying to compose MIME email in plpgsql or do dns
lookups or interface with your C application code. Plpgsql is
singularly unsuited for anything other than database work. Yes we have
other languages but there are still relatively few and having them
running within a PL interface makes integrating with the rest of their
systems more awkward. And more dangerous -- consider what a simple
memory management bug can do if it's in a database backend instead of
a network client.

> We can't ignore that many people use Windows.

I think  that logic is backwards. People choose their development and
server environment based on what works best for them. It makes no
sense to engineer the system around the assumption that they don't
like developing using the best native tools. Our reimplementation of
the OS is always going to be second-rate by comparison and it's doing
nothing for them but imposing the disadvantages of the restrictions
being stuck in a database backend brings.

-- 
greg


Re: new GUC var: autovacuum_process_all_tables

From
Robert Haas
Date:
>> Agreed, let's get this capability out in 8.4 and we can always adust it
>> based on user demand.
>
> Oh, I agree to limiting what we do for 8,4, but we need more later.

Thinking about this a little more, the biggest problem I have with
this feature is that it makes autovacuum_enabled mean two different
things depending on context.  But maybe we should change the name of
the reloption to "autovacuum" and have three values for it:
default|enabled|disabled.

Then we could add a GUC called autovacuum_by_default = on|off, and we
could later insert per-schema or per-database or per-table-group
defaults without introducing any backwards-incompatibility (default
would still mean default, though the default might come from a
different source).

...Robert


Re: new GUC var: autovacuum_process_all_tables

From
Andrew Dunstan
Date:

Simon Riggs wrote:
>> not trying to make sure that cron-driven
>> scripts can do anything autovacuum can.
>>     
>
> I'm not in favour of limiting our capability to internal actions only.
> If we add a capability for scheduling work, we can easily make it
> capable of scheduling many kinds of work.
>
> Writing an application maintenance utility in PL/pgSQL is much better
> than having to write it for all the different servers an application may
> need to run on. We can't ignore that many people use Windows.
>
>   

I'm not sure what you're saying here. Windows has a scheduler (in my 
setup, that's how my buildfarm members run). And there are third party 
cron utilities as well.


cheers

andrew


Re: new GUC var: autovacuum_process_all_tables

From
Josh Berkus
Date:
Alvaro,

First off, with over 200 GUC variables currently active, in general we 
should be looking to *eliminate* variables rather than adding them.  So 
my personal bar for endorsing a new GUC is set pretty high.

> Now, sometimes it might make more sense to keep it enabled but have it
> only check for certain tables, and leave the majority of them disabled.
> For this we'd have a separate GUC parameter, as in $SUBJECT (I'm not
> wedded to the name), and have the user set autovacuum_enabled=true via
> reloptions to enable it.

I can't imagine, nor have I encountered in the 3 years of consulting I 
did since Autovaccum became available, such a use case.

Unless there's a real, critical use case for this which is common, I'm 
opposed to this GUC.

On the other hand, I'd been keen on a runtime suset autovaccum=on/off 
which we could call from a cron job or the pgadmin scheduler in order to 
have maintenance windows.  Unless that's already becoming possible?

--Josh



Re: new GUC var: autovacuum_process_all_tables

From
Robert Haas
Date:
On Thu, Feb 5, 2009 at 7:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Thinking about this a little more, the biggest problem I have with
> this feature is that it makes autovacuum_enabled mean two different
> things depending on context.  But maybe we should change the name of
> the reloption to "autovacuum" and have three values for it:
> default|enabled|disabled.
>
> Then we could add a GUC called autovacuum_by_default = on|off, and we
> could later insert per-schema or per-database or per-table-group
> defaults without introducing any backwards-incompatibility (default
> would still mean default, though the default might come from a
> different source).

In fact (he said to himself), we could take this a step further and
call both the reloption and GUC "autovacuum_policy".  Then we could
have two policies for this release ("always" and "never") plus allow
"default" for the reloption.  Then future releases could allow users
to define additional policies, like "off-hours".

Just thinking out loud here folks...

...Robert


Re: new GUC var: autovacuum_process_all_tables

From
Simon Riggs
Date:
On Thu, 2009-02-05 at 19:23 -0500, Andrew Dunstan wrote:
> 
> Simon Riggs wrote:
> >> not trying to make sure that cron-driven
> >> scripts can do anything autovacuum can.
> >>     
> >
> > I'm not in favour of limiting our capability to internal actions only.
> > If we add a capability for scheduling work, we can easily make it
> > capable of scheduling many kinds of work.
> >
> > Writing an application maintenance utility in PL/pgSQL is much better
> > than having to write it for all the different servers an application may
> > need to run on. We can't ignore that many people use Windows.
> >
> >   
> 
> I'm not sure what you're saying here. Windows has a scheduler (in my 
> setup, that's how my buildfarm members run). And there are third party 
> cron utilities as well.

All I'm saying is *if* we put scheduling inside Postgres for autovacuum
*then* we should make it general purpose scheduling.

If anybody uses the argument that "we have external schedulers, so don't
put them in the database" then that argument applies equally to
scheduling autovacuum. It's easy to turn autovacuum on/off via an
external scheduler, yet look upthread and see how many people think it
should be in the database.

Whichever way you think the decision should go, the same arguments apply
to scheduling autovacuum and scheduling other database maintenance
tasks.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: new GUC var: autovacuum_process_all_tables

From
Simon Riggs
Date:
On Fri, 2009-02-06 at 00:07 +0000, Greg Stark wrote:
> On Thu, Feb 5, 2009 at 11:57 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >
> > Writing an application maintenance utility in PL/pgSQL is much better
> > than having to write it for all the different servers an application may
> > need to run on.
> 
> Welcome to the suction effect. If your scheduler is in the database
> then you're stuck with the interfaces the database provides. When you
> use those interfaces you're going to be stuck with whatever tools work
> with them. Imagine trying to compose MIME email in plpgsql or do dns
> lookups or interface with your C application code. Plpgsql is
> singularly unsuited for anything other than database work. Yes we have
> other languages but there are still relatively few and having them
> running within a PL interface makes integrating with the rest of their
> systems more awkward. And more dangerous -- consider what a simple
> memory management bug can do if it's in a database backend instead of
> a network client.

You're saying that because it would be wrong for some things, we're not
going to allow it at all. Re-read what you've written and you'll see
that same argument would ban all PLs, on the basis that we have external
language environments. I don't suppose you believe that.

There are many in-database-only actions that people want to schedule,
not just autovacuum. End of month data deletion, partition setup,
pre-joined or summary table creation, FY rollover calculations etc etc.
None of those have anything to do with the world outside the DB.

If writing database functions is possible and sometimes desirable in
Postgres, then scheduling them is also sometimes desirable also. I would
hope that we can give our users the choice.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: new GUC var: autovacuum_process_all_tables

From
Andrew Dunstan
Date:

Simon Riggs wrote:
> All I'm saying is *if* we put scheduling inside Postgres for autovacuum
> *then* we should make it general purpose scheduling.
>
> If anybody uses the argument that "we have external schedulers, so don't
> put them in the database" then that argument applies equally to
> scheduling autovacuum. It's easy to turn autovacuum on/off via an
> external scheduler, yet look upthread and see how many people think it
> should be in the database.
>
> Whichever way you think the decision should go, the same arguments apply
> to scheduling autovacuum and scheduling other database maintenance
> tasks.
>
>   

OK, I agree with that.

cheers

andrew



Re: new GUC var: autovacuum_process_all_tables

From
Alvaro Herrera
Date:
Josh Berkus wrote:

> I can't imagine, nor have I encountered in the 3 years of consulting I  
> did since Autovaccum became available, such a use case.

Ok, so due to popular demand, I'm not implementing this new GUC.

> On the other hand, I'd been keen on a runtime suset autovaccum=on/off  
> which we could call from a cron job or the pgadmin scheduler in order to  
> have maintenance windows.  Unless that's already becoming possible?

autovacuum=on/off is already SIGHUP as of 8.3 (not SUSET, since it makes
no sense to change it in a single connection).

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: new GUC var: autovacuum_process_all_tables

From
Ron Mayer
Date:
Joshua D. Drake wrote:
> On Thu, 2009-02-05 at 17:08 -0500, Tom Lane wrote:
>> My feeling is that we should be trying to eliminate use-cases for
>> cron-driven vacuuming, not trying to make sure that cron-driven
>> scripts can do anything autovacuum can.
> Agreed. IMO, the user should only have to think about "vacuum" in an
> abstract sense.

+1

>> The main remaining use-case seems to me to make vacuuming work adhere
>> to some business-determined schedule, hence maintenance windows seem
>> like the next thing to do.
> Also agreed.

Somewhat agreed - since in many cases the business-determined schedule
is just a rough estimate of measurable attributes of the machine. When
we say "vacuum between midnight and 5am" we often actually mean "vacuum
when the I/O subsystem has bandwidth to spare and the machine's otherwise
lightly loaded, and we guess that means late at night".



Re: new GUC var: autovacuum_process_all_tables

From
Alvaro Herrera
Date:
Ron Mayer wrote:
> Joshua D. Drake wrote:

> >> The main remaining use-case seems to me to make vacuuming work adhere
> >> to some business-determined schedule, hence maintenance windows seem
> >> like the next thing to do.
> > Also agreed.
> 
> Somewhat agreed - since in many cases the business-determined schedule
> is just a rough estimate of measurable attributes of the machine. When
> we say "vacuum between midnight and 5am" we often actually mean "vacuum
> when the I/O subsystem has bandwidth to spare and the machine's otherwise
> lightly loaded, and we guess that means late at night".

The current state of the system is not necessarily a good indicator of
the immediately future state.  If we were to collect history (I/O load
versus time of day and day of week) that would be another matter, but
I'm not sure that's a productive use of our development time.

If we could guess what's going to be the load of the server in the near
future, that would help scheduling considerable.  Since we can't, we
have to trust that the user can provide an educated guess, which is what
"maintenance windows" are all about.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: new GUC var: autovacuum_process_all_tables

From
Josh Berkus
Date:
Alvaro,

>> On the other hand, I'd been keen on a runtime suset autovaccum=on/off  
>> which we could call from a cron job or the pgadmin scheduler in order to  
>> have maintenance windows.  Unless that's already becoming possible?
> 
> autovacuum=on/off is already SIGHUP as of 8.3 (not SUSET, since it makes
> no sense to change it in a single connection).

Right.  What I'm saying is that if it *didn't* require a sighup, then 
users could cronjob starting and stopping Autovac themselves.

--Josh



Re: new GUC var: autovacuum_process_all_tables

From
Alvaro Herrera
Date:
Simon Riggs wrote:
> 
> On Thu, 2009-02-05 at 18:54 -0300, Alvaro Herrera wrote:

> > I don't see them as conflicting; I see yours as a missing feature,
> > namely the ability to add tables to an autovacuum "group", which could
> > have settings attached.  Being able to do that is the whole point of
> > moving settings to reloptions.
> 
> So your changes will allow these?
> 
> ALTER DATABASE foo SET (autovacuum_enabled = false);
> ALTER SCHEMA foo SET (autovacuum_enabled = false);

Hmm.  Perhaps being able to turn autovacuum on/off per-database is
desirable and possible, but I can't see doing it per schema.

> CREATE TABLE GROUP foo_group;
> ALTER TABLE foo SET TABLE GROUP foo_group;
> ALTER TABLE foo2 SET TABLE GROUP foo_group;
> ALTER TABLE GROUP SET (autovacuum_enabled = false);

Yes, something like that.

> Hopefully the grouping of tables is not purely related to AV?

Hmm, good question.  I was envisioning it only for autovacuum, but it
hasn't been vetted on pgsql-hackers.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: new GUC var: autovacuum_process_all_tables

From
Alvaro Herrera
Date:
Josh Berkus wrote:

>>> On the other hand, I'd been keen on a runtime suset autovaccum=on/off 
>>>  which we could call from a cron job or the pgadmin scheduler in 
>>> order to  have maintenance windows.  Unless that's already becoming 
>>> possible?
>>
>> autovacuum=on/off is already SIGHUP as of 8.3 (not SUSET, since it makes
>> no sense to change it in a single connection).
>
> Right.  What I'm saying is that if it *didn't* require a sighup, then  
> users could cronjob starting and stopping Autovac themselves.

Hmm, I'm not sure I understand what you're suggesting.  Maybe what you
want is that we have a SQL-accesible function that sends SIGHUP to the
postmaster?


Re: new GUC var: autovacuum_process_all_tables

From
Robert Haas
Date:
>> Hopefully the grouping of tables is not purely related to AV?
>
> Hmm, good question.  I was envisioning it only for autovacuum, but it
> hasn't been vetted on pgsql-hackers.

I think we're in danger of inventing a solution in search of a problem here.

AIUI, the main reason for table groups would be to define different
autovacuum policies for different groups of tables.  Right now, that
would be pretty stupid, because there are only two possible policies:
"yes" and "no".  But if the policy is something very complex, then
you're not going to want to redefine it for each individual table.
Instead, you're going to want to define it once and then point
individual tables at it.  But you could do that just as well by
assigning each policy a name or number and then setting a reloption on
the table to refer to that name or number, which would completely
avoid the need to invent all-new, non-standard syntax.

But if we do decide to invent such a syntax, it's not good enough to
say that we should make it general because it might be useful for a
purpose other than autovacuum.  We should have a pretty specific idea
of what sort of purpose that might be.  Otherwise, we'll likely find
(when the purpose finally arises) that the supposedly-general model we
introduced doesn't fit it as well as we thought.

But right now, we don't even have ONE use case for the general syntax,
let alone two, because the future autovacuum enhancements that would
make use of that syntax haven't been designed yet (or at least haven't
been discussed here yet).

...Robert


Re: new GUC var: autovacuum_process_all_tables

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Josh Berkus wrote:
>> Right.  What I'm saying is that if it *didn't* require a sighup, then  
>> users could cronjob starting and stopping Autovac themselves.

> Hmm, I'm not sure I understand what you're suggesting.  Maybe what you
> want is that we have a SQL-accesible function that sends SIGHUP to the
> postmaster?

Like, say, pg_reload_conf()?  But actually a cron job would almost
certainly find it more pleasant to use pg_ctl reload and never bother
with a database connection at all.
        regards, tom lane


Re: new GUC var: autovacuum_process_all_tables

From
Jaime Casanova
Date:
On Mon, Feb 9, 2009 at 12:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> Hopefully the grouping of tables is not purely related to AV?
>>
>> Hmm, good question.  I was envisioning it only for autovacuum, but it
>> hasn't been vetted on pgsql-hackers.
>
> I think we're in danger of inventing a solution in search of a problem here.
>
> AIUI, the main reason for table groups would be to define different
> autovacuum policies for different groups of tables.  Right now, that
> would be pretty stupid, because there are only two possible policies:
> "yes" and "no".

not really... the idea is to let one group to have autovacuum on in
certain periods of time and let them of the rest of the time...

or maybe a group of tables should be autovacuumed every 50 updates
(vac_base_thresh) and some tables every 100, in some hours maybe we
need to have different vac_cost_delay and vac_cost_limit...

actually there are different parameters that could be set...

> Instead, you're going to want to define it once and then point
> individual tables at it.  But you could do that just as well by
> assigning each policy a name or number and then setting a reloption on
> the table to refer to that name or number, which would completely
> avoid the need to invent all-new, non-standard syntax.
>

well the reloptions *is* invented and non-standard syntax

> But if we do decide to invent such a syntax, it's not good enough to
> say that we should make it general because it might be useful for a
> purpose other than autovacuum.  We should have a pretty specific idea
> of what sort of purpose that might be.  Otherwise, we'll likely find
> (when the purpose finally arises) that the supposedly-general model we
> introduced doesn't fit it as well as we thought.
>
> But right now, we don't even have ONE use case for the general syntax,
> let alone two, because the future autovacuum enhancements that would
> make use of that syntax haven't been designed yet (or at least haven't
> been discussed here yet).


--- devil's advocate mode on ---

a general purpose scheduler could be used for:
- REINDEX
- moving data around for OLAP
- periodically execute SP that has to change the status of a process
in a time driven way...
- autovacuum, and programming manual vacuums

--- devil's advocate mode off ---

now, we actually can do that work with external schedulers (cron in
linux, the windows task scheduler, etc)... the only two reasons i can
think to prefer our own sintax for this is: pg_dump support to keep
pilicies alive even in a fresh installed machine and marketing (two
good reasons if you ask me)

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


Re: new GUC var: autovacuum_process_all_tables

From
Robert Haas
Date:
>> AIUI, the main reason for table groups would be to define different
>> autovacuum policies for different groups of tables.  Right now, that
>> would be pretty stupid, because there are only two possible policies:
>> "yes" and "no".
>
> not really... the idea is to let one group to have autovacuum on in
> certain periods of time and let them of the rest of the time...

Yes, but that's a future enhancement, we don't have that now.

> or maybe a group of tables should be autovacuumed every 50 updates
> (vac_base_thresh) and some tables every 100, in some hours maybe we
> need to have different vac_cost_delay and vac_cost_limit...
>
> actually there are different parameters that could be set...
>
>> Instead, you're going to want to define it once and then point
>> individual tables at it.  But you could do that just as well by
>> assigning each policy a name or number and then setting a reloption on
>> the table to refer to that name or number, which would completely
>> avoid the need to invent all-new, non-standard syntax.
>
> well the reloptions *is* invented and non-standard syntax

Yes, but we already have that one.  IMO we should try to reuse it and
only invent new stuff if there is a compelling reason - which is so
far absent from this discussion.

>> But if we do decide to invent such a syntax, it's not good enough to
>> say that we should make it general because it might be useful for a
>> purpose other than autovacuum.  We should have a pretty specific idea
>> of what sort of purpose that might be.  Otherwise, we'll likely find
>> (when the purpose finally arises) that the supposedly-general model we
>> introduced doesn't fit it as well as we thought.
>>
>> But right now, we don't even have ONE use case for the general syntax,
>> let alone two, because the future autovacuum enhancements that would
>> make use of that syntax haven't been designed yet (or at least haven't
>> been discussed here yet).
>
>
> --- devil's advocate mode on ---
>
> a general purpose scheduler could be used for:
> - REINDEX
> - moving data around for OLAP
> - periodically execute SP that has to change the status of a process
> in a time driven way...
> - autovacuum, and programming manual vacuums
>
> --- devil's advocate mode off ---

AFAICS, table groups wouldn't help with any of that stuff.  I think
you're proving my point that we have no idea what we're implementing,
so it's a little premature to talk about what else the same
infrastructure can be used for.

> now, we actually can do that work with external schedulers (cron in
> linux, the windows task scheduler, etc)... the only two reasons i can
> think to prefer our own sintax for this is: pg_dump support to keep
> pilicies alive even in a fresh installed machine and marketing (two
> good reasons if you ask me)

Which are all great points, but not what I was talking about.  I am
talking about the table group stuff.

...Robert


Re: new GUC var: autovacuum_process_all_tables

From
Jaime Casanova
Date:
On Mon, Feb 9, 2009 at 1:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> AIUI, the main reason for table groups would be to define different
>>> autovacuum policies for different groups of tables.  Right now, that
>>> would be pretty stupid, because there are only two possible policies:
>>> "yes" and "no".
>>
>> not really... the idea is to let one group to have autovacuum on in
>> certain periods of time and let them of the rest of the time...
>
> Yes, but that's a future enhancement, we don't have that now.
>

that was what simon was talking about, IIRC... he was speculating
about a possible future syntax for grouping tables for use with a
possible future postgres scheduler...

>>
>> well the reloptions *is* invented and non-standard syntax
>
> Yes, but we already have that one.  IMO we should try to reuse it and
> only invent new stuff if there is a compelling reason - which is so
> far absent from this discussion.
>

reloptions is what we will use for autovacumm (actually Alvaro already
applied that patch)... no one is touching that... the group syntax is
for a future feature...

>>> But if we do decide to invent such a syntax, it's not good enough to
>>> say that we should make it general because it might be useful for a
>>> purpose other than autovacuum.  We should have a pretty specific idea
>>> of what sort of purpose that might be.  Otherwise, we'll likely find
>>> (when the purpose finally arises) that the supposedly-general model we
>>> introduced doesn't fit it as well as we thought.
>>>
>>> But right now, we don't even have ONE use case for the general syntax,
>>> let alone two, because the future autovacuum enhancements that would
>>> make use of that syntax haven't been designed yet (or at least haven't
>>> been discussed here yet).
>>
>>
>> --- devil's advocate mode on ---
>>
>> a general purpose scheduler could be used for:
>> - REINDEX
>> - moving data around for OLAP
>> - periodically execute SP that has to change the status of a process
>> in a time driven way...
>> - autovacuum, and programming manual vacuums
>>
>> --- devil's advocate mode off ---
>
> AFAICS, table groups wouldn't help with any of that stuff.  I think

table groups are not being implemented now... it was a mere
speculation about a way to apply a policy in a set of tables...
actually, Alvaro's response was: "something like that" so we have to
actually wait for his proposal before start a war on that and before
we think it could be general enough to include other policies (like
the ones for an scheduler)

> you're proving my point that we have no idea what we're implementing,
> so it's a little premature to talk about what else the same
> infrastructure can be used for.
>

that's because we are not implementing that now... it's for the future...

>> now, we actually can do that work with external schedulers (cron in
>> linux, the windows task scheduler, etc)... the only two reasons i can
>> think to prefer our own sintax for this is: pg_dump support to keep
>> pilicies alive even in a fresh installed machine and marketing (two
>> good reasons if you ask me)
>
> Which are all great points, but not what I was talking about.  I am
> talking about the table group stuff.
>

me too


--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157