Thread: pg_autovacuum next steps

Re: pg_autovacuum next steps

From

Tom Lane

Date:

21 March 2004, 19:12:41

"Matthew T. O'Connor" <matthew@zeut.net> writes:
> Lately I have been thinking about the next steps for the pg_autovacuum
> daemon.  I have written up a document that describes what I'm planning
> to do next.   Please read the attached and response as I would really
> like some feedback.

> [ rtf document ]

Please repost in some less proprietary format.  Plain text is generally
considered the thing to use on this list.
        regards, tom lane

Re: pg_autovacuum next steps

From

Peter Eisentraut

Date:

21 March 2004, 20:37:42

Matthew T. O'Connor wrote:
> Lately I have been thinking about the next steps for the
> pg_autovacuum daemon.  I have written up a document that describes
> what I'm planning to do next.   Please read the attached and response
> as I would really like some feedback.

I think these configuration issues will become a lot easier if you make 
the autovacuum daemon a subprocess of the postmaster (like, say, the 
checkpoint process).  Then you have access to a host of methods for 
storing state, handling configuration, etc.

Re: pg_autovacuum next steps

From

Christopher Kings-Lynne

Date:

21 March 2004, 21:18:00

> I think these configuration issues will become a lot easier if you make 
> the autovacuum daemon a subprocess of the postmaster (like, say, the 
> checkpoint process).  Then you have access to a host of methods for 
> storing state, handling configuration, etc.

Yeah - why delay making it a backend process? :)

Chris

Re: pg_autovacuum next steps

From

"Matthew T. O'Connor"

Date:

21 March 2004, 23:21:13

On Sun, 2004-03-21 at 20:31, Christopher Kings-Lynne wrote:
> > I think these configuration issues will become a lot easier if you make 
> > the autovacuum daemon a subprocess of the postmaster (like, say, the 
> > checkpoint process).  Then you have access to a host of methods for 
> > storing state, handling configuration, etc.
> 
> Yeah - why delay making it a backend process? :)

Ok, well this was part of the reason to have this conversation.

My reasons:
A) I wasn't sure if people really thought this was ready to be
integrated.  Tom had said a while ago, that it was a good to keep it as
a contrib module while it's still actively being developed.

B) Perhaps people like the idea of it being a client app (I don't think
so.)

C) Most importantly, I'm not backend hacker.  If someone wants to do the
initial work of getting it running as a backend process, I can take it
from there.  A while ago, Bruce offered to help me with any backend
issues I might have, so perhaps with a little help I can take a run at
it.

So the first question big question is: Do we want to make it a backend
subprocess now?

Secondly, are there any other features that people are interested in
that were not mentioned in my document?

Matthew O'Connor

Re: pg_autovacuum next steps

From

"Matthew T. O'Connor"

Date:

21 March 2004, 23:36:23

On Sun, 2004-03-21 at 18:12, Tom Lane wrote:
> "Matthew T. O'Connor" <matthew@zeut.net> writes:
> > [ rtf document ]
>
> Please repost in some less proprietary format. Plain text is generally
> considered the thing to use on this list.

I don't think RTF is proprietary but I should have just posted inline
anyway so here is a copy:

pg_autovacuum Version 2
Design Document:

Exec Summary:
pg_autovacuum was initially released as a contrib module in PostgreSQL
v7.4. The version in 7.4 is by design very simple. No configuration
is required, and very little configuration is possible. Despite these
limitations it was voted the most popular new feature of PostgreSQL v7.4
according to the survey held on postgresql.org
(http://www.postgresql.org/survey.php?View=1&SurveyID=23).

Despite it's popularity there is much room for improvement. This
document sets out to define the most important improvements that would
help pg_autovacuum to become a truly powerful asset to the suite of
tools that come with PostgreSQL.

Current Problems & Limitations:

Based on user feedback from people using pg_autovacuum in the field, and
my own observations, there are a number of problems and limitation with
pg_autovacuum. They are:

* Inability to customize thresholds on a per table basis
* Inability to set default thresholds on a per database basis
* Inability to exclude specific databases / tables from pg_autovacuum
monitoring
* Inability to schedule vacuums during off-peak times
* Lack of integration related to startup and shutdown
* Ignorance of VACUUM and ANALYZE operations performed outside
pg_autovacuum (requires backend integration? or can listen / notify can
be used?)
* Lack of logging options / syslog integration / log rotation options
* Create table fails because template1 is busy

I'm not sure how to address all of these concerns, or that they all
should be addressed right now. One of my big questions is backend
integration. I am leaning towards leaving pg_autovacuum as a client
application in contrib for one more release. During this time, I can
continue to tweak and improve pg_autovacuum so that we will have a very
good idea what the final product should be before we make it a standard
backend process.

For PostgreSQL 7.5, I plan to implement these new features:
1.Per database defaults and per table thresholds (including total
exclusion)2.Persistent data3.Single-Pass Mode (external scheduling from cron etc...)4.Off peak scheduling

1. Per Database defaults and Per table Thresholds:

There are differing opinions as to the best way to providing these this
feature. The primary debate is where to save the configuration data. I
see three options:
1.Store config data inside a special pg_autovacuum table inside
existing databases that wants custom settings.
2.Use a config file. This would require some additional coding to add
the required parsing, but is possible.
3.Create a pg_autovacuum database inside any cluster that wants to
customize their settings.

Since many people do not like tools that clutter their databases by
adding tables, I think option 1 (adding a pg_autovacuum table to
existing databases) is right out. Using a config file would be Ok, but
would require additional parsing code. My preference is option 3.
Since pg_autovacuum will (hopefully) eventually become an integrated
part of the backend, it will eventually be able to add required data to
the system catalogs. Given these two premises, as long as pg_autovacuum
remains a contrib module it could use it's own database to mimic having
system tables. If this database exists, it will be used, if it does not
exist, then pg_autovacuum will work just as it did in the 7.4 release
with very limited options available to it. The user will be able to
specify a non-default database.

Table Structure for database specific defaults and table specific
thresholds:

databases_defaults: (will reference the pg_class system table)id serial primary keyexclude_database
booleandefault_vacuum_scaling_factor floatdefault_vacuum_base_value
intdefault_analyze_scaling_factor floatdefault_analyze_base_value intdboid oid references
pg_database.oid
table_thresholdsid serial primary keyexclude_table boolean (exclude this
table)vacuum_scaling_factor float (equivalent to -v)vacuum_base_value int (equivalent to
-V)vacuum_threshold float (if > 0, use this threshold)analyze_scaling_factor float (equivalent to
-a)analyze_base_value int (equivalent to -A)analyze_threshold float (if > 0 use this threshold)relid
oid references pg_classs.relid

2.Persistent pg_autovacuum Data:

Right now pg_autovacuum has no memory of what was going on the last time
it was run. So if significant changes have happened while pg_autovacuum
is not running, they will not be counted in the analysis of when to
perform a vacuum or analyze operation which can result in under
vacuuming. So, pg_autovacuum should occasionally write down it's
numbers to the database. The data will be stored in an additional
table called table_data

table_dataid serial primary keyCountAtLastAnalyze long CountAtLastVacuum longtable_thresholds_id
intreferences table_thresholds

3.Single-Pass Mode (External Scheduling):

I have received requests to be able to run pg_autovacuum only on request
(not as a daemon) making only one pass over all the tables (not looping
indefinately). The advantage being that it will operate more like the
current vacuum command except that it will only vacuum tables that need
to be vacuumed. This feature could be useful as long as pg_autovacuum
exists outside the backend. If pg_autovacuum gets integrated into the
backend and gets automatically started as a daemon during startup, then
this option will no longer make sense.

Once we have persistent data (Step 2) then we can easily operate in
Single-Pass Mode.

4.Off-Peak Scheduling:

A fundamental advantage of our vacuum system is that the work required
to reclaim table space is taken out of the critical path and can be
moved to and off-peak time when cycles are less precious. One of the
drawbacks of the current pg_autovacuum is that it doesn't have any way
to factor this in.

In it's simplest form (which I will implement first) I would add the
ability to add a second set of thresholds that will be active only
during an “off-peak” time that can be specified in the pg_autovacuum
database, perhaps in a general_settings table.

A few notes about things I'm not planning on working on at the moment.

Better logging options:

An additional logging could would be added to the pg_autovacuum database
and will log all activity (vacuums and analyzes) along with their
corresponding duration.

Syslog support. I'm not sure this is really needed, but a simple patch
was submitted by one user and perhaps that can be reviewed / improved
and applied.

Re: pg_autovacuum next steps

From

Gavin Sherry

Date:

21 March 2004, 23:39:18

On Sun, 21 Mar 2004, Matthew T. O'Connor wrote:

> On Sun, 2004-03-21 at 20:31, Christopher Kings-Lynne wrote:
> > > I think these configuration issues will become a lot easier if you make
> > > the autovacuum daemon a subprocess of the postmaster (like, say, the
> > > checkpoint process).  Then you have access to a host of methods for
> > > storing state, handling configuration, etc.
> >
> > Yeah - why delay making it a backend process? :)
>
> Ok, well this was part of the reason to have this conversation.
>
> My reasons:
> A) I wasn't sure if people really thought this was ready to be
> integrated.  Tom had said a while ago, that it was a good to keep it as
> a contrib module while it's still actively being developed.

I was talking to Jan about some other work on VACUUM related to more
intelligent vacuuming. Namely, maintaining a map (outside of shared
memory) of blocks which have been pushed out of the free space map for
VACUUM to visit (which requires a backend process) and being aware of load
restrictions (ie, allowing user to only vacuum when the load average is
less than X, for example) and some other leveling stuff to ensure that
availability is consistent. Whilst this doesn't related to pg_autovacuum
specifically, it'd be great if they could be released at the same time, I
think.

>
> B) Perhaps people like the idea of it being a client app (I don't think
> so.)
>

I'd like to see it as part of the backend.

> C) Most importantly, I'm not backend hacker.  If someone wants to do the
> initial work of getting it running as a backend process, I can take it
> from there.  A while ago, Bruce offered to help me with any backend
> issues I might have, so perhaps with a little help I can take a run at
> it.

I'd be happy to help you out.

Gavin

Re: pg_autovacuum next steps

From

Bruce Momjian

Date:

22 March 2004, 00:00:58

> > B) Perhaps people like the idea of it being a client app (I don't think
> > so.)
> >
> 
> I'd like to see it as part of the backend.
> 
> > C) Most importantly, I'm not backend hacker.  If someone wants to do the
> > initial work of getting it running as a backend process, I can take it
> > from there.  A while ago, Bruce offered to help me with any backend
> > issues I might have, so perhaps with a little help I can take a run at
> > it.
> 
> I'd be happy to help you out.

Agreed.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: pg_autovacuum next steps

From

"Matthew T. O'Connor"

Date:

22 March 2004, 03:35:46

On Sun, 2004-03-21 at 23:00, Bruce Momjian wrote:
> > > C) Most importantly, I'm not backend hacker.  If someone wants to do the
> > > initial work of getting it running as a backend process, I can take it
> > > from there.  A while ago, Bruce offered to help me with any backend
> > > issues I might have, so perhaps with a little help I can take a run at
> > > it.
> > 
> > I'd be happy to help you out.
> 
> Agreed.

Ok, thanks for the offer to help, but I think I understated things above
when I said I'll need a "little" help :-)

I have a few big picture questions.  Once pg_autovacuum is launched as a
postmaster sub-process, what changes?  That is, currently pg_autovacuum
uses libpq to connect to a database and issue queries including a vacuum
/ analyze command when needed.  After becoming a subprocess will
(should) it still use libpq to connect to the databases, I don't think
so, is it even possible to do that?  If not, how will it checkout the
stats of all the different databases?  I guess it should fork() a new
backend, connect to it somehow, and use it to query the database, but
I'm really not sure how this works.

I'm looking through the backend startup code to see how the stats
collector and the bgwriter work since they are probably two semi-close
examples of what I'll have to do.  I think checkpoints does something
similar in that it issues a checkpoint command. 

Thanks again...

Matthew

Re: pg_autovacuum next steps

From

Gavin Sherry

Date:

22 March 2004, 04:20:38

On Mon, 22 Mar 2004, Matthew T. O'Connor wrote:

> On Sun, 2004-03-21 at 23:00, Bruce Momjian wrote:
> > > > C) Most importantly, I'm not backend hacker.  If someone wants to do the
> > > > initial work of getting it running as a backend process, I can take it
> > > > from there.  A while ago, Bruce offered to help me with any backend
> > > > issues I might have, so perhaps with a little help I can take a run at
> > > > it.
> > >
> > > I'd be happy to help you out.
> >
> > Agreed.
>
> Ok, thanks for the offer to help, but I think I understated things above
> when I said I'll need a "little" help :-)
>

I haven't looked at the code but...

> I have a few big picture questions.  Once pg_autovacuum is launched as a
> postmaster sub-process, what changes?  That is, currently pg_autovacuum
> uses libpq to connect to a database and issue queries including a vacuum
> / analyze command when needed.  After becoming a subprocess will
> (should) it still use libpq to connect to the databases, I don't think

It could use libpq but most definately shouldn't.

> so, is it even possible to do that?  If not, how will it checkout the
> stats of all the different databases?  I guess it should fork() a new
> backend, connect to it somehow, and use it to query the database, but
> I'm really not sure how this works.

It can interact with the stats collector (seperate backend) in the same
way that existing backends interact: through a domain socket.

> I'm looking through the backend startup code to see how the stats
> collector and the bgwriter work since they are probably two semi-close
> examples of what I'll have to do.  I think checkpoints does something
> similar in that it issues a checkpoint command.

The vacuum backend will call vacuum() (or something very like it)
directly. I imagine that when it gets called and on which table will be
based upon the existing algorithm.

Thanks,

Gavin

Re: pg_autovacuum next steps

From

Gavin Sherry

Date:

22 March 2004, 04:36:19

> > Ok, thanks for the offer to help, but I think I understated things above
> > when I said I'll need a "little" help :-)
> >
>
> I haven't looked at the code but...
>
> > I have a few big picture questions.  Once pg_autovacuum is launched as a
> > postmaster sub-process, what changes?  That is, currently pg_autovacuum
> > uses libpq to connect to a database and issue queries including a vacuum
> > / analyze command when needed.  After becoming a subprocess will
> > (should) it still use libpq to connect to the databases, I don't think
>
> It could use libpq but most definately shouldn't.
>
> > so, is it even possible to do that?  If not, how will it checkout the
> > stats of all the different databases?  I guess it should fork() a new
> > backend, connect to it somehow, and use it to query the database, but
> > I'm really not sure how this works.
>
> It can interact with the stats collector (seperate backend) in the same
> way that existing backends interact: through a domain socket.
>
> > I'm looking through the backend startup code to see how the stats
> > collector and the bgwriter work since they are probably two semi-close
> > examples of what I'll have to do.  I think checkpoints does something
> > similar in that it issues a checkpoint command.
>
> The vacuum backend will call vacuum() (or something very like it)
> directly. I imagine that when it gets called and on which table will be
> based upon the existing algorithm.

One point is this: vacuum() assumes that you are running in a fully
fledged backend. There'd be a fair bit of work involved in allowing a
single process to call vacuum() against multiple databases. As such, I
think that a vacuum backend for a specific database should be forked upon
the first connect. Also, the backend might like to try and workout if
there are any active backends for its database every so often and if not,
perform a final vacuum (if necessary) and exit, so that we don't have lots
of idle processes sitting around.

Is there a better approach than this?

Gavin

Re: pg_autovacuum next steps

From

Karel Zak

Date:

22 March 2004, 05:23:03

On Mon, Mar 22, 2004 at 02:35:37AM -0500, Matthew T. O'Connor wrote:
> On Sun, 2004-03-21 at 23:00, Bruce Momjian wrote:
> > > > C) Most importantly, I'm not backend hacker.  If someone wants to do the
> > > > initial work of getting it running as a backend process, I can take it
> > > > from there.  A while ago, Bruce offered to help me with any backend
> > > > issues I might have, so perhaps with a little help I can take a run at
> > > > it.
> > > 
> > > I'd be happy to help you out.
> > 
> > Agreed.
> 
> Ok, thanks for the offer to help, but I think I understated things above
> when I said I'll need a "little" help :-)
> 
> I have a few big picture questions.  Once pg_autovacuum is launched as a
> postmaster sub-process, what changes? 
All. It's important do  it as backend process. Because  libpq has very,very limited and slow resources for work with
backendstuff.The base should be the standard backend with different "main loop" thatwill instead  socket checks  some
shared information about  tables andcalls  directly vacuum  stuff. In  this  case you  can  omit work  withconnections,
parseretc.
 
I thought about  it in last days  and I found perfect  Tom's idea aboutFSM tables usage:
       What  I had  in the  back of  my mind  was: each  backend counts       attempted insertions  and deletions in
its relcache entries (an       update adds to both counts).  At transaction commit or abort, we       know which  of
these two counts represents  the number  of dead       tuples added to each relation, so while we scan the relcache for
     post-xact  cleanup  (which  we  will be  doing  anyway)  we  can       transfer the  correct count  into the
shared FSM entry  for the       relation.  This gives  us a reasonably accurate  count in shared       memory of all
thetuple obsoletions since bootup,  at least for       heavily-used  tables.  (The  FSM  might choose  to forget  about
     lightly-used tables.)  The  auto vacuumer could look  at the FSM       numbers to decide which tables are highest
priorityto vacuum.       (2002-09-03 08:10:32)
 
I looked at the code and I  think extend FSM tables will pretty simple,but I unsure how relcache counters Tom thought.
Tom?
   Karel

-- Karel Zak  <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/

Re: pg_autovacuum next steps

From

Richard Huxton

Date:

22 March 2004, 08:25:36

On Monday 22 March 2004 03:36, Matthew T. O'Connor wrote:
> 1. Per Database defaults and Per table Thresholds:
>
> There are differing opinions as to the best way to providing these this
> feature.  The primary debate is where to save the configuration data.  I
> see three options:
>
>  1.Store config data inside a special pg_autovacuum table inside
> existing databases that wants custom settings.
>
>  2.Use a config file.  This would require some additional coding to add
> the required parsing, but is possible.
>
>  3.Create a pg_autovacuum database inside any cluster that wants to
> customize their settings.
>
> Since many people do not like tools that clutter their databases by
> adding tables, I think option 1 (adding a pg_autovacuum table to
> existing databases) is right out.  Using a config file would be Ok, but
> would require additional parsing code.  My preference is option 3.

I've nothing against #3 as a default, but can I put in a suggestion for 1 & 3, 
or rather some setting definable at runtime/build-time that lets you select 
database + schema for autovacuum to find its config data. 

I might be wrong, but it strikes me as the sort of thing people running shared 
environments will want to choose for themselves.

--  Richard Huxton Archonet Ltd

Re: pg_autovacuum next steps

From

Jan Wieck

Date:

22 March 2004, 09:27:49

Gavin Sherry wrote:
> On Sun, 21 Mar 2004, Matthew T. O'Connor wrote:
> 
>> On Sun, 2004-03-21 at 20:31, Christopher Kings-Lynne wrote:
>> > > I think these configuration issues will become a lot easier if you make
>> > > the autovacuum daemon a subprocess of the postmaster (like, say, the
>> > > checkpoint process).  Then you have access to a host of methods for
>> > > storing state, handling configuration, etc.
>> >
>> > Yeah - why delay making it a backend process? :)
>>
>> Ok, well this was part of the reason to have this conversation.
>>
>> My reasons:
>> A) I wasn't sure if people really thought this was ready to be
>> integrated.  Tom had said a while ago, that it was a good to keep it as
>> a contrib module while it's still actively being developed.
> 
> I was talking to Jan about some other work on VACUUM related to more
> intelligent vacuuming. Namely, maintaining a map (outside of shared
> memory) of blocks which have been pushed out of the free space map for
> VACUUM to visit (which requires a backend process) and being aware of load
> restrictions (ie, allowing user to only vacuum when the load average is
> less than X, for example) and some other leveling stuff to ensure that
> availability is consistent. Whilst this doesn't related to pg_autovacuum
> specifically, it'd be great if they could be released at the same time, I
> think.

I don't recall the "outside of shared memory" part. Anyhow, the whole 
story goes like this:

Maintain 2 bits per block that tell if the block has been vaccumed of 
all dead tuples since the last time it was dirtied, and if all its 
tuples are completely frozen. If those two conditions are true, there is 
no need to vacuum that block at all (Red Flag!!! On further thinking I 
realized that this assumes that the FSM is loss free).

With a default 8K blocksize, this means 32K per 1GB segment, making 4 
additional blocks. I actually think that these extra blocks should be 
somehow part of the heap files, so that they are subject to the regular 
buffer management.

To keep the lock contention on them low, vacuum and backends will 
set/clear new flags in the bufhdr flags member. That way, the bgwriter 
and checkpointer will be the usual suspects to set/clear these flags in 
the shared bitmap array stored in the extra blocks.

As to where to store these blocks, some block number arithmetic magic 
comes to mind. That way a blocks relnode and blockno automatically lead 
to the bits, even in the case of blind writes.

Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: pg_autovacuum next steps

From

"Matthew T. O'Connor"

Date:

22 March 2004, 11:31:45

On Mon, 2004-03-22 at 03:36, Gavin Sherry wrote:
> One point is this: vacuum() assumes that you are running in a fully
> fledged backend. There'd be a fair bit of work involved in allowing a
> single process to call vacuum() against multiple databases. 

I can't imagine we want to do that.

> As such, I
> think that a vacuum backend for a specific database should be forked upon
> the first connect. Also, the backend might like to try and workout if
> there are any active backends for its database every so often and if not,
> perform a final vacuum (if necessary) and exit, so that we don't have lots
> of idle processes sitting around.

I don't understand, are you saying that upon postmaster startup a vacuum
backend should be forked for every database in the cluster?

> Is there a better approach than this?

One thought I had is that there wouldn't be a persistent daemon type
process, rather the auto_vacuum process, like the check point process,
would be kicked off periodically, it would initially be attached to
template1, then when it's done there it will figure out the next
database to take a look at and fork, the existing template1 process will
exit, and the new process will be connected to the next database, just
repeat this forking loop until done.  

Thoughts?

Re: pg_autovacuum next steps

From

Tom Lane

Date:

22 March 2004, 11:52:18

"Matthew T. O'Connor" <matthew@zeut.net> writes:
> There are differing opinions as to the best way to providing these this
> feature.  The primary debate is where to save the configuration data.  I
> see three options:

>  1.Store config data inside a special pg_autovacuum table inside
> existing databases that wants custom settings.   

>  2.Use a config file.  This would require some additional coding to add
> the required parsing, but is possible. 

>  3.Create a pg_autovacuum database inside any cluster that wants to
> customize their settings.  

> Since many people do not like tools that clutter their databases by
> adding tables, I think option 1 (adding a pg_autovacuum table to
> existing databases) is right out.  Using a config file would be Ok, but
> would require additional parsing code.  My preference is option 3. 

Personally I like #2.  The claim that this requires extra code seems
bogus to me --- when you are working at the C code level, reading a
table takes code too.  I think both 1 and 3 represent clutter.  Also,
they will introduce entirely new sets of problems that we already know
how to deal with in the case of config files, but have no experience
with for config tables.  Some examples: how do you get the daemon to
notice that you changed the config (SIGHUP for config files, ??
otherwise)?  Can you easily get at the config data from a daemon process
regardless of which database it's attached to (if any)?  If the config
data is in a table, what sorts of interesting problems will arise from
trying to vacuum that table itself?

Basically I see config tables as cutting against the grain of everything
we've done to date in Postgres DBA-level configuration, and I do not see
a reason to justify that sort of reinvention.
        regards, tom lane

Re: pg_autovacuum next steps

From

Tom Lane

Date:

22 March 2004, 11:59:08

Gavin Sherry <swm@linuxworld.com.au> writes:
> One point is this: vacuum() assumes that you are running in a fully
> fledged backend. There'd be a fair bit of work involved in allowing a
> single process to call vacuum() against multiple databases.

Make that "it isn't going to happen".

> As such, I
> think that a vacuum backend for a specific database should be forked upon
> the first connect. Also, the backend might like to try and workout if
> there are any active backends for its database every so often and if not,
> perform a final vacuum (if necessary) and exit, so that we don't have lots
> of idle processes sitting around.

Lots of idle processes sitting around is right out, too.  Remember that
each one would eat a backend connection slot.  I think we are going to
have to limit this to *one* process at a time.  What that probably means
is that we successively launch an autovacuum process against each
database, it does whatever seems appropriate in that database and then
quits.  We could manage this just like checkpoints are presently managed
--- the only thing the postmaster has to know is the desired idle period
between end of one autovacuum and start of the next.
        regards, tom lane

Re: pg_autovacuum next steps

From

"Matthew T. O'Connor"

Date:

22 March 2004, 12:07:17

On Mon, 2004-03-22 at 10:58, Tom Lane wrote:
> Lots of idle processes sitting around is right out, too.  Remember that
> each one would eat a backend connection slot.  I think we are going to
> have to limit this to *one* process at a time.  What that probably means
> is that we successively launch an autovacuum process against each
> database, it does whatever seems appropriate in that database and then
> quits.  We could manage this just like checkpoints are presently managed
> --- the only thing the postmaster has to know is the desired idle period
> between end of one autovacuum and start of the next.

Ok, I was thinking a similar thing (see my response to Gavin).  So we
could have autovacuum fired off by the postmaster and it will connect to
databases as needed in a serial fashion. 

Matthew

Re: pg_autovacuum next steps

From

"Matthew T. O'Connor"

Date:

22 March 2004, 12:07:23

On Mon, 2004-03-22 at 07:25, Richard Huxton wrote:
> On Monday 22 March 2004 03:36, Matthew T. O'Connor wrote:
> >  1.Store config data inside a special pg_autovacuum table inside
> > existing databases that wants custom settings.
> >
> >  2.Use a config file.  This would require some additional coding to add
> > the required parsing, but is possible.
> >
> >  3.Create a pg_autovacuum database inside any cluster that wants to
> > customize their settings.parsing code.  My preference is option 3.
> 
> I've nothing against #3 as a default, but can I put in a suggestion for 1 & 3, 
> or rather some setting definable at runtime/build-time that lets you select 
> database + schema for autovacuum to find its config data. 
> 
> I might be wrong, but it strikes me as the sort of thing people running shared 
> environments will want to choose for themselves.

If pg_autovacuum was being designed to live forever as a client app,
then I agree admins having a choice would be good.  But as we are going
to eventually move any auto_vacuum data and settings into the system
tables (when autovacuum is part of the system), I don't see the need to
expend the extra cycles, especially since people seem to be pushing hard
for autovacuum to be a backend function sooner rather than later.

Re: pg_autovacuum next steps

From

"Matthew T. O'Connor"

Date:

22 March 2004, 12:16:38

On Mon, 2004-03-22 at 04:23, Karel Zak wrote:
>  All. It's important do  it as backend process. Because  libpq has very,
>  very limited and slow resources for work with backend stuff.

Agreed.
>  The base should be the standard backend with different "main loop" that
>  will instead  socket checks  some shared  information about  tables and
>  calls  directly vacuum  stuff. In  this  case you  can  omit work  with
>  connections, parser etc.

So am I to understand I can start up a postmaster subprocess and then be
able to monitor the activity of all the databases?  I guess that makes
sense since I would be talking to the stats collector directly via a
socket.  But that doesn't solve the problem of issuing the vacuum to
different databases, I would still create a new backend for every
database that needs a vacuum or analyze issues.

Also, we don't want to launch multiple simultaneous vacuums so we want
the commands to be serialized (I know some people want to be able to do
this if databases are located on different disks, but for now i'm
keeping things simple).  I have an idea for this that I just mentioned
in another message to the list.

>  I thought about  it in last days  and I found perfect  Tom's idea about
>  FSM tables usage:

There has been lots of discusion of incorporating FSM data into the
auto_vacuum decision process.  I am interested in exploring this, but
since I'm already biting off more than I can easily chew, I am going to
try and leave the decision making process the same for now.  

BTW I think we need to use both tools (stats and FSM) since not all
tables will be in the FSM, an insert only table still needs to be
analyzed periodically and a lightly updated table will eventually need
to be vacuumed.

Re: pg_autovacuum next steps

From

"Matthew T. O'Connor"

Date:

22 March 2004, 12:24:26

On Mon, 2004-03-22 at 10:51, Tom Lane wrote:
> >  2.Use a config file.  This would require some additional coding to add
> > the required parsing, but is possible. 
> 
> Personally I like #2.  The claim that this requires extra code seems
> bogus to me --- when you are working at the C code level, reading a
> table takes code too.  I think both 1 and 3 represent clutter.  Also,
> they will introduce entirely new sets of problems that we already know
> how to deal with in the case of config files, but have no experience
> with for config tables.  Some examples: how do you get the daemon to
> notice that you changed the config (SIGHUP for config files, ??
> otherwise)?  Can you easily get at the config data from a daemon process
> regardless of which database it's attached to (if any)?  If the config
> data is in a table, what sorts of interesting problems will arise from
> trying to vacuum that table itself?

I guess what I mean by extra code is extra work since I am comfortable
storing data in tables, and would need to do some work to put it in a
config file.

Some of this conversations should change if we really are going to
integrate autovacuum into the backend.  My suggestions for storing
config data was really assuming that we would continue to be a client
app for at least one more release cycle, but I think I'm hearing that
people want it in the backend now.

Assuming that we are being integrated into the backend, and assuming
that a autovacuum process is fired off periodically by the postmaster, I
think some config items should become GUC options and others will become
part of the system tables.  For example: enable_autovacuum should be a
GUC option, but the vacuum threshold for a given table should probably
just be a new column in the pg_class table.  Any data stored in the
system tables will be check every time the autovacuum process is run so
there won't be any problems with noticing that the admin change the
threshold for a given table. 

> Basically I see config tables as cutting against the grain of everything
> we've done to date in Postgres DBA-level configuration, and I do not see
> a reason to justify that sort of reinvention.

I agree if we are talking about a client app, but I think the scheme
laid out above is consistent for something that is part of the backend,
do you disagree?

Thanks,

Matthew O'Connor

Re: pg_autovacuum next steps

From

"Alex J. Avriette"

Date:

22 March 2004, 15:39:38

On Sun, Mar 21, 2004 at 05:32:59PM -0500, Matthew T. O'Connor wrote:
> Lately I have been thinking about the next steps for the pg_autovacuum
> daemon.  I have written up a document that describes what I'm planning
> to do next.   Please read the attached and response as I would really
> like some feedback.

Hi, Matthew. For our uses, we found that pg_autovacuum did not behave
as expected with vacuum_threshold set to 0.  For our purposes, we have
a very good idea of how many tuples need to be slurped up over a given
interval, and would like the autovacuum daemon to simply go through and
vacuum when it hits a, er, threshold.

Obligatory explanation: We were doing this with cron, but when
something gets really ugly, cron is not as intuitive as autovacuum.

Having this in the postmaster would be fine by me (and in fact, I would
even suggest that it belongs there), but I really need to have more
flexibility or we are going to (read: I will be tasked with) be writing
our own.

Sorry for the terseness -- and I haven't had a chance to go over the
whole thread; we're moving from 7.3.2 to 7.4.2 on "the big database" 
this week.

Alex

--
alex@posixnap.net
Alex J. Avriette, Systems Programmer
"You have a spell checker. It is called a brain. Please use it." - Chris Lyth

Re: pg_autovacuum next steps

From

Gavin Sherry

Date:

22 March 2004, 17:15:48

> > As such, I
> > think that a vacuum backend for a specific database should be forked upon
> > the first connect. Also, the backend might like to try and workout if
> > there are any active backends for its database every so often and if not,
> > perform a final vacuum (if necessary) and exit, so that we don't have lots
> > of idle processes sitting around.
>
> Lots of idle processes sitting around is right out, too.  Remember that
> each one would eat a backend connection slot.  I think we are going to
> have to limit this to *one* process at a time.  What that probably means
> is that we successively launch an autovacuum process against each
> database, it does whatever seems appropriate in that database and then
> quits.  We could manage this just like checkpoints are presently managed
> --- the only thing the postmaster has to know is the desired idle period
> between end of one autovacuum and start of the next.

*slaps hand on forehead*

Yes, this is the best approach.

So, do we want a static time, a GUC controlled time or some time which is
modified by pg_autovacuum's/stat's collector's knowledge of the amount of
work which goes on in any given database?

Gavin

Re: pg_autovacuum next steps

From

Tom Lane

Date:

22 March 2004, 17:38:46

Gavin Sherry <swm@linuxworld.com.au> writes:
> So, do we want a static time, a GUC controlled time or some time which is
> modified by pg_autovacuum's/stat's collector's knowledge of the amount of
> work which goes on in any given database?

From the point of view of the postmaster a GUC-controlled delay would
seem like the best thing.  We could discuss having the autovacuum code
try to feed back adjustments in the delay, but remember that one of the
golden virtues for the postmaster proper is simplicity; that translates
directly to robustness.  We don't want the postmaster engaging in
anything complicated that could potentially lock it up or crash it due
to a bug.

Possibly this point could be finessed by a two-level structure, that is,
postmaster launches autovacuum daemon (which is not itself a backend)
and that in turn launches backends to do the real per-database work.
The postmaster's only subsequent involvement is restarting the autovac
daemon if it dies.  The autovac daemon can be as complex as you want.

This nice-sounding arrangement is probably not directly workable because
of the fact that the postmaster has no good way to know about or control
backends if they aren't its direct children.  Perhaps the autovac daemon
*should* use libpq, that is, not fork backends but connect via the
postmaster each time it wants to run a backend.  Then the backends are
ordinary children of the postmaster and everything acts normally.
(This could amount to using the existing autovac code, and simply adding
a frammish to the postmaster to autospawn the autovac daemon as a
non-backend child process.)
        regards, tom lane

Re: pg_autovacuum next steps

From

"Matthew T. O'Connor"

Date:

22 March 2004, 17:46:35

Tom Lane wrote:

>From the point of view of the postmaster a GUC-controlled delay would
>seem like the best thing.  We could discuss having the autovacuum code
>try to feed back adjustments in the delay, but remember that one of the
>golden virtues for the postmaster proper is simplicity; that translates
>directly to robustness.  We don't want the postmaster engaging in
>anything complicated that could potentially lock it up or crash it due
>to a bug.
>  
>

Agreed.

[snip...]

>This nice-sounding arrangement is probably not directly workable because
>of the fact that the postmaster has no good way to know about or control
>backends if they aren't its direct children.  Perhaps the autovac daemon
>*should* use libpq, that is, not fork backends but connect via the
>postmaster each time it wants to run a backend.  Then the backends are
>ordinary children of the postmaster and everything acts normally.
>(This could amount to using the existing autovac code, and simply adding
>a frammish to the postmaster to autospawn the autovac daemon as a
>non-backend child process.)
>

Well this certainly sounds like something that would be easy to do, 
which appeals to me at least as a first cut.  Question: Does this mean 
that I lose many of the advantages of being "in the backend"?  That is, 
would pg_autovacuum still be able to use facilities such as GUC, error 
reporting, access to the FSM, direct pipe connection to the stats 
collector etc?

Re: pg_autovacuum next steps

From

"Matthew T. O'Connor"

Date:

22 March 2004, 17:50:48

Alex J. Avriette wrote:

>Hi, Matthew. For our uses, we found that pg_autovacuum did not behave
>as expected with vacuum_threshold set to 0.  For our purposes, we have
>a very good idea of how many tuples need to be slurped up over a given
>interval, and would like the autovacuum daemon to simply go through and
>vacuum when it hits a, er, threshold.
>  
>

Could you please explain this better, I don't really understand what the 
problem is.  If you want pg_autovacuum to perform a vacuum on a table 
that has had exactly X updates no matter what, you can just run it with 
-V0 -vX (where X is the vacuum threshold)  same thing can be done for 
analyze.  Are you saying that you did this and it did not work as expected?

>Having this in the postmaster would be fine by me (and in fact, I would
>even suggest that it belongs there), but I really need to have more
>flexibility or we are going to (read: I will be tasked with) be writing
>our own.
>  
>

Everyone agrees it should be in the backend eventually the questions 
were just if it was ready, and how or if it should change when it goes in.

>Sorry for the terseness -- and I haven't had a chance to go over the
>whole thread; we're moving from 7.3.2 to 7.4.2 on "the big database" 
>this week.
>

No problem, but I need more information to diagnose your problem.


Matthew

Re: pg_autovacuum next steps

From

Tom Lane

Date:

22 March 2004, 17:56:18

"Matthew T. O'Connor" <matthew@zeut.net> writes:
> Well this certainly sounds like something that would be easy to do, 
> which appeals to me at least as a first cut.  Question: Does this mean 
> that I lose many of the advantages of being "in the backend"?  That is, 
> would pg_autovacuum still be able to use facilities such as GUC, error 
> reporting, access to the FSM, direct pipe connection to the stats 
> collector etc?

If you aren't a backend then you couldn't safely access shared memory,
including FSM in particular.  I see no reason you couldn't use GUC
though.  There is no "direct pipe connection" to the stats collector,
except in the output direction which is not what you want, so I'm not
seeing your point there.

I am not sure that lack of FSM access is a showstopper, though.  We
could easily imagine inventing backend commands to read out whatever
info you want from FSM, so you could request the info from your
connected backend.

The more I think about this the more I like it --- it keeps the autovac
control code still at arms length from the backend which will surely
ease development and experimentation.  I suppose there is some overhead
in pushing data back and forth over the FE/BE protocol, but surely that
would be negligible next to the "real work" of vacuuming.
        regards, tom lane

Re: pg_autovacuum next steps

From

"Matthew T. O'Connor"

Date:

22 March 2004, 18:30:06

Tom Lane wrote:

>If you aren't a backend then you couldn't safely access shared memory,
>including FSM in particular.  I see no reason you couldn't use GUC
>though.  There is no "direct pipe connection" to the stats collector,
>except in the output direction which is not what you want, so I'm not
>seeing your point there.
>  
>

I probably said that wrong, but how do backends get their stats data?  
Meaning, when I do a "select * from pg_stat_all_tables" how is the 
backend getting that data.  The reason I'm interested is that if I don't 
have to fire up a backend just to check the stats that would reduce the 
load associated with the autovacuum daemon.  Someone earlier in the 
thread seemed to imply there was a way to do this.

>I am not sure that lack of FSM access is a showstopper, though.  We
>could easily imagine inventing backend commands to read out whatever
>info you want from FSM, so you could request the info from your
>connected backend.
>  
>

Yeah I agree, and for phase 1 we can just continue working only on stats 
data.

>The more I think about this the more I like it --- it keeps the autovac
>control code still at arms length from the backend which will surely
>ease development and experimentation.  I suppose there is some overhead
>in pushing data back and forth over the FE/BE protocol, but surely that
>would be negligible next to the "real work" of vacuuming.
>

Right, I think the overhead would be negligible.  Since you seem to 
think this is (or at least might be) a good idea, I will go ahead and 
try to get the postmaster to fire-up the autovacuum daemon.  So that the 
1st cut, will basically be pg_autovacuum exactly as it stands now,  just 
launched by the postmaster.

Also, you didn't mention if I will be able to use the backend logging 
functions, I would guess that I can, but I'm not totally sure.

Thanks again,

Matthew

Re: pg_autovacuum next steps

From

Tom Lane

Date:

22 March 2004, 18:43:07

"Matthew T. O'Connor" <matthew@zeut.net> writes:
> Tom Lane wrote:
>> There is no "direct pipe connection" to the stats collector,

> I probably said that wrong, but how do backends get their stats data?  

They read it out of a flat file that the stats collector rewrites every
so often.

> Meaning, when I do a "select * from pg_stat_all_tables" how is the 
> backend getting that data.  The reason I'm interested is that if I don't 
> have to fire up a backend just to check the stats that would reduce the 
> load associated with the autovacuum daemon.

Is that really worth the loss of independence?  I'm not sure one way or
the other myself.  I suppose the autovac daemon could still be run by
hand for testing purposes, but it would have to run as user postgres and
on the same machine as the postmaster.

> Right, I think the overhead would be negligible.  Since you seem to 
> think this is (or at least might be) a good idea, I will go ahead and 
> try to get the postmaster to fire-up the autovacuum daemon.  So that the 
> 1st cut, will basically be pg_autovacuum exactly as it stands now,  just 
> launched by the postmaster.

> Also, you didn't mention if I will be able to use the backend logging 
> functions, I would guess that I can, but I'm not totally sure.

Yes, if you are launched by the postmaster then you'd have access to its
stderr etc, so you could just log via elog/ereport.  Again though this
puts you in a situation where the daemon *must* be launched by the
postmaster or it won't work.  How much arm's-length separation are you
willing to give up?
        regards, tom lane

Re: pg_autovacuum next steps

From

Gavin Sherry

Date:

22 March 2004, 19:02:04

On Mon, 22 Mar 2004, Tom Lane wrote:

> "Matthew T. O'Connor" <matthew@zeut.net> writes:
> > Well this certainly sounds like something that would be easy to do,
> > which appeals to me at least as a first cut.  Question: Does this mean
> > that I lose many of the advantages of being "in the backend"?  That is,
> > would pg_autovacuum still be able to use facilities such as GUC, error
> > reporting, access to the FSM, direct pipe connection to the stats
> > collector etc?
>

[snip]

> The more I think about this the more I like it --- it keeps the autovac
> control code still at arms length from the backend which will surely
> ease development and experimentation.  I suppose there is some overhead
> in pushing data back and forth over the FE/BE protocol, but surely that
> would be negligible next to the "real work" of vacuuming.

I was initially against the idea of using libpq but its growing on me too.

I think it would be good if the core functions of pg_autovacuum: threshold
algorithms, connection, issuing commands can be (re?)designed such that
not only the backend can link against it but also a stripped down
pg_autovacuum binary which can be used manually. That way users can have a
choice and a workaround if there are any issues with the backend model.
Also useful for debugging.

Thanks,

Gavin

Re: pg_autovacuum next steps

From

Joe Conway

Date:

22 March 2004, 19:31:59

Matthew T. O'Connor wrote:
> * Inability to customize thresholds on a per table basis

I ran headlong into this one. IMHO fixing this is critical.

> * Inability to set default thresholds on a per database basis
> * Inability to exclude specific databases / tables from pg_autovacuum
> monitoring

These would be nice to have, but less critical than #1 I think.

> * Inability to schedule vacuums during off-peak times

This would be *really* nice to have. In my recent case, if pg_autovacuum 
could work for say 3 minutes, and then back off for 2 minutes or so 
while the batch transactions hit, it would be ideal.

> I'm not sure how to address all of these concerns, or that they all
> should be addressed right now.  One of my big questions is backend
> integration.  I am leaning towards leaving pg_autovacuum as a client
> application in contrib for one more release.  During this time, I can
> continue to tweak and improve pg_autovacuum so that we will have a very
> good idea what the final product should be before we make it a standard
> backend process.

I really think pg_autovacuum ought to get folded into the backend now, 
for 7.5. I haven't had time yet to read the entire thread, but I saw 
others making the same comment. It would make some of the listed 
problems go away, or at least become far easier to deal with.

> For PostgreSQL 7.5,  I plan to implement these new features:
> 
>  1.Per database defaults and per table thresholds (including total
> exclusion)

Great!

>  2.Persistent data
>  3.Single-Pass Mode (external scheduling from cron etc...)
>  4.Off peak scheduling

Great again!

> 1. Per Database defaults and Per table Thresholds:
> 
>  1.Store config data inside a special pg_autovacuum table inside
> existing databases that wants custom settings.   

A natural if folded into the backend.

> 3.Single-Pass Mode (External Scheduling):
> 
> I have received requests to be able to run pg_autovacuum only on request
> (not as a daemon) making only one pass over all the tables (not looping
> indefinately).  The advantage being that it will operate more like the
> current vacuum command except that it will only vacuum tables that need
> to be vacuumed.  This feature could be useful as long as pg_autovacuum
> exists outside the backend.  If pg_autovacuum gets integrated into the
> backend and gets automatically started as a daemon during startup, then
> this option will no longer make sense.

It still might make sense. You could have a mode where the daemon 
essentially sleeps forever, until explicitly woken up by a signal. When 
woken, it makes one pass, and goes back to infinite sleep. Then provide 
a simple way to signal the autovacuum process -- maybe an extension of 
the current VACUUM syntax.

> 4.Off-Peak Scheduling:
> 
> A fundamental advantage of our vacuum system is that the work required
> to reclaim table space is taken out of the critical path and can be
> moved to and off-peak time when cycles are less precious.  One of the
> drawbacks of the current pg_autovacuum is that it doesn't have any way
> to factor this in.
> 
> In it's simplest form (which I will implement first) I would add the
> ability to add a second set of thresholds that will be active only
> during an “off-peak” time that can be specified in the pg_autovacuum
> database, perhaps in a general_settings table.

I don't know how this would work, but it is for sure important. In the 
recent testing I found that pg_autovacuum (well, lazy vacuum in general, 
but I was using pg_autovacuum to control it) made a huge difference in 
performance of batch transactions. They range from 4-5 seconds without 
vacuum running, to as high as 15 minutes with vacuum running. With the 
vacuum delay patch, delay = 1, pagecount = 8, I still saw times go as 
high as 10 minutes. Backing vacuum off any more than that caused it to 
fall behind the transaction rate unrecoverably. But as I said above, if 
the transactions could complete without vacuum running in 4-5 seconds, 
then vacuuming resumes for the 3-to-4 minutes between batches, all would 
be well.

Joe

Re: pg_autovacuum next steps

From

"Matthew T. O'Connor"

Date:

22 March 2004, 19:46:10

Joe Conway wrote:

> Matthew T. O'Connor wrote:
>
>> * Inability to schedule vacuums during off-peak times
>
>
> This would be *really* nice to have. In my recent case, if 
> pg_autovacuum could work for say 3 minutes, and then back off for 2 
> minutes or so while the batch transactions hit, it would be ideal.
>
I'm not sure what you are suggesting here.  As it stands right now, 
pg_autovacuum just issues a standard vacuum command, so there isn't 
anything pg_autovacuum can do until that command completes.  There has 
been a lot of work going on trying to reduce performance impact  
associated with a vacuum (vacuum delay, ARC etc), hopefully that will 
make a big difference.

> I really think pg_autovacuum ought to get folded into the backend now, 
> for 7.5. I haven't had time yet to read the entire thread, but I saw 
> others making the same comment. It would make some of the listed 
> problems go away, or at least become far easier to deal with.
>
Yeah, that seems to be the consensus, I am going to work on that next.

>> 3.Single-Pass Mode (External Scheduling):
>
>
> It still might make sense. You could have a mode where the daemon 
> essentially sleeps forever, until explicitly woken up by a signal. 
> When woken, it makes one pass, and goes back to infinite sleep. Then 
> provide a simple way to signal the autovacuum process -- maybe an 
> extension of the current VACUUM syntax.
>
Well one thing we talked about was having the autovacuum process not be 
a daemon that is running all the time but rather a process that is 
spawned periodically by the postmaster, so you wouldn't have to worry 
about the signal / wake up issues.   I could see this working much like 
checkpoints where the postmaster fires them off on a schedule, there is 
nothing stopping you from issuing a checkpoint command to force one 
immediately.  So perhaps there could be a new SQL command like "VACUUM 
AUTO" or something like that.

>> 4.Off-Peak Scheduling:
>>
>> In it's simplest form (which I will implement first) I would add the
>> ability to add a second set of thresholds that will be active only
>> during an “off-peak” time that can be specified in the pg_autovacuum
>> database, perhaps in a general_settings table.
>
>
> I don't know how this would work, but it is for sure important. In the 
> recent testing I found that pg_autovacuum (well, lazy vacuum in 
> general, but I was using pg_autovacuum to control it) made a huge 
> difference in performance of batch transactions. They range from 4-5 
> seconds without vacuum running, to as high as 15 minutes with vacuum 
> running. With the vacuum delay patch, delay = 1, pagecount = 8, I 
> still saw times go as high as 10 minutes. Backing vacuum off any more 
> than that caused it to fall behind the transaction rate unrecoverably. 
> But as I said above, if the transactions could complete without vacuum 
> running in 4-5 seconds, then vacuuming resumes for the 3-to-4 minutes 
> between batches, all would be well. 


Again, once the vacuum command is issued, it's out of pg_autovacuums 
control.  There has been some talk of pg_autovacuum looking at the 
system load to see if it should wait, or passing some different delay 
settings to vacuum based on system activity, so maybe some of that will 
help.

What I was talking about with Off-Peak scheduling is really just setting 
different thresholds for different times of the day, (or perhaps for 
different systems loads) so that if you know your app is typically busy 
from 8AM - 8PM,  then you can set more conservative thresholds during 
that time window,  and more aggressive thresholds during the Off-Peak time.

Thanks for the feedback,

Matthew

Re: pg_autovacuum next steps

From

"Matthew T. O'Connor"

Date:

22 March 2004, 20:42:44

Gavin Sherry wrote:

>I was initially against the idea of using libpq but its growing on me too.
>
>I think it would be good if the core functions of pg_autovacuum: threshold
>algorithms, connection, issuing commands can be (re?)designed such that
>not only the backend can link against it but also a stripped down
>pg_autovacuum binary which can be used manually. That way users can have a
>choice and a workaround if there are any issues with the backend model.
>Also useful for debugging.
>

I agree.  Initially,  it appears that this won't be a problem since the 
pg_autovacuum executable will be untouched (or as untouched as possible 
anyway), it will just be launched by the backend.  Going forward if I 
use any of the functionality provided by the backend (error reporting 
etc...) I will then have to deign it to work in both stand alone and 
postmaster sub-process modes, which I think is doable.

Matthew

Re: pg_autovacuum next steps

From

Joe Conway

Date:

22 March 2004, 20:52:10

Matthew T. O'Connor wrote:
> Could you please explain this better, I don't really understand what the 
> problem is.  If you want pg_autovacuum to perform a vacuum on a table 
> that has had exactly X updates no matter what, you can just run it with 
> -V0 -vX (where X is the vacuum threshold)  same thing can be done for 
> analyze.  Are you saying that you did this and it did not work as expected?

I think the problem with that (I initially tried it myself) is that 
without a per-table setting, it is impractical. For instance, I want the 
one large table in my app vacuumed every 500,000 dead tuples, but now my 
smaller tables never reach the threshold.

Joe

Re: pg_autovacuum next steps

From

Joe Conway

Date:

22 March 2004, 21:05:49

Matthew T. O'Connor wrote:
> Joe Conway wrote:
>> This would be *really* nice to have. In my recent case, if 
>> pg_autovacuum could work for say 3 minutes, and then back off for 2 
>> minutes or so while the batch transactions hit, it would be ideal.
>>
> I'm not sure what you are suggesting here.  As it stands right now, 
> pg_autovacuum just issues a standard vacuum command, so there isn't 
> anything pg_autovacuum can do until that command completes.  There has 
> been a lot of work going on trying to reduce performance impact  
> associated with a vacuum (vacuum delay, ARC etc), hopefully that will 
> make a big difference.

Hopefully it will. I've not had a chance yet to test against cvs tip.

I guess what I was suggesting would have to be built in to the lazy 
vacuum process itself, similar to the vacuum delay stuff. I have no idea 
if it would be practical, but ideally every X number of pages there 
would be a way to detect "current load on the postgres cluster" or 
"current load on the server", and then either sleep, or go full bore, 
depending on the result and some threshold setting. Kind of like the 
vacuum delay patch, but with "don't bother to sleep if the system is idle".

Joe

Re: pg_autovacuum next steps

From

"Alex J. Avriette"

Date:

22 March 2004, 22:45:18

On Mon, Mar 22, 2004 at 04:50:57PM -0500, Matthew T. O'Connor wrote:

> Could you please explain this better, I don't really understand what the 
> problem is.  If you want pg_autovacuum to perform a vacuum on a table 
> that has had exactly X updates no matter what, you can just run it with 
> -V0 -vX (where X is the vacuum threshold)  same thing can be done for 
> analyze.  Are you saying that you did this and it did not work as expected?

My experience was it did not vacuum at all.

> Everyone agrees it should be in the backend eventually the questions 
> were just if it was ready, and how or if it should change when it goes in.

It is very simple code, I'm not sure what is "not ready" about it. If
people don't want to use it, they don't have to. Many databases can go
days to weeks without vacuuming.

alex

--
alex@posixnap.net
Alex J. Avriette, Unix Systems Gladiator
"You cannot invade the mainland United States. There would be a rifle behind each blade of grass." - Admiral Isoroku
Yamamoto

Re: pg_autovacuum next steps

From

"Matthew T. O'Connor"

Date:

17 May 2004, 16:10:09

Tom Lane wrote:

>"Matthew T. O'Connor" <matthew@zeut.net> writes:
>  
>
>>I probably said that wrong, but how do backends get their stats data?  
>>    
>>
>
>They read it out of a flat file that the stats collector rewrites every
>so often.
>
>  
>
Ok so that would be easy to do (if we decide we want to)

>Is that really worth the loss of independence?  I'm not sure one way or
>the other myself.  I suppose the autovac daemon could still be run by
>hand for testing purposes, but it would have to run as user postgres and
>on the same machine as the postmaster.
>
>  
>
I'm not sure, it would allow autovacuum to check the stats more 
regularly.  I suppose it would be possible for pg_autovacuum to 
recognize if it's been launched by the postmaster or not and look up 
stats as appropriate, but as you say, I'm not sure it's worth it, and my 
first cut will work exactly as the current pg_autovacuum does.

>>Also, you didn't mention if I will be able to use the backend logging 
>>functions, I would guess that I can, but I'm not totally sure.
>>    
>>
>
>Yes, if you are launched by the postmaster then you'd have access to its
>stderr etc, so you could just log via elog/ereport.  Again though this
>puts you in a situation where the daemon *must* be launched by the
>postmaster or it won't work.  How much arm's-length separation are you
>willing to give up?
>

Well I think I would be more willing to give up the separation for 
logging purposes.  I would think that an error message issued by a 
postmaster sub-process should wind up in the same location as an error 
issued by the postmaster proper.  Also, people have requested lots of 
logging options such as syslog etc and I think it would be nice to get 
all this for free.

I would think that pg_autovacuum could determine if it's a postmaster 
sub-process and log appropriately, does sound reasonable?

Thread: pg_autovacuum next steps

Attachment

Attachment