Thread: RFC: changing autovacuum_naptime semantics

RFC: changing autovacuum_naptime semantics

From

Alvaro Herrera

Date:

07 March 2007, 19:00:17

Hackers,

I want to propose some very simple changes to autovacuum in order to
move forward (a bit):

1. autovacuum_naptime semantics
2. limiting the number of workers: global, per database, per tablespace?

I still haven't received the magic bullet to solve the hot table
problem, but these at least means we continue doing *something*.


Changing autovacuum_naptime semantics

Are we agreed on changing autovacuum_naptime semantics?  The idea is to
make it per-database instead of the current per-cluster, i.e., a "nap"
would be the minimum time that passes between starting one worker into a
database and starting another worker in the same database.

Currently, naptime is the time elapsed between two worker runs across
all databases.  So if you have 15 databases, autovacuuming each one
takes place every 15*naptime.

Eventually, we could have per-database naptime defined in pg_database,
and do away with the autovacuum_naptime GUC param (or maybe keep it as a
default value).  Say for database D1 you want to have workers every 60
seconds but for database D2 you want 1 hour.

Question:
Is everybody OK with changing the autovacuum_naptime semantics?


Limiting the number of workers

I was originally proposing having a GUC parameter which would limit the
cluster-wide maximum number of workers.  Additionally we could have a
per-database limit (stored in a pg_database column), being simple to
implement.  Josh Drake proposed getting rid of the GUC param, saying
that it would confuse users to set the per-database limit to some higher
value than the GUC setting and then finding the lower limit enforced
(presumably because of being unaware of it).

The problem is that we need to set shared memory up for workers, so we
really need a hard limit and it must be global.  Thus the GUC param is
not optional.

Other people also proposed having a per-tablespace limit.  This would
make a lot of sense, tablespaces being the natural I/O units.  However,
I'm not very sure it's too easy to implement, because you can put half
of database D1 and half of database D2 in tablespace T1, and the two
other halves in tablespace T2.  Then enforcing the limit becomes rather
complicated and will probably mean putting a worker to sleep.  I think
it makes more sense to skip implementing per-tablespace limits for now,
and have a plan to put per-tablespace IO throttles in the future.

Questions:
Is everybody OK with not putting a per-tablespace worker limit?
Is everybody OK with putting per-database worker limits on a pg_database
column?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: RFC: changing autovacuum_naptime semantics

From

Tom Lane

Date:

07 March 2007, 19:18:53

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Is everybody OK with changing the autovacuum_naptime semantics?

it seems already different from 8.2, so no objection to further change.

> Is everybody OK with not putting a per-tablespace worker limit?
> Is everybody OK with putting per-database worker limits on a pg_database
> column?

I don't think we need a new pg_database column.  If it's a GUC you can
do ALTER DATABASE SET, no?  Or was that what you meant?
        regards, tom lane

Re: RFC: changing autovacuum_naptime semantics

From

Galy Lee

Date:

07 March 2007, 21:54:58

Alvaro,

Alvaro Herrera wrote:
> I still haven't received the magic bullet to solve the hot table
> problem, but these at least means we continue doing *something*.

Can I know about what is your plan or idea for autovacuum improvement
for 8.3 now? And also what is the roadmap of autovacuum improvement for 8.4?

Thanks,

Galy Lee
lee.galy _at_ ntt.oss.co.jp
NTT Open Source Software Center

Re: RFC: changing autovacuum_naptime semantics

From

Jim Nasby

Date:

08 March 2007, 01:54:27

On Mar 7, 2007, at 4:00 PM, Alvaro Herrera wrote:
> Is everybody OK with putting per-database worker limits on a  
> pg_database
> column?

I'm worried that we would live to regret such a limit. I can't really  
see any reason to limit how many vacuums are occurring in a database,  
because there's no limiting factor there; you're either going to be  
IO bound (per-tablespace), or *maybe* CPU-bound (perhaps the  
Greenplum folks could enlighten us as to whether they run into vacuum  
being CPU-bound on thumpers).

Changing the naptime behavior to be database related makes perfect  
sense, because the minimum XID you have to worry about is a per- 
database thing; I just don't see limiting the number of vacuums as  
being per-database, though. I'm also skeptical that we'll be able to  
come up with a good way to limit the number of backends until we get  
the hot table issue addressed. Perhaps a decent compromise for now  
would be to limit how many 'small table' vacuums could run on each  
tablespace, and then limit how many 'unlimited table size' vacuums  
could run on each tablespace, where 'small table' would probably have  
to be configurable. I don't think it's the best final solution, but  
it should at least solve the immediate need.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: RFC: changing autovacuum_naptime semantics

From

Alvaro Herrera

Date:

08 March 2007, 14:18:41

Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:

> > Is everybody OK with not putting a per-tablespace worker limit?
> > Is everybody OK with putting per-database worker limits on a pg_database
> > column?
> 
> I don't think we need a new pg_database column.  If it's a GUC you can
> do ALTER DATABASE SET, no?  Or was that what you meant?

No, that doesn't work unless we save the datconfig column to the
pg_database flatfile, because it's the launcher (which is not connected) 
who needs to read it.  Same thing with an hypothetical per-database
naptime.  The launcher would also need to parse it, which is not ideal
(though not a dealbreaker either).

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: RFC: changing autovacuum_naptime semantics

From

Alvaro Herrera

Date:

08 March 2007, 14:44:25

Galy Lee wrote:

Hi,

> Alvaro Herrera wrote:
> > I still haven't received the magic bullet to solve the hot table
> > problem, but these at least means we continue doing *something*.
> 
> Can I know about what is your plan or idea for autovacuum improvement
> for 8.3 now? And also what is the roadmap of autovacuum improvement for 8.4?

Things I want to do for 8.3:

- Make use of the launcher/worker stuff, that is, allow multiple autovacuum processes in parallel.  With luck we'll
findout how to deal with hot tables.

Things I'm not sure we'll be able to have in 8.3, in which case I'll get
to them for early 8.4:

- The maintenance window stuff, i.e., being able to throttle workers depending on a user-defined schedule.

8.4 material:

- per-tablespace throttling, coordinating IO from multiple workers

I don't have anything else as detailed as a "plan".  If you have
suggestions, I'm all ears.

Now regarding your restartable vacuum work.  I think that stopping a
vacuum at some point and being able to restart it later is very cool and
may get you some hot chicks, but I'm not sure it's really useful.  I
think it makes more sense to do something like throttling an ongoing
vacuum to a reduced IO rate, if the maintenance window closes.  So if
you're in the middle of a heap scan and the maintenance window closes,
you immediately stop the scan and go the the index cleanup phase, *at a
reduced IO rate*.  This way, the user will be able to get the benefits
of vacuuming at some not-too-distant future, without requiring the
maintenance window to open again, but without the heavy IO impact that
was allowed during the maintenance window.

Does this make sense?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: RFC: changing autovacuum_naptime semantics

From

Galy Lee

Date:

08 March 2007, 20:44:03

Alvaro Herrera wrote:
> I don't have anything else as detailed as a "plan".  If you have
> suggestions, I'm all ears.
Cool, thanks for the update. :) We also have some new ideas on the
improvement of autovacuum now. I will raise it up later.

> Now regarding your restartable vacuum work.  
> Does this make sense?
I also have reached a similar conclusion now.  Thank you.

Regards
Galy

Re: RFC: changing autovacuum_naptime semantics

From

Tom Lane

Date:

09 March 2007, 01:42:50

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Now regarding your restartable vacuum work.  I think that stopping a
> vacuum at some point and being able to restart it later is very cool and
> may get you some hot chicks, but I'm not sure it's really useful.

Too true :-(

> I think it makes more sense to do something like throttling an ongoing
> vacuum to a reduced IO rate, if the maintenance window closes.  So if
> you're in the middle of a heap scan and the maintenance window closes,
> you immediately stop the scan and go the the index cleanup phase, *at a
> reduced IO rate*.

Er, why not just finish out the scan at the reduced I/O rate?  Any sort
of "abort" behavior is going to create net inefficiency, eg doing an
index scan to remove only a few tuples.  ISTM that the vacuum ought to
just continue along its existing path at a slower I/O rate.
        regards, tom lane

Re: RFC: changing autovacuum_naptime semantics

From

Galy Lee

Date:

09 March 2007, 02:41:59

Tom Lane wrote:
> Er, why not just finish out the scan at the reduced I/O rate?  Any sort

Sometimes, you may need to vacuum large table in maintenance window and
hot table in the service time. If vacuum for hot table does not eat two
much foreground resource, then you can vacuum large table with a lower
IO rate outside maintenance window; but if vacuum for hot table is
overeating the system resource, then launcher had better to stop the
long running vacuum outside maintenance window.

But I am not insisting on the stop-start feature at this moment.
Changing the cost delay dynamically sounds more reasonable. We can use
it to balance total I/O of workers in service time or maintenance time.
It is not so difficult to achieve this by leveraging the share memory of
autovacuum.

Best Regards
Galy Lee

Re: RFC: changing autovacuum_naptime semantics

From

Grzegorz Jaskiewicz

Date:

09 March 2007, 04:39:15

On Mar 9, 2007, at 6:42 AM, Tom Lane wrote:

> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> Now regarding your restartable vacuum work.  I think that stopping a
>> vacuum at some point and being able to restart it later is very  
>> cool and
>> may get you some hot chicks, but I'm not sure it's really useful.
>
> Too true :-(

Yeah.
Wouldn't 'divide and conquer' kinda approach make it better ? Ie. let  
vacuum to work on some part of table/db. Than stop, pick up another  
part later, vacuum it, etc, etc ?

-- 
Grzegorz Jaskiewicz
gj@pointblue.com.pl

Re: RFC: changing autovacuum_naptime semantics

From

Gregory Stark

Date:

09 March 2007, 08:35:02

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Er, why not just finish out the scan at the reduced I/O rate?  Any sort
> of "abort" behavior is going to create net inefficiency, eg doing an
> index scan to remove only a few tuples.  ISTM that the vacuum ought to
> just continue along its existing path at a slower I/O rate.

I think the main motivation to abort a vacuum scan is so we can switch to some
more urgent scan. So if in the middle of a 1-hour long vacuum of some big
warehouse table we realize that a small hot table is long overdue for a vacuum
we want to be able to remove the tuples we've found so far, switch to the hot
table, and when we don't have more urgent tables to vacuum resume the large
warehouse table vacuum.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

Re: RFC: changing autovacuum_naptime semantics

From

Alvaro Herrera

Date:

09 March 2007, 08:49:12

Gregory Stark wrote:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
> 
> > Er, why not just finish out the scan at the reduced I/O rate?  Any sort
> > of "abort" behavior is going to create net inefficiency, eg doing an
> > index scan to remove only a few tuples.  ISTM that the vacuum ought to
> > just continue along its existing path at a slower I/O rate.
> 
> I think the main motivation to abort a vacuum scan is so we can switch to some
> more urgent scan. So if in the middle of a 1-hour long vacuum of some big
> warehouse table we realize that a small hot table is long overdue for a vacuum
> we want to be able to remove the tuples we've found so far, switch to the hot
> table, and when we don't have more urgent tables to vacuum resume the large
> warehouse table vacuum.

Why not just let another autovac worker do the hot table?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support