Re: [GENERAL] Autovacuum Improvements - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: [GENERAL] Autovacuum Improvements
Date
Msg-id 1168639847.3990.158.camel@silverbirch.site
Whole thread Raw
Responses Re: [GENERAL] Autovacuum Improvements  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: [GENERAL] Autovacuum Improvements  ("Pavan Deolasee" <pavan@enterprisedb.com>)
Re: [GENERAL] Autovacuum Improvements  ("Pavan Deolasee" <pavan@enterprisedb.com>)
List pgsql-hackers
On Fri, 2006-12-29 at 20:25 -0300, Alvaro Herrera wrote:
> Christopher Browne wrote:
>
> > Seems to me that you could get ~80% of the way by having the simplest
> > "2 queue" implementation, where tables with size < some threshold get
> > thrown at the "little table" queue, and tables above that size go to
> > the "big table" queue.
> >
> > That should keep any small tables from getting "vacuum-starved."
>
> Hmm, would it make sense to keep 2 queues, one that goes through the
> tables in smaller-to-larger order, and the other one in the reverse
> direction?
>
> I am currently writing a design on how to create "vacuum queues" but I'm
> thinking that maybe it's getting too complex to handle, and a simple
> idea like yours is enough (given sufficient polish).

Sounds good to me. My colleague Pavan has just suggested multiple
autovacuums and then prototyped something almost as a side issue while
trying to solve other problems. I'll show him this entry, maybe he saw
it already? I wasn't following this discussion until now.

The 2 queue implementation seemed to me to be the most straightforward
implementation, mirroring Chris' suggestion. A few aspects that haven't
been mentioned are:
- if you have more than one VACUUM running, we'll need to watch memory
management. Having different queues based upon table size is a good way
of doing that, since the smaller queues have a naturally limited memory
consumption.
- with different size-based queues, the larger VACUUMs can be delayed so
they take much longer, while the small tables can go straight through

Some feedback from initial testing is that 2 queues probably isn't
enough. If you have tables with 100s of blocks and tables with millions
of blocks, the tables in the mid-range still lose out. So I'm thinking
that a design with 3 queues based upon size ranges, plus the idea that
when a queue is empty it will scan for tables slightly above/below its
normal range. That way we wouldn't need to specify the cut-offs with a
difficult to understand new set of GUC parameters, define them exactly
and then have them be wrong when databases grow.

The largest queue would be the one reserved for Xid wraparound
avoidance. No table would be eligible for more than one queue at a time,
though it might change between queues as it grows.

Alvaro, have you completed your design?

Pavan, what are your thoughts?

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Neil Conway
Date:
Subject: Re: NaN behavior
Next
From: markwkm@gmail.com
Date:
Subject: Re: O_DIRECT, or madvise and/or posix_fadvise