Re: Second attempt, roll your own autovacuum - Mailing list pgsql-general

From Christopher Browne
Subject Re: Second attempt, roll your own autovacuum
Date
Msg-id 87ac1kkmn5.fsf@wolfe.cbbrowne.com
Whole thread Raw
In response to Second attempt, roll your own autovacuum  (Glen Parker <glenebob@nwlink.com>)
Responses Re: Second attempt, roll your own autovacuum
List pgsql-general
In an attempt to throw the authorities off his trail, tgl@sss.pgh.pa.us (Tom Lane) transmitted:
> Glen Parker <glenebob@nwlink.com> writes:
>> I am still trying to roll my own auto vacuum thingy.
>
> Um, is this purely for hack value?  What is it that you find inadequate
> about regular autovacuum?  It is configurable through the pg_autovacuum
> catalog --- which I'd be the first to agree is a sucky user interface,
> but we're not going to set the user interface in concrete until we are
> pretty confident it's feature-complete.  So: what do you see missing?

I think that about a year ago I proposed a more sophisticated approach
to autovacuum; one part of it was to set up a "request queue," a table
where vacuum requests would get added.

There's some "producer" side stuff:

- There could be tables you want to vacuum exceedingly frequently;
  those could get added periodically via something shaped like cron.

- One could ask for all the tables in a given database to be added to
  the queue, so as to mean that all tables would get vacuumed every so
  often.

- You might even inject requests 'quasi-manually', asking for the
  queue to do work on particular tables.

There's some "policy side" stuff:

- Rules might be put in place to eliminate certain tables from the
  queue, providing some intelligence as to what oughtn't get vacuumed

Then there's the "consumer":

- The obvious "dumb" approach is simply to have one connection that
  runs through the queue, pulling the eldest entry, vacuuming, and
  marking it done.

- The obvious extension is that if a table is listed multiple times in
  the queue, it only need be processed once.

- There might be time-based exclusions to the effect that large tables
  oughtn't be processed during certain periods (backup time?)

- One might have *two* consumers, one that will only process small
  tables, so that those little, frequently updated tables can get
  handled quickly, and another consumer that does larger tables.
  Or perhaps that knows that it's fine, between 04:00 and 09:00 UTC,
  to have 6 consumers, and blow through a lot of larger tables
  simultaneously.

  After all, changes in 8.2 mean that concurrent vacuums don't block
  one another from cleaning out dead content.

I went as far as scripting up the simplest form of this, with
"injector" and queue and the "dumb consumer."  Gave up because it
wasn't that much better than what we already had.
--
output = reverse("moc.liamg" "@" "enworbbc")
http://linuxfinances.info/info/
Minds, like parachutes, only function when they are open.

pgsql-general by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Let's play bash the search engine
Next
From: "Jeff Amiel"
Date:
Subject: MAGIC_MODULE and libc