Vacuum daemon (pgvacuumd ?) - Mailing list pgsql-hackers

From mlw
Subject Vacuum daemon (pgvacuumd ?)
Date
Msg-id 3C857D36.59F017FB@mohawksoft.com
Whole thread Raw
Responses Re: Vacuum daemon (pgvacuumd ?)  (Bruce Momjian <pgman@candle.pha.pa.us>)
Re: Vacuum daemon (pgvacuumd ?)  (Lincoln Yeoh <lyeoh@pop.jaring.my>)
List pgsql-hackers
(for background, see conversation: "Postgresql backend to perform vacuum
automatically" )

In the idea phase 1, brainstorm

Create a table for the defaults in template1
Create a table in each database for state inforation.

Should have a maximum duty cycle for vacuum vs non-vacuum on a per table basis.
If a vacuum takes 3 minutes, and a duty cycle is no more than 10%, the next
vacuum can not take place for another 30 minutes. Is this a table or database
setting? I am thinking table. Anyone have good arguments for database?

Must have a trigger point of number of total tuples vs number of dirty tuples.
Unfortunately some tuples are more important than others, but that I don't know
how to really detect that. We should be able to keep track of the number of
dirty tuples in a table. Is it known how many tuples are in a table at any
point? (if so, on a side note, can we use this for a count()?) How about dirty
tuples?

Is the number of deleted tuples sufficient to decide priority on vacuum? My
thinking is that the tables with the most deleted tuples is the table which
need most vacuum. Should ratio of deleted tuples vs total tuples or just count
of deleted tuples. I am thinking ratio, but maybe it need be tunable.


Here is the program flow:

(1) Startup (Do this for each database.)
(2) Get all the information from a vacuumd table.
(2) If the table does not exist, perform a vacuum on all tables, and initialize
the table to current state. 
(3) Check which tables can be vacuumed based on their duty cycle and current
time.
(4) If the tables eligible to be vacuumed have deleted tuples which exceed
acceptable limits, vacuum them.
(5) Wait a predefined time, loop (2)

This is my basic idea, what do you all think?

I plan to work on this in the next couple weeks. Any suggestions, notes,
concerns, features would be welcome.


pgsql-hackers by date:

Previous
From: Justin Clift
Date:
Subject: Do we still have locking problems with concurrent users of hash tables?
Next
From: Bruce Momjian
Date:
Subject: Re: Do we still have locking problems with concurrent users