Re: Vacuum daemon (pgvacuumd ?) - Mailing list pgsql-hackers

From Nicolas Bazin
Subject Re: Vacuum daemon (pgvacuumd ?)
Date
Msg-id 001101c1c4bc$f7667fb0$660d090a@software.ingenico.com.au
Whole thread Raw
In response to Vacuum daemon (pgvacuumd ?)  (mlw <markw@mohawksoft.com>)
List pgsql-hackers
----- Original Message -----
From: "mlw" <markw@mohawksoft.com>
To: "PostgreSQL-development" <pgsql-hackers@postgresql.org>
Sent: Wednesday, March 06, 2002 1:21 PM
Subject: [HACKERS] Vacuum daemon (pgvacuumd ?)


> (for background, see conversation: "Postgresql backend to perform vacuum
> automatically" )
>
> In the idea phase 1, brainstorm
>
> Create a table for the defaults in template1
> Create a table in each database for state inforation.
>
> Should have a maximum duty cycle for vacuum vs non-vacuum on a per table
basis.
> If a vacuum takes 3 minutes, and a duty cycle is no more than 10%, the
next
> vacuum can not take place for another 30 minutes. Is this a table or
database
> setting? I am thinking table. Anyone have good arguments for database?
>
> Must have a trigger point of number of total tuples vs number of dirty
tuples.
> Unfortunately some tuples are more important than others, but that I don't
know
> how to really detect that. We should be able to keep track of the number
of
> dirty tuples in a table. Is it known how many tuples are in a table at any
> point? (if so, on a side note, can we use this for a count()?) How about
dirty
> tuples?
This parameters are certainly correct in a lot of cases, but why not use a
stored proc to decide when to start a vacuum. The system table can maintain
raw data related to vacuum: last vacuum timestamp, previous vacuum duration,
table priority, .... Then a parameter can be used let say from 0 to 9 (0 for
no vacuum) as a vacuum profile. The stored proc. would translate this
profile to thresholds adapted to its algorithm that can use the per-table
statistic that already exist.
Obviously a standard proc can be installed but it lets the DBA the
possibility to adapt the criteria to its DB whith no modification to the
code.

>
> Is the number of deleted tuples sufficient to decide priority on vacuum?
My
> thinking is that the tables with the most deleted tuples is the table
which
> need most vacuum. Should ratio of deleted tuples vs total tuples or just
count
> of deleted tuples. I am thinking ratio, but maybe it need be tunable.
>
>
> Here is the program flow:
>
> (1) Startup (Do this for each database.)
> (2) Get all the information from a vacuumd table.
> (2) If the table does not exist, perform a vacuum on all tables, and
initialize
> the table to current state.
> (3) Check which tables can be vacuumed based on their duty cycle and
current
> time.
> (4) If the tables eligible to be vacuumed have deleted tuples which exceed
> acceptable limits, vacuum them.
> (5) Wait a predefined time, loop (2)
>
> This is my basic idea, what do you all think?
>
> I plan to work on this in the next couple weeks. Any suggestions, notes,
> concerns, features would be welcome.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>




pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Do we still have locking problems with concurrent users
Next
From: "Christopher Kings-Lynne"
Date:
Subject: Re: Do we still have locking problems with concurrent users