Re: possible vacuum improvement? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: possible vacuum improvement?
Date
Msg-id 6236.1031065279@sss.pgh.pa.us
Whole thread Raw
In response to Re: possible vacuum improvement?  ("Shridhar Daithankar" <shridhar_daithankar@persistent.co.in>)
Responses Re: possible vacuum improvement?  (Rod Taylor <rbt@zort.ca>)
List pgsql-hackers
"Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> writes:
> 1)Is this sounds like a workable solution?

Adding a trigger to every tuple update won't do at all.  Storing the
counts in a table won't do either, as the updates on that table will
generate a huge amount of wasted space themselves (not to mention
enough contention to destroy concurrent performance).

> 4)Is use of threads sounds portable enough?

Threads are completely out of the question, at least if you have any
hope of seeing this code get accepted into the core distro.


For vacuum's purposes all that we really care to know about is the
number of obsoleted tuples in each table: committed deletes and updates,
and aborted inserts and updates all count.  Furthermore, we do not need
or want a 100% reliable solution; approximate counts would be plenty
good enough.

What I had in the back of my mind was: each backend counts attempted
insertions and deletions in its relcache entries (an update adds to both
counts).  At transaction commit or abort, we know which of these two
counts represents the number of dead tuples added to each relation, so
while we scan the relcache for post-xact cleanup (which we will be doing
anyway) we can transfer the correct count into the shared FSM entry for
the relation.  This gives us a reasonably accurate count in shared
memory of all the tuple obsoletions since bootup, at least for
heavily-used tables.  (The FSM might choose to forget about lightly-used
tables.)  The auto vacuumer could look at the FSM numbers to decide
which tables are highest priority to vacuum.

This scheme would lose the count info on a database restart, but that
doesn't bother me.  In typical scenarios the same tables will soon get
enough new counts to be highly ranked for vacuuming.  In any case the
auto vacuumer must be designed so that it vacuums every table every so
often anyhow, so the possibility of forgetting that there were some dead
tuples in a given table isn't catastrophic.

I do not think we need or want a control table for this; certainly I see
no need for per-table manual control over this process.  There should
probably be a few knobs in the form of GUC parameters so that the admin
can control how much overall work the auto-vacuumer does.  For instance
you'd probably like to turn it off when under peak interactive load.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Vince Vielhaber
Date:
Subject: Re: Just testing tighgter UCE controls ...
Next
From: "Serguei A. Mokhov"
Date:
Subject: Re: Memory management question