Thread: Is there a way to make VACUUM run completely outside transaction

Is there a way to make VACUUM run completely outside transaction

From

Hannu Krosing

Date:

07 February 2005, 10:53:47

Hi

As VACUUM is not something that can be rolled back, could we not make it
run completely outside transactions. It already needs to be run outside
a transaction block.

I try to explain the problem more thoroughly below (I'm quite sleepy, so
the explanation may be not too clear ;)


My problem is as follows:

I have a fairly write intensive database that has two kinds of tables -

1) some with a small working set (thousands of rows), but their get
constant traffic of hundreds of inserts/updates/deletes per second. 

2) and some with bigger tables (a few million rows) which get smaller
load of updates.

I keep the first type of tables filesize small/dead tuples count low by
running vacuum on them in a tight loop with 15 sec sleeps in between
vacuuming individual tables)

And I keep the 2nd type balanced by running another loop of vacuums on
them.

It worked quite well for a while, but as the tables grow, the vacuums on
the 2nd kind of tables run long enough for the first kind of tables to
accumulate hundreds of thousands dead tuples, which can't be freed
because the vacuums on 2nd kind run in their own long transactions,
keeping the oldest active transaction id smaller than needed. 

And the new provisions of making VACUUM less intrusive by allowing
delays in vacuuming make this problem worse, by kind of priority
inverion - the less intrusive vacuum runs longer, thereby keeping its
transaction open longer and thereby being *more* intrusive by not
allowing old tuples to be deleted by another vacuum command.


I can see two ways to solve this problem:

1) If possible, VACUUM command should close its transaction id early in
the command processing, as it does not really need to be in transactions

2) the transaction of the VACUUM command could be assigned some really
bug trx id, so it won't get in a way

3) VACUUM commits/or aborts its transaction after some predetermined
interval (say 1 min) and starts a new one.

4) VACUUM itself could have some way to check if the "oldest"
transaction is also VACUUM, and be able to free the tuples that are
older than last *data altering* transaction. At least VACUUM, ANALYSE
and TRUNCATE should not be considered *data altering* here.


The problems caused by non-data-altering but long-running transactionsis
are not unique to VACUUM, similar problems also affects Slony. So any of
1)-3) would be preferable to 4)

Or perhaps it already solved in 8.0 ? My rant is about 7.4.6.

-- 
Hannu Krosing <hannu@tm.ee>

Re: Is there a way to make VACUUM run completely outside transaction

From

Tom Lane

Date:

07 February 2005, 15:51:26

Hannu Krosing <hannu@tm.ee> writes:
> As VACUUM is not something that can be rolled back, could we not make it
> run completely outside transactions.

No, because it has to be able to hold a table-level lock on the target
table.  Besides, where did you get the idea that it can't be rolled back?
The VACUUM FULL case, at least, has to go through huge pushups to be
sure it is rollback-safe.
        regards, tom lane

Re: Is there a way to make VACUUM run completely outside

From

Hannu Krosing

Date:

07 February 2005, 17:17:17

Ühel kenal päeval (esmaspäev, 7. veebruar 2005, 10:51-0500), kirjutas
Tom Lane:
> Hannu Krosing <hannu@tm.ee> writes:
> > As VACUUM is not something that can be rolled back, could we not make it
> > run completely outside transactions.
>
> No, because it has to be able to hold a table-level lock on the target
> table.

Does NON-FULL VACUUM release this lock when pausing when sleeping on
vacuum_cost_delay ?

If not, could it be made to release the lock and end current transaction
when going to sleep and start a new transaction and aquire the lock
again when waking up ?

> Besides, where did you get the idea that it can't be rolled back?
> The VACUUM FULL case, at least, has to go through huge pushups to be
> sure it is rollback-safe.

What are the semantics of ROLLBACKing a VACUUM FULL ? Will it restore
the dead tuples to their original positions ?

I assumed that it does its data-shuffling behind the scenes so that
changes done by it are invisible to others at the *logical* level.

If all the changes it does to internal data storage can be rolled back,
then I can't see how VACUUM FULL can work at all without requiring 2x
the filesize for the ROLLBACK.

Also, why must it be run outside of transaction block if it can be
rollbacked ?

--
Hannu Krosing <hannu@tm.ee>

Re: Is there a way to make VACUUM run completely outside

From

Alvaro Herrera

Date:

07 February 2005, 22:01:54

On Mon, Feb 07, 2005 at 07:16:41PM +0200, Hannu Krosing wrote:

> If all the changes it does to internal data storage can be rolled back,
> then I can't see how VACUUM FULL can work at all without requiring 2x
> the filesize for the ROLLBACK.

I think the point is that the table is still consistent if the system
crashes while vacuum is running.  For the internal details, see Xvac in
the HeapTupleHeader struct.

> Also, why must it be run outside of transaction block if it can be
> rollbacked ?

A vacuum actually uses several transactions, so it wouldn't work as the
user would expect if run in a transaction.  The first one is committed
rather early and new ones are opened and closed.  (One per table, IIRC.)

-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"There was no reply" (Kernel Traffic)

Re: Is there a way to make VACUUM run completely outside

From

Hannu Krosing

Date:

08 February 2005, 11:56:19

Ühel kenal päeval (esmaspäev, 7. veebruar 2005, 19:01-0300), kirjutas
Alvaro Herrera:

> > Also, why must it be run outside of transaction block if it can be
> > rollbacked ?
>
> A vacuum actually uses several transactions, so it wouldn't work as the
> user would expect if run in a transaction.  The first one is committed
> rather early and new ones are opened and closed.  (One per table, IIRC.)

So I guess that making it commit and open new transaction at a regular
interval (like each minute) during vacuuming single table would not
alter its visible behaviour. That would solve my problem of long-running
vacuums on large tables polluting unrelated small but heavily updated
tables with dead tuples.

I'll take a peak at code and try to come up with a naive proposal you
can shoot down ;)

--
Hannu Krosing <hannu@tm.ee>

Re: Is there a way to make VACUUM run completely outside

From

Alvaro Herrera

Date:

08 February 2005, 16:40:06

On Tue, Feb 08, 2005 at 01:55:47PM +0200, Hannu Krosing wrote:

> So I guess that making it commit and open new transaction at a regular
> interval (like each minute) during vacuuming single table would not
> alter its visible behaviour. That would solve my problem of long-running
> vacuums on large tables polluting unrelated small but heavily updated
> tables with dead tuples.

Interesting.  The problem is that a long running VACUUM on a single
table will keep in PGPROC a TransactionId that will last very long,
which will "pollute" every concurrent Snapshot; so smaller tables can't
be cleaned up because the tuples are visible for the transaction running
the vacuum -- except that that transaction cannot possibly want to look
at them.

-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"La tristeza es un muro entre dos jardines" (Khalil Gibran)

Re: Is there a way to make VACUUM run completely outside

From

Hannu Krosing

Date:

09 February 2005, 09:40:49

Ühel kenal päeval (teisipäev, 8. veebruar 2005, 13:39-0300), kirjutas
Alvaro Herrera:
> On Tue, Feb 08, 2005 at 01:55:47PM +0200, Hannu Krosing wrote:
>
> > So I guess that making it commit and open new transaction at a regular
> > interval (like each minute) during vacuuming single table would not
> > alter its visible behaviour. That would solve my problem of long-running
> > vacuums on large tables polluting unrelated small but heavily updated
> > tables with dead tuples.
>
> Interesting.  The problem is that a long running VACUUM on a single
> table will keep in PGPROC a TransactionId that will last very long,
> which will "pollute" every concurrent Snapshot; so smaller tables can't
> be cleaned up because the tuples are visible for the transaction running
> the vacuum -- except that that transaction cannot possibly want to look
> at them.

Exactly. That's what I was trying to describe in my original post.

--
Hannu Krosing <hannu@tm.ee>