Thread: How to make lazy VACUUM of one table run in several transactions ?

How to make lazy VACUUM of one table run in several transactions ?

From

Hannu Krosing

Date:

25 April 2005, 07:46:49

Hi Tom, 

I've got the impression that you have worked most actively on VACUUM and
so I ask you this directly instead of adressing pgsql-hackers list in
general. Feel free to correct me :)


I have a problem, that I think can be solved by splitting the vacuum up
to run in several transactions, each running for no more than X minutes.

The problem itself is having a database with small (5-50k rows), fast-
changing tables and huge (5-25M rows) slower changing tables, all
running in a 24/7 setup. 

The small table needs to be kept small by constant vacuuming (a loop
doing a vacuum on this table after each 15 sec interval) to keep up with
its traffic.

The problem appears when the big table needs to be vacuumed, as this
vacuum on big table prevents the vacuum on small table from freeing up
the space used by dead tuples.

And the sutuation is made *worse* by running the vacuum with
vacuum_cost_limit to reduce the I/O impact, (kind of priority-reversal),
as then vacuum runs then even longer, and slows down operations on the
small table even more. 

The fastest fix seems to change vacuum command to run in several
transactions.

So what should be done in addition to changing 
   lazy_vacuum_rel(onerel, vacstmt);

to check for some time/page_cnt limit after each heap page (near the
start of main loop in lazy_scan_heap() ), and if it is reached then
stop, clean up indexes, and return the blkno of next page needing to be
vacuumed,and 
replacing the call to lazy_vacuum_rel(onerel, vacstmt); in vacuum.c with
the following loop.
   next_page_to_vacuum = 0;   while (next_page_to_vacuum < RelationGetNumberOfBlocks(onerel)) {
StartTransactionCommand();      ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
 
       next_page_to_vacuum = lazy_vacuum_rel(onerel, vacstmt);
       CommitTransactionCommand();   }

Must some locks also be released an reaquired inside this loop, or is
there something else I should keep in mind when trying to do this ?

The operations in this loop need not be cheap - I'm happy if I can keep
individual transactions below 5 to 10 minutes, though 1-2 min bould be
best.


P.S. One other typical case where long transactions are disastrous is 
Slony1's use of LISTEN/NOTIFY (the non-indexed table pg_listener, after
having grown to 100Mb, is not too responsive), though there the real
solution for Slony1 would be switching to polling instead of interrupt
(notify) mode for high-volume databases.

-- 
Hannu Krosing <hannu@tm.ee>

Re: How to make lazy VACUUM of one table run in several transactions ?

From

Alvaro Herrera

Date:

25 April 2005, 09:41:19

On Sun, Apr 24, 2005 at 12:02:37PM +0300, Hannu Krosing wrote:

> to check for some time/page_cnt limit after each heap page (near the
> start of main loop in lazy_scan_heap() ), and if it is reached then
> stop, clean up indexes, and return the blkno of next page needing to
> be vacuumed, and replacing the call to lazy_vacuum_rel(onerel,
> vacstmt); in vacuum.c with the following loop.
> 
>     next_page_to_vacuum = 0;
>     while (next_page_to_vacuum < RelationGetNumberOfBlocks(onerel)) {
>         StartTransactionCommand();
>         ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
> 
>         next_page_to_vacuum = lazy_vacuum_rel(onerel, vacstmt);
> 
>         CommitTransactionCommand();
>     }
> 
> Must some locks also be released an reaquired inside this loop, or is
> there something else I should keep in mind when trying to do this ?

There is "session lock" on the table.  You must release that.

However, after releasing and reacquiring that lock, all you know about
the table must be rechecked.  In particular the table can be dropped :-)
or truncated, or vacuumed by some other process; etc.  So you'd need to
start the vacuum "from scratch."  (I guess you'd skip the first N
pages.)

One thing to keep in mind is whether the Xmin calculations are right
after such a thing ... e.g. if you truncated the clog with the wrong
parameters, you could lose data.

-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"Doing what he did amounts to sticking his fingers under the hood of the
implementation; if he gets his fingers burnt, it's his problem."  (Tom Lane)

Re: How to make lazy VACUUM of one table run in several transactions ?

From

Tom Lane

Date:

25 April 2005, 12:11:56

Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> On Sun, Apr 24, 2005 at 12:02:37PM +0300, Hannu Krosing wrote:
>> Must some locks also be released an reaquired inside this loop, or is
>> there something else I should keep in mind when trying to do this ?

> There is "session lock" on the table.  You must release that.

Actually, the only hope of making this work is NOT to release that.
If you hold the appropriate lock at the session level then it is
reasonable to consider successive transactions within the vacuum
as being one big operation.

I think the major issue with this would be memory management, ie,
how to prevent CommitTransactionCommand from cleaning up all of
vacuum's working state.
        regards, tom lane

Re: How to make lazy VACUUM of one table run in several

From

Hannu Krosing

Date:

26 April 2005, 17:23:56

On E, 2005-04-25 at 11:11 -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > On Sun, Apr 24, 2005 at 12:02:37PM +0300, Hannu Krosing wrote:
> >> Must some locks also be released an reaquired inside this loop, or is
> >> there something else I should keep in mind when trying to do this ?
> 
> > There is "session lock" on the table.  You must release that.
> 
> Actually, the only hope of making this work is NOT to release that.
> If you hold the appropriate lock at the session level then it is
> reasonable to consider successive transactions within the vacuum
> as being one big operation.
> 
> I think the major issue with this would be memory management, ie,
> how to prevent CommitTransactionCommand from cleaning up all of
> vacuum's working state.

Are there any known (easy :) tricks for achieving this ?

Now that I have had time to look a little more at this I have another
idea:

Could I avoid having a transaction at all?

As VACUUM is not "transactional" in the sense that it does not change
anything visible to users ever, can't be undone by rollback, etc... ,
could it be possible to create enough "transaction-like" environment for
it to really run outside of transactions. Perhaps just advancing
oldestXmin at certain intervals ?

-- 
Hannu Krosing <hannu@tm.ee>

Re: How to make lazy VACUUM of one table run in several transactions ?

From

Tom Lane

Date:

26 April 2005, 18:54:51

Hannu Krosing <hannu@tm.ee> writes:
> Could I avoid having a transaction at all?

Not really; too much of the database access infrastructure is tied to
transaction stuff ... even facilities as basic as memory management.

> As VACUUM is not "transactional" in the sense that it does not change
> anything visible to users ever, can't be undone by rollback, etc... ,
> could it be possible to create enough "transaction-like" environment for
> it to really run outside of transactions. Perhaps just advancing
> oldestXmin at certain intervals ?

I wonder whether you could avoid advertising the VACUUM's XID in PGPROC.
Not sure that this can work, but it would be a lot simpler than stopping
and starting transactions ...
        regards, tom lane

Re: How to make lazy VACUUM of one table run in several

From

Hannu Krosing

Date:

02 May 2005, 08:54:48

On T, 2005-04-26 at 17:54 -0400, Tom Lane wrote:
> Hannu Krosing <hannu@tm.ee> writes:
> > Could I avoid having a transaction at all?
> 
> Not really; too much of the database access infrastructure is tied to
> transaction stuff ... even facilities as basic as memory management.
> 
> > As VACUUM is not "transactional" in the sense that it does not change
> > anything visible to users ever, can't be undone by rollback, etc... ,
> > could it be possible to create enough "transaction-like" environment for
> > it to really run outside of transactions. Perhaps just advancing
> > oldestXmin at certain intervals ?
> 
> I wonder whether you could avoid advertising the VACUUM's XID in PGPROC.
> Not sure that this can work, but it would be a lot simpler than stopping
> and starting transactions ...

What dreadful things will happed if I just set the xid and xmin of
current transaction (proc->xid, ->xmin) to some value below
FirstNormalTransactionId, so that TransactionIdIsNormal(xid) will make
GetOldestXmin ignore my transaction ? 

Will this for example confuse WAL or cause something else equally grim
to happen ?

An alternative would be to have some special "ignore-me" flag in PGPROC.

A more general solution to the problem "VACUUM does not clean dead
tuples fast enough due to an old transaction" problem is keeping the
OldestXmin for each table separately as a list of table OIDs in each
PGPROC. 

This would be automatically extandable to long COPY, or in fact any
single SQL statement running in its implicit transaction by examining
the query plan and reserving all tables touched by the query and its
dependencies. 
And this could be done as a user-level command for explicit transactions
(LOCK TABLE my_table, othertable, yetanothertable FOR/FROM VACUUM) which
would tell postgres that the current query will need these (and only
these) tables. If LOCK FOR VACUUM is done inside a transaction, then
touching any other table should be an error.

-- 
Hannu Krosing <hannu@tm.ee>

Re: How to make lazy VACUUM of one table run in several transactions ?

From

Tom Lane

Date:

02 May 2005, 11:38:47

Hannu Krosing <hannu@tm.ee> writes:
> A more general solution to the problem "VACUUM does not clean dead
> tuples fast enough due to an old transaction" problem is keeping the
> OldestXmin for each table separately as a list of table OIDs in each
> PGPROC. 

> This would be automatically extandable to long COPY, or in fact any
> single SQL statement running in its implicit transaction by examining
> the query plan and reserving all tables touched by the query and its
> dependencies. 

This is completely unworkable, since it amounts to assuming you know at
the start of a serializable transaction which tables it will touch.  In
point of fact you can't even know that for the current query let alone
future ones --- consider user-defined functions.

(Not to mention that we can't expect to fit that much info into a fixed
amount of shared memory.)
        regards, tom lane

Re: How to make lazy VACUUM of one table run in several

From

Hannu Krosing

Date:

02 May 2005, 13:51:05

On E, 2005-05-02 at 10:38 -0400, Tom Lane wrote:

But what about my question about just changing xid in PGPROC ?

> Hannu Krosing <hannu@tm.ee> writes:
> > A more general solution to the problem "VACUUM does not clean dead
> > tuples fast enough due to an old transaction" problem is keeping the
> > OldestXmin for each table separately as a list of table OIDs in each
> > PGPROC. 
> 
> > This would be automatically extandable to long COPY, or in fact any
> > single SQL statement running in its implicit transaction by examining
> > the query plan and reserving all tables touched by the query and its
> > dependencies. 
> 
> This is completely unworkable, since it amounts to assuming you know at
> the start of a serializable transaction which tables it will touch.

I meant it for "simple" things, like VACUUM and maybe also simple COPY
(tables ref'd by foreign keys can be found from pg_depend), but yes we
can't automatically know it in general case. 

That's why I proposed it to be an error for any transaction with such
list to touch any tables not in the list, so the assumptions of what
tables are tested are simple to test.

> In
> point of fact you can't even know that for the current query let alone
> future ones --- consider user-defined functions.
>
> (Not to mention that we can't expect to fit that much info into a fixed
> amount of shared memory.)

what should be in shared memory, is one list of ints per backend (oids
of tables for current transaction), mostly just one :)

The general case should be not to have such list at all which assumes
that any table could be touched.

-- 
Hannu Krosing <hannu@tm.ee>

Re: How to make lazy VACUUM of one table run in several

From

Hannu Krosing

Date:

09 May 2005, 08:57:39

Something weird is going on with some of mai emails - I tried to send this from hannu@tm.ee,
then from hannu@skype.net and it does not appear in list. Another answer sent after this
came through in less than 5 min.

Now I removed Tom Lane from To: and moved pgsql-hackers@postgresql.org from CC: to To: field

On T, 2005-04-26 at 17:54 -0400, Tom Lane wrote:
> Hannu Krosing <hannu@tm.ee> writes:
> > Could I avoid having a transaction at all?
>
> Not really; too much of the database access infrastructure is tied to
> transaction stuff ... even facilities as basic as memory management.
>
> > As VACUUM is not "transactional" in the sense that it does not change
> > anything visible to users ever, can't be undone by rollback, etc... ,
> > could it be possible to create enough "transaction-like" environment for
> > it to really run outside of transactions. Perhaps just advancing
> > oldestXmin at certain intervals ?
>
> I wonder whether you could avoid advertising the VACUUM's XID in PGPROC.
> Not sure that this can work, but it would be a lot simpler than stopping
> and starting transactions ...

What I came up with is adding an extra bool to PGPROC (inVacuum) which
is set if the current command is Vacuum in its own transaction and
modifying GetOldestXmin() tos skip backends where inVacuum==true. I also
added this skipping logic to where PGPROC-xmin is set.

The resultin code passes 'make check' and also seems to do the right
thing, wrt its purpose. I tested it by generating a 2M row table
'bightable' and then running 'vacuum verbose analyse bigtable;' on it
(runtime ~25 sec).

At the same time I repeatedly ran in another connection
'update smalltable set data=data+1;vacuum verbose smalltable;'
which was able to free all 16 old rows all the time during both vacuum
and analyse phases.

The only surprise was that it did not work during plain 'analyse
bigtable;'. It seems that plain analyse does not set use_own_xacts and
thus cant't make use of this optimisation.

Please comment on the attached patch (against ver 8.0.2), especially
what could be broken (WAL, some trx isolation level, ... ?) and how to
test if it is.

Also, does this have any chance to get accepted in 8.1 ?

--
Hannu Krosing <hannu@skype.net>

Attachment

concurrent_vacuum.v802.diff