Re: Plans for solving the VACUUM problem - Mailing list pgsql-hackers

From Barry Lind
Subject Re: Plans for solving the VACUUM problem
Date
Msg-id 3B09B04A.5060806@xythos.com
Whole thread Raw
In response to RE: Plans for solving the VACUUM problem  ("Mikheev, Vadim" <vmikheev@SECTORBASE.COM>)
List pgsql-hackers

Mikheev, Vadim wrote:

> 
> Ok, last reminder -:))
> 
> On transaction abort, read WAL records and undo (rollback)
> changes made in storage. Would allow:
> 
> 1. Reclaim space allocated by aborted transactions.
> 2. Implement SAVEPOINTs.
>    Just to remind -:) - in the event of error discovered by server
>    - duplicate key, deadlock, command mistyping, etc, - transaction
>    will be rolled back to the nearest implicit savepoint setted
>    just before query execution; - or transaction can be aborted by
>    ROLLBACK TO <savepoint_name> command to some explicit savepoint
>    setted by user. Transaction rolled back to savepoint may be continued.
> 3. Reuse transaction IDs on postmaster restart.
> 4. Split pg_log into small files with ability to remove old ones (which
>    do not hold statuses for any running transactions).
> 
> Vadim

This is probably not a good thread to add my two cents worth, but here 
goes anyway.

The biggest issue I see with the proposed UNDO using WAL is the issue of 
large/long lasting transactions.  While it might be possible to solve 
this problem with some extra work.  However keep in mind that different 
types of transactions (i.e. normal vs bulk loads) require different 
amounts of time and/or UNDO.  To solve this problem, you really need per 
transaction limits which seems difficult to implement.

I have no doubt that UNDO with WAL can be done.  But is there some other 
way of doing UNDO that might be just as good or better?

Part of what I see in this thread reading between the lines is that some 
believe the solution to many problems in the long term is to implement 
an overwriting storage manager.  Implementing UNDO via WAL is a 
necessary step in that direction.  While others seem to believe that the 
non-overwriting storage manager has some life in it yet, and may even be 
the storage manager for many releases to come.  I don't know enough 
about the internals to have any say in that discussion, however the 
grass isn't always greener on the other side of the fence (i.e. an 
overwriting storage manager will come with its own set of problems/issues).

So let me throw out an idea for UNDO using the current storage manager.  
First let me state that UNDO is a bit of a misnomer, since undo for 
transactions is already implemented.  That is what pg_log is all about.  
The part of UNDO that is missing is savepoints (either explicit or 
implicit), because pg_log doesn't capture the information for each 
command in a transaction.  So the question really becomes, how to 
implement savepoints with current storage manager?

I am going to lay out one assumption that I am making:
1) Most transactions are either completely successful or completely 
rolled back (If this weren't true, i.e. you really needed savepoints to partially 
rollback changes, you couldn't be using the existing version of 
postgresql at all)

My proposal is: 1) create a new relation to store 'failed commands' for transactions.   This is similar to pg_log for
transactions,but takes it to the 
 
command level.  And since it records only failed commands (or ranges of 
failed commands), thus most transactions will not have any information 
in this relation per the assumption above.  2) Use the unused pg_log status (3 = unused, 2 = commit, 1 = abort, 0 
= inprocess) to mean that the transaction was commited but some commands 
were rolled back (i.e. partial commit)  Again for the majority of transactions nothing will need to change, 
since they will still be marked as committed or aborted.  3) Code that determines whether or not a tuple is committed
ornot 
 
needs to be aware of this new pg_log status, and look in the new 
relation to see if the particular command was rolled back or not to 
determine the commited status of the tuple.  This subtly changes the 
meaning of HEAP_XMIN_COMMITTED and related flags to reflect the 
transaction and command status instead of just the transaction status.

The runtime cost of this shouldn't be too high, since the committed 
state is cached in HEAP_XMIN_COMMITTED et al, it is only the added cost 
for the pass that needs to set these flags, and then there is only added 
cost in the case that the transaction wasn't completely sucessful (again 
my assumption above).

Now I have know idea if what I am proposing is really doable or not.  I 
am just throwing this out as an alternative to WAL based 
UNDO/savepoints.  The reason I am doing this is that to me it seems to 
leverage much of the existing infrastructure already in place that 
performs undo for rolledback transactions (all the tmin, tmax, cmin, 
cmax stuff as well as vacuum).  Also it doesn't come with the large WAL 
log file problem for large transactions.

Now having said all of this I realize that this doesn't solve the 4 
billion transaction id limit problem, or the large size of the pg_log 
file with large numbers of transactions.

thanks,
--Barry


> 



pgsql-hackers by date:

Previous
From: Tulio Oliveira
Date:
Subject: Prevent CREATE TABLE
Next
From: Bruce Momjian
Date:
Subject: Re: More pgindent follies