Re: Plans for solving the VACUUM problem - Mailing list pgsql-hackers
From | Barry Lind |
---|---|
Subject | Re: Plans for solving the VACUUM problem |
Date | |
Msg-id | 3B09B04A.5060806@xythos.com Whole thread Raw |
In response to | RE: Plans for solving the VACUUM problem ("Mikheev, Vadim" <vmikheev@SECTORBASE.COM>) |
List | pgsql-hackers |
Mikheev, Vadim wrote: > > Ok, last reminder -:)) > > On transaction abort, read WAL records and undo (rollback) > changes made in storage. Would allow: > > 1. Reclaim space allocated by aborted transactions. > 2. Implement SAVEPOINTs. > Just to remind -:) - in the event of error discovered by server > - duplicate key, deadlock, command mistyping, etc, - transaction > will be rolled back to the nearest implicit savepoint setted > just before query execution; - or transaction can be aborted by > ROLLBACK TO <savepoint_name> command to some explicit savepoint > setted by user. Transaction rolled back to savepoint may be continued. > 3. Reuse transaction IDs on postmaster restart. > 4. Split pg_log into small files with ability to remove old ones (which > do not hold statuses for any running transactions). > > Vadim This is probably not a good thread to add my two cents worth, but here goes anyway. The biggest issue I see with the proposed UNDO using WAL is the issue of large/long lasting transactions. While it might be possible to solve this problem with some extra work. However keep in mind that different types of transactions (i.e. normal vs bulk loads) require different amounts of time and/or UNDO. To solve this problem, you really need per transaction limits which seems difficult to implement. I have no doubt that UNDO with WAL can be done. But is there some other way of doing UNDO that might be just as good or better? Part of what I see in this thread reading between the lines is that some believe the solution to many problems in the long term is to implement an overwriting storage manager. Implementing UNDO via WAL is a necessary step in that direction. While others seem to believe that the non-overwriting storage manager has some life in it yet, and may even be the storage manager for many releases to come. I don't know enough about the internals to have any say in that discussion, however the grass isn't always greener on the other side of the fence (i.e. an overwriting storage manager will come with its own set of problems/issues). So let me throw out an idea for UNDO using the current storage manager. First let me state that UNDO is a bit of a misnomer, since undo for transactions is already implemented. That is what pg_log is all about. The part of UNDO that is missing is savepoints (either explicit or implicit), because pg_log doesn't capture the information for each command in a transaction. So the question really becomes, how to implement savepoints with current storage manager? I am going to lay out one assumption that I am making: 1) Most transactions are either completely successful or completely rolled back (If this weren't true, i.e. you really needed savepoints to partially rollback changes, you couldn't be using the existing version of postgresql at all) My proposal is: 1) create a new relation to store 'failed commands' for transactions. This is similar to pg_log for transactions,but takes it to the command level. And since it records only failed commands (or ranges of failed commands), thus most transactions will not have any information in this relation per the assumption above. 2) Use the unused pg_log status (3 = unused, 2 = commit, 1 = abort, 0 = inprocess) to mean that the transaction was commited but some commands were rolled back (i.e. partial commit) Again for the majority of transactions nothing will need to change, since they will still be marked as committed or aborted. 3) Code that determines whether or not a tuple is committed ornot needs to be aware of this new pg_log status, and look in the new relation to see if the particular command was rolled back or not to determine the commited status of the tuple. This subtly changes the meaning of HEAP_XMIN_COMMITTED and related flags to reflect the transaction and command status instead of just the transaction status. The runtime cost of this shouldn't be too high, since the committed state is cached in HEAP_XMIN_COMMITTED et al, it is only the added cost for the pass that needs to set these flags, and then there is only added cost in the case that the transaction wasn't completely sucessful (again my assumption above). Now I have know idea if what I am proposing is really doable or not. I am just throwing this out as an alternative to WAL based UNDO/savepoints. The reason I am doing this is that to me it seems to leverage much of the existing infrastructure already in place that performs undo for rolledback transactions (all the tmin, tmax, cmin, cmax stuff as well as vacuum). Also it doesn't come with the large WAL log file problem for large transactions. Now having said all of this I realize that this doesn't solve the 4 billion transaction id limit problem, or the large size of the pg_log file with large numbers of transactions. thanks, --Barry >
pgsql-hackers by date: