Re: Point in Time Recovery - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Point in Time Recovery |
Date | |
Msg-id | 1089493611.17493.767.camel@stromboli Whole thread Raw |
In response to | Re: Point in Time Recovery (Jan Wieck <JanWieck@Yahoo.com>) |
List | pgsql-hackers |
On Sat, 2004-07-10 at 15:17, Jan Wieck wrote: > On 7/6/2004 3:58 PM, Simon Riggs wrote: > > > On Tue, 2004-07-06 at 08:38, Zeugswetter Andreas SB SD wrote: > >> > - by time - but the time stamp on each xlog record only specifies to the > >> > second, which could easily be 10 or more commits (we hope....) > >> > > >> > Should we use a different datatype than time_t for the commit timestamp, > >> > one that offers more fine grained differentiation between checkpoints? > >> > >> Imho seconds is really sufficient. If you know a more precise position > >> you will probably know it from backend log or an xlog sniffer. With those > >> you can easily use the TransactionId way. > > TransactionID and timestamp is only sufficient if the transactions are > selected by their commit order. Especially in read committed mode, > consider this execution: > > xid-1: start > xid-2: start > xid-2: update field x > xid-2: commit > xid-1: update field y > xid-1: commit > > In this case, the update done by xid-1 depends on the row created by > xid-2. So logically xid-2 precedes xid-1, because it made its changes > earlier. > > So you have to apply the log until you find the commit record of the > transaction you want apply last, and then stamp all transactions that > where in progress at that time as aborted. > Agreed. I've implemented this exactly as you say.... This turns out to be very easy because: - when looking where to stop we only ever stop at commit or aborts - these are the only records that have timestamps anyway... - any record that isn't specifically committed is not updated in the clog and therefore not visible. The clog starts in indeterminate state, 0 and is then updated to either committed or aborted. Aborted and indeterminate are handled similarly in the current code, to allow for crash recovery - PITR doesn't change anything there. So, PITR doesn't do anything that crash recovery doen't already do. Crash recovery makes no attempt to keep track of in-progress transactions and doesn't make a special journey to the clog to specifically mark them as aborted - they just are by default. So, what we mean by "stop at a transactionId" is "stop applying redo at the commit/abort record for that transactionId." It has to be an exact match, not a "greater than", for exactly the reason you mention. That means that although we stop at the commit record of transactionId X, we may also have applied records for transactions with later transactionIds e.g. X+1, X+2...etc (without limit or restriction). (I'll even admit that as first, I did think we could get away with the "less than" test that you are warning me about. Overall, I've spent more time on theory/analysis than on coding, on the idea that you can improve poor code, but wrong code just needs to be thrown away). Timestamps are more vague...When time is used, there might easily be 10+ transactions whose commit/abort records have identical timestamp values. So we either stop at the first or last record depending upon whether we specified inclusive or exclusive on the recovery target value. The hard bit, IMHO, is what we do with the part of the log that we have chosen not to apply....which has been discussed on list in detail also. Thanks for keeping an eye out for possible errors - this one is something I'd thought through and catered for (there are comments in my current latest published code to that effect, although I have not yet finished coding the clean-up-after-stopping part). This implies nothing with regard to other possible errors or oversights and so I very much welcome any questioning of this nature - I am as prone to error as the next man. It's important we get this right. Best regards, Simon Riggs
pgsql-hackers by date: