Home > mailing lists

Re: Point in Time Recovery - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: Point in Time Recovery
Date	July 10, 2004 18:06:57
Msg-id	1089493611.17493.767.camel@stromboli Whole thread Raw
In response to	Re: Point in Time Recovery (Jan Wieck <JanWieck@Yahoo.com>)
List	pgsql-hackers

Tree view

On Sat, 2004-07-10 at 15:17, Jan Wieck wrote:
> On 7/6/2004 3:58 PM, Simon Riggs wrote:
> 
> > On Tue, 2004-07-06 at 08:38, Zeugswetter Andreas SB SD wrote:
> >>  > - by time - but the time stamp on each xlog record only specifies to the
> >> > second, which could easily be 10 or more commits (we hope....)
> >> > 
> >> > Should we use a different datatype than time_t for the commit timestamp,
> >> > one that offers more fine grained differentiation between checkpoints?
> >> 
> >> Imho seconds is really sufficient. If you know a more precise position
> >> you will probably know it from backend log or an xlog sniffer. With those
> >> you can easily use the TransactionId way.
> 
> TransactionID and timestamp is only sufficient if the transactions are 
> selected by their commit order. Especially in read committed mode, 
> consider this execution:
> 
>      xid-1: start
>      xid-2: start
>      xid-2: update field x
>      xid-2: commit
>      xid-1: update field y
>      xid-1: commit
> 
> In this case, the update done by xid-1 depends on the row created by 
> xid-2. So logically xid-2 precedes xid-1, because it made its changes 
> earlier.
> 
> So you have to apply the log until you find the commit record of the 
> transaction you want apply last, and then stamp all transactions that 
> where in progress at that time as aborted.
> 

Agreed.

I've implemented this exactly as you say....

This turns out to be very easy because:
- when looking where to stop we only ever stop at commit or aborts -
these are the only records that have timestamps anyway...
- any record that isn't specifically committed is not updated in the
clog and therefore not visible. The clog starts in indeterminate state,
0 and is then updated to either committed or aborted. Aborted and
indeterminate are handled similarly in the current code, to allow for
crash recovery - PITR doesn't change anything there.
So, PITR doesn't do anything that crash recovery doen't already do.
Crash recovery makes no attempt to keep track of in-progress
transactions and doesn't make a special journey to the clog to
specifically mark them as aborted - they just are by default.

So, what we mean by "stop at a transactionId" is "stop applying redo at
the commit/abort record for that transactionId." It has to be an exact
match, not a "greater than", for exactly the reason you mention. That
means that although we stop at the commit record of transactionId X, we
may also have applied records for transactions with later transactionIds
e.g. X+1, X+2...etc (without limit or restriction).

(I'll even admit that as first, I did think we could get away with the
"less than" test that you are warning me about. Overall, I've spent more
time on theory/analysis than on coding, on the idea that you can improve
poor code, but wrong code just needs to be thrown away).

Timestamps are more vague...When time is used, there might easily be 10+
transactions whose commit/abort records have identical timestamp values.
So we either stop at the first or last record depending upon whether we
specified inclusive or exclusive on the recovery target value.

The hard bit, IMHO, is what we do with the part of the log that we have
chosen not to apply....which has been discussed on list in detail also.

Thanks for keeping an eye out for possible errors - this one is
something I'd thought through and catered for (there are comments in my
current latest published code to that effect, although I have not yet
finished coding the clean-up-after-stopping part). 

This implies nothing with regard to other possible errors or oversights
and so I very much welcome any questioning of this nature - I am as
prone to error as the next man. It's important we get this right.

Best regards, Simon Riggs

pgsql-hackers by date:

From: Bruce Momjian
Date: 10 July 2004, 18:02:36
Subject: Re: [BUGS] BUG #1118: Misleading Commit message

From: Josh Berkus
Date: 10 July 2004, 18:21:53
Subject: Re: Nested Transactions, Abort All

Re: Point in Time Recovery - Mailing list pgsql-hackers

Previous

Next