Re: XLog: how to log? - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: XLog: how to log? |
Date | |
Msg-id | 1084307137.3028.2020.camel@stromboli Whole thread Raw |
In response to | Re: XLog: how to log? (Bruce Momjian <pgman@candle.pha.pa.us>) |
Responses |
Re: XLog: how to log?
Re: XLog: how to log? |
List | pgsql-hackers |
On Tue, 2004-05-11 at 16:33, Bruce Momjian wrote: > Tom Lane wrote: > > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > > Hmm ... I think it should be forbidden to quote a subtrans Xid as > > > rollforward point. Not sure if that can be done though, or how to do > > > it. > > > > Seems like a nonissue, unless the XLOG trace makes a subtrans look the > > same as a main trans, which it'd not do would it? > > I agree that a subtrans xid should not be a valid rollforward point. Forgive me discussing what seems like obvious points - I'm sure you appreciate we need an exact statement of how/when to terminate recovery and that might be found in looking harder at the subtrans questions. This is my third re-write of this e-mail, since I keep thinking of additional things while going for the "definitive statement". I had thought this was straightforward... Currently, recovery loops until end of xlogs. There is no exit condition from the loop. There is not currently a timestamp on the xlogs - anywhere apart from the file date on each xlog. Xids are assigned sequentially to transactions as they start. However, Xids are not committed sequentially. Moreover, checkpoint records do not wait for transactions to complete, so a checkpoint could record an Xid, yet a lower Xid might still be in progress and commit sometime after the checkpoint. So, when we do a backup, we might take with us a pg_control that has a particular Xid, only to find lots of later committed, but earlier Xids in the xlogs. So Xid can have no lower bound. (and a fully formed clog is essential to recovery). If we go searching for a particular Xid, there is no way to tell whether an Xid suggested by a user is too big or too small for use as a recovery target. We need to recover - it is the only way to tell; if we find an Xid that matches, we stop. If not, we keep going until end of logs, when we need to issue a "recovered fully - the Xid you gave was not valid", which may take some time and is also very clearly not what was wanted. (If they had wanted full recovery, they would have asked). So searching on an Xid is inherently a poor way to recover. Which is a shame, because it seemed like an easy target. Unless of course, we live with this vagueness and get on and build the XLogSpy... Xlog records ARE written sequentially, so a timestamp written to the xlogs COULD be used as a target for halting recovery. We would be able to decide, ahead of starting recovery, whether we would be able to sensibly recover to that point by using the pg_control checkpoint time as the lower bound and the file write times of the highest xlog as the upper bound. Once decided that the target timestamp lies between upper and lower bounds, we begin recovery, knowing exactly where it will complete. During recovery, we would search for a timestamp. If found exactly, stop. If exceeded, stop. Any transactions not committed at that point are, as we say, out of luck. ....This approach has a certainty about it that I think is much better than the error prone Xid hunting approach, and is also more attuned to the human reality (time matters, Xids don't). Earlier, Bruce and I had discussed that for reasons of time pressure, the PITR code for this release would consist of a) recovery to a particular Xid b) later, a utility that allowed xlogs to be inspected to allow DBA to decide which is the correct Xid to recover to. Those ideas don't sound as good now.... Therefore: action on me? - add a timestamp to EACH xlog record - something I had been shying away from. On Tue, 2004-05-11 at 14:56, Alvaro Herrera wrote: > (Unrelated: note that after main transaction commit, a committed > subtransaction is indistinguishable from a committed main transaction -- > and with the current idea of XLog I have, after recovering a transaction > tree from XLog there won't be any mark in pg_subtrans. So the system > will not be exactly as it was before but it won't matter.) I don't think we need a subtrans commit directly, since if the top-level commits after the subtrans has committed, then we're good. However, if a subtrans aborts, yet the top-level commits there will be data written to the database about an aborted transaction. We don't have Undo, so the subtrans clog must be updated to show that the subtrans aborted, otherwise we would read both the committed (top-level) and the uncommitted data (subtrans). Another way of putting it - if it was worth writing before a crash, it is worth recovering after a crash. Shurely? > > We could allow specification of a subtrans ID to be interpreted the same > > as specification of its parent main trans. Dunno if that's actually > > useful to anyone. Actually, I'd think that people would generally > > specify recovery up to a particular timestamp, and not be interested in > > xact numbers at all ... > > I don't think timestamp is going to be precise enough. Basically I can > see someone saying I want recovery up to 4am, but anything more specific > will need xid. I suggested that we write an xlog dump tool so you can > see the xids (with some xid details) and rough timestamps stored in the > WAL file and choose the xid for recovery. Bruce, As I started this e-mail (1st time), I completely agreed with you. I've now had to switch my thinking. (Doesn't effect archiving architecture....) I'm a little dazed....comments anyone? Best regards, Simon Riggs
pgsql-hackers by date: