Thread: XLog: how to log?
Hackers, I'm trying to figure out how to log the subtransaction operations. A simple idea would be to add the committed subxact Xids to the xl_xact_commit and xl_xact_abort structs. However, this will be a variable length array, and those structs already have variable length arrays at the end (filenodes to drop). I don't see how would I overlay that into a C struct. A different idea would be to write a XLog record for subxact aborts (which would include Xids of all committed child subxacts), and at main transaction commit write another (different) XLog record, where XLogRecord.Xid = mainTransactionXid, which would carry the committed subtransactions. Does this make any sense? Ideas, comments? -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "No reniegues de lo que alguna vez creíste"
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > A simple idea would be to add the committed subxact Xids to the > xl_xact_commit and xl_xact_abort structs. However, this will be a > variable length array, and those structs already have variable length > arrays at the end (filenodes to drop). I don't see how would I overlay > that into a C struct. Just means you have to do some address arithmetic instead of being able to reference the additional data as a struct member. Tedious but hardly difficult. See for instance the handling of "move" data in XLOG_HEAP_MOVE records, or "unused" data in XLOG_HEAP_CLEAN. I'd not recommend emitting additional xlog records unless there's a genuine log-semantics-related reason for doing so. (Thinks about it...) It seems like you'd want to emit a subtrans abort record to carry information about file deletions, but there's no real need to emit subtrans commit records, is there? regards, tom lane
On Tue, 2004-05-11 at 04:05, Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > A simple idea would be to add the committed subxact Xids to the > > xl_xact_commit and xl_xact_abort structs. However, this will be a > > variable length array, and those structs already have variable length > > arrays at the end (filenodes to drop). I don't see how would I overlay > > that into a C struct. > > Just means you have to do some address arithmetic instead of being able > to reference the additional data as a struct member. Tedious but hardly > difficult. See for instance the handling of "move" data in > XLOG_HEAP_MOVE records, or "unused" data in XLOG_HEAP_CLEAN. > > I'd not recommend emitting additional xlog records unless there's a > genuine log-semantics-related reason for doing so. (Thinks about it...) > It seems like you'd want to emit a subtrans abort record to carry > information about file deletions, but there's no real need to emit > subtrans commit records, is there? > Should we write a subtrans commit record when WAL_DEBUG is set? We could remove it completely in the future when the interaction between PITR recovery and subtransactions have been fully covered off. My thinking is that subtransactions themselves are not valid recovery targets. As long as we can tell the top level commits from the subtransactions then it should be easy to recover. My thinking is that if a subtransaction id is wrongly quoted as a recover point, then this would be treated as a "rollforward to txnid X, but do not include this one". Does that make sense? Does everything we have support that? Best Regards, Simon Riggs
On Tue, May 11, 2004 at 07:44:53AM +0100, Simon Riggs wrote: > Tom Lane wrote: > > Just means you have to do some address arithmetic instead of being able > > to reference the additional data as a struct member. Tedious but hardly > > difficult. See for instance the handling of "move" data in > > XLOG_HEAP_MOVE records, or "unused" data in XLOG_HEAP_CLEAN. Got it. I had skimmed looking for an example of this but didn't find this. > Should we write a subtrans commit record when WAL_DEBUG is set? We could > remove it completely in the future when the interaction between PITR > recovery and subtransactions have been fully covered off. Actually, the only thing a subtransaction has to recover is mark its Xid as either committed of aborted in clog, and possible file deletions, so there shouldn't be too much interaction with your work. (Unrelated: note that after main transaction commit, a committed subtransaction is indistinguishable from a committed main transaction -- and with the current idea of XLog I have, after recovering a transaction tree from XLog there won't be any mark in pg_subtrans. So the system will not be exactly as it was before but it won't matter.) > My thinking is that subtransactions themselves are not valid recovery > targets. As long as we can tell the top level commits from the > subtransactions then it should be easy to recover. My thinking is that > if a subtransaction id is wrongly quoted as a recover point, then this > would be treated as a "rollforward to txnid X, but do not include this > one". Does that make sense? Does everything we have support that? Hmm ... I think it should be forbidden to quote a subtrans Xid as rollforward point. Not sure if that can be done though, or how to do it. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "La gente vulgar solo piensa en pasar el tiempo; el que tiene talento, en aprovecharlo"
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > Hmm ... I think it should be forbidden to quote a subtrans Xid as > rollforward point. Not sure if that can be done though, or how to do > it. Seems like a nonissue, unless the XLOG trace makes a subtrans look the same as a main trans, which it'd not do would it? We could allow specification of a subtrans ID to be interpreted the same as specification of its parent main trans. Dunno if that's actually useful to anyone. Actually, I'd think that people would generally specify recovery up to a particular timestamp, and not be interested in xact numbers at all ... regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > Hmm ... I think it should be forbidden to quote a subtrans Xid as > > rollforward point. Not sure if that can be done though, or how to do > > it. > > Seems like a nonissue, unless the XLOG trace makes a subtrans look the > same as a main trans, which it'd not do would it? > > We could allow specification of a subtrans ID to be interpreted the same > as specification of its parent main trans. Dunno if that's actually > useful to anyone. Actually, I'd think that people would generally > specify recovery up to a particular timestamp, and not be interested in > xact numbers at all ... I don't think timestamp is going to be precise enough. Basically I can see someone saying I want recovery up to 4am, but anything more specific will need xid. I suggested that we write an xlog dump tool so you can see the xids (with some xid details) and rough timestamps stored in the WAL file and choose the xid for recovery. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Tue, May 11, 2004 at 10:37:49AM -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > Hmm ... I think it should be forbidden to quote a subtrans Xid as > > rollforward point. Not sure if that can be done though, or how to do > > it. > > Seems like a nonissue, unless the XLOG trace makes a subtrans look the > same as a main trans, which it'd not do would it? Huh, certainly not. > We could allow specification of a subtrans ID to be interpreted the same > as specification of its parent main trans. Dunno if that's actually > useful to anyone. Probably not. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "If it wasn't for my companion, I believe I'd be having the time of my life" (John Dunbar)
Simon Riggs <simon@2ndquadrant.com> writes: > Currently, recovery loops until end of xlogs. There is no exit condition > from the loop. There is not currently a timestamp on the xlogs - > anywhere apart from the file date on each xlog. Sure there is: every transaction commit record (and I think abort as well) carries a timestamp. This was put in with PITR in mind in the very beginning. regards, tom lane
On Tue, 2004-05-11 at 16:33, Bruce Momjian wrote: > Tom Lane wrote: > > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > > Hmm ... I think it should be forbidden to quote a subtrans Xid as > > > rollforward point. Not sure if that can be done though, or how to do > > > it. > > > > Seems like a nonissue, unless the XLOG trace makes a subtrans look the > > same as a main trans, which it'd not do would it? > > I agree that a subtrans xid should not be a valid rollforward point. Forgive me discussing what seems like obvious points - I'm sure you appreciate we need an exact statement of how/when to terminate recovery and that might be found in looking harder at the subtrans questions. This is my third re-write of this e-mail, since I keep thinking of additional things while going for the "definitive statement". I had thought this was straightforward... Currently, recovery loops until end of xlogs. There is no exit condition from the loop. There is not currently a timestamp on the xlogs - anywhere apart from the file date on each xlog. Xids are assigned sequentially to transactions as they start. However, Xids are not committed sequentially. Moreover, checkpoint records do not wait for transactions to complete, so a checkpoint could record an Xid, yet a lower Xid might still be in progress and commit sometime after the checkpoint. So, when we do a backup, we might take with us a pg_control that has a particular Xid, only to find lots of later committed, but earlier Xids in the xlogs. So Xid can have no lower bound. (and a fully formed clog is essential to recovery). If we go searching for a particular Xid, there is no way to tell whether an Xid suggested by a user is too big or too small for use as a recovery target. We need to recover - it is the only way to tell; if we find an Xid that matches, we stop. If not, we keep going until end of logs, when we need to issue a "recovered fully - the Xid you gave was not valid", which may take some time and is also very clearly not what was wanted. (If they had wanted full recovery, they would have asked). So searching on an Xid is inherently a poor way to recover. Which is a shame, because it seemed like an easy target. Unless of course, we live with this vagueness and get on and build the XLogSpy... Xlog records ARE written sequentially, so a timestamp written to the xlogs COULD be used as a target for halting recovery. We would be able to decide, ahead of starting recovery, whether we would be able to sensibly recover to that point by using the pg_control checkpoint time as the lower bound and the file write times of the highest xlog as the upper bound. Once decided that the target timestamp lies between upper and lower bounds, we begin recovery, knowing exactly where it will complete. During recovery, we would search for a timestamp. If found exactly, stop. If exceeded, stop. Any transactions not committed at that point are, as we say, out of luck. ....This approach has a certainty about it that I think is much better than the error prone Xid hunting approach, and is also more attuned to the human reality (time matters, Xids don't). Earlier, Bruce and I had discussed that for reasons of time pressure, the PITR code for this release would consist of a) recovery to a particular Xid b) later, a utility that allowed xlogs to be inspected to allow DBA to decide which is the correct Xid to recover to. Those ideas don't sound as good now.... Therefore: action on me? - add a timestamp to EACH xlog record - something I had been shying away from. On Tue, 2004-05-11 at 14:56, Alvaro Herrera wrote: > (Unrelated: note that after main transaction commit, a committed > subtransaction is indistinguishable from a committed main transaction -- > and with the current idea of XLog I have, after recovering a transaction > tree from XLog there won't be any mark in pg_subtrans. So the system > will not be exactly as it was before but it won't matter.) I don't think we need a subtrans commit directly, since if the top-level commits after the subtrans has committed, then we're good. However, if a subtrans aborts, yet the top-level commits there will be data written to the database about an aborted transaction. We don't have Undo, so the subtrans clog must be updated to show that the subtrans aborted, otherwise we would read both the committed (top-level) and the uncommitted data (subtrans). Another way of putting it - if it was worth writing before a crash, it is worth recovering after a crash. Shurely? > > We could allow specification of a subtrans ID to be interpreted the same > > as specification of its parent main trans. Dunno if that's actually > > useful to anyone. Actually, I'd think that people would generally > > specify recovery up to a particular timestamp, and not be interested in > > xact numbers at all ... > > I don't think timestamp is going to be precise enough. Basically I can > see someone saying I want recovery up to 4am, but anything more specific > will need xid. I suggested that we write an xlog dump tool so you can > see the xids (with some xid details) and rough timestamps stored in the > WAL file and choose the xid for recovery. Bruce, As I started this e-mail (1st time), I completely agreed with you. I've now had to switch my thinking. (Doesn't effect archiving architecture....) I'm a little dazed....comments anyone? Best regards, Simon Riggs
Simon Riggs wrote: > During recovery, we would search for a timestamp. If found exactly, > stop. If exceeded, stop. Any transactions not committed at that point > are, as we say, out of luck. ....This approach has a certainty about it > that I think is much better than the error prone Xid hunting approach, > and is also more attuned to the human reality (time matters, Xids > don't). > > Earlier, Bruce and I had discussed that for reasons of time pressure, > the PITR code for this release would consist of > a) recovery to a particular Xid > b) later, a utility that allowed xlogs to be inspected to allow DBA to > decide which is the correct Xid to recover to. > Those ideas don't sound as good now.... > > Therefore: action on me? - add a timestamp to EACH xlog record - > something I had been shying away from. Many transactions are going to have the same timestamp because that just isn't precise enough to choose a particular transaction. I agree finding a particular xid in the logs is hard. We could just scan the logs to see if we find the xid before doing the recovery. Anyway, I though we agreed to just get total recovery working for 7.5 and we can deal with recovery to a particular point later. Or we can just add timestamps to the wal file header and restore to a particular wal file date timestamp. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Tue, May 11, 2004 at 09:25:37PM +0100, Simon Riggs wrote: > On Tue, 2004-05-11 at 16:33, Bruce Momjian wrote: > > Tom Lane wrote: > > > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > > > Hmm ... I think it should be forbidden to quote a subtrans Xid as > > > > rollforward point. Not sure if that can be done though, or how to do > > > > it. > > > > > > Seems like a nonissue, unless the XLOG trace makes a subtrans look the > > > same as a main trans, which it'd not do would it? > > I agree that a subtrans xid should not be a valid rollforward point. If you try to do that you'll fail because there will be no XLog record signalling the commit of a subtransaction. They will be marked committed as necessary as a subproduct of main transaction committing. > Currently, recovery loops until end of xlogs. There is no exit condition > from the loop. There is not currently a timestamp on the xlogs - > anywhere apart from the file date on each xlog. Both xact commit and abort have timestamps in the XLog. I think valid recovery points are transaction commit/abort, not transaction start. > If we go searching for a particular Xid, there is no way to tell whether > an Xid suggested by a user is too big or too small for use as a recovery > target. We need to recover - it is the only way to tell; if we find an > Xid that matches, we stop. If not, we keep going until end of logs, when > we need to issue a "recovered fully - the Xid you gave was not valid", > which may take some time and is also very clearly not what was wanted. I think the user should first examine the logs with whatever tools are provided, and use a timestamp or a Xid listed in the XLog. If they use a Xid that's not listed, it's not our fault ... > b) later, a utility that allowed xlogs to be inspected to allow DBA to > decide which is the correct Xid to recover to. Why is this difficult? There are lots of subsys_desc() functions which already returns what's in each log record as a string. The tool could initially just dump that ... > Therefore: action on me? - add a timestamp to EACH xlog record - > something I had been shying away from. You only need timestamps in xl_xact_commit and xl_xact_abort, which are already there. > On Tue, 2004-05-11 at 14:56, Alvaro Herrera wrote: > > (Unrelated: note that after main transaction commit, a committed > > subtransaction is indistinguishable from a committed main transaction -- > > and with the current idea of XLog I have, after recovering a transaction > > tree from XLog there won't be any mark in pg_subtrans. So the system > > will not be exactly as it was before but it won't matter.) > > I don't think we need a subtrans commit directly, since if the top-level > commits after the subtrans has committed, then we're good. If the subxact wrote a tuple, its Xid has to be in the pg_clog. Thus we need to recover the pg_clog write. > However, if a subtrans aborts, yet the top-level commits there will be > data written to the database about an aborted transaction. We don't have > Undo, so the subtrans clog must be updated to show that the subtrans > aborted, otherwise we would read both the committed (top-level) and the > uncommitted data (subtrans). If the aborted subxact wrote a tuple, its Xid has to be in the pg_clog. > Another way of putting it - if it was worth writing before a crash, it > is worth recovering after a crash. Surely? Right. What I was saying is that we don't need pg_subtrans info, because that's only needed while the subtransaction is marked as "subcommitted" but it's parent hasn't committed or aborted yet. The subcommitted status is changed to committed/aborted when the main transaction commits or aborts; at recovery time, we already know if that happenned or not so we can mark it right away. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "La vida es para el que se aventura"
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > Both xact commit and abort have timestamps in the XLog. I think valid > recovery points are transaction commit/abort, not transaction start. A transaction abort cannot be an interesting stop point either; you may as well stop at the latest preceding commit. Whatever the aborted transaction did need not be entered into the database. The abort record timestamps may be useful for navigation (to help determine where you want to stop) but AFAICS only main transaction commit records are useful stopping points. regards, tom lane
On Tue, 2004-05-11 at 21:29, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > Currently, recovery loops until end of xlogs. There is no exit condition > > from the loop. There is not currently a timestamp on the xlogs - > > anywhere apart from the file date on each xlog. > > Sure there is: every transaction commit record (and I think abort as > well) carries a timestamp. This was put in with PITR in mind in the > very beginning. > When being wrong has a purpose, I just love it. I'll put my code-goggles back on the right way round. Thanks, Simon
On Tue, 2004-05-11 at 22:01, Bruce Momjian wrote: > Anyway, I though we agreed to just get total recovery working for 7.5 > and we can deal with recovery to a particular point later. Back on target now! Regards, Simon
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Many transactions are going to have the same timestamp because that just > isn't precise enough to choose a particular transaction. I think this argument is largely a red herring ... but if it makes you feel better, we could change the contents of the commit timestamp to be gettimeofday() output (seconds+microseconds) instead of just time() output. That should be precise enough for practical purposes. regards, tom lane
Thanks to all of you for such swift advice and correction! Have a good evening... On Tue, 2004-05-11 at 22:26, Alvaro Herrera wrote: > On Tue, May 11, 2004 at 09:25:37PM +0100, Simon Riggs wrote: > > On Tue, 2004-05-11 at 16:33, Bruce Momjian wrote: > > > Tom Lane wrote: > > > > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > Both xact commit and abort have timestamps in the XLog. I think valid > recovery points are transaction commit/abort, not transaction start. > Agreed. > > > If we go searching for a particular Xid, there is no way to tell whether > > an Xid suggested by a user is too big or too small for use as a recovery > > target. We need to recover - it is the only way to tell; if we find an > > Xid that matches, we stop. If not, we keep going until end of logs, when > > we need to issue a "recovered fully - the Xid you gave was not valid", > > which may take some time and is also very clearly not what was wanted. > > I think the user should first examine the logs with whatever tools are > provided, and use a timestamp or a Xid listed in the XLog. If they use > a Xid that's not listed, it's not our fault ... Blame is in the eye of the beholder. Anyway, because of above, no worries. > > > b) later, a utility that allowed xlogs to be inspected to allow DBA to > > decide which is the correct Xid to recover to. > > Why is this difficult? There are lots of subsys_desc() functions which > already returns what's in each log record as a string. The tool could > initially just dump that ... > It's not hard, its all a question of finite time resources. > > > Therefore: action on me? - add a timestamp to EACH xlog record - > > something I had been shying away from. > > You only need timestamps in xl_xact_commit and xl_xact_abort, which are > already there. > > > On Tue, 2004-05-11 at 14:56, Alvaro Herrera wrote: > > > (Unrelated: note that after main transaction commit, a committed > > > subtransaction is indistinguishable from a committed main transaction -- > > > and with the current idea of XLog I have, after recovering a transaction > > > tree from XLog there won't be any mark in pg_subtrans. So the system > > > will not be exactly as it was before but it won't matter.) > > > > I don't think we need a subtrans commit directly, since if the top-level > > commits after the subtrans has committed, then we're good. > > If the subxact wrote a tuple, its Xid has to be in the pg_clog. Thus > we need to recover the pg_clog write. > > > However, if a subtrans aborts, yet the top-level commits there will be > > data written to the database about an aborted transaction. We don't have > > Undo, so the subtrans clog must be updated to show that the subtrans > > aborted, otherwise we would read both the committed (top-level) and the > > uncommitted data (subtrans). > > If the aborted subxact wrote a tuple, its Xid has to be in the pg_clog. > > > Another way of putting it - if it was worth writing before a crash, it > > is worth recovering after a crash. Surely? > > Right. What I was saying is that we don't need pg_subtrans info, > because that's only needed while the subtransaction is marked as > "subcommitted" but it's parent hasn't committed or aborted yet. The > subcommitted status is changed to committed/aborted when the main > transaction commits or aborts; at recovery time, we already know if that > happenned or not so we can mark it right away. Maybe I'm just not following you, which seems likely. Surely it is possible that a subtrans could abort, yet the fulltrans commits. Yet that doesn't in any way imply that the subtrans should change from aborted to committed? Anyway, I've raised my thoughts, so if you're still OK, thats fine by me. I'll get back to the wheel to see what can be spun... Best regards, Simon Riggs
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Many transactions are going to have the same timestamp because that just > > isn't precise enough to choose a particular transaction. > > I think this argument is largely a red herring ... but if it makes you > feel better, we could change the contents of the commit timestamp to > be gettimeofday() output (seconds+microseconds) instead of just time() > output. That should be precise enough for practical purposes. I am saying timestamp as used for specifying a recovery location might not be unique enough, no? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > On Tue, May 11, 2004 at 09:25:37PM +0100, Simon Riggs wrote: >> Another way of putting it - if it was worth writing before a crash, it >> is worth recovering after a crash. Surely? > Right. Wrong. Only actions that were *committed* before the crash, or in general the selected recovery stopping time, need be recovered. The actions-so-far of transactions that were in progress at that time don't have to be redone. It doesn't really matter if we do do them; we have a consistent database state either way. I'm just pointing this out for the sake of clear thinking. > What I was saying is that we don't need pg_subtrans info, > because that's only needed while the subtransaction is marked as > "subcommitted" but it's parent hasn't committed or aborted yet. The > subcommitted status is changed to committed/aborted when the main > transaction commits or aborts; at recovery time, we already know if that > happenned or not so we can mark it right away. Agreed, if pg_subtrans is never consulted except for children of in-progress (or recently-completed) main transactions, then it need not be recovered. However, I'm not sure this is really true. Couldn't there be an entry in pg_clog showing a SUBTRANS COMMIT state immediately after a recovery? If the main transaction never gets to commit or abort, there would be nothing in XLOG to cue the recovery process to go and clear that clog entry. So without some further mechanism I'd expect the first backend that visits the tuple in question to go and consult pg_subtrans. There would need to be a way of knowing "XIDs older than this are by definition not to be sought in pg_subtrans". With a cutoff, you could assume that SUBTRANS COMMIT entries older than the cutoff can be changed to ABORT state. What's not clear to me is how we can advance that cutoff; it might have to go along with the clog truncation logic. regards, tom lane
On Tue, May 11, 2004 at 11:07:47PM +0100, Simon Riggs wrote: > On Tue, 2004-05-11 at 22:26, Alvaro Herrera wrote: > > Right. What I was saying is that we don't need pg_subtrans info, > > because that's only needed while the subtransaction is marked as > > "subcommitted" but it's parent hasn't committed or aborted yet. The > > subcommitted status is changed to committed/aborted when the main > > transaction commits or aborts; at recovery time, we already know if that > > happenned or not so we can mark it right away. > > Maybe I'm just not following you, which seems likely. > > Surely it is possible that a subtrans could abort, yet the fulltrans > commits. Yet that doesn't in any way imply that the subtrans should > change from aborted to committed? The subtransaction will be recorded as aborted in pg_clog (I mean there will be a XLog record saying that it aborted). That protects it from showing as committed, no matter what happens to the parent xact. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "La tristeza es un muro entre dos jardines" (Khalil Gibran)
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> I think this argument is largely a red herring ... but if it makes you >> feel better, we could change the contents of the commit timestamp to >> be gettimeofday() output (seconds+microseconds) instead of just time() >> output. That should be precise enough for practical purposes. > I am saying timestamp as used for specifying a recovery location might > not be unique enough, no? Why not? I don't think there are going to be practical situations where the user knows that he wants transactions up till exactly 6:48:52 PM anyway. He'll be lucky if he knows that the junior DBA dropped the wrong table around 6:30 :-(. It's even less likely that he desperately needs to revert to before such a disaster but still get in a transaction that happened to commit at the same second. Take it down to the microsecond level and the use-case becomes vanishingly small. What is *very* likely, but has no chance of being supported by PITR, is that he wishes he could keep the results of transactions committed shortly after the error he's trying to undo. At the microsecond level, the transaction commit order is not that predictable anyway (except of course when one transaction's lock blocks another). I think it's pointless to get all exercised about whether you can stop on *this* transaction and not *that* one when they are committing so close together. Half the time the user will be wishing he could omit the first and include the second, but he doesn't get to do that, and so I say being able to point to the stopping xact by xid is just an exercise in technical hair-splitting. I don't think it solves any real-world problem. Another way of putting this is that one-second resolution in the states you have available to restore to is plenty good enough for real-world cases. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> I think this argument is largely a red herring ... but if it makes you > >> feel better, we could change the contents of the commit timestamp to > >> be gettimeofday() output (seconds+microseconds) instead of just time() > >> output. That should be precise enough for practical purposes. > > > I am saying timestamp as used for specifying a recovery location might > > not be unique enough, no? > > Why not? I don't think there are going to be practical situations where > the user knows that he wants transactions up till exactly 6:48:52 PM > anyway. He'll be lucky if he knows that the junior DBA dropped the > wrong table around 6:30 :-(. It's even less likely that he desperately > needs to revert to before such a disaster but still get in a transaction > that happened to commit at the same second. Take it down to the > microsecond level and the use-case becomes vanishingly small. Here is my logic. Once they have a way to dump the WAL contents, folks trying to recover to a specific point in the past are going to look at the WAL dump and hopefully identify the transaction that was bad. They then will want to roll back to just before that transaction. Do they subtract one second from the transaction? Seems it would be easier to just pick the xid that was just before the bad one. Also, considering the various time formats and timezone issues that it is simpler to just have them specify an xid. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Wed, 2004-05-12 at 04:47, Bruce Momjian wrote: > Tom Lane wrote: > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > Tom Lane wrote: > > >> I think this argument is largely a red herring ... but if it makes you > > >> feel better, we could change the contents of the commit timestamp to > > >> be gettimeofday() output (seconds+microseconds) instead of just time() > > >> output. That should be precise enough for practical purposes. > > > > > I am saying timestamp as used for specifying a recovery location might > > > not be unique enough, no? > > > > Why not? I don't think there are going to be practical situations where > > the user knows that he wants transactions up till exactly 6:48:52 PM > > anyway. He'll be lucky if he knows that the junior DBA dropped the > > wrong table around 6:30 :-(. It's even less likely that he desperately > > needs to revert to before such a disaster but still get in a transaction > > that happened to commit at the same second. Take it down to the > > microsecond level and the use-case becomes vanishingly small. > > Here is my logic. Once they have a way to dump the WAL contents, folks > trying to recover to a specific point in the past are going to look at > the WAL dump and hopefully identify the transaction that was bad. They > then will want to roll back to just before that transaction. > > Do they subtract one second from the transaction? Seems it would be > easier to just pick the xid that was just before the bad one. Also, > considering the various time formats and timezone issues that it is > simpler to just have them specify an xid. Well, I think I agree with both sides of this debate. Solution: provide both timestamp AND Xid capability. We assume that if they specify Xid, it is because they know and, for whatever reason, care, about the exact specification of where recovery stops. If you know a large statement just executed in error, then you want to restore back to just before the error. My earlier angst was based upon mistaking that there was no timestamp. There is now a simple choice of recovery targets and fairly simple to implement both. Design is now clear for me. Best Regards, Simon Riggs
> > I think this argument is largely a red herring ... but if it makes you > > feel better, we could change the contents of the commit timestamp to > > be gettimeofday() output (seconds+microseconds) instead of just time() > > output. That should be precise enough for practical purposes. > > I am saying timestamp as used for specifying a recovery location might > not be unique enough, no? Maybe the api should allow a xid only in addition to a timestamp, for use when timestamp alone is not precise enough. That would solve the problem of not finding the xid. I would stay away from microseconds. Andreas