Thread: PITR Phase 2 - Design Planning
Since Phase1 is functioning and should hopefully soon complete, we can now start thinking about Phase 2: full recovery to a point-in-time. Previous thinking was that a command line switch would be used to specify recover to a given point in time, rather than the default, which will be recover all the way to end of (available) xlogs. Recovering to a specific point in time forces us to consider what the granularity is of time. We could recover: 1.to end of a full transaction log file 2.to end of a full transaction Transaction log files currently have timestamps, so that is straightforward, but probably not the best we can do. We would rollforward until the xlog file time > desired point in time. To make (2) work we would have to have a timestamp associated with each transaction. This could be in one of two places: 1. the transaction record in the clog 2. the log record in the xlog We would then recover the xlog record by record, until we found a record that had a timestamp > desired point-in-time. Currently, neither of these places have a timestamp. Hmmmm. We can't use pg_control because we are assuming that it needs recovery... I can't see any general way of adding a timestamp in any less than 2 bytes. We don't need a timezone. The timestamp could refer to a number of seconds since last checkpoint; since this is limited already by a GUC to force checkpoints every so often. Although code avoids a checkpoint if no updates have taken place, we wouldn't be too remiss to use a forced checkpoint every 32,000 seconds (9 hours). Assuming that accuracy of the point-in-time was of the order of seconds?? If we went to 0.1 second accuracy, we could checkpoint (force) every 40 minutes or so. All of that seems too restrictive. If we went to milliseconds, then we could use a 4 byte value and use a checkpoint (force) every 284 hours or 1.5 weeks. Thoughts? Clog uses 2 bits per transaction, so even 2 bytes extra per transaction will make the clog 9 times larger than originally intended. This could well cause it to segment quicker, but I'm sure no one would be happy with that. So, lets not add anything to the clog. The alternative is to make the last part of the XlogHeader record a timestamp value, increasing each xlog write. It might be possible to make this part of the header optional depending upon whether or not PITR was required, but then my preference is against such dynamic coding. So, I propose: - appending 8 byte date/time data into xlog file header record - appending 4 bytes of time offset onto each xlog record - altering the recovery logic to compare the calculated time of each xlog record (file header + offset) against the desired point-in-time, delivered to it by GUC. Input is sought from anybody with detailed NTP knowledge, since the working of NTP drift correction may have some subtle interplay with this proposal. Also, while that code is being altered, some additional log records need to be added when recovery of each new xlog starts, with timing, to allow DBAs watching a recovery to calculate expected completion times for the recovery, which is essential for long recovery situations. I am also considering any changes that may be required to prepare the way for a future implementation of parallel redo recovery. Best regards, Simon Riggs, 2ndQuadrant http://www.2ndquadrant.com
Simon Riggs wrote: > Transaction log files currently have timestamps, so that is > straightforward, but probably not the best we can do. We would > rollforward until the xlog file time > desired point in time. > > To make (2) work we would have to have a timestamp associated with each > transaction. This could be in one of two places: > 1. the transaction record in the clog > 2. the log record in the xlog > We would then recover the xlog record by record, until we found a record > that had a timestamp > desired point-in-time. > > Currently, neither of these places have a timestamp. Hmmmm. We can't use > pg_control because we are assuming that it needs recovery... > > I can't see any general way of adding a timestamp in any less than 2 > bytes. We don't need a timezone. The timestamp could refer to a number > of seconds since last checkpoint; since this is limited already by a GUC > to force checkpoints every so often. Although code avoids a checkpoint > if no updates have taken place, we wouldn't be too remiss to use a > forced checkpoint every 32,000 seconds (9 hours). > Assuming that accuracy of the point-in-time was of the order of > seconds?? If we went to 0.1 second accuracy, we could checkpoint (force) > every 40 minutes or so. All of that seems too restrictive. > If we went to milliseconds, then we could use a 4 byte value and use a > checkpoint (force) every 284 hours or 1.5 weeks. > Thoughts? I was thinking --- how would someone know the time to use for restore? Certainly they will not know subsecond accuracy? Probably second-level accuracty is enough, _except_ when they want everything restored up to a DROP TABLE transaction or some major problem. Is there a way to give users a list of transactions on a log backup? Can we show them the username, database, or commands or something? Would they be able to restore up to a specific transaction in that case? Basically, we could give them sub-second recovery, but what value would it be? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Mon, Apr 26, 2004 at 05:05:41PM -0400, Bruce Momjian wrote: > Simon Riggs wrote: > > Transaction log files currently have timestamps, so that is > > straightforward, but probably not the best we can do. We would > > rollforward until the xlog file time > desired point in time. > > I was thinking --- how would someone know the time to use for restore? I think there should be a way to get a TransactionId and restore up to that point. It'd be cool, but not required, if the system showed what valid TransactionIds there are, and roughly what they did (the xlog code already has "describers" everywhere AFAICS). -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "El realista sabe lo que quiere; el idealista quiere lo que sabe" (Anónimo)
On Mon, 2004-04-26 at 23:01, Alvaro Herrera wrote: > On Mon, Apr 26, 2004 at 05:05:41PM -0400, Bruce Momjian wrote: > > Simon Riggs wrote: > > > Transaction log files currently have timestamps, so that is > > > straightforward, but probably not the best we can do. We would > > > rollforward until the xlog file time > desired point in time. > > > > I was thinking --- how would someone know the time to use for restore? > > I think there should be a way to get a TransactionId and restore up to > that point. It'd be cool, but not required, if the system showed what > valid TransactionIds there are, and roughly what they did (the xlog > code already has "describers" everywhere AFAICS). You're right, I think we should start by implementing the rollforward to a txnid before we consider the rollforward to a specified point-in-time. All the hooks for that are already there... Best regards, Simon Riggs
On Mon, 2004-04-26 at 22:05, Bruce Momjian wrote: > Simon Riggs wrote: > > Transaction log files currently have timestamps, so that is > > straightforward, but probably not the best we can do. We would > > rollforward until the xlog file time > desired point in time. > > > > To make (2) work we would have to have a timestamp associated with each > > transaction. This could be in one of two places: > > 1. the transaction record in the clog > > 2. the log record in the xlog > > We would then recover the xlog record by record, until we found a record > > that had a timestamp > desired point-in-time. > > > > Currently, neither of these places have a timestamp. Hmmmm. We can't use > > pg_control because we are assuming that it needs recovery... > > > > I can't see any general way of adding a timestamp in any less than 2 > > bytes. We don't need a timezone. The timestamp could refer to a number > > of seconds since last checkpoint; since this is limited already by a GUC > > to force checkpoints every so often. Although code avoids a checkpoint > > if no updates have taken place, we wouldn't be too remiss to use a > > forced checkpoint every 32,000 seconds (9 hours). > > Assuming that accuracy of the point-in-time was of the order of > > seconds?? If we went to 0.1 second accuracy, we could checkpoint (force) > > every 40 minutes or so. All of that seems too restrictive. > > If we went to milliseconds, then we could use a 4 byte value and use a > > checkpoint (force) every 284 hours or 1.5 weeks. > > Thoughts? > > I was thinking --- how would someone know the time to use for restore? > Certainly they will not know subsecond accuracy? Probably second-level > accuracty is enough, _except_ when they want everything restored up to a > DROP TABLE transaction or some major problem. Is there a way to give > users a list of transactions on a log backup? Can we show them the > username, database, or commands or something? Would they be able to > restore up to a specific transaction in that case? > > Basically, we could give them sub-second recovery, but what value would > it be? Yes, you remind me of a whole train of thought... There should be a switch to allow you to specify the txnid you wish to recover up until as well. You raise the point of how you know what time to recover to. That is in fact the very hardest part of recovery for a DBA. That's a good reason for being able to list xlog contents, as you can with Oracle. Sounds like we need an XlogMiner utility... > Can we show them the username, database, or commands or something? Yes, that sounds fairly straightforward possible using a modification of the ReadRecord functions at the bottom of xlog.c - which is why security of the xlogs is important. It's also a good reason for being able to pause and restart recovery, so you can see what it's like before continuing further. Usually you are trying to sync up the contents of the database with all of the other things that were being updated too. Often these will define either the required transaction contents, or give a time to use. Whatever level of time accuracy you choose, we would always need to handle the case where multiple transactions have been committed with exactly the same time (after rounding) and yet we may wish to split them. Rolling forward to a txnid would help there. Best regards, Simon
Simon Riggs wrote: > On Mon, 2004-04-26 at 23:01, Alvaro Herrera wrote: > > On Mon, Apr 26, 2004 at 05:05:41PM -0400, Bruce Momjian wrote: > > > Simon Riggs wrote: > > > > Transaction log files currently have timestamps, so that is > > > > straightforward, but probably not the best we can do. We would > > > > rollforward until the xlog file time > desired point in time. > > > > > > I was thinking --- how would someone know the time to use for restore? > > > > I think there should be a way to get a TransactionId and restore up to > > that point. It'd be cool, but not required, if the system showed what > > valid TransactionIds there are, and roughly what they did (the xlog > > code already has "describers" everywhere AFAICS). > > You're right, I think we should start by implementing the rollforward to > a txnid before we consider the rollforward to a specified point-in-time. > All the hooks for that are already there... Yep, sounds like a plan. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Simon Riggs wrote: > Since Phase1 is functioning and should hopefully soon complete, we can > now start thinking about Phase 2: full recovery to a point-in-time. > > Previous thinking was that a command line switch would be used to > specify recover to a given point in time, rather than the default, which > will be recover all the way to end of (available) xlogs. > > Recovering to a specific point in time forces us to consider what the > granularity is of time. > We could recover: > 1.to end of a full transaction log file > 2.to end of a full transaction > > Transaction log files currently have timestamps, so that is > straightforward, but probably not the best we can do. We would > rollforward until the xlog file time > desired point in time. > > To make (2) work we would have to have a timestamp associated with each > transaction. This could be in one of two places: > 1. the transaction record in the clog > 2. the log record in the xlog > We would then recover the xlog record by record, until we found a record > that had a timestamp > desired point-in-time. > > Currently, neither of these places have a timestamp. Hmmmm. We can't use > pg_control because we are assuming that it needs recovery... > > I can't see any general way of adding a timestamp in any less than 2 > bytes. We don't need a timezone. The timestamp could refer to a number > of seconds since last checkpoint; since this is limited already by a GUC > to force checkpoints every so often. Although code avoids a checkpoint > if no updates have taken place, we wouldn't be too remiss to use a > forced checkpoint every 32,000 seconds (9 hours). > Assuming that accuracy of the point-in-time was of the order of > seconds?? If we went to 0.1 second accuracy, we could checkpoint (force) > every 40 minutes or so. All of that seems too restrictive. > If we went to milliseconds, then we could use a 4 byte value and use a > checkpoint (force) every 284 hours or 1.5 weeks. > Thoughts? > > Clog uses 2 bits per transaction, so even 2 bytes extra per transaction > will make the clog 9 times larger than originally intended. This could > well cause it to segment quicker, but I'm sure no one would be happy > with that. So, lets not add anything to the clog. > > The alternative is to make the last part of the XlogHeader record a > timestamp value, increasing each xlog write. It might be possible to > make this part of the header optional depending upon whether or not PITR > was required, but then my preference is against such dynamic coding. > > So, I propose: > > - appending 8 byte date/time data into xlog file header record > - appending 4 bytes of time offset onto each xlog record > - altering the recovery logic to compare the calculated time of each > xlog record (file header + offset) against the desired point-in-time, > delivered to it by GUC. > > Input is sought from anybody with detailed NTP knowledge, since the > working of NTP drift correction may have some subtle interplay with this > proposal. > > Also, while that code is being altered, some additional log records need > to be added when recovery of each new xlog starts, with timing, to allow > DBAs watching a recovery to calculate expected completion times for the > recovery, which is essential for long recovery situations. > > I am also considering any changes that may be required to prepare the > way for a future implementation of parallel redo recovery. > > Best regards, Simon Riggs, 2ndQuadrant > http://www.2ndquadrant.com > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend Simon, I have one question which might be important: If we use timestamps inside the WAL system to find out where to stop. What happens if somebody changes the time of the system? (e.g. correcting the system clock by calling ntpdate). Wouldn't it confuse the PITR system? How do you plan to handle that? Unfortunately time is nothing which can be used as a key (at least not from my point of view). Just some lousy ideas early in the morning ... Regards, Hans -- Cybertec Geschwinde u Schoenig Schoengrabern 134, A-2020 Hollabrunn, Austria Tel: +43/2952/30706 or +43/664/233 90 75 www.cybertec.at, www.postgresql.at, kernel.cybertec.at
On Tuesday 27 April 2004 00:32, Bruce Momjian wrote: > Simon Riggs wrote: > > On Mon, 2004-04-26 at 23:01, Alvaro Herrera wrote: > > > On Mon, Apr 26, 2004 at 05:05:41PM -0400, Bruce Momjian wrote: > > > > I was thinking --- how would someone know the time to use for > > > > restore? > > > > > > I think there should be a way to get a TransactionId and restore up to > > > that point. It'd be cool, but not required, if the system showed what > > > valid TransactionIds there are, and roughly what they did (the xlog > > > code already has "describers" everywhere AFAICS). > > > > You're right, I think we should start by implementing the rollforward to > > a txnid before we consider the rollforward to a specified point-in-time. > > All the hooks for that are already there... > > Yep, sounds like a plan. Speaking as a DBA, what I usually want to do is restore to "immediately before I started the payroll calculation". An actual wall-clock time is mostly irrelevant to me. Suggestion: How about a pg_trans_note table (crap name, I know) that only permits inserts - records (backend-pid, timestamp, notes). My app inserts "starting payroll calc" and "ending payroll calc" entries because those are the points I might wish to sync to. If I want to sync for each individual transaction in my calculations, my app can do that too. From a usability point of view you might want to automatically insert rows on client connection/table creation etc. You could also delete any rows more than a week old when archiving WAL files. -- Richard Huxton Archonet Ltd
On Tue, Apr 27, 2004 at 10:38:45 +0100, Richard Huxton <dev@archonet.com> wrote: > > Speaking as a DBA, what I usually want to do is restore to "immediately before > I started the payroll calculation". An actual wall-clock time is mostly > irrelevant to me. For long running transactions where you want to recover as much as possible, one might also want to recover up until just before a specific transaction committed (as opposed to started).
On Tue, 2004-04-27 at 08:56, Hans-Jürgen Schönig wrote: > Simon Riggs wrote: > > Since Phase1 is functioning and should hopefully soon complete, we can > > now start thinking about Phase 2: full recovery to a point-in-time. > > > > Previous thinking was that a command line switch would be used to > > specify recover to a given point in time, rather than the default, which > > will be recover all the way to end of (available) xlogs. > > > > Recovering to a specific point in time forces us to consider what the > > granularity is of time. > > We could recover: > > 1.to end of a full transaction log file > > 2.to end of a full transaction > > > > Transaction log files currently have timestamps, so that is > > straightforward, but probably not the best we can do. We would > > rollforward until the xlog file time > desired point in time. > > > > To make (2) work we would have to have a timestamp associated with each > > transaction. ... > > So, I propose: > > > > - appending 8 byte date/time data into xlog file header record > > - appending 4 bytes of time offset onto each xlog record > > - altering the recovery logic to compare the calculated time of each > > xlog record (file header + offset) against the desired point-in-time, > > delivered to it by GUC. > > > > Input is sought from anybody with detailed NTP knowledge, since the > > working of NTP drift correction may have some subtle interplay with this > > proposal. > > > > > I have one question which might be important: If we use timestamps > inside the WAL system to find out where to stop. What happens if > somebody changes the time of the system? (e.g. correcting the system > clock by calling ntpdate). Wouldn't it confuse the PITR system? How do > you plan to handle that? Unfortunately time is nothing which can be used > as a key (at least not from my point of view). > Interesting question. The timestamps written within WAL would be "just data" once written. The recovery process would refer only to that timestamp data, so would not refer to any external notion of time. This is required to ensure that the same recovery will happen identically, no matter how many times you choose to re-run it (or are forced to by external circumstances). If you change the time of the system backwards, this might invalidate the transaction log history....this would effectively create two (or more) xlog records with the same timestamp on them and it would be logically indeterminate which one should limit recovery. In practical terms, I would implement this as "apply all records <= PIT". This would mean that recovery would stop only when the time became larger than PIT, which would only occur at the second (or last) record that had a timestamp equal to PIT. I guess I could put a WARNING in to say "time just went backwards...spoohw" Overall, I'd refer back to the points Bruce raised - you certainly do need a way of finding out the time to recover to, and as others have said also, time isn't the only desirable "recovery point". Best regards, Simon Riggs
On Tue, 2004-04-27 at 10:38, Richard Huxton wrote: > On Tuesday 27 April 2004 00:32, Bruce Momjian wrote: > > Simon Riggs wrote: > > > On Mon, 2004-04-26 at 23:01, Alvaro Herrera wrote: > > > > On Mon, Apr 26, 2004 at 05:05:41PM -0400, Bruce Momjian wrote: > > > > > I was thinking --- how would someone know the time to use for > > > > > restore? > > > > > > > > I think there should be a way to get a TransactionId and restore up to > > > > that point. It'd be cool, but not required, if the system showed what > > > > valid TransactionIds there are, and roughly what they did (the xlog > > > > code already has "describers" everywhere AFAICS). > > > > > > You're right, I think we should start by implementing the rollforward to > > > a txnid before we consider the rollforward to a specified point-in-time. > > > All the hooks for that are already there... > > > > Yep, sounds like a plan. > > Speaking as a DBA, what I usually want to do is restore to "immediately before > I started the payroll calculation". An actual wall-clock time is mostly > irrelevant to me. > > Suggestion: How about a pg_trans_note table (crap name, I know) that only > permits inserts - records (backend-pid, timestamp, notes). My app inserts > "starting payroll calc" and "ending payroll calc" entries because those are > the points I might wish to sync to. If I want to sync for each individual > transaction in my calculations, my app can do that too. > >From a usability point of view you might want to automatically insert rows on > client connection/table creation etc. You could also delete any rows more > than a week old when archiving WAL files. Not sure I like you're implementation, but the overall idea is great. I'd suggest extending the CHECKPOINT command so you can say: CHECKPOINT <text message> e.g. CHECKPOINT 'starting payroll Feb04'; (I'm sure some other DBMS does this...head spinning can;t recall...) the text could just appear in the xlog record data packet... That could then be used as the target recovery point. Best Regards, Simon Riggs
On Tue, 2004-04-27 at 18:43, Bruno Wolff III wrote: > On Tue, Apr 27, 2004 at 10:38:45 +0100, > Richard Huxton <dev@archonet.com> wrote: > > > > Speaking as a DBA, what I usually want to do is restore to "immediately before > > I started the payroll calculation". An actual wall-clock time is mostly > > irrelevant to me. > > For long running transactions where you want to recover as much as possible, > one might also want to recover up until just before a specific transaction > committed (as opposed to started). Sounds like the difference between > and >=, so should be possible... Best Regards, Simon Riggs
> Overall, I'd refer back to the points Bruce raised - you certainly do > need a way of finding out the time to recover to, and as others have > said also, time isn't the only desirable "recovery point". Wouldn't it be sufficient to simply use the transaction ID and ensure that all the parameters the user might want to use to find that ID can be made available in the log files?
On Tue, 2004-04-27 at 21:56, Rod Taylor wrote: > > Overall, I'd refer back to the points Bruce raised - you certainly do > > need a way of finding out the time to recover to, and as others have > > said also, time isn't the only desirable "recovery point". > > Wouldn't it be sufficient to simply use the transaction ID and ensure > that all the parameters the user might want to use to find that ID can > be made available in the log files? > Yes, of course, all methods of locating a particular xlog file to stop at are effectively equivalent. The discussion is mostly about what is convenient for the user in a real recovery situation. >From all that has been said so far, I would implement: 1. Recovery to a specific txnid, which is fairly straightforward 2. Recovery to a specific date/time a) either by implementing a log inspection tool that shows the txnid for a PIT b) implementing recovery to a PIT directly 3. Recovery to a named checkpoint Best Regards, Simon Riggs
Bruno Wolff III wrote: > For long running transactions where you want to recover as much as possible, > one might also want to recover up until just before a specific transaction > committed (as opposed to started). If your DB has died and you are recovering it, how do you reestablish a session so that a transaction can complete ? Doesn't all client connections assume that a transaction has failed if the connection to the DB fails ? Or am I just being naive, as usual ? Peter
On Tue, 2004-04-27 at 17:36, Simon Riggs wrote: > On Tue, 2004-04-27 at 21:56, Rod Taylor wrote: > > > Overall, I'd refer back to the points Bruce raised - you certainly do > > > need a way of finding out the time to recover to, and as others have > > > said also, time isn't the only desirable "recovery point". > > > > Wouldn't it be sufficient to simply use the transaction ID and ensure > > that all the parameters the user might want to use to find that ID can > > be made available in the log files? > Yes, of course, all methods of locating a particular xlog file to stop > at are effectively equivalent. The discussion is mostly about what is > convenient for the user in a real recovery situation. I see.. The first thing I would need to do is look at /var/log/pgsql. At that point it really doesn't matter what the identifier is so long as the identifier is there.
On Tue, 2004-04-27 at 23:11, Rod Taylor wrote: > On Tue, 2004-04-27 at 17:36, Simon Riggs wrote: > > On Tue, 2004-04-27 at 21:56, Rod Taylor wrote: > > > > Overall, I'd refer back to the points Bruce raised - you certainly do > > > > need a way of finding out the time to recover to, and as others have > > > > said also, time isn't the only desirable "recovery point". > > > > > > Wouldn't it be sufficient to simply use the transaction ID and ensure > > > that all the parameters the user might want to use to find that ID can > > > be made available in the log files? > > > Yes, of course, all methods of locating a particular xlog file to stop > > at are effectively equivalent. The discussion is mostly about what is > > convenient for the user in a real recovery situation. > > I see.. The first thing I would need to do is look at /var/log/pgsql. At > that point it really doesn't matter what the identifier is so long as > the identifier is there. > PITR works on the assumption that /var/log/pgsql no longer exists at all. It is suitable for use in bare-metal recovery situations, as well as usage-induced situations. You pick up the pieces, work out what the best identifier is, then plan on using that.... might not be a pgsql log, it might be: i) literally wallclock - "power went off about 2" ii) other systems logs iii) etc Best Regards, Simon Riggs
On Fri, 2004-05-28 at 00:02, Peter Galbavy wrote: > Bruno Wolff III wrote: > > For long running transactions where you want to recover as much as possible, > > one might also want to recover up until just before a specific transaction > > committed (as opposed to started). > > If your DB has died and you are recovering it, how do you reestablish a > session so that a transaction can complete ? Doesn't all client > connections assume that a transaction has failed if the connection to > the DB fails ? > Reasonable question... You re-establish connection, but cannot resume the failed transaction. PostgreSQL already has crash recovery...this is for restore from backup scenarios. Best Regards, Simon Riggs
Another idea would be to provide some means to roll a database forwards and backwards. If you're doing a recovery because you did something like an accidental UPDATE SET field = blah without a where clause, what you really care about is going up to the point right before that update. If there's a log viewer with enough detail, that would suffice; otherwise you'd need to test for some specific condition. Of course timestamps would still be useful in this scenario since they can get you close to the transaction in question. -- Jim C. Nasby, Database Consultant jim@nasby.net Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
Simon Riggs wrote: > On Tue, 2004-04-27 at 21:56, Rod Taylor wrote: > > > Overall, I'd refer back to the points Bruce raised - you certainly do > > > need a way of finding out the time to recover to, and as others have > > > said also, time isn't the only desirable "recovery point". > > > > Wouldn't it be sufficient to simply use the transaction ID and ensure > > that all the parameters the user might want to use to find that ID can > > be made available in the log files? > > > > Yes, of course, all methods of locating a particular xlog file to stop > at are effectively equivalent. The discussion is mostly about what is > convenient for the user in a real recovery situation. > > >From all that has been said so far, I would implement: > > 1. Recovery to a specific txnid, which is fairly straightforward > 2. Recovery to a specific date/time > a) either by implementing a log inspection tool that shows the txnid for > a PIT > b) implementing recovery to a PIT directly > 3. Recovery to a named checkpoint What if we added transaction id to log_line_prefix? The user could then log all queries and find the xid where they want to stop, but of course that assumes they have enabled such logging, and they have access to the logs. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Thu, May 27, 2004 at 23:02:42 +0000, Peter Galbavy <peter.galbavy@knowtion.net> wrote: > Bruno Wolff III wrote: > >For long running transactions where you want to recover as much as > >possible, > >one might also want to recover up until just before a specific transaction > >committed (as opposed to started). > > If your DB has died and you are recovering it, how do you reestablish a > session so that a transaction can complete ? Doesn't all client > connections assume that a transaction has failed if the connection to > the DB fails ? The context of my suggestion was for recovering up until a transaction which messed things up was committed. I did not want the problem transaction to be committed. If the problem transaction ran for a long time, there might be other transactions that I want to keep, if possible, that committed after the problem transaction started and before it ended.
Bruno Wolff III wrote: > The context of my suggestion was for recovering up until a transaction which > messed things up was committed. I did not want the problem transaction to > be committed. If the problem transaction ran for a long time, there might > be other transactions that I want to keep, if possible, that committed > after the problem transaction started and before it ended. Ah! followed by Eek! Now I see the light. It's very bright and painful. What I can see is that expressing this accurately and unambiguously is going to be _difficult_. How do you know accurately the point just before a transaction was completed. There must be a good subset of candidates that can be labelled. Is there anyway to label/name a transaction that can be kept somewhere ? Like "begin transaction 'bigtrasacation26';" - is there any allowance in the SQL standards for naming trasactions ? PS I have fixed my system clock - apologies to my earlier reply being a month ahead. rgds, -- Peter
On Wed, 2004-04-28 at 05:00, Bruce Momjian wrote: > Simon Riggs wrote: > > On Tue, 2004-04-27 at 21:56, Rod Taylor wrote: > > > > Overall, I'd refer back to the points Bruce raised - you certainly do > > > > need a way of finding out the time to recover to, and as others have > > > > said also, time isn't the only desirable "recovery point". > > > > > > Wouldn't it be sufficient to simply use the transaction ID and ensure > > > that all the parameters the user might want to use to find that ID can > > > be made available in the log files? > > > > > > > Yes, of course, all methods of locating a particular xlog file to stop > > at are effectively equivalent. The discussion is mostly about what is > > convenient for the user in a real recovery situation. > > > > >From all that has been said so far, I would implement: > > > > 1. Recovery to a specific txnid, which is fairly straightforward > > 2. Recovery to a specific date/time > > a) either by implementing a log inspection tool that shows the txnid for > > a PIT > > b) implementing recovery to a PIT directly > > 3. Recovery to a named checkpoint > > What if we added transaction id to log_line_prefix? The user could then > log all queries and find the xid where they want to stop, but of course > that assumes they have enabled such logging, and they have access to the > logs. Good thinking. I'll have a look at this and come back to you. Best Regards, Simon Riggs
Simon Riggs wrote: >On Wed, 2004-04-28 at 05:00, Bruce Momjian wrote: > > >>What if we added transaction id to log_line_prefix? The user could then >>log all queries and find the xid where they want to stop, but of course >>that assumes they have enabled such logging, and they have access to the >>logs. >> >> > >Good thinking. > >I'll have a look at this and come back to you. > > > log_statement triggered logging happens very early in the process - if you are logging them all it happens before the statements are even parsed. Would we have an xid to log sensibly at that stage? Perhaps with log_min_duration_statment = 0 we would (i.e. log statements when finished rather than when started). cheers andrew
On Wed, 2004-04-28 at 18:35, Andrew Dunstan wrote: > Simon Riggs wrote: > > >On Wed, 2004-04-28 at 05:00, Bruce Momjian wrote: > > > > > >>What if we added transaction id to log_line_prefix? The user could then > >>log all queries and find the xid where they want to stop, but of course > >>that assumes they have enabled such logging, and they have access to the > >>logs. > >> > >> > > > >Good thinking. > > > >I'll have a look at this and come back to you. > > > log_statement triggered logging happens very early in the process - if > you are logging them all it happens before the statements are even > parsed. Would we have an xid to log sensibly at that stage? > > Perhaps with log_min_duration_statment = 0 we would (i.e. log statements > when finished rather than when started). > Let's call this XLogSpy. For starters, we only need to look at write transactions. Many read-only transactions would not need to be examined, just to know they were read only. Remembering that we're using xlogs for recovery, we perhaps should not assume that we have anything other than that which has been archived. Currently, that is just the xlogs. So really we need to work off what is within them and right now that isn't much at all. We might optionally include the parsed statement data into the xlog, since this might be smaller than the exact text itself and would also allow us to filter the xlogs based upon any aspect of the lists. Not really happy with any of these ideas yet. Best Regards, Simon Riggs
Simon Riggs said: > On Wed, 2004-04-28 at 18:35, Andrew Dunstan wrote: >> Simon Riggs wrote: >> >> >On Wed, 2004-04-28 at 05:00, Bruce Momjian wrote: >> > >> > >> >>What if we added transaction id to log_line_prefix? The user could >> >>then log all queries and find the xid where they want to stop, but >> >>of course that assumes they have enabled such logging, and they have >> >>access to the logs. >> >> >> >> >> > >> >Good thinking. >> > >> >I'll have a look at this and come back to you. >> > >> log_statement triggered logging happens very early in the process - if >> you are logging them all it happens before the statements are even >> parsed. Would we have an xid to log sensibly at that stage? >> >> Perhaps with log_min_duration_statment = 0 we would (i.e. log >> statements when finished rather than when started). >> > > Let's call this XLogSpy. > > For starters, we only need to look at write transactions. Many > read-only transactions would not need to be examined, just to know they > were read only. > > Remembering that we're using xlogs for recovery, we perhaps should not > assume that we have anything other than that which has been archived. > Currently, that is just the xlogs. So really we need to work off what > is within them and right now that isn't much at all. > > We might optionally include the parsed statement data into the xlog, > since this might be smaller than the exact text itself and would also > allow us to filter the xlogs based upon any aspect of the lists. > > Not really happy with any of these ideas yet. > I don't see how this relates to the log_line_prefix idea, or are you abandoning that? cheers andrew
Simon Riggs wrote: > I'd suggest extending the CHECKPOINT command so you can say: > CHECKPOINT <text message> > e.g. CHECKPOINT 'starting payroll Feb04'; > (I'm sure some other DBMS does this...head spinning can;t recall...) > the text could just appear in the xlog record data packet... I believe you are thinking of a savepoint rather than a checkpoint.
On Thu, Apr 29, 2004 at 05:09:19PM +0200, Peter Eisentraut wrote: > Simon Riggs wrote: > > I'd suggest extending the CHECKPOINT command so you can say: > > CHECKPOINT <text message> > > e.g. CHECKPOINT 'starting payroll Feb04'; > > (I'm sure some other DBMS does this...head spinning can;t recall...) > > the text could just appear in the xlog record data packet... > > I believe you are thinking of a savepoint rather than a checkpoint. But a "savepoint" has a very precise meaning in the SQL standard, which relates to points in a transaction you can roll back to. I don't think you want to overload with this other meaning, which I see as putting a special mark in the XLog -- completely unrelated. ISTM Checkpoint 'something' would flush all xlogs, insert some kind of special xlog record with 'something' on it, and then archive everything (up to that point?). -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "XML!" Exclaimed C++. "What are you doing here? You're not a programming language." "Tell that to the people who use me," said XML.
On Thu, 2004-04-29 at 16:09, Peter Eisentraut wrote: > Simon Riggs wrote: > > I'd suggest extending the CHECKPOINT command so you can say: > > CHECKPOINT <text message> > > e.g. CHECKPOINT 'starting payroll Feb04'; > > (I'm sure some other DBMS does this...head spinning can;t recall...) > > the text could just appear in the xlog record data packet... > > I believe you are thinking of a savepoint rather than a checkpoint. > Perhaps that was the inspiration, but no, I definitely meant a CHECKPOINT. But now you come to mention it, it would be better just to have a command that simply wrote a named record to the xlog, so it can be searched for later... COMMENT [IN TRANSACTION LOG] 'starting payroll Feb04' Best Regards, Simon Riggs
Simon Riggs wrote: > On Thu, 2004-04-29 at 16:09, Peter Eisentraut wrote: > > Perhaps that was the inspiration, but no, I definitely meant a > CHECKPOINT. > > But now you come to mention it, it would be better just to have a > command that simply wrote a named record to the xlog, so it can > be searched for later... > > COMMENT [IN TRANSACTION LOG] 'starting payroll Feb04' FWFW, Oracle's COMMIT syntax has an optional COMMENT specifier which is used for documenting a distributed transaction. In-doubt transactions can then be manually committed or aborted by referencing the transaction associated with the comment. Example: COMMIT WORK COMMENT 'A complex distributed Tx'; Perhaps there is some common ground between the 2PC implementation and PITR? Mike Mascari
Alvaro Herrera wrote: > But a "savepoint" has a very precise meaning in the SQL standard, > which relates to points in a transaction you can roll back to. I > don't think you want to overload with this other meaning, which I see > as putting a special mark in the XLog -- completely unrelated. They are completely unrelated because you're considering them on two different levels. From the user interface level they are both a place to roll back to. Whether they are inside the same transaction or not is like the difference between a normal and a holdable cursor, but it's the same interface.
I put 360000+ rows in a table , and now any select , update , analyze ... command fail. the log shows "ERROR: heapgettup: failed ReadBuffer", but any INSERT sql command success. the table schema is row | type | modifiers ---------------+-----------------------------+----------test_id | integer | not nullsnapshot | timestamp without time zone |ip_client | inet |ip_server | inet |conn_time | integer |response_time | integer |response_head | character varying | Check constraints: "invalid_conn_time" CHECK (conn_time >= 0) "invalid_resp_time" CHECK (response_time >= 0) I didn't create any index, any one know why or suggestion to save the un-readable database? or anything I mis-configuration ?? Thanks for your help. June-Yen
jihuang <jihuang@iis.sinica.edu.tw> writes: > I put 360000+ rows in a table , and now any select , update , analyze > ... command fail. > the log shows "ERROR: heapgettup: failed ReadBuffer", What Postgres version is this? AFAICS that error has been impossible for quite some time ... regards, tom lane
Thanks for your real-time respone! the problem was sloved after I upgrade the postgreSQL from 7.3.4 to 7.4.2. by the way, is there any bug-tracking website for postgreSQL ? I follow the [HOMEPAGE] -> [DEVELOPERS] -> find nothing relative to bugzilla-like items, follow the [GBROG] ->>> it's PostgreSQL related projects , but without PostgreSQL itself ? let me show a advertisement... quote from ORELLY's Developer Weblogs> RT foundry is being developed in Taiwan as part ofthe Open Foundry Project, which is aimed at encouraging for> FS/OSS development in Taiwan. The foundry is a SF-like, expect using better technologies> (RT for bug/request tracking, subversion for source control, etc ... the following link is the issue and comments log for sloving this problem I said. http://rt.openfoundry.org/Foundry/Project/Tracker/Display.html?Queue=90&id=2653 there are some chinese characters mixed, but I just wanna to show that host a dedicate issue/bug tracking system may improve a software project evloution. June-Yen Tom Lane wrote: >jihuang <jihuang@iis.sinica.edu.tw> writes: > > >>I put 360000+ rows in a table , and now any select , update , analyze >>... command fail. >>the log shows "ERROR: heapgettup: failed ReadBuffer", >> >> > >What Postgres version is this? AFAICS that error has been impossible >for quite some time ... > > regards, tom lane > >