Thread: PITR Phase 2 - Design Planning

PITR Phase 2 - Design Planning

From
Simon Riggs
Date:
Since Phase1 is functioning and should hopefully soon complete, we can
now start thinking about Phase 2: full recovery to a point-in-time.

Previous thinking was that a command line switch would be used to
specify recover to a given point in time, rather than the default, which
will be recover all the way to end of (available) xlogs.

Recovering to a specific point in time forces us to consider what the
granularity is of time.
We could recover:
1.to end of a full transaction log file
2.to end of a full transaction

Transaction log files currently have timestamps, so that is
straightforward, but probably not the best we can do. We would
rollforward until the xlog file time > desired point in time.

To make (2) work we would have to have a timestamp associated with each
transaction. This could be in one of two places:
1. the transaction record in the clog
2. the log record in the xlog
We would then recover the xlog record by record, until we found a record
that had a timestamp > desired point-in-time.

Currently, neither of these places have a timestamp. Hmmmm. We can't use
pg_control because we are assuming that it needs recovery...

I can't see any general way of adding a timestamp in any less than 2
bytes. We don't need a timezone. The timestamp could refer to a number
of seconds since last checkpoint; since this is limited already by a GUC
to force checkpoints every so often. Although code avoids a checkpoint
if no updates have taken place, we wouldn't be too remiss to use a
forced checkpoint every 32,000 seconds (9 hours).
Assuming that accuracy of the point-in-time was of the order of
seconds?? If we went to 0.1 second accuracy, we could checkpoint (force)
every 40 minutes or so. All of that seems too restrictive.
If we went to milliseconds, then we could use a 4 byte value and use a
checkpoint (force) every 284 hours or 1.5 weeks.
Thoughts?

Clog uses 2 bits per transaction, so even 2 bytes extra per transaction
will make the clog 9 times larger than originally intended. This could
well cause it to segment quicker, but I'm sure no one would be happy
with that. So, lets not add anything to the clog.

The alternative is to make the last part of the XlogHeader record a
timestamp value, increasing each xlog write. It might be possible to
make this part of the header optional depending upon whether or not PITR
was required, but then my preference is against such dynamic coding.

So, I propose:

- appending 8 byte date/time data into xlog file header record
- appending 4 bytes of time offset onto each xlog record
- altering the recovery logic to compare the calculated time of each
xlog record (file header + offset) against the desired point-in-time,
delivered to it by GUC.

Input is sought from anybody with detailed NTP knowledge, since the
working of NTP drift correction may have some subtle interplay with this
proposal.

Also, while that code is being altered, some additional log records need
to be added when recovery of each new xlog starts, with timing, to allow
DBAs watching a recovery to calculate expected completion times for the
recovery, which is essential for long recovery situations. 

I am also considering any changes that may be required to prepare the
way for a future implementation of parallel redo recovery.

Best regards, Simon Riggs, 2ndQuadrant
http://www.2ndquadrant.com





Re: PITR Phase 2 - Design Planning

From
Bruce Momjian
Date:
Simon Riggs wrote:
> Transaction log files currently have timestamps, so that is
> straightforward, but probably not the best we can do. We would
> rollforward until the xlog file time > desired point in time.
> 
> To make (2) work we would have to have a timestamp associated with each
> transaction. This could be in one of two places:
> 1. the transaction record in the clog
> 2. the log record in the xlog
> We would then recover the xlog record by record, until we found a record
> that had a timestamp > desired point-in-time.
> 
> Currently, neither of these places have a timestamp. Hmmmm. We can't use
> pg_control because we are assuming that it needs recovery...
> 
> I can't see any general way of adding a timestamp in any less than 2
> bytes. We don't need a timezone. The timestamp could refer to a number
> of seconds since last checkpoint; since this is limited already by a GUC
> to force checkpoints every so often. Although code avoids a checkpoint
> if no updates have taken place, we wouldn't be too remiss to use a
> forced checkpoint every 32,000 seconds (9 hours).
> Assuming that accuracy of the point-in-time was of the order of
> seconds?? If we went to 0.1 second accuracy, we could checkpoint (force)
> every 40 minutes or so. All of that seems too restrictive.
> If we went to milliseconds, then we could use a 4 byte value and use a
> checkpoint (force) every 284 hours or 1.5 weeks.
> Thoughts?

I was thinking ---  how would someone know the time to use for restore?
Certainly they will not know subsecond accuracy?  Probably second-level
accuracty is enough, _except_ when they want everything restored up to a
DROP TABLE transaction or some major problem.  Is there a way to give
users a list of transactions on a log backup?  Can we show them the
username, database, or commands or something?  Would they be able to
restore up to a specific transaction in that case?

Basically, we could give them sub-second recovery, but what value would
it be?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: PITR Phase 2 - Design Planning

From
Alvaro Herrera
Date:
On Mon, Apr 26, 2004 at 05:05:41PM -0400, Bruce Momjian wrote:
> Simon Riggs wrote:
> > Transaction log files currently have timestamps, so that is
> > straightforward, but probably not the best we can do. We would
> > rollforward until the xlog file time > desired point in time.
> 
> I was thinking ---  how would someone know the time to use for restore?

I think there should be a way to get a TransactionId and restore up to
that point.  It'd be cool, but not required, if the system showed what
valid TransactionIds there are, and roughly what they did (the xlog
code already has "describers" everywhere AFAICS).

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"El realista sabe lo que quiere; el idealista quiere lo que sabe" (Anónimo)


Re: PITR Phase 2 - Design Planning

From
Simon Riggs
Date:
On Mon, 2004-04-26 at 23:01, Alvaro Herrera wrote:
> On Mon, Apr 26, 2004 at 05:05:41PM -0400, Bruce Momjian wrote:
> > Simon Riggs wrote:
> > > Transaction log files currently have timestamps, so that is
> > > straightforward, but probably not the best we can do. We would
> > > rollforward until the xlog file time > desired point in time.
> > 
> > I was thinking ---  how would someone know the time to use for restore?
> 
> I think there should be a way to get a TransactionId and restore up to
> that point.  It'd be cool, but not required, if the system showed what
> valid TransactionIds there are, and roughly what they did (the xlog
> code already has "describers" everywhere AFAICS).

You're right, I think we should start by implementing the rollforward to
a txnid before we consider the rollforward to a specified point-in-time.
All the hooks for that are already there...

Best regards, Simon Riggs



Re: PITR Phase 2 - Design Planning

From
Simon Riggs
Date:
On Mon, 2004-04-26 at 22:05, Bruce Momjian wrote:
> Simon Riggs wrote:
> > Transaction log files currently have timestamps, so that is
> > straightforward, but probably not the best we can do. We would
> > rollforward until the xlog file time > desired point in time.
> > 
> > To make (2) work we would have to have a timestamp associated with each
> > transaction. This could be in one of two places:
> > 1. the transaction record in the clog
> > 2. the log record in the xlog
> > We would then recover the xlog record by record, until we found a record
> > that had a timestamp > desired point-in-time.
> > 
> > Currently, neither of these places have a timestamp. Hmmmm. We can't use
> > pg_control because we are assuming that it needs recovery...
> > 
> > I can't see any general way of adding a timestamp in any less than 2
> > bytes. We don't need a timezone. The timestamp could refer to a number
> > of seconds since last checkpoint; since this is limited already by a GUC
> > to force checkpoints every so often. Although code avoids a checkpoint
> > if no updates have taken place, we wouldn't be too remiss to use a
> > forced checkpoint every 32,000 seconds (9 hours).
> > Assuming that accuracy of the point-in-time was of the order of
> > seconds?? If we went to 0.1 second accuracy, we could checkpoint (force)
> > every 40 minutes or so. All of that seems too restrictive.
> > If we went to milliseconds, then we could use a 4 byte value and use a
> > checkpoint (force) every 284 hours or 1.5 weeks.
> > Thoughts?
> 
> I was thinking ---  how would someone know the time to use for restore?
> Certainly they will not know subsecond accuracy?  Probably second-level
> accuracty is enough, _except_ when they want everything restored up to a
> DROP TABLE transaction or some major problem.  Is there a way to give
> users a list of transactions on a log backup?  Can we show them the
> username, database, or commands or something?  Would they be able to
> restore up to a specific transaction in that case?
> 
> Basically, we could give them sub-second recovery, but what value would
> it be?

Yes, you remind me of a whole train of thought...

There should be a switch to allow you to specify the txnid you wish to
recover up until as well.

You raise the point of how you know what time to recover to. That is in
fact the very hardest part of recovery for a DBA. That's a good reason
for being able to list xlog contents, as you can with Oracle. Sounds
like we need an XlogMiner utility...

> Can we show them the username, database, or commands or something?

Yes, that sounds fairly straightforward possible using a modification of
the ReadRecord functions at the bottom of xlog.c - which is why security
of the xlogs is important.

It's also a good reason for being able to pause and restart recovery, so
you can see what it's like before continuing further.

Usually you are trying to sync up the contents of the database with all
of the other things that were being updated too. Often these will define
either the required transaction contents, or give a time to use.

Whatever level of time accuracy you choose, we would always need to
handle the case where multiple transactions have been committed with
exactly the same time (after rounding) and yet we may wish to split
them. Rolling forward to a txnid would help there.

Best regards, Simon
 



Re: PITR Phase 2 - Design Planning

From
Bruce Momjian
Date:
Simon Riggs wrote:
> On Mon, 2004-04-26 at 23:01, Alvaro Herrera wrote:
> > On Mon, Apr 26, 2004 at 05:05:41PM -0400, Bruce Momjian wrote:
> > > Simon Riggs wrote:
> > > > Transaction log files currently have timestamps, so that is
> > > > straightforward, but probably not the best we can do. We would
> > > > rollforward until the xlog file time > desired point in time.
> > > 
> > > I was thinking ---  how would someone know the time to use for restore?
> > 
> > I think there should be a way to get a TransactionId and restore up to
> > that point.  It'd be cool, but not required, if the system showed what
> > valid TransactionIds there are, and roughly what they did (the xlog
> > code already has "describers" everywhere AFAICS).
> 
> You're right, I think we should start by implementing the rollforward to
> a txnid before we consider the rollforward to a specified point-in-time.
> All the hooks for that are already there...

Yep, sounds like a plan.
--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: PITR Phase 2 - Design Planning

From
Hans-Jürgen Schönig
Date:
Simon Riggs wrote:
> Since Phase1 is functioning and should hopefully soon complete, we can
> now start thinking about Phase 2: full recovery to a point-in-time.
> 
> Previous thinking was that a command line switch would be used to
> specify recover to a given point in time, rather than the default, which
> will be recover all the way to end of (available) xlogs.
> 
> Recovering to a specific point in time forces us to consider what the
> granularity is of time.
> We could recover:
> 1.to end of a full transaction log file
> 2.to end of a full transaction
> 
> Transaction log files currently have timestamps, so that is
> straightforward, but probably not the best we can do. We would
> rollforward until the xlog file time > desired point in time.
> 
> To make (2) work we would have to have a timestamp associated with each
> transaction. This could be in one of two places:
> 1. the transaction record in the clog
> 2. the log record in the xlog
> We would then recover the xlog record by record, until we found a record
> that had a timestamp > desired point-in-time.
> 
> Currently, neither of these places have a timestamp. Hmmmm. We can't use
> pg_control because we are assuming that it needs recovery...
> 
> I can't see any general way of adding a timestamp in any less than 2
> bytes. We don't need a timezone. The timestamp could refer to a number
> of seconds since last checkpoint; since this is limited already by a GUC
> to force checkpoints every so often. Although code avoids a checkpoint
> if no updates have taken place, we wouldn't be too remiss to use a
> forced checkpoint every 32,000 seconds (9 hours).
> Assuming that accuracy of the point-in-time was of the order of
> seconds?? If we went to 0.1 second accuracy, we could checkpoint (force)
> every 40 minutes or so. All of that seems too restrictive.
> If we went to milliseconds, then we could use a 4 byte value and use a
> checkpoint (force) every 284 hours or 1.5 weeks.
> Thoughts?
> 
> Clog uses 2 bits per transaction, so even 2 bytes extra per transaction
> will make the clog 9 times larger than originally intended. This could
> well cause it to segment quicker, but I'm sure no one would be happy
> with that. So, lets not add anything to the clog.
> 
> The alternative is to make the last part of the XlogHeader record a
> timestamp value, increasing each xlog write. It might be possible to
> make this part of the header optional depending upon whether or not PITR
> was required, but then my preference is against such dynamic coding.
> 
> So, I propose:
> 
> - appending 8 byte date/time data into xlog file header record
> - appending 4 bytes of time offset onto each xlog record
> - altering the recovery logic to compare the calculated time of each
> xlog record (file header + offset) against the desired point-in-time,
> delivered to it by GUC.
> 
> Input is sought from anybody with detailed NTP knowledge, since the
> working of NTP drift correction may have some subtle interplay with this
> proposal.
> 
> Also, while that code is being altered, some additional log records need
> to be added when recovery of each new xlog starts, with timing, to allow
> DBAs watching a recovery to calculate expected completion times for the
> recovery, which is essential for long recovery situations. 
> 
> I am also considering any changes that may be required to prepare the
> way for a future implementation of parallel redo recovery.
> 
> Best regards, Simon Riggs, 2ndQuadrant
> http://www.2ndquadrant.com
> 
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend


Simon,

I have one question which might be important: If we use timestamps 
inside the WAL system to find out where to stop. What happens if 
somebody changes the time of the system? (e.g. correcting the system 
clock by calling ntpdate). Wouldn't it confuse the PITR system? How do 
you plan to handle that? Unfortunately time is nothing which can be used 
as a key (at least not from my point of view).

Just some lousy ideas early in the morning ...
Regards,
    Hans


-- 
Cybertec Geschwinde u Schoenig
Schoengrabern 134, A-2020 Hollabrunn, Austria
Tel: +43/2952/30706 or +43/664/233 90 75
www.cybertec.at, www.postgresql.at, kernel.cybertec.at



Re: PITR Phase 2 - Design Planning

From
Richard Huxton
Date:
On Tuesday 27 April 2004 00:32, Bruce Momjian wrote:
> Simon Riggs wrote:
> > On Mon, 2004-04-26 at 23:01, Alvaro Herrera wrote:
> > > On Mon, Apr 26, 2004 at 05:05:41PM -0400, Bruce Momjian wrote:
> > > > I was thinking ---  how would someone know the time to use for
> > > > restore?
> > >
> > > I think there should be a way to get a TransactionId and restore up to
> > > that point.  It'd be cool, but not required, if the system showed what
> > > valid TransactionIds there are, and roughly what they did (the xlog
> > > code already has "describers" everywhere AFAICS).
> >
> > You're right, I think we should start by implementing the rollforward to
> > a txnid before we consider the rollforward to a specified point-in-time.
> > All the hooks for that are already there...
>
> Yep, sounds like a plan.

Speaking as a DBA, what I usually want to do is restore to "immediately before 
I started the payroll calculation". An actual wall-clock time is mostly 
irrelevant to me.

Suggestion: How about a pg_trans_note table (crap name, I know) that only 
permits inserts - records (backend-pid, timestamp, notes). My app inserts 
"starting payroll calc" and "ending payroll calc" entries because those are 
the points I might wish to sync to. If I want to sync for each individual 
transaction in my calculations, my app can do that too.
From a usability point of view you might want to automatically insert rows on 
client connection/table creation etc. You could also delete any rows more 
than a week old when archiving WAL files.

--  Richard Huxton Archonet Ltd


Re: PITR Phase 2 - Design Planning

From
Bruno Wolff III
Date:
On Tue, Apr 27, 2004 at 10:38:45 +0100, Richard Huxton <dev@archonet.com> wrote:
> 
> Speaking as a DBA, what I usually want to do is restore to "immediately before 
> I started the payroll calculation". An actual wall-clock time is mostly 
> irrelevant to me.

For long running transactions where you want to recover as much as possible,
one might also want to recover up until just before a specific transaction
committed (as opposed to started).


Re: PITR Phase 2 - Design Planning

From
Simon Riggs
Date:
On Tue, 2004-04-27 at 08:56, Hans-Jürgen Schönig wrote:
> Simon Riggs wrote:
> > Since Phase1 is functioning and should hopefully soon complete, we can
> > now start thinking about Phase 2: full recovery to a point-in-time.
> >
> > Previous thinking was that a command line switch would be used to
> > specify recover to a given point in time, rather than the default, which
> > will be recover all the way to end of (available) xlogs.
> >
> > Recovering to a specific point in time forces us to consider what the
> > granularity is of time.
> > We could recover:
> > 1.to end of a full transaction log file
> > 2.to end of a full transaction
> >
> > Transaction log files currently have timestamps, so that is
> > straightforward, but probably not the best we can do. We would
> > rollforward until the xlog file time > desired point in time.
> >
> > To make (2) work we would have to have a timestamp associated with each
> > transaction.

...

> > So, I propose:
> >
> > - appending 8 byte date/time data into xlog file header record
> > - appending 4 bytes of time offset onto each xlog record
> > - altering the recovery logic to compare the calculated time of each
> > xlog record (file header + offset) against the desired point-in-time,
> > delivered to it by GUC.
> >
> > Input is sought from anybody with detailed NTP knowledge, since the
> > working of NTP drift correction may have some subtle interplay with this
> > proposal.
> >
> >
> I have one question which might be important: If we use timestamps
> inside the WAL system to find out where to stop. What happens if
> somebody changes the time of the system? (e.g. correcting the system
> clock by calling ntpdate). Wouldn't it confuse the PITR system? How do
> you plan to handle that? Unfortunately time is nothing which can be used
> as a key (at least not from my point of view).
>

Interesting question.

The timestamps written within WAL would be "just data" once written. The
recovery process would refer only to that timestamp data, so would not
refer to any external notion of time. This is required to ensure that
the same recovery will happen identically, no matter how many times you
choose to re-run it (or are forced to by external circumstances).

If you change the time of the system backwards, this might invalidate
the transaction log history....this would effectively create two (or
more) xlog records with the same timestamp on them and it would be
logically indeterminate which one should limit recovery. In practical
terms, I would implement this as "apply all records <= PIT". This would
mean that recovery would stop only when the time became larger than PIT,
which would only occur at the second (or last) record that had a
timestamp equal to PIT. I guess I could put a WARNING in to say "time
just went backwards...spoohw"

Overall, I'd refer back to the points Bruce raised - you certainly do
need a way of finding out the time to recover to, and as others have
said also, time isn't the only desirable "recovery point".

Best regards, Simon Riggs



Re: PITR Phase 2 - Design Planning

From
Simon Riggs
Date:
On Tue, 2004-04-27 at 10:38, Richard Huxton wrote:
> On Tuesday 27 April 2004 00:32, Bruce Momjian wrote:
> > Simon Riggs wrote:
> > > On Mon, 2004-04-26 at 23:01, Alvaro Herrera wrote:
> > > > On Mon, Apr 26, 2004 at 05:05:41PM -0400, Bruce Momjian wrote:
> > > > > I was thinking ---  how would someone know the time to use for
> > > > > restore?
> > > >
> > > > I think there should be a way to get a TransactionId and restore up to
> > > > that point.  It'd be cool, but not required, if the system showed what
> > > > valid TransactionIds there are, and roughly what they did (the xlog
> > > > code already has "describers" everywhere AFAICS).
> > >
> > > You're right, I think we should start by implementing the rollforward to
> > > a txnid before we consider the rollforward to a specified point-in-time.
> > > All the hooks for that are already there...
> >
> > Yep, sounds like a plan.
> 
> Speaking as a DBA, what I usually want to do is restore to "immediately before 
> I started the payroll calculation". An actual wall-clock time is mostly 
> irrelevant to me.
> 
> Suggestion: How about a pg_trans_note table (crap name, I know) that only 
> permits inserts - records (backend-pid, timestamp, notes). My app inserts 
> "starting payroll calc" and "ending payroll calc" entries because those are 
> the points I might wish to sync to. If I want to sync for each individual 
> transaction in my calculations, my app can do that too.
> >From a usability point of view you might want to automatically insert rows on 
> client connection/table creation etc. You could also delete any rows more 
> than a week old when archiving WAL files.

Not sure I like you're implementation, but the overall idea is great.

I'd suggest extending the CHECKPOINT command so you can say:
CHECKPOINT <text message>
e.g. CHECKPOINT 'starting payroll Feb04';
(I'm sure some other DBMS does this...head spinning can;t recall...)
the text could just appear in the xlog record data packet...

That could then be used as the target recovery point.

Best Regards, Simon Riggs



Re: PITR Phase 2 - Design Planning

From
Simon Riggs
Date:
On Tue, 2004-04-27 at 18:43, Bruno Wolff III wrote:
> On Tue, Apr 27, 2004 at 10:38:45 +0100,
>   Richard Huxton <dev@archonet.com> wrote:
> > 
> > Speaking as a DBA, what I usually want to do is restore to "immediately before 
> > I started the payroll calculation". An actual wall-clock time is mostly 
> > irrelevant to me.
> 
> For long running transactions where you want to recover as much as possible,
> one might also want to recover up until just before a specific transaction
> committed (as opposed to started).

Sounds like the difference between > and >=, so should be possible...

Best Regards, Simon Riggs



Re: PITR Phase 2 - Design Planning

From
Rod Taylor
Date:
> Overall, I'd refer back to the points Bruce raised - you certainly do
> need a way of finding out the time to recover to, and as others have
> said also, time isn't the only desirable "recovery point".

Wouldn't it be sufficient to simply use the transaction ID and ensure
that all the parameters the user might want to use to find that ID can
be made available in the log files?



Re: PITR Phase 2 - Design Planning

From
Simon Riggs
Date:
On Tue, 2004-04-27 at 21:56, Rod Taylor wrote:
> > Overall, I'd refer back to the points Bruce raised - you certainly do
> > need a way of finding out the time to recover to, and as others have
> > said also, time isn't the only desirable "recovery point".
> 
> Wouldn't it be sufficient to simply use the transaction ID and ensure
> that all the parameters the user might want to use to find that ID can
> be made available in the log files?
> 

Yes, of course, all methods of locating a particular xlog file to stop
at are effectively equivalent. The discussion is mostly about what is
convenient for the user in a real recovery situation.

>From all that has been said so far, I would implement:

1. Recovery to a specific txnid, which is fairly straightforward
2. Recovery to a specific date/time 
a) either by implementing a log inspection tool that shows the txnid for
a PIT
b) implementing recovery to a PIT directly
3. Recovery to a named checkpoint

Best Regards, Simon Riggs



Re: PITR Phase 2 - Design Planning

From
Peter Galbavy
Date:
Bruno Wolff III wrote:
> For long running transactions where you want to recover as much as possible,
> one might also want to recover up until just before a specific transaction
> committed (as opposed to started).

If your DB has died and you are recovering it, how do you reestablish a 
session so that a transaction can complete ? Doesn't all client 
connections assume that a transaction has failed if the connection to 
the DB fails ?

Or am I just being naive, as usual ?

Peter


Re: PITR Phase 2 - Design Planning

From
Rod Taylor
Date:
On Tue, 2004-04-27 at 17:36, Simon Riggs wrote:
> On Tue, 2004-04-27 at 21:56, Rod Taylor wrote:
> > > Overall, I'd refer back to the points Bruce raised - you certainly do
> > > need a way of finding out the time to recover to, and as others have
> > > said also, time isn't the only desirable "recovery point".
> > 
> > Wouldn't it be sufficient to simply use the transaction ID and ensure
> > that all the parameters the user might want to use to find that ID can
> > be made available in the log files?

> Yes, of course, all methods of locating a particular xlog file to stop
> at are effectively equivalent. The discussion is mostly about what is
> convenient for the user in a real recovery situation.

I see.. The first thing I would need to do is look at /var/log/pgsql. At
that point it really doesn't matter what the identifier is so long as
the identifier is there.



Re: PITR Phase 2 - Design Planning

From
Simon Riggs
Date:
On Tue, 2004-04-27 at 23:11, Rod Taylor wrote:
> On Tue, 2004-04-27 at 17:36, Simon Riggs wrote:
> > On Tue, 2004-04-27 at 21:56, Rod Taylor wrote:
> > > > Overall, I'd refer back to the points Bruce raised - you certainly do
> > > > need a way of finding out the time to recover to, and as others have
> > > > said also, time isn't the only desirable "recovery point".
> > > 
> > > Wouldn't it be sufficient to simply use the transaction ID and ensure
> > > that all the parameters the user might want to use to find that ID can
> > > be made available in the log files?
> 
> > Yes, of course, all methods of locating a particular xlog file to stop
> > at are effectively equivalent. The discussion is mostly about what is
> > convenient for the user in a real recovery situation.
> 
> I see.. The first thing I would need to do is look at /var/log/pgsql. At
> that point it really doesn't matter what the identifier is so long as
> the identifier is there.
> 

PITR works on the assumption that /var/log/pgsql no longer exists at
all. It is suitable for use in bare-metal recovery situations, as well
as usage-induced situations.

You pick up the pieces, work out what the best identifier is, then plan
on using that.... might not be a pgsql log, it might be:
i) literally wallclock - "power went off about 2"
ii) other systems logs
iii) etc

Best Regards, Simon Riggs




Re: PITR Phase 2 - Design Planning

From
Simon Riggs
Date:
On Fri, 2004-05-28 at 00:02, Peter Galbavy wrote:
> Bruno Wolff III wrote:
> > For long running transactions where you want to recover as much as possible,
> > one might also want to recover up until just before a specific transaction
> > committed (as opposed to started).
> 
> If your DB has died and you are recovering it, how do you reestablish a 
> session so that a transaction can complete ? Doesn't all client 
> connections assume that a transaction has failed if the connection to 
> the DB fails ?
> 

Reasonable question...

You re-establish connection, but cannot resume the failed transaction.

PostgreSQL already has crash recovery...this is for restore from backup
scenarios. 

Best Regards, Simon Riggs



Re: PITR Phase 2 - Design Planning

From
"Jim C. Nasby"
Date:
Another idea would be to provide some means to roll a database forwards
and backwards. If you're doing a recovery because you did something like
an accidental UPDATE SET field = blah without a where clause, what you
really care about is going up to the point right before that update. If
there's a log viewer with enough detail, that would suffice; otherwise
you'd need to test for some specific condition.

Of course timestamps would still be useful in this scenario since they
can get you close to the transaction in question.
-- 
Jim C. Nasby, Database Consultant                  jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


Re: PITR Phase 2 - Design Planning

From
Bruce Momjian
Date:
Simon Riggs wrote:
> On Tue, 2004-04-27 at 21:56, Rod Taylor wrote:
> > > Overall, I'd refer back to the points Bruce raised - you certainly do
> > > need a way of finding out the time to recover to, and as others have
> > > said also, time isn't the only desirable "recovery point".
> > 
> > Wouldn't it be sufficient to simply use the transaction ID and ensure
> > that all the parameters the user might want to use to find that ID can
> > be made available in the log files?
> > 
> 
> Yes, of course, all methods of locating a particular xlog file to stop
> at are effectively equivalent. The discussion is mostly about what is
> convenient for the user in a real recovery situation.
> 
> >From all that has been said so far, I would implement:
> 
> 1. Recovery to a specific txnid, which is fairly straightforward
> 2. Recovery to a specific date/time 
> a) either by implementing a log inspection tool that shows the txnid for
> a PIT
> b) implementing recovery to a PIT directly
> 3. Recovery to a named checkpoint

What if we added transaction id to log_line_prefix?  The user could then
log all queries and find the xid where they want to stop, but of course
that assumes they have enabled such logging, and they have access to the
logs.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: PITR Phase 2 - Design Planning

From
Bruno Wolff III
Date:
On Thu, May 27, 2004 at 23:02:42 +0000, Peter Galbavy <peter.galbavy@knowtion.net> wrote:
> Bruno Wolff III wrote:
> >For long running transactions where you want to recover as much as 
> >possible,
> >one might also want to recover up until just before a specific transaction
> >committed (as opposed to started).
> 
> If your DB has died and you are recovering it, how do you reestablish a 
> session so that a transaction can complete ? Doesn't all client 
> connections assume that a transaction has failed if the connection to 
> the DB fails ?

The context of my suggestion was for recovering up until a transaction which
messed things up was committed. I did not want the problem transaction to
be committed. If the problem transaction ran for a long time, there might
be other transactions that I want to keep, if possible, that committed
after the problem transaction started and before it ended.


Re: PITR Phase 2 - Design Planning

From
Peter Galbavy
Date:
Bruno Wolff III wrote:
> The context of my suggestion was for recovering up until a transaction which
> messed things up was committed. I did not want the problem transaction to
> be committed. If the problem transaction ran for a long time, there might
> be other transactions that I want to keep, if possible, that committed
> after the problem transaction started and before it ended.

Ah! followed by Eek! Now I see the light. It's very bright and painful.

What I can see is that expressing this accurately and unambiguously is 
going to be _difficult_. How do you know accurately the point just 
before a transaction was completed. There must be a good subset of 
candidates that can be labelled.

Is there anyway to label/name a transaction that can be kept somewhere ? 
Like "begin transaction 'bigtrasacation26';" - is there any allowance in 
the SQL standards for naming trasactions ?

PS I have fixed my system clock - apologies to my earlier reply being a 
month ahead.

rgds,
--
Peter


Re: PITR Phase 2 - Design Planning

From
Simon Riggs
Date:
On Wed, 2004-04-28 at 05:00, Bruce Momjian wrote:
> Simon Riggs wrote:
> > On Tue, 2004-04-27 at 21:56, Rod Taylor wrote:
> > > > Overall, I'd refer back to the points Bruce raised - you certainly do
> > > > need a way of finding out the time to recover to, and as others have
> > > > said also, time isn't the only desirable "recovery point".
> > > 
> > > Wouldn't it be sufficient to simply use the transaction ID and ensure
> > > that all the parameters the user might want to use to find that ID can
> > > be made available in the log files?
> > > 
> > 
> > Yes, of course, all methods of locating a particular xlog file to stop
> > at are effectively equivalent. The discussion is mostly about what is
> > convenient for the user in a real recovery situation.
> > 
> > >From all that has been said so far, I would implement:
> > 
> > 1. Recovery to a specific txnid, which is fairly straightforward
> > 2. Recovery to a specific date/time 
> > a) either by implementing a log inspection tool that shows the txnid for
> > a PIT
> > b) implementing recovery to a PIT directly
> > 3. Recovery to a named checkpoint
> 
> What if we added transaction id to log_line_prefix?  The user could then
> log all queries and find the xid where they want to stop, but of course
> that assumes they have enabled such logging, and they have access to the
> logs.

Good thinking.

I'll have a look at this and come back to you.

Best Regards, Simon Riggs



Re: PITR Phase 2 - Design Planning

From
Andrew Dunstan
Date:
Simon Riggs wrote:

>On Wed, 2004-04-28 at 05:00, Bruce Momjian wrote:
>  
>
>>What if we added transaction id to log_line_prefix?  The user could then
>>log all queries and find the xid where they want to stop, but of course
>>that assumes they have enabled such logging, and they have access to the
>>logs.
>>    
>>
>
>Good thinking.
>
>I'll have a look at this and come back to you.
>
>  
>

log_statement triggered logging happens very early in the process - if 
you are logging them all it happens before the statements are even 
parsed. Would we have an xid to log sensibly at that stage?

Perhaps with log_min_duration_statment = 0 we would (i.e. log statements 
when finished rather than when started).

cheers

andrew



Re: PITR Phase 2 - Design Planning

From
Simon Riggs
Date:
On Wed, 2004-04-28 at 18:35, Andrew Dunstan wrote:
> Simon Riggs wrote:
> 
> >On Wed, 2004-04-28 at 05:00, Bruce Momjian wrote:
> >  
> >
> >>What if we added transaction id to log_line_prefix?  The user could then
> >>log all queries and find the xid where they want to stop, but of course
> >>that assumes they have enabled such logging, and they have access to the
> >>logs.
> >>    
> >>
> >
> >Good thinking.
> >
> >I'll have a look at this and come back to you.
> >
> log_statement triggered logging happens very early in the process - if 
> you are logging them all it happens before the statements are even 
> parsed. Would we have an xid to log sensibly at that stage?
> 
> Perhaps with log_min_duration_statment = 0 we would (i.e. log statements 
> when finished rather than when started).
> 

Let's call this XLogSpy.

For starters, we only need to look at write transactions. Many read-only
transactions would not need to be examined, just to know they were read
only.

Remembering that we're using xlogs for recovery, we perhaps should not
assume that we have anything other than that which has been archived.
Currently, that is just the xlogs. So really we need to work off what is
within them and right now that isn't much at all.

We might optionally include the parsed statement data into the xlog,
since this might be smaller than the exact text itself and would also
allow us to filter the xlogs based upon any aspect of the lists.

Not really happy with any of these ideas yet.

Best Regards, Simon Riggs



Re: PITR Phase 2 - Design Planning

From
"Andrew Dunstan"
Date:
Simon Riggs said:
> On Wed, 2004-04-28 at 18:35, Andrew Dunstan wrote:
>> Simon Riggs wrote:
>>
>> >On Wed, 2004-04-28 at 05:00, Bruce Momjian wrote:
>> >
>> >
>> >>What if we added transaction id to log_line_prefix?  The user could
>> >>then log all queries and find the xid where they want to stop, but
>> >>of course that assumes they have enabled such logging, and they have
>> >>access to the logs.
>> >>
>> >>
>> >
>> >Good thinking.
>> >
>> >I'll have a look at this and come back to you.
>> >
>> log_statement triggered logging happens very early in the process - if
>>  you are logging them all it happens before the statements are even
>> parsed. Would we have an xid to log sensibly at that stage?
>>
>> Perhaps with log_min_duration_statment = 0 we would (i.e. log
>> statements  when finished rather than when started).
>>
>
> Let's call this XLogSpy.
>
> For starters, we only need to look at write transactions. Many
> read-only transactions would not need to be examined, just to know they
> were read only.
>
> Remembering that we're using xlogs for recovery, we perhaps should not
> assume that we have anything other than that which has been archived.
> Currently, that is just the xlogs. So really we need to work off what
> is within them and right now that isn't much at all.
>
> We might optionally include the parsed statement data into the xlog,
> since this might be smaller than the exact text itself and would also
> allow us to filter the xlogs based upon any aspect of the lists.
>
> Not really happy with any of these ideas yet.
>


I don't see how this relates to the log_line_prefix idea, or are you
abandoning that?

cheers

andrew




Re: PITR Phase 2 - Design Planning

From
Peter Eisentraut
Date:
Simon Riggs wrote:
> I'd suggest extending the CHECKPOINT command so you can say:
> CHECKPOINT <text message>
> e.g. CHECKPOINT 'starting payroll Feb04';
> (I'm sure some other DBMS does this...head spinning can;t recall...)
> the text could just appear in the xlog record data packet...

I believe you are thinking of a savepoint rather than a checkpoint.



Re: PITR Phase 2 - Design Planning

From
Alvaro Herrera
Date:
On Thu, Apr 29, 2004 at 05:09:19PM +0200, Peter Eisentraut wrote:
> Simon Riggs wrote:
> > I'd suggest extending the CHECKPOINT command so you can say:
> > CHECKPOINT <text message>
> > e.g. CHECKPOINT 'starting payroll Feb04';
> > (I'm sure some other DBMS does this...head spinning can;t recall...)
> > the text could just appear in the xlog record data packet...
> 
> I believe you are thinking of a savepoint rather than a checkpoint.

But a "savepoint" has a very precise meaning in the SQL standard, which
relates to points in a transaction you can roll back to.  I don't think
you want to overload with this other meaning, which I see as putting a
special mark in the XLog -- completely unrelated.

ISTM Checkpoint 'something' would flush all xlogs, insert some kind of
special xlog record with 'something' on it, and then archive everything
(up to that point?).

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"XML!" Exclaimed C++.  "What are you doing here? You're not a programming
language."
"Tell that to the people who use me," said XML.


Re: PITR Phase 2 - Design Planning

From
Simon Riggs
Date:
On Thu, 2004-04-29 at 16:09, Peter Eisentraut wrote:
> Simon Riggs wrote:
> > I'd suggest extending the CHECKPOINT command so you can say:
> > CHECKPOINT <text message>
> > e.g. CHECKPOINT 'starting payroll Feb04';
> > (I'm sure some other DBMS does this...head spinning can;t recall...)
> > the text could just appear in the xlog record data packet...
> 
> I believe you are thinking of a savepoint rather than a checkpoint.
> 

Perhaps that was the inspiration, but no, I definitely meant a
CHECKPOINT.

But now you come to mention it, it would be better just to have a
command that simply wrote a named record to the xlog, so it can be
searched for later...

COMMENT [IN TRANSACTION LOG] 'starting payroll Feb04' 

Best Regards, Simon Riggs



Re: PITR Phase 2 - Design Planning

From
Mike Mascari
Date:
Simon Riggs wrote:

> On Thu, 2004-04-29 at 16:09, Peter Eisentraut wrote:
> 
> Perhaps that was the inspiration, but no, I definitely meant a 
> CHECKPOINT.
> 
> But now you come to mention it, it would be better just to have a
>  command that simply wrote a named record to the xlog, so it can
> be searched for later...
> 
> COMMENT [IN TRANSACTION LOG] 'starting payroll Feb04'

FWFW,

Oracle's COMMIT syntax has an optional COMMENT specifier which is 
used for documenting a distributed transaction. In-doubt 
transactions can then be manually committed or aborted by 
referencing the transaction associated with the comment. Example:

COMMIT WORK COMMENT 'A complex distributed Tx';

Perhaps there is some common ground between the 2PC implementation 
and PITR?

Mike Mascari




Re: PITR Phase 2 - Design Planning

From
Peter Eisentraut
Date:
Alvaro Herrera wrote:
> But a "savepoint" has a very precise meaning in the SQL standard,
> which relates to points in a transaction you can roll back to.  I
> don't think you want to overload with this other meaning, which I see
> as putting a special mark in the XLog -- completely unrelated.

They are completely unrelated because you're considering them on two 
different levels.  From the user interface level they are both a place 
to roll back to.  Whether they are inside the same transaction or not 
is like the difference between a normal and a holdable cursor, but it's 
the same interface.



ERROR: heapgettup: failed ReadBuffer

From
jihuang
Date:
I put  360000+ rows in a table , and now any select , update , analyze 
... command fail.
the log shows  "ERROR: heapgettup: failed ReadBuffer",
but any INSERT sql command success.


the table schema is
      row        |              type               |     modifiers
---------------+-----------------------------+----------test_id       | integer                     | not nullsnapshot
   | timestamp without time zone |ip_client     | inet                        |ip_server     | inet
  |conn_time     | integer                     |response_time | integer                     |response_head | character
varying          |
 
Check constraints:   "invalid_conn_time" CHECK (conn_time >= 0)   "invalid_resp_time" CHECK (response_time >= 0)


I didn't create any index,
any one know why or suggestion to save the un-readable database?

or anything I mis-configuration ??

Thanks for your help.

June-Yen



Re: ERROR: heapgettup: failed ReadBuffer

From
Tom Lane
Date:
jihuang <jihuang@iis.sinica.edu.tw> writes:
> I put  360000+ rows in a table , and now any select , update , analyze 
> ... command fail.
> the log shows  "ERROR: heapgettup: failed ReadBuffer",

What Postgres version is this?  AFAICS that error has been impossible
for quite some time ...
        regards, tom lane


Re: ERROR: heapgettup: failed ReadBuffer

From
jihuang
Date:
Thanks for your real-time respone!

the problem was sloved after I upgrade the postgreSQL from 7.3.4 to 7.4.2.

by the way, is there any bug-tracking website for postgreSQL ?
I follow the [HOMEPAGE] -> [DEVELOPERS] ->    find nothing  relative to 
bugzilla-like items,
follow the [GBROG] ->>>   it's PostgreSQL related projects , but without 
PostgreSQL itself ?

let me show a advertisement...  quote from ORELLY's Developer Weblogs> RT foundry is being developed in Taiwan as part
ofthe Open Foundry 
 
Project, which is aimed at encouraging for> FS/OSS development in Taiwan.  The foundry is a SF-like, expect using 
better technologies> (RT for bug/request tracking, subversion for source control, etc ...

the following link is the issue and comments log for sloving this 
problem I said.
http://rt.openfoundry.org/Foundry/Project/Tracker/Display.html?Queue=90&id=2653

there are some chinese characters mixed,  but I just wanna to show that 
host a dedicate  issue/bug tracking system
may improve a software project evloution.

June-Yen

Tom Lane wrote:

>jihuang <jihuang@iis.sinica.edu.tw> writes:
>  
>
>>I put  360000+ rows in a table , and now any select , update , analyze 
>>... command fail.
>>the log shows  "ERROR: heapgettup: failed ReadBuffer",
>>    
>>
>
>What Postgres version is this?  AFAICS that error has been impossible
>for quite some time ...
>
>            regards, tom lane
>  
>