Re: Documentation on PITR still scarce - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Documentation on PITR still scarce
Date
Msg-id 1099739614.6942.174.camel@localhost.localdomain
Whole thread Raw
In response to Re: Documentation on PITR still scarce  (Joachim Wieland <joe@mcknight.de>)
Responses Re: Documentation on PITR still scarce  (Joachim Wieland <joe@mcknight.de>)
List pgsql-hackers
On Sat, 2004-11-06 at 00:54, Joachim Wieland wrote:
> Hi,
> 
> On Fri, Nov 05, 2004 at 10:26:55PM +0000, Simon Riggs wrote:
> > That is exactly the situation Timelines are designed to avoid. This
> > should not have happened. What leads you to think it has? My guess is
> > that it has not. If it has, its a bug.
> 
> Hmm. I did the following:
> 
> - I recovered to one PIT.
> - I verified that everything was fine.
> - If I shut down postmaster now and try to recover to another PIT,
>   everything will work fine. (by re-restoring the original backup as you
>   pointed out)
> 
> However if I:
> 
>  - Shut down postmaster and restart it in normal mode (without a new
>    recovery.conf) and then do some database operations, it seems to
>    overwrite a file from my archive:
> 

Right. You have not done a correct archive recovery and so, yes, you
will get that failure. The database can't know about your activities -
you do, and you know they are wrong, so you should expect error.

The timeline code only comes into effect when you request an archive
recovery. If you do not, it has no way of knowing it "should have".

This error is possible because of two things:
i) when PostgreSQL starts up, the only things it knows about are in the
files in the data directory... it has no other "memory" likes humans
do...if you put an incorrect set of files there for it, then it will
be...incorrect
ii) PostgreSQL hands-off responsibility for management of the archive to
you. Using a simple copy command is not the best way to protect your
important data archives - its just an example for understanding and
testing.

It doesn't and can't know what you have done, so cannot itself avoid
*requesting* the overwrite. You are the only one that determine that the
*request* to archive would cause an error.

I can see that this exposes a window for user error, and we should
document this. The correct way to get around this potential error is to:
i) follow the instructions
ii) or, for safety, write a script that checks for the existence of the
file in the archive before it does the copy.

so then set archive_command = "copy2myarchive ...."

where copy2myrchive does
- checks for file existence in archive, abort if file exists
- does the copy

Timelines are brilliant, but they don't protect you from everything.

> [...recovery...]
> LOG:  archive recovery complete
> LOG:  database system is ready
> LOG:  archived transaction log file "00000002.history"
> 
> Now we are at timeline 2 I guess.
> 
> [...normal startup...]
> LOG:  checkpoint record is at 0/22701F8
> LOG:  redo record is at 0/22701F8; undo record is at 0/0; shutdown TRUE
> LOG:  next transaction ID: 2595; next OID: 231915
> LOG:  database system is ready
> [...I do some database action...]
> LOG:  archived transaction log file "000000010000000000000001"
> LOG:  archived transaction log file "000000020000000000000002"
> 
> 
> If I stop postmaster again, wipe out my data/ dir and re-restore the
> original backup, I can't do any PITRs any more... If I re-install my archive
> as well, it works again.
> 
> 
> > > My question is: When I've restored up to the time t_0, how can I go on
> > > to restore up to another point in time, later than t_0 but before the
> > > end of my log files.
> 
> > You need to re-restore the original backup.
> 
> Ah. Ok. I had the impression that the timelines save me from re-restoring
> the original files and that I could start off directly from there. Ok,
> that's why it didn't work out that well  ;-)
> 

Once you have brought up a database in timeline N+1, you can't use it as
the base to recover to a point in timeline N because the data file
contents cannot be trusted to be identical to the way they were in
timeline N. Re-restoring the backup sounds like a thing that
needs-optimization, but it is required for transactional correctness.
[There is some slight area of improvement, but I don't wish to explain
this because it might lure people into error by mentioning it...the code
currently requires re-restoring]

-- 
Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Thomas Hallgren
Date:
Subject: Re: [PATCHES] CVS should die
Next
From: Simon Riggs
Date:
Subject: Re: Release schedule plans